%%html
<script src="https://bits.csb.pitt.edu/preamble.js"></script>
Molecular Dynamics¶
The simulation of the physical motions of atoms using classical physics (Newton's equations of motion).
- Forces determined by a force field
- Biological force fields are primarily calibrated for proteins
- Simulation done in a small box (usually only a few proteins)
- Result is an approximation
%%html
<div id="md1" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="https://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#md1';
jQuery(divid).asker({
id: divid,
question: "Assuming sufficient simulation time, which of the following processes can <b>not</b> be observed in a molecular dynamics simulation?",
answers: ["The disassociation of two proteins",'The folding of a protein','The construction of RNA by RNA polymerase',
'The conformational sampling of residue side chains or short loop regions '],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Why Simulate?¶
- assess stability of model
- observe dynamics of interactions
- observe effects of mutations
- compute properties of the ensemble
- ...
A typically simulation ranges from 10ns to 100ns, so we are limited in the sorts of processes that can be observed.
MD Packages¶
Amber http://ambermd.org
- Fastest GPU implementation (in my experience)
Gromacs http://www.gromacs.org
- Open-source (LGPL)
NAMD http://www.ks.uiuc.edu/Research/namd/
- Highly optimized for cluster computing
- Integrated with VMD
LAMMPS http://lammps.sandia.gov
- Open-source (GPL)
Forcefields¶
The forcefield determines what forces are applied to each atom. For example, electrostatic interactions and torsion angle potentials. Examples of force field families:
- CHARMM
- Gromacs
- OPLS
- Amber
$$V(r^N)=\sum_\text{bonds} k_b (l-l_0)^2 + \sum_\text{angles} k_a (\theta - \theta_0)^2 $$ $$+ \sum_\text{torsions} \frac{1}{2} V_n [1+\cos(n \omega- \gamma)] +\sum_{j=1} ^{N-1} \sum_{i=j+1} ^N \biggl\{\epsilon_{i,j}\biggl[\left(\frac{r_{0ij}}{r_{ij}} \right)^{12} - 2\left(\frac{r_{0ij}}{r_{ij}} \right)^{6} \biggr]+ \frac{q_iq_j}{4\pi \epsilon_0 r_{ij}}\biggr\} $$
Timestep¶
Every timestep of the simulation (1 or 2 femtoseconds) all the forces exerted on every atom are calculated and the positions and velocities are updated appropriately according to Newton's laws of motion.
%%html
<div id="md2" style="width: 500px"></div>
<script>
$('head').append('<link rel="stylesheet" href="https://bits.csb.pitt.edu/asker.js/themes/asker.default.css" />');
var divid = '#md2';
jQuery(divid).asker({
id: divid,
question: "How many seconds is a femtosecond?",
answers: ["10^-18","10^-15","10^-12","10^-9","10^-6"],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
File Formats¶
Different packages use different file formats and have different approaches to setting up and running a simulation. Typically, to start a simulation you need
- a configuration file that specifies how the simulation should be run
- a topology of your system
- initial coordinates of your system (may include velocities)
The output of the simulation is a trajectory file.
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.dcd
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd
--2023-10-10 08:42:54-- http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17559824 (17M) Saving to: ‘shmt2.prmtop’ shmt2.prmtop 100%[===================>] 16.75M 60.1MB/s in 0.3s 2023-10-10 08:42:54 (60.1 MB/s) - ‘shmt2.prmtop’ saved [17559824/17559824] --2023-10-10 08:42:54-- http://mscbio2025.csb.pitt.edu/files/shmt2.dcd Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 102836596 (98M) [application/DCD] Saving to: ‘shmt2.dcd’ shmt2.dcd 100%[===================>] 98.07M 67.4MB/s in 1.5s 2023-10-10 08:42:55 (67.4 MB/s) - ‘shmt2.dcd’ saved [102836596/102836596] --2023-10-10 08:42:56-- http://mscbio2025.csb.pitt.edu/files/shmt2.inpcrd Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3127961 (3.0M) Saving to: ‘shmt2.inpcrd’ shmt2.inpcrd 100%[===================>] 2.98M --.-KB/s in 0.05s 2023-10-10 08:42:56 (63.3 MB/s) - ‘shmt2.inpcrd’ saved [3127961/3127961]
Configuration File¶
These are usually plain text and extremely obtuse without reading the documentation.
Amber:
md of hsp27 implicit &cntrl imin = 0, ntb = 0, igb = 1, ntpr = 100, ntwx = 5000, ntwr = 5000, ntpr = 5000, ntt = 3, gamma_ln = 1.0, tempi = 300.0, temp0 = 300.0 nstlim = 50000000, dt = 0.002, ntc = 2, cut = 999 /
Topology¶
The topology stores connectivity and atom type information about a model, but no coordinates.
- .prmtop - Amber
- .psf - NAMD
- .top - Gromacs
Typically you will create a topology from a PDB using the tools provided with the simulation package. For example, you might use tleap in Amber to solvate the protein and create initial coordinates and a topology that includes information about your chosen forcefield.
!head -15 shmt2.prmtop
%VERSION VERSION_STAMP = V0001.000 DATE = 09/02/15 08:40:14 %FLAG TITLE %FORMAT(20a4) default_name %FLAG POINTERS %FORMAT(10I8) 85695 17 78475 7362 16384 9960 33288 31170 0 0 174624 24681 7362 9960 31170 67 152 191 36 1 0 0 0 0 0 0 0 2 26 0 0 %FLAG ATOM_NAME %FORMAT(20a4) N H1 H2 H3 CA HA CB HB2 HB3 CG CD1 HD1 NE1 HE1 CE2 CZ2 HZ2 CH2 HH2 CZ3 HZ3 CE3 HE3 CD2 C O N H CA HA CB HB CG2 HG21HG22HG23OG1 HG1 C O N H CA HA2 HA3 C O N H CA HA CB HB2 HB3 CG HG2 HG3 CD OE1 NE2
Coordinates¶
A coordinate file provides the (x,y,z) coordinates of each atom in your system.
For most coordinate formats, the file is useless without the topology.
Some formats also include velocities and/or energies and can be used to restart the simulation.
- .pdb - Gromacs, NAMD
- .inpcrd - Amber
- .rst - Amber Restart
!head shmt2.inpcrd
default_name 85695 56.0088807 71.7060934 26.6369124 55.9058245 72.6656576 26.9347665 55.3430908 71.5216067 25.9001704 56.9898190 71.5152073 26.4905496 55.5961623 70.8150285 27.7784433 54.7545753 70.2022149 27.4555036 56.7364989 69.8797176 28.2523746 57.1310507 69.3326367 27.3961441 57.5399895 70.4472121 28.7218987 56.2765422 68.8069283 29.2961567 55.8725428 67.5289634 29.0305627 55.8943751 67.2239618 27.9843327 55.5106592 66.8784844 30.1940309 55.1697749 65.9298887 30.2577772 55.6890532 67.7312641 31.2490638 55.4724345 67.5312695 32.6151225
Trajectory¶
The trajectory is the result of the simulation. These files store the coordinates of every atom of the simulation for every output timestep of the simulation.
These files can be huge, so the minimum amount of information is usually stored (i.e. just coordinates) and the file is useless without the topology.
- .trj, .trr - Gromacs full trajectory
- .xtc - Gromacs compressed trajectory
- .dcd - NAMD
- .mdcrd, .nc - Amber
!head shmt2.dcd
T CORDd o�: # T T Cpptraj generated dcd file. T �N �: ��B�\�BԇB�хB���Bhd�B��BT��B��B���B�:�BCS�B L�B&��B�̈́B�Y�B$2�B�5�BكBn�B��Bw݅B�׆B�̅B�B���B0��B��B[Bʍ�B��zB/�vB|B@yB�9|B��B)�yBCHxB�|B�_{B��|B��~BڅzB��{B��|B�vtBL�rB�QqBp�rB�zkBN�iB(�jBŮlB�ffB=:lB�lB�lpB.�hB �jBdBX�aB� cBVhB�OcBW�jB��nBE#hB��dB/�kB�eiB�nB�pB�bnB�7qB7JuB�xB<�uB�eB�BgB �aB��`B_BF\^B0qYB��WB�VBI�YB�?[BN�bBػfBU�aB-J_B��dBB)iB��bB��cB��^B|�dBf�cBkB�IlB70lB�mB��aBѱ]B lbBpcB�idB�nhB._B��\B[�^Brh`B��XB1�WB+�VBq�UB��VBA aB[�aB�lbB�zaB�CeB�fdB��bB��bB�!eB$]BƢZB�E[B`kB�#nB�SmB��jB� sB�#uB��tB�GyB�sB[�rB-�nB+uB��xB-�qB9�nBksBI�vBP�tBg�vB�qBƑyB�ozB�s|B��nB��kB�nB5EqB*XuB�pB�oB��qB%�oB��iB�ghB��fBKjBBfB�RkB��gB�pB�xsB��qB�pB�wB0ayB��xBu�yB�oxBc;xB��B�m�Bs$�B�aoBD�mB�zoBŐpBKwmB��nBapB�ltBJ'oBR�nB��pB[�jBY�oBl�vBoxBG�xB�lxB�mgB�dB+eB��gB�O_B`�]Bt^B_�_Bm5`B�XB�VBʈYB�_QB3OB�_OB8)JB��FB��HB�DB� MB��KBoCRB'WUB��SBE0\B��WBP^B2�aB�*\B�WB{�^BscB�]B�]B�[^B>�^B�WBl�TB�;TB�\Bk�XB�`BdB�LaB#V`B��fB<hB{�fBıkBi�kB��pB-�pB�"tBfvqBepkB`�gB�bnB��kB-*]B�ZBhl\B�_BO�XB�YBgBYBzYB��UB�^B9�aB�G]B�pYB9�`B~�]B�^B4 `B�hbB~=[B�RBu�OB[�QBU�TB�hLB)IB�{LB�;PB �IB��IB|(JBg�FBVJB�ZOB#�PB��QB �NB/{QB�KB�;IB��IB��FB�?KB�LB�PHB�@>B��:B�'>B��@BG:Bjr:B��;B8<B��8B�lAB}1AB7BDB�kCBN|@B�HB_nJB�^KB24B��0B�2Bˀ5B IB|�JB1�/B �-B-e3B%�4B,�4B�d4B�9B�#1B�/B�0B-�1B��-B��*B (+B)U.BF�)B�h&BB�'B5�#B�a"B�� B|1'B?"B��#B��B�8!B�[2B�C4B��3B�c2B��7BI�;B\8B��8B�5B�!=B��<B@�=B�BB�jCB|�DB�5Bb�2Bd|8B��;B�S6B}4B�2B 5B��1B�&-B5�-B��*Boa'B=�-B� BBKS?B<CB3d?BQBB)M>B�!:B��7B�X8B�bKBuwNBd�JB�_HB >NBI�NB3�KB��GBB�LB3�MB��SB��WB5�SB�|PB��XB��ZBWBƏSB��UBs[B��]BM�]B,"`B��`BdZBlsXB��XB�XTB~:ZB&!\B�`B:�YBVB��\B��`Bw�ZB��\B�YB��_B��aB�c_Bna^B��[B�[B��WB�]B��\Bo�dB��hB5�eB<nbB �jBA�mB,aiB��lBI�gB6dB}�`B��kBU�pBK�hB'YeBZ�jBO�kB6�eBvdB�1gB��`B2c_B��\B��^B�YB��[B��aBh fBH=_B�`B�ToB�rB��pB�,nB��uBcvB�3uBQKrB��sBOpzB��{B֙}B�Y�B�}BϵzBQBE$zB�vB �sBu�zB�V{B8~B�~Bɬ�B��|B��B`�vB��yB��qBv�mB�ZmB�nB�3hB��dBE�gB��hB
%%html
<div id="mdsz" style="width: 500px"></div>
<script>
var divid = '#mdsz';
jQuery(divid).asker({
id: divid,
question: "If a trajectory file stores nothing but the single precision (4 byte) coordinates of 1000 frames of a simulation of a 500 atom system, how big is the trajectory file?",
answers: ["0.5 MB", "1.5 MB","5.72 MB","6.0 MB",'16 MB',"48 MB"],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".input .o:contains(html)").closest('.input').hide();
</script>
mdanalysis¶
https://code.google.com/p/mdanalysis/
"MDAnalysis is an object-oriented python toolkit to analyze molecular dynamics trajectories generated by CHARMM, Gromacs, NAMD, LAMMPS, or Amber."
import MDAnalysis
universe = MDAnalysis.Universe('shmt2.prmtop', 'shmt2.dcd')
MDAnalysis starts with a topology and a trajectory.
Atom Groups¶
universe.atoms
<AtomGroup with 85695 atoms>
You can select a specific group of atoms (very similar to ProDy) using atom selections.
universe.select_atoms("protein")
<AtomGroup with 14432 atoms>
Selections can work directly on AtomGroups
universe.select_atoms("resname PRO")
<AtomGroup with 644 atoms>
universe.select_atoms("byres around 5 resid 370")
<AtomGroup with 245 atoms>
prot = universe.select_atoms("protein")
prot.select_atoms("byres around 5 resid 370") #select whole residues within 5 of residue 100
<AtomGroup with 209 atoms>
Like ProDy, can iterate over atoms
for a in prot.atoms[:3]:
print(a)
<Atom 1: N of type N3 of resname TRP, resid 1 and segid SYSTEM> <Atom 2: H1 of type H of resname TRP, resid 1 and segid SYSTEM> <Atom 3: H2 of type H of resname TRP, resid 1 and segid SYSTEM>
Note that atoms retain information about residues (but generally not chains)
print(a.resid,a.resname,a.resnum)
1 TRP 1
Residues¶
AtomGroups can also be traversed and viewed at a residue level
prot.residues
<ResidueGroup with 924 residues>
%%html
<div id="mdres" style="width: 500px"></div>
<script>
var divid = '#mdres';
jQuery(divid).asker({
id: divid,
question: "How many residues are there in our shmt2 protein?",
answers: [ "22","924","1024","24681","85695"],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Trajectories¶
universe.trajectory
<DCDReader shmt2.dcd with 100 frames of 85695 atoms>
The coordinates of atoms are determined by the current position in the trajectory (trajectory.frame
)
The coordinates of selections refer to whatever the current trajectory frame is
The current frame is set by iterating over the trajectory or indexing into it.
for ts in universe.trajectory[:5]:
print(ts.frame, universe.trajectory.frame, ts.time, prot.center_of_mass())
0 0 4.8888212322065376e-05 [63.06938232 61.67971211 47.8073606 ] 1 1 9.777642464413075e-05 [63.0723881 61.69172611 47.80475064] 2 2 0.00014666463696619611 [63.05890957 61.67358912 47.8136455 ] 3 3 0.0001955528492882615 [63.08430546 61.67402582 47.80758225] 4 4 0.0002444410616103269 [63.07228252 61.68630054 47.7999566 ]
Analysis¶
A number of packages have been contributed to MDAnalysis to perform common tasks.
import MDAnalysis.analysis
PACKAGE CONTENTS
- align - aligning structures
- contacts - native contact analysis
- density - compute water densities
- distances - for computing distances
- gnm
- hbonds - hydrogen bond analysis
- helanal - analysis of helices
- hole - for analyzing pores
- leaflet
- nuclinfo - analysis of nucleic acids
- psa - path simularity
- rms
- waterdynamics - water analysis
- x3dna - a different nucleic analysis
Alignment¶
from MDAnalysis.analysis.align import *
Can align a single structure with alignto
Use AlignTraj
to align and write out a full trajectory (trajectories are not kept in memory)
universe.trajectory[0]
#if we align to ourselves, will fit to current frame
alignment = AlignTraj(universe, universe, select='protein',filename='rmsfit.dcd')
alignment.run()
/home/dkoes/.local/lib/python3.10/site-packages/MDAnalysis/coordinates/DCD.py:432: UserWarning: No dimensions set for current frame, zeroed unitcell will be written warnings.warn(wmsg)
<MDAnalysis.analysis.align.AlignTraj at 0x7fc3b7f1d450>
MDAnalysis.rms¶
from MDAnalysis.analysis.rms import * #this pulls in an rmsd function
Root mean squared deviation (RMSD) $$\sqrt{\frac{\sum_i^n(x_i^a-x_i^b)^2+(y_i^a-y_i^b)^2+(z_i^a-z_i^b)^2}{n}}$$
universe.trajectory[0] #sets the current frame to the start
refcoord = prot.positions # once stored, _coordinates_ do NOT change with trajectory
refcoord
array([[67.44726 , 79.259 , 22.918747], [68.181496, 78.82586 , 22.377106], [67.914246, 80.0178 , 23.394417], ..., [54.802658, 77.382744, 36.064526], [55.336662, 76.558395, 36.86686 ], [55.43933 , 77.794464, 35.015507]], dtype=float32)
universe.trajectory[-1] #last frame
print(rmsd(refcoord,prot.positions))
2.3666243947311068
Plot both the all-protein and calpha only (name CA
) rmsd with the first frame.
universe.trajectory[0]
protref = prot.positions
caref = prot.select_atoms('name CA').positions
protrmsd = []
carmsd = []
for ts in universe.trajectory:
protrmsd.append(rmsd(protref,prot.positions))
carmsd.append(rmsd(caref,prot.select_atoms('name CA').positions))
n = universe.trajectory.n_frames
plt.plot(range(n),protrmsd,range(n),carmsd)
plt.xlabel("Frame #")
plt.ylabel('RMSD')
plt.legend(['Protein','CA'],loc='lower right');
%%html
<div id="mdtri" style="width: 500px"></div>
<script>
var divid = '#mdtri';
jQuery(divid).asker({
id: divid,
question: "If frame 40 is ~2 RMSD from the start and frame 80 is ~2 RMSD from the start. What can be said about the RMSD between frames 40 and 80?",
answers: ['It is ~0', 'It is < ~2','It is < ~4','Nothing'],
server: "https://bits.csb.pitt.edu/asker.js/example/asker.cgi",
charter: chartmaker})
$(".jp-InputArea .o:contains(html)").closest('.jp-InputArea').hide();
</script>
Project¶
- Compute RMSDs
- between frame 0 and frame 40
- between frame 40 and frame 80
- between frame 0 and frame 80
- Compute all RMSDs (put in a list)
- to starting frame
- to ending frame
- Plot start/end RMSDs
- Align protein to only resids 760-924, redo above
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop
!wget http://mscbio2025.csb.pitt.edu/files/shmt2.dcd
--2023-10-10 08:43:06-- http://mscbio2025.csb.pitt.edu/files/shmt2.prmtop Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17559824 (17M) Saving to: ‘shmt2.prmtop.1’ shmt2.prmtop.1 100%[===================>] 16.75M 63.6MB/s in 0.3s 2023-10-10 08:43:06 (63.6 MB/s) - ‘shmt2.prmtop.1’ saved [17559824/17559824] --2023-10-10 08:43:06-- http://mscbio2025.csb.pitt.edu/files/shmt2.dcd Resolving mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)... 136.142.4.139 Connecting to mscbio2025.csb.pitt.edu (mscbio2025.csb.pitt.edu)|136.142.4.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 102836596 (98M) [application/DCD] Saving to: ‘shmt2.dcd.1’ shmt2.dcd.1 100%[===================>] 98.07M 65.7MB/s in 1.5s 2023-10-10 08:43:08 (65.7 MB/s) - ‘shmt2.dcd.1’ saved [102836596/102836596]