Molecular simulations on GPUs


Home Projects Random number generators SOP-GPU Implicit solvent models SOP-GPU Package Links Nvidia - CUDA Zone NAMD VMD CHARMM Protein Data Bank Viper Data Bank Computational resources

Back

About

The SOP-GPU package, where SOP stands for the Self Orginized Polymer Model fully implemented on a GPU, is a scientific software package designed to perform Langevin Dynamics Simulations of the mechanical or thermal unfolding, and mechanical indentation of large biomolecular systems in the experimental subsecond (millisecond-to-second) timescale. The SOP-GPU package utilizes the C_α-carbon based coarse-grained description of proteins combined with computational power of modern Graphics Processing Units (GPUs).

Documentation

For detailed explanation of all the constant parameters and for the format of the input and output files, please refer to SOP-GPU manual. Some technical details of the algorithms used are given here. Also, see examples below for some useful tips on how to run simulations using SOP-GPU program.

Downloads

You can download the latest source code here. Please, feel free to contact us if you have questions or comments, and if you would like to download the most recent version of the SOP-GPU package. We would appreciate if you could provide us with your suggestions on how to improve SOP-GPU.

Installation instructions

You will need NVIDIA Developers Driver and Toolkit. Use these tips for installation instructions (Ubuntu Linux).
Download this archive with the program.
Unpack the archive.
In the terminal window, cd to SOP-GPU folder and type 'make'. This will compile the program, creating two executable files: sop-gpu and sop-top.
To install the SOP-GPU program, copy the executable files to /usr/bin folder, or type 'sudo make install'.

Using SOP-GPU

Preparing initial (all-atom) pdb structure

Download the protein structure file from Protein DataBank. One of the simplest examples is a WW domain, which can be extracted from the PDB entry 1PIN.
Truncate the PDB file, and leave only the part needed for simulations. In the case of 1PIN.pdb (for WW domain), you will need to delete almost everything, leaving only the residues 6 to 39 from the chain A*.

*Note: the current version of the "sop-top" program from the SOP-GPU package reads only "ATOM" and "SSBOND" sections of the PDB file. If you have S-S bonds in your systems, make sure that you keep the corresponding "SSBOND" entries in your truncated file.

Creating the topology file and coarse-graned .pdb file

To create the molecular topology, you will need the configuration file**. A basic example is availible here.
Change the name of the protein in the configuration file so that it corresponds to the name of your truncated .pdb file.
If needed, change the name of the input file (all-atom .pdb file), the output files (topology and coarse-grained pdb structure), the cut-off distances for the native contacts, i.e. between a pair of the C_α-carbons (simple Go definition) and between the side chains (full Go definition), and the energy scale (ε_h)***.
Use the 'sop-top' utility by typing the following command in the terminal window:

$ sop-top top.sop
This will generate topology and coarse-grained .pdb files. These files will be used in the next step by the "sop-gpu" program. You can inspect your .pdb file by loading it into a visualization software like VMD.

**Note: the configuration file (.sop) is a set of parameter name vs parameter value pairs. Each line in the .sop file contains one name-value pair, with the parameter value space- or TAB-separated from the parameter name. In addition, "<" and ">" characters can be used to define macros: if the value of the first parameter is defined using the name of the second parameter, embeded in "<" and ">", then this part of the first parameter value will be changed to the value of the second parameter. For example, if you have the following lines in your configuration file:

name 1PIN
structure <name>.pdb

then the program will use the protein structure from "1PIN.pdb" file. In the configuration file, character "#" can be used for remarks.

***Note: in general, there is no rigorous rule for chosing the ε_h value and the values for cut-off distances, including R_limit_bond and SC_limit_bond (see below). Importantly, these parameters of the SOP model determine how native contacts are defined and how strong they are. One way of chosing these constant parameters is to try several combinations so that the results of experiments and simulations agree. For a protein as simple as WW domain, formed by 34 amino acid residues, the basic definition of ε_h, R_limit_bond, and SC_limit_bond is good enough, i.e. when ε_h=1.5kcal/mol, R_limit_bond=8.0Å and SC_limit_bond=5.2Å. In general, ε_h should be within the range of 1.0 and 1.5 kcal/mol. The standard values for the cut-off distances are, respectively, 8Å and 0Å for R_limit_bond and SC_limit_bond for simple Go definition, and 8Å and 5.2Å for full Go definition. The "sop-top" utility allows one to do more sophisticated topology definitions, which includes constructing protein tandems or taking the ε_h values for each residue from the initial PDB file. Please, see examples section and SOP-GPU manual for more details. Also, if you wish to use separate folder(s) for the output files/data, you need to create these folders before running the "sop-top" utility.

Equilibrium simulations

To start equilibrium simulation runs, use the following steps:

To start equilibrium simulations, use this configuration file. If you are using WW domain example, you can go directly to step 4.
In the configuration file, change the name of the protein to whatever name you wish to use.
In the configuration file, change the number of integration steps (parameter "numsteps") and random seed (parameter "seed").
After the configuration file is ready to go, type the following command in the terminal window:
$ sop-gpu equil.sop

This will start equilibrium simulations, and will print the output on the screen. The SOP-GPU program will report the current timestep, and the energies and Maxwell-Boltzmann temperature of the system as well as some estimates on how long the simulation run will take. All the energies and temperature are saved into the TAB-separated output (.dat) file. The program will also save a .dcd file with the coordinates of C_α-particles. The names of these output files are specified in the .sop configuration file. You will be able to see the current structure by loading the .dcd into a visualization software, simular to VMD, using the coarse-grained .pdb file. Here, you can use either the initial .pdb file (generated by "sop-top") or the reference file, saved by the SOP-GPU program. Detailed description of the energy output (.dat) file can be found in SOP-GPU Manual.

Pulling a protein

To start pulling simulation runs, use the following steps:

Use this configuration file to start. If you are using the WW domain example, you can go directly to step 6.
As before, you need to change the protein name (parameter "name").
Update the numbers of fixed and pulled residues and their particle IDs. If you wish to constrain just one residue and pull the protein at a single residue, change parameters "fixed1" and "pulled1" to the desired particle IDs. Note, that these numbers are particle ID-numbers starting from zero, not the actual protein residue number. To get the particle ID, you will need to find this residue in the coarse-grained (.pdb) file, i.e. in the .pdb file, generated by "sop-top", and subtract 1 from the value in the second column (atom serial number).
You may also want to change the pulling speed (parameter "deltax") and the cantilever spring constant (parameter "k_trans"). By default, these parameters are set to the standart experimental values of 2.5µm/s and 35pN/nm, respectively.
Adjust the number of integration steps (parameter "numsteps").
To start pulling simulations, type the following command in the terminal window:
$ sop-gpu pull.sop
In addition to the output for energies and coordinates, this command will also create the TAB-separated "pulling" (.dat) output file, saving the position of the cantilever tip (in Å), and the values for the molecular and cantilever spring forces (in kcal/molÅ). For more details on the format of this file, see SOP-GPU Manual.
To get a force-extension curve, plot the data in column 4 against the data in column 2 from the "pulling" output file. Note, that the reading of force is in given in units of kcal/molÅ, and the distance is reported in Å.

Many-runs-per-GPU approach

This approach allows one to perform concurrently many simulation runs on a single GPU device, in order to obtain multiple independent trajectories for the same system. Here, we provide an example of how to run equilibrium simulations (steps 1-3) as well as pulling simulations (steps 4 and 5) using the many-runs-per-GPU approach. To initiate many independent runs, use the following steps:

Go back to the "equil.sop" configuration file and change parameter "run 1" to "firstrun 1".****
Add the following line to the configuration file:
runnum 100
This will start 100 trajectories, and will generate many output files. You might want to put them into separate folders. If you decide to do so, you will need to create all the folders for the output before running simulations. If you change, say, the parameter "DCDfile" to "dcd/<name>_<run>_<stage>.dcd", you will need to create the "dcd" subfolder in the simulation folder.
Make simular changes in the "pull.sop" configuration file. Since pulling simulations are using the last frame from the equilibrium simulations, make sure to change the parameter "coordinates".
To start pulling simulations following equilibrium simulations, use this shell script. To run the script, simply type the following command in the terminal window:
$ sh runall.sh

**** Since you have removed the parameter "run" from the configuration file, macros "<run>" now represents the set of "runnum" numbers (=100 in this case), starting from the value of parameter "firstrun" (=1 in this case). If you have, say "<run>.dcd" as your DCD output, the SOP-GPU program will save files "1.dcd" through "100.dcd", i.e. one file for each trajectory. Make sure that you keep "<run>" macros in all the output files so they will not overwrite each other.

Examples

The WW domain and tandems of WW monomers using the many-runs-per-GPU approach [tarball].
Forced indentation of the bacteriophage HK97 [tarball].

Questions?

For questions regarding how to install and run the SOP-GPU package, please contact:

Artem Zhmurov at zhmurov@gmail.com
Valeri Barsegov at Valeri_Barsegov@uml.edu

For questions about the SOP model, please contact:

Ruxandra Dima at dimari100@gmail.com

Acknowledgement

Please, be sure to cite SOP-GPU in your publications by refering to our papers:

A. Zhmurov, R. I. Dima, Y. Kholodov, and V. Barsegov, "SOP-GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors", Proteins, 78, 2984-2999 (2010).
A. Zhmurov, K. Rybnikov, Y. Kholodov, and V. Barsegov, "Generation of random numbers on graphics processors: Forced indentation in silico of the bacteriophage HK97", J. Phys. Chem. B, 115, 5278-5288 (2011)