6. Running a Gromacs/LSDMap Workload¶

This section will discuss details about the execution phase. The input to the tool is given in terms of a resource configuration file and a workload configuration file. The execution is started based on the parameters set in these configuration files. In section 4.1, we discuss execution on Stampede and in section 4.2, we discuss execution on Archer.

6.1. Introduction¶

DM-d-MD (Diffusion-Map directed Molecular Dynamics) is an adaptive sampling algorithm based on LSDMap (Locally Scaled Diffusion Map), a nonlinear dimensionality reduction technique which provides a set of collective variables associated with slow time scales of Molecular Dynamics simulations (MD).

For an introduction to DM-d-MD, including how it can be used as a stand-alone tool, see J.Preto and C. Clementi, Phys. Chem. Chem. Phys., 2014, 16, 19181-19191 .

In a nutshell, DM-d-MD consists in periodically restarting multiple parallel GROMACS MD trajectories from a distribution of configurations uniformly sampled along LSDMap coordinates. In this way, during each DM-d-MD cycle, it becomes possible to visit a wider area of the configuration space without remaining trapped in local minima as it could be the case for plain MD simulations. As another feature, DM-d-MD includes a reweighting scheme that is used to keep track of the free energy landscape all along the procedure. A typical DM-d-MD cycle includes the following steps:

Simulation: Short MD trajectories are run using GROMACS starting from the set of configurations selected in step 3 (one trajectory per configuration). For the first cycle, the trajectories start from configurations specified within an input file provided by the user (option md_input_file in the Workload configuration file).
Analysis: LSDMap is computed from the endpoints of each trajectory. The LSDMap coordinates are stored in a file called lsdmap.ev.
Select + Reweighting: New configurations are selected among the endpoints so that the distribution of new configurations is uniform along LSDMap coordinates. The same endpoint can be selected as a new configuration more than once or can be not selected. At the same time, a statistical weight is provided to each new configuration in order to recover the free energy landscape associated with regular MD. The weights are stored in a file called weight.w.

In common with the other ExTASY workflows, a user prepares the necessary input files and ExTASY configuration files on their local workstation, and launches the job from there, but the calculations are then performed on the execution host, which is typically an HPC resource.

6.2. Required Input files¶

The GROMACS/LSDMap (DM-d-MD) workflow requires the user to prepare at least three GROMACS-style files, one configuration file used for LSDMap, and two ExTASY configuration files.

A topology file (.top format) (specified via the option top_file in the Workload configuration file).
An initial structure file (.gro format) (specified via the option md_input_file in the Workload configuration file).
A parameter file (.mdp format) for MD simulations (specified via the option mdp_file in the Workload configuration file).
A configuration file used for LSDMap (.ini format) (specified via the option lsdm_config_file in the Workload configuration file)
An ExTASY Resource configuration (*.rcfg) file.
An ExTASY Workload configuration (*.wcfg) file.

For more information about .top, .gro and .mdp formats, we refer the user to the following website http://manual.gromacs.org/current/online/files.html. Please note that the parameter “nsteps” specified in the .mdp file should correspond to the number of MD time steps of each DM-d-MD cycle. Documentation on GROMACS can be found on the official website: http://www.gromacs.org.

Here is an example of a typical LSDMap configuration file (config.ini):

[LSDMAP]
;metric used to compute the distance matrix (rmsd, cmd, dihedral)
metric=rmsd

;constant r0 used with cmd metric
r0=0.05

[LOCALSCALE]
;status (constant, kneighbor, kneighbor_mean)
status=constant

;constant epsilon used in case status is constant
epsilon=0.05

;value of k in case status is kneighbor or kneighbor_mean
k=30

Notes:

See the paper W. Zheng, M. A. Rohrdanz, M. Maggioni and C. Clementi, J. Chem. Phys., 2011, 134, 144109 for more information on how LSDMap works.
metric is the metric used with LSDMap (only rmsd, cmd (contact map distance) and dihedral metric are currently supported, see the paper `P. Cossio, A. Laio and F. Pietrucci, Phys. Chem. Chem. Phys., 2011, 13, 10421–10425 <http://pubs.rsc.org/en/Content/ArticleLanding/2011/CP/c0cp02675a#!divAbstract>`_for more information).
status in the section LOCALSCALE refers to the way the local scale is computed when performing LSDMap. constant means that the local scale is the same for all the configurations and is equal to the value specified via the parameter epsilon (in nm). kneighbor implies that the local scale of each MD configuration is given as the distance to its kth nearest neighbor, where k is given by the parameter k. kneighbor_mean means that the local scale is the same for all the configuration and is equal to the average kth-neighbor distance.

The resource and workload configuration files are discussed specific to the resource in the forthcoming sections. In section 6.3, we discuss execution on Stampede and in section 6.4, we discuss execution on Archer.

6.3. Running on Stampede¶

This section is to be done entirely on your laptop. The ExTASY tool expects two input files:

The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Stampede.

The workload configuration file defines the GROMACS/LSDMap workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.

Step 1: Create a new directory for the example,

mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/

Step 2: Download the config files and the input files directly using the following link.

wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/grlsd-on-stampede.tar
tar xf grlsd-on-stampede.tar
cd grlsd-on-stampede

Step 3: In the grlsd-on-stampede folder, a resource configuration file stampede.rcfg exists. Details and modifications required are as follows:

Note

For the purposes of this example, you require to change only:

UNAME

ALLOCATION

The other parameters in the resource configuration are already set up to successfully execute the workload in this example.

Step 4: In the grlsd-on-stampede folder, a workload configuration file gromacslsdmap.wcfg exists. Details and modifications are as follows:

Note

All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported for these examples.

Step 5: You can find the executable script `extasy_gromacs_lsdmap.py` in the grlsd-on-stampede folder.

Now you are can run the workload using :

python extasy_gromacs_lsdmap.py --RPconfig stampede.rcfg --Kconfig gromacslsdmap.wcfg

Note

Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.

Note

Time to completion: ~13 mins (from the time job goes through LRMS)

6.4. Running on Archer¶

This section is to be done entirely on your laptop. The ExTASY tool expects two input files:

The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Archer.

The workload configuration file defines the CoCo/Amber workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.

Step 1: Create a new directory for the example,

mkdir $HOME/extasy-tutorial/
cd $HOME/extasy-tutorial/

Step 2: Download the config files and the input files directly using the following link.

wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/grlsd-on-archer.tar
tar xf grlsd-on-archer.tar
cd grlsd-on-archer

Step 3: In the grlsd-on-archer folder, a resource configuration file archer.rcfg exists. Details and modifications required are as follows:

Note

For the purposes of this example, you require to change only:

UNAME

ALLOCATION

The other parameters in the resource configuration are already set up to successfully execute the workload in this example.

Step 4: In the grlsd-on-archer folder, a workload configuration file gromacslsdmap.wcfg exists. Details and modifications required are as follows:

Note

All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported.

Step 5: You can find the executable script `extasy_gromacs_lsdmap.py` in the grlsd-on-archer folder.

Now you are can run the workload using :

python extasy_gromacs_lsdmap.py --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg

Note

Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.

Note

Time to completion: ~15 mins (from the time job goes through LRMS)

6.5. Running on localhost¶

The above two sections describes execution on XSEDE.Stampede and EPSRC.Archer, assuming you have access to these machines. This section describes the changes required to the EXISTING scripts in order to get Gromacs-LSDMap running on your local machines (label to be used = local.localhost as in the generic examples).

Step 1: You might have already guessed the first step. You need to create a SingleClusterEnvironment object targetting the localhost machine. You can either directly make changes to the extasy_gromacs_lsdmap.py script or create a separate resource configuration file and provide it as an argument.

Step 2: The MD tools require some tool specific environment variables to be setup (AMBERHOME, PYTHONPATH, GCC, GROMACS_DIR, etc). Along with this, you would require to set the PATH environment variable to point to the binary file (if any) of the MD tool. Once you determine all the environment variables to be setup, set them on the terminal and test it by executing the MD command (possibly for a sample case). For example, if you have gromacs installed in $HOME as $HOME/gromacs_5. You probably have to setup GROMACS_DIR to $HOME/gromacs-5 and append $HOME/gromacs-5/bin to PATH. Please check official documentation of the MD tool.

Step 3: There are three options to proceed.

Once you tested the environment setup, next you need to add it to the particular kernel definition. You need to, first, locate the particular file to be modified. All the files related to Ensemble Toolkit are located within the virtualenv (say “myenv”). Go into the following path: myenv/lib/python-2.7/site-packages/radical/ensemblemd/kernel_plugins/md. This path contains all the kernels used for the MD examples. You can open the gromacs.py file and add an entry for local.localhost (in "machine_configs") as follows:
..
..
"machine_configs":
{

    ..
    ..

    "local.localhost":
    {
        "pre_exec"    : ["export GROMACS_DIR=$HOME/gromacs-5", "export PATH=$HOME/gromacs-5/bin:$PATH"],
        "executable"  : ["mdrun"],
        "uses_mpi"    : False       # Could be True or False
    },

    ..
    ..

}
..
..
This would have to be repeated for all the kernels.

Another option is to perform the same above steps. But leave the "pre_exec" value as an empty list and set all the environment variables in your bashrc ($HOME/.bashrc). Remember that you would still need to set the executable as above.

The third option is to create your own kernel plugin as part of your user script. These avoids the entire procedure of locating the existing kernel plugin files. This would also get you comfortable in using kernels other than the ones currently available as part of the package. Creating your own kernel plugins are discussed here

6.6. Understanding the Output of the Examples¶

In the local machine, a “output” folder is created and at the end of every checkpoint intervel (=nsave) an “iter*” folder is created which contains the necessary files to start the next iteration.

For example, in the case of gromacs-lsdmap on stampede, for 4 iterations with nsave=2:

grlsd-on-stampede$ ls
output/  config.ini  gromacslsdmap.wcfg  grompp.mdp  input.gro  stampede.rcfg  topol.top

grlsd-on-stampede/output$ ls
iter1/  iter3/

The “iter*” folder will not contain any of the initial files such as the topology file, minimization file, etc since they already exist on the local machine. In gromacs-lsdmap, the “iter*” folder contains the coordinate file and weight file required in the next iteration. It also contains a logfile about the lsdmap stage of the current iteration.

grlsd-on-stampede/output/iter1$ ls
2_input.gro  lsdmap.log  weight.w

On the remote machine, inside the pilot-* folder you can find a folder called “unit.00000”. This location is used to exchange/link/move intermediate data. The shared data is kept in “unit.00000/” and the iteration specific inputs/outputs can be found in their specific folders (=”unit.00000/iter*”).

$ cd unit.00000/
$ ls
config.ini  gro.py   input.gro   iter1/  iter3/    post_analyze.py  reweighting.py   run.py     spliter.py
grompp.mdp  gro.pyc  iter0/      iter2/  lsdm.py   pre_analyze.py   run_analyzer.sh  select.py  topol.top

As specified above, outputs of the DM-d-MD procedure can be used to recover the free energy landscape of the system. It is however the responsibility of the user to decide how many DM-d-MD cycles he/she wants to perform depending on the region of the configuration space he/she might want to explore. In general, the larger the number of DM-d-MD cycles, the better. However, different systems may require more or less cycles to achieve a complete exploration of their free energy landscape. The free energy landscape can be plotted every nsave cycle. The .gro file in backup/iterX can be used to compute any specific collective variables to build the free energy plot. The weights contained in the .w file should be used to “reweight” each configuration when computing the free energy histogram.