4. Running Generic Examples

4.1. Multiple Simulations Single Analysis Application with SAL pattern

This example shows how to use the SAL pattern to execute 4 iterations of a simulation analysis loop with multiple simulation instances and a single analysis instance. We skip the pre_loop step in this example. Each simulation_stage generates 16 new random ASCII files. One ASCII file in each of its instances. In the analysis_stage, the ASCII files from each of the simulation instances are analyzed and character count is performed on each of the files using one analysis instance. The output is downloaded to the user machine.

[S]    [S]    [S]    [S]    [S]    [S]    [S]
 |      |      |      |      |      |      |
 \-----------------------------------------/
                      |
                     [A]
                      |
 /-----------------------------------------\
 |      |      |      |      |      |      |
[S]    [S]    [S]    [S]    [S]    [S]    [S]
 |      |      |      |      |      |      |
 \-----------------------------------------/
                      |
                     [A]
                      .
                      .

Warning

In order to run this example, you need access to a MongoDB server and set the RADICAL_PILOT_DBURL in your environment accordingly. The format is mongodb://hostname:port.

4.1.1. Run locally

  • Step 1: View the example source below. You can download the generic examples using the following:
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/generic.tar
tar xf generic.tar
cd generic

Note

The files in the above link are configured to run for the tutorial. The source at the end of this page is generic and might require changes.

  • Step 2: Run the multiple_simulations_single_analysis:
python multiple_simulations_single_analysis.py

Once the script has finished running, you should see the character frequency files generated by the individual ensembles (cfreqs-1.dat) in the in the same directory you launched the script in. You should see as many such files as were the number of iterations. Each analysis stage generates the character frequency file for all the files generated in the simulation stage every iteration.

Note

Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.

4.1.2. Run remotely

By default, simulation and analysis stages run on one core your local machine:

SingleClusterEnvironment(
resource="local.localhost",
cores=1,
walltime=30,
username=None,
project=None
)

You can change the script to use a remote HPC cluster and increase the number of cores to see how this affects the runtime of the script as the individual simulations instances can run in parallel. For example, execution on xsede.stampede using 16 cores would require:

SingleClusterEnvironment(
resource="xsede.stampede",
cores=16,
walltime=30,    #minutes
username=None,  # add your username here
project=None # add your allocation or project id here if required
)

4.1.3. Example Script

Download multiple_simulations_single_analysis.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#!/usr/bin/env python



__author__       = "Vivek <vivek.balasubramanian@rutgers.edu>"
__copyright__    = "Copyright 2014, http://radical.rutgers.edu"
__license__      = "MIT"
__example_name__ = "Multiple Simulations Instances, Single Analysis Instance Example (MSSA)"


import sys
import os
import json

from radical.ensemblemd import Kernel
from radical.ensemblemd import SimulationAnalysisLoop
from radical.ensemblemd import EnsemblemdError
from radical.ensemblemd import ResourceHandle


# ------------------------------------------------------------------------------
# Set default verbosity

if os.environ.get('RADICAL_ENTK_VERBOSE') == None:
	os.environ['RADICAL_ENTK_VERBOSE'] = 'REPORT'

# ------------------------------------------------------------------------------
#
class MSSA(SimulationAnalysisLoop):
	"""MSMA exemplifies how the MSMA (Multiple-Simulations / Multiple-Analsysis)
	   scheme can be implemented with the SimulationAnalysisLoop pattern.
	"""
	def __init__(self, iterations, simulation_instances, analysis_instances):
		SimulationAnalysisLoop.__init__(self, iterations, simulation_instances, analysis_instances)


	def simulation_stage(self, iteration, instance):
		"""In the simulation step we
		"""
		k = Kernel(name="misc.mkfile")
		k.arguments = ["--size=1000", "--filename=asciifile.dat"]
		return [k]

	def analysis_stage(self, iteration, instance):
		"""In the analysis step we use the ``$PREV_SIMULATION`` data reference
		   to refer to the previous simulation. The same
		   instance is picked implicitly, i.e., if this is instance 5, the
		   previous simulation with instance 5 is referenced.
		"""
		link_input_data = []
		for i in range(1, self.simulation_instances+1):
			link_input_data.append("$PREV_SIMULATION_INSTANCE_{instance}/asciifile.dat > asciifile-{instance}.dat".format(instance=i))

		k = Kernel(name="misc.ccount")
		k.arguments            = ["--inputfile=asciifile-*.dat", "--outputfile=cfreqs.dat"]
		k.link_input_data      = link_input_data
		k.download_output_data = "cfreqs.dat > cfreqs-{iteration}.dat".format(iteration=iteration)
		return [k]


# ------------------------------------------------------------------------------
#
if __name__ == "__main__":

	try:

		# Create a new static execution context with one resource and a fixed
		# number of cores and runtime.
		cluster = ResourceHandle(
				resource='local.localhost',
				cores=1,
				walltime=15,
				#username=None,

				#project=None,
				#queue = None,

				database_url='mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot',
				#database_name=None,
				#access_schema=None
		)

		# Allocate the resources.
		cluster.allocate()

		# We set both the the simulation and the analysis step 'instances' to 16.
		# If they
		mssa = MSSA(iterations=4, simulation_instances=16, analysis_instances=1)

		cluster.run(mssa)

		cluster.deallocate()

	except EnsemblemdError, er:

		print "Ensemble MD Toolkit Error: {0}".format(str(er))
		raise # Just raise the execption again to get the backtrace

In line 55, a SingleClusterEnvironment (Execution context) is created targetted to reserve 1 core on localhost for a duration of 30 mins. In line 64, an allocation request is made for the particular execution context.

In line 16, we define the pattern class to be the SAL pattern. We skip the definition of the pre_loop step since we do not require it for this example. In line 27, we define the kernel that needs to be executed during the simulation stage (mkfile) as well as the arguments to the kernel. In line 41, we define the kernel that needs to be executed during the analysis stage (ccount). In lines 38-39, we create a list of references to the output data created in each of the simulation instances, in order to stage it in during the analysis instance (line 43).

In line 68, we create an instance of this MSSA class to run 4 iterations of 16 simulation instances and 1 analysis instance. We run this pattern in the execution context in line 70 and once completed we deallocate the acquired resources (line 72).

4.2. Multiple Simulations Multiple Analysis Application with SAL pattern

This example shows how to use the SAL pattern to execute 4 iterations of a simulation analysis loop with multiple simulation instances and multiple analysis instance. We skip the pre_loop step in this example. Each simulation_stage generates 16 new random ASCII files. One ASCII file in each of its instances. In the analysis_stage, the ASCII files from the simulation instances are analyzed and character count is performed. Each analysis instance uses the file generated by the corresponding simulation instance. This is possible since we use the same number of instances for simulation and analysis.The output is downloaded to the user machine.

[S]    [S]    [S]    [S]    [S]    [S]    [S]    [S]
 |      |      |      |      |      |      |      |
[A]    [A]    [A]    [A]    [A]    [A]    [A]    [A]
 |      |      |      |      |      |      |      |
[S]    [S]    [S]    [S]    [S]    [S]    [S]    [S]
 |      |      |      |      |      |      |      |
[A]    [A]    [A]    [A]    [A]    [A]    [A]    [A]

Warning

In order to run this example, you need access to a MongoDB server and set the RADICAL_PILOT_DBURL in your environment accordingly. The format is mongodb://hostname:port.

4.2.1. Run locally

  • Step 1: View the example sources below. You can download the generic examples using the following (same link as above):
wget https://bitbucket.org/extasy-project/extasy-workflows/downloads/generic.tar
tar xf generic.tar
cd generic

Note

The files in the above link are configured to run for the CECAM workshop. The source at the end of this page is generic and might require changes.

  • Step 2: Run the multiple_simulations_multiple_analysis:
python multiple_simulations_multiple_analysis.py

Once the script has finished running, you should see the character frequency files generated by the individual ensembles (cfreqs-1-1.dat) in the in the same directory you launched the script in. You should see as many such files as were the number of iterations times the number of ensembles (i.e. simulation/analysis width). Each analysis stage generates the character frequency file for each of the files generated in the simulation stage every iteration.

Note

Environment variable RADICAL_ENMD_VERBOSE is set to REPORT in the python script. This specifies the verbosity of the output. For more verbose output, you can use INFO or DEBUG.

4.2.2. Run remotely

By default, simulation and analysis stages run on one core your local machine:

SingleClusterEnvironment(
resource="local.localhost",
cores=1,
walltime=30,
username=None,
project=None
)

You can change the script to use a remote HPC cluster and increase the number of cores to see how this affects the runtime of the script as the individual simulations instances can run in parallel. For example, execution on xsede.stampede using 16 cores would require:

SingleClusterEnvironment(
resource="xsede.stampede",
cores=16,
walltime=30,
username=None,  # add your username here
project=None # add your allocation or project id here if required
)

4.2.3. Example Script

Download multiple_simulations_multiple_analysis.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
#!/usr/bin/env python

__author__       = "Vivek <vivek.balasubramanian@rutgers.edu>"
__copyright__    = "Copyright 2014, http://radical.rutgers.edu"
__license__      = "MIT"
__example_name__ = "Multiple Simulations Instances, Multiple Analysis Instances Example (MSMA)"

import sys
import os
import json


from radical.ensemblemd import Kernel
from radical.ensemblemd import SimulationAnalysisLoop
from radical.ensemblemd import EnsemblemdError
from radical.ensemblemd import ResourceHandle


# ------------------------------------------------------------------------------
# Set default verbosity

if os.environ.get('RADICAL_ENTK_VERBOSE') == None:
	os.environ['RADICAL_ENTK_VERBOSE'] = 'REPORT'

# ------------------------------------------------------------------------------
#
class MSMA(SimulationAnalysisLoop):
	"""MSMA exemplifies how the MSMA (Multiple-Simulations / Multiple-Analsysis)
	   scheme can be implemented with the SimulationAnalysisLoop pattern.
	"""
	def __init__(self, iterations, simulation_instances, analysis_instances):
		SimulationAnalysisLoop.__init__(self, iterations, simulation_instances, analysis_instances)


	def simulation_stage(self, iteration, instance):
		"""In the simulation step we
		"""
		k = Kernel(name="misc.mkfile")
		k.arguments = ["--size=1000", "--filename=asciifile.dat"]
		return k

	def analysis_stage(self, iteration, instance):
		"""In the analysis step we use the ``$PREV_SIMULATION`` data reference
		   to refer to the previous simulation. The same
		   instance is picked implicitly, i.e., if this is instance 5, the
		   previous simulation with instance 5 is referenced.
		"""
		k = Kernel(name="misc.ccount")
		k.arguments            = ["--inputfile=asciifile.dat", "--outputfile=cfreqs.dat"]
		k.link_input_data      = "$PREV_SIMULATION/asciifile.dat".format(instance=instance)
		k.download_output_data = "cfreqs.dat > cfreqs-{iteration}-{instance}.dat".format(instance=instance, iteration=iteration)
		return k


# ------------------------------------------------------------------------------
#
if __name__ == "__main__":

	try:

		# Create a new static execution context with one resource and a fixed
		# number of cores and runtime.
		cluster = ResourceHandle(
				resource='local.localhost',
				cores=1,
				walltime=15,
				#username=None,
				#project=None,
				#queue = None,

				database_url='mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot',
				#database_name=,
				#access_schema=None,
		)

		# Allocate the resources.
		cluster.allocate()

		# We set both the the simulation and the analysis step 'instances' to 8.
		msma = MSMA(iterations=2, simulation_instances=8, analysis_instances=8)

		cluster.run(msma)

		cluster.deallocate()

	except EnsemblemdError, er:

		print "Ensemble MD Toolkit Error: {0}".format(str(er))
		raise # Just raise the execption again to get the backtrace

In line 51, a SingleClusterEnvironment (Execution context) is created targetted to reserve 1 core on localhost for a duration of 30 mins. In line 60, an allocation request is made for the particular execution context.

In line 16, we define the pattern class to be the SAL pattern. We skip the definition of the pre_loop step since we do not require it for this example. In line 27, we define the kernel that needs to be executed during the simulation stage (mkfile) as well as the arguments to the kernel. In line 37, we define the kernel that needs to be executed during the analysis stage (ccount). In lines 39, we refer to the output data created in the previous simulation stage with the same instance number as that of the current analysis instance.

In line 63, we create an instance of this MSSA class to run 4 iterations of 16 simulation instances and 1 analysis instance. We run this pattern in the execution context in line 65 and once completed we deallocate the acquired resources (line 67).