Performing Tasks Such As Monte Carlo Simulations on a Cluster

I have been using Drexel’s cluster more and more over the past few months, and I have had several people ask me for help on how to submit jobs. In particular, how can I do something like a Monte Carlo simulation. I figured I would share a way to go about doing it with Univa Grid Engine and a simple Matlab program.

First, we need a Matlab program that does a single simulation. To enable reproducibility, the random seed needs to be set, and ideally each simulation needs to have a different random seed. To see why, open Matlab and run rand. Then close Matlab and do it again. Notice anything interesting. Now, set the random seed with some integer which we need to pass in through our SGE script. Our Matlab script is going to set the random seed then do something, such as your simulation then write an output file, and we are provided the file name. This script looks like:

function matlab_demo(output_fp, seed)
  % set the random seed.
  rng(seed);               % IMPORTANT
  a_random_number = rand;  % or do you simulation here
  save(output_fp);
end

Next, we need a shell script (I’ll call mine submitter.sh) to submit to grid engine. Nothing fancy here, just make sure you have two slots set aside with pe shm and set -t to contain the number of simulations you need to run. With every task, grid engine will assign a task ID ($SGE_TASK_ID). We can use the ID to: (i) set our random seed, and (ii) give our output file a unique name. The other thing to watch out for is where you write your files to. It seems convenient to write the output file to your home directory; however, this should not be perform, particularly if you have several files you’re writing in a single script. Therefore, you should write to the scratch space, which is in $TMP, then move the file from scratch to your home.

#!/bin/bash -l
#$ -cwd
#$ -q all.q
#$ -t 1-50
#$ -j y
#$ -M your@email.edu
#$ -P yourGroupsPrj
#$ -S /bin/bash
#$ -pe shm 2
#$ -e /tmp/
#$ -o /tmp/
#$ -l h_vmem=3G
#$ -l h_rt=72:00:00

# boiler plate module loading
. /etc/profile.d/modules.sh
module load shared
module load proteus
module load sge/univa

# load the modules that your program is going to need
module load matlab/R2013a

# set the matlab path if you have other scripts your going to need
MATLABPATH=/path/to/your/matlab/files/

# setup a couple enviromental variables
# - write a different file for each task in the array job (this may be
#   analagous to 1 MC similation.
temp_fp=${TMP}/result_file_${SGE_TASK_ID}.mat
# - where are we going to write the result in our home directory
file_fp=/home/your_home/result_file_${SGE_TASK_ID}.mat

# call our matlab function, you may need to remove "singleCompThread" if
# your using the parallel computing toolbox
matlab -singleCompThread -nosplash -nodisplay -r "matlab_demo('${temp_fp}', ${SGE_TASK_ID})"

# now that we have saved our file to the scratch space in the matlab program,
# move the file back to our load folder.
cp ${temp_fp} ${file_fp}

Then in the shell:

qsub submitter.sh

And thats about it! Now all you need to do is write a reduce script to summarize the results from the 50 simulations. Always refer to the documentation if you need to know how to set the flags of your grid engine script.

Advertisements

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s