New Position!

As of Fall 2015, I am joining the Department of Electrical & Computer Engineering at the University of Arizona.

Advertisements

Performing Tasks Such As Monte Carlo Simulations on a Cluster

I have been using Drexel’s cluster more and more over the past few months, and I have had several people ask me for help on how to submit jobs. In particular, how can I do something like a Monte Carlo simulation. I figured I would share a way to go about doing it with Univa Grid Engine and a simple Matlab program.

First, we need a Matlab program that does a single simulation. To enable reproducibility, the random seed needs to be set, and ideally each simulation needs to have a different random seed. To see why, open Matlab and run rand. Then close Matlab and do it again. Notice anything interesting. Now, set the random seed with some integer which we need to pass in through our SGE script. Our Matlab script is going to set the random seed then do something, such as your simulation then write an output file, and we are provided the file name. This script looks like:

function matlab_demo(output_fp, seed)
  % set the random seed.
  rng(seed);               % IMPORTANT
  a_random_number = rand;  % or do you simulation here
  save(output_fp);
end

Next, we need a shell script (I’ll call mine submitter.sh) to submit to grid engine. Nothing fancy here, just make sure you have two slots set aside with pe shm and set -t to contain the number of simulations you need to run. With every task, grid engine will assign a task ID ($SGE_TASK_ID). We can use the ID to: (i) set our random seed, and (ii) give our output file a unique name. The other thing to watch out for is where you write your files to. It seems convenient to write the output file to your home directory; however, this should not be perform, particularly if you have several files you’re writing in a single script. Therefore, you should write to the scratch space, which is in $TMP, then move the file from scratch to your home.

#!/bin/bash -l
#$ -cwd
#$ -q all.q
#$ -t 1-50
#$ -j y
#$ -M your@email.edu
#$ -P yourGroupsPrj
#$ -S /bin/bash
#$ -pe shm 2
#$ -e /tmp/
#$ -o /tmp/
#$ -l h_vmem=3G
#$ -l h_rt=72:00:00

# boiler plate module loading
. /etc/profile.d/modules.sh
module load shared
module load proteus
module load sge/univa

# load the modules that your program is going to need
module load matlab/R2013a

# set the matlab path if you have other scripts your going to need
MATLABPATH=/path/to/your/matlab/files/

# setup a couple enviromental variables
# - write a different file for each task in the array job (this may be
#   analagous to 1 MC similation.
temp_fp=${TMP}/result_file_${SGE_TASK_ID}.mat
# - where are we going to write the result in our home directory
file_fp=/home/your_home/result_file_${SGE_TASK_ID}.mat

# call our matlab function, you may need to remove "singleCompThread" if
# your using the parallel computing toolbox
matlab -singleCompThread -nosplash -nodisplay -r "matlab_demo('${temp_fp}', ${SGE_TASK_ID})"

# now that we have saved our file to the scratch space in the matlab program,
# move the file back to our load folder.
cp ${temp_fp} ${file_fp}

Then in the shell:

qsub submitter.sh

And thats about it! Now all you need to do is write a reduce script to summarize the results from the 50 simulations. Always refer to the documentation if you need to know how to set the flags of your grid engine script.

ACM International Workshop on Big Data in Life Sciences

Gail Rosen and I have an invited talk at the ACM International Workshop on Big Data in Life Sciences (BigLS), which is being held in conjunction with the ACM BCB conference. I will be in Newport Beach, CA to give the talk. We have released most of the code required to reproduce the result on GitHub. Note that the shell script in the root of the directory is used by be to run IPython on our lab’s server. If you’re interested, you can find instructions on how to do this here.

Notes from the WCCI

I sat through several sessions and met some really great people at the WCCI in Beijing. First, Yann LaCun gave one of the best plenary lectures I have had the pleasure of sitting through. Yann gave a great overview of learning representations and convolutional neural networks, and finished off the talk with a really impressive demo. Here are a couple of notes that I took sitting through some of the sessions.

  • Paul Werbos gave a great talk on where he sees some of the grand challenges. One of the themes of his talk was combining approaches from supervised, unsupervised and reinforcement learning, which is an area of interests in his division at the NSF as well as improvements to the optimization of ADP problems.
    P. Werbos, “From ADP to the Brain: Foundations, roadmap, challenges and research priorities,” in International Joint Conference on Neural Networks, 2014.
  • S. Wang et al. presented an interesting multi-objective optimization problem to find the Pareto-optimal weights for OOB and UOB, such that the minority class and majority class recalls are simultaneously maximized.
    S. Wang, L. L. Minku, and X. Yao, “A multi-objective ensemble method for online class imbalance learning,” in International Joint Conference on Neural Networks, 2014.

Maybe I’ll update this later with more notes.

IEEE CIS Travel Award

I am a recipient of a travel award to the IEEE World Congress on Computational Intelligence (WCCI), which is being held in Beijing on July 6-13. The award is generously provided by the IEEE Computational Intelligence Society.