Managing software with Environment Modules

Managing software with Environment Modules

When building a cluster we need to have software available in a standard way across all nodes of the cluster. When software is packaged with yum this is quite easy because we can just do install the package on each node, but often scientific software is compiled from source we would like to install it once and have it available across the cluster as a whole.

By default software that is compiled tends to install in a path such as /usr/local, but for a cluster a shared software area should be made available. For example /cluster/software which can be shared via NFS. Then a package could be installed into this area and available on the whole cluster.

An example: wcd

Originally created at Wits University by Scott Hazelhurst’s group, wcd is an open source EST clustering application that is available at code.google.com/p/wcdest. The latest version of the code is at this URL: https://wcdest.googlecode.com/files/wcd-express-0.6.3.tar.gz.

Download it with:

wget https://wcdest.googlecode.com/files/wcd-express-0.6.3.tar.gz

Then unpack it with:

tar xf wcd-express-0.6.3.tar.gz

This will unpack into a directory wcd-express-0.6.3. You now need to decide where to install it. Since we’ve already created clusters with NFS exports, do this installation on the head node of your cluster and choose a path in the NFS exported area. For example if you are exporting /cluster, create a path such as /cluster/software and install to /cluster/software/wcd-0.6.3. Your install will use the commands configure and make, so e.g.:

cd wcd-express-0.6.3
./configure --prefix=/cluster/software/wcd-0.6.3
make
sudo make install

You will also need the benchmark10000.seq dataset, so download this with:

wget http://www.bioinf.wits.ac.za/~scott/data/other/benchmark10000.seq

Environment Modules: a solution for your path problems

Environment Modules is software (based on TCL) for managing your PATH and other settings. It allows you to add settings to your environment for particular software packages. The settings to add are configured in module files. Install Environment Modules on your machine (this must be done on all machines, head nodes as well as worker nodes) with:

sudo yum install -y environment-modules

This creates a settings file in /etc/profile.d that is included every time a login shell starts, but for now import these settings into your shell with the command:

source /etc/profile.d/modules.sh

This provides the module command. You can see what modules are available by running:

module avail

Which should show something like:

----------- /usr/share/Modules/modulefiles --------------------------
dot         module-git  module-info modules     null        use.own

The line /usr/share/Modules/modulefiles is a module path, which specifies where module files are located. If you look into this path you’ll see some files with names like in the output of environment modules. E.g. this is the module file for dot:

#%Module1.0#####################################################################
##
## dot modulefile
##
## modulefiles/dot.  Generated from dot.in by configure.
##
proc ModulesHelp { } {
    global dotversion

    puts stderr "\tAdds `.' to your PATH environment variable"
    puts stderr "\n\tThis makes it easy to add the current working directory"
    puts stderr "\tto your PATH environment variable.  This allows you to"
    puts stderr "\trun executables in your current working directory"
    puts stderr "\twithout prepending ./ to the excutable name"
    puts stderr "\n\tVersion $dotversion\n"
}

module-whatis   "adds `.' to your PATH environment variable"

# for Tcl script use only
set dotversion  3.2.10

append-path PATH    .

Module files are written in TCL and start with the first line:

#%Module1.0

They typically start with a ModulesHelp procedure that explains something about the software the module file provides. E.g. for wcd we might have:

proc ModulesHelp { } {
    puts stderr "wcd 0.6.3 - EST assembler"
}

Then they typically add something to your PATH and possibly other environemt variables like LD_LIBRARY_PATH. If we installed wcd in /cluster/software/wcd-0.6.3, the actual wcd binary would be in /cluster/software/wcd-0.6.3/bin, so we can add that to the PATH with:

append-path PATH /cluster/software/wcd-0.6.3/bin

So the full wcd modules file would look like:

#%Module1.0
proc ModulesHelp { } {
    puts stderr "wcd 0.6.3 - EST assembler"
}
append-path PATH /cluster/software/wcd-0.6.3/bin

Putting this in the default system modulesfile defeats the object of making this available on the whole cluster, so we need to make a new modules file area, e.g. in /cluster/modules. This needs to be available to the whole cluster so should be created on the head node. Since there might be multiple versions of wcd available, we should put the module file in /cluster/modules/wcd/wcd-0.6.3.

Setting up the MODULEPATH

Environment Modules uses an environment variable named MODULEPATH to find its module files. Each node on the cluster should now look for modules in this new area, so if your new modules area is in /cluster/modules then you should create a file /etc/profile.d/more-modules.sh that contains:

#!/bin/sh

MODULEPATH=$MODULEPATH:/cluster/modules
export MODULEPATH

And include this setting in your settings with:

source /etc/profile.d/more-modules.sh

Now you should be able to run:

module add wcd

And then run wcd. Test wcd on benchmark10000.seq with:

wcd -c benchmark10000.seq

Using Environment Modules in your cluster scripts

Now test this on your cluster. Create a file runwcd.sh that contains:

#!/bin/bash

. /etc/profile.d/modules.sh

module add wcd
wcd -c benchmark10000.seq

And ensure that you are in the same directory as benchmark10000.seq, then submit this script to the queue with:

qsub -d $(pwd) runwcd.sh

Note that this must happen on the head node.

Leave a Reply

Your email address will not be published. Required fields are marked *