Managing cluster resources with Torque and Maui

Scheduling jobs with Torque and Maui

A resource manager is a suite of software that manages the resources on a computing cluster. Such a cluster consists of a management node, one or more worker nodes and (at least) one login node. Typically there is storage provided from the cluster from a storage server or storage array – this is mounted via a network filesystem like NFS or GlusterFS.

cluster diagram

To share the resources of the cluster jobs are submitted to the resource manager and the research manager works with a scheduler to fairly allocate jobs to the worker nodes on the cluster. In addition to the resource manager and scheduler, clusters require a shared filesystem (often provided by NFS) and a shared set of users. This allows jobs to be run on any node, because the view of storage and users is the same across the cluster. The cluster we build in this session will use the Torque resource manager that works alongside the Maui scheduler.

Torque and Maui are maintained by a company called Adaptive Computing, but the versions that we will be using are provided by the European Middleware Initiative (EMI), that provides a software platform for high performance computing (HPC). To use the EMI software we first need to enable the relevant software respositories, and prepare the Torque/Maui software, which can be done with this script (that needs to be run as root):

#!/bin/bash

yum install -y wget libselinux-python yum-plugin-downloadonly yum-plugin-priorities

echo "[software.vulnerability.group.fixes-sl-6-x86_64]
name=Repository for software.vulnerability.group.fixes (o/s: sl6 arch: x86_64)
baseurl=http://repository.egi.eu/community/software/software.vulnerability.group.fixes/torque/releases/sl/6/x86_64/RPMS/
enabled=1
gpgcheck=0" >/etc/yum.repos.d/egi-vuln.repo

for reponame in base contribs third-party ; do
  wget -O /etc/yum.repos.d/emi3-${reponame}.repo -c http://emisoft.web.cern.ch/emisoft/dist/EMI/3/repos/sl6/emi3-${reponame}.repo
done
rpm --import http://emisoft.web.cern.ch/emisoft/dist/EMI/3/RPM-GPG-KEY-emi

PRIORITIESCONF=/etc/yum/pluginconf.d/priorities.conf
grep 'check_obsoletes = 1' $PRIORITIESCONF
STATUS=$?
if [ -e $PRIORITIESCONF -a $STATUS -ne 0 ] ; then
  echo 'check_obsoletes = 1' >> $PRIORITIESCONF
fi

yum install -y --downloadonly emi-torque-server emi-torque-client 

This script is available in the training lab at http://train0.bi.up.ac.za/install_torque.sh. This can be installed using the command:

wget -O - http://train0.bi.up.ac.za/install_torque.sh |sudo bash

(If you do not have wget available install it with yum install -y wget)

Once the script has finished running you can do the actual Torque/Maui install with:

sudo yum install -y emi-torque-server emi-torque-client

Torque and Maui configuration

Daemons, firewall and authentication

Torque makes use two daemons, pbs_server and pbs_mom. The pbs_server keeps track of the state of resources and jobs on the cluster and pbs_mom manages jobs on an individual worker node. Torque supports pluggable schedulers, so Maui interfaces with pbs_server to provide scheduling (and is thus typically run on the same macine).

To enable communication between pbs_server, Maui and pbs_mom you need to open TCP ports 15001 and 15003 on the cluster management server and port 150003 on the worker nodes. If you have ferm installed you can configure the firewall with this ferm script:

chain INPUT {
    proto tcp dport (
        15003 15001
    ) ACCEPT;
}

The Torque server installed from EMI uses munged for authentication and thus this needs to be configured on each node in the cluster. First install munged with:

sudo yum install -y munge

And then do the initial munged configuration:

sudo create-munge-key

This creates a key in /etc/munge/munge.key that needs to be copied to the same location on each worker node. This file should be owned by user munge and group munge and mode 0600, so that it shows up as:

-r--------. 1 munge munge 1024 Feb  7 17:29 munge.key

Once the munge key is created, you need to start the munge daemon with:

sudo service munge start

Torque configuration

Defining worker nodes and the management server

The file /var/spool/pbs/server_priv/nodes defines the worker nodes in the cluster. E.g. for a cluster with 4 workers named worker1 to worker4 in domain example.com, each having 4 CPU cores, it would contain:

worker1.example.com np=4
worker2.example.com np=4
worker3.example.com np=4
worker4.example.com np=4

The np specification species how many processors are available on each node. In our example, however, we’re only going to have a single worker node, so make a /var/spool/pbs/server_priv/nodes file containing something like:

train5.bi.up.ac.za np=3

Where train5 is replaced with the name of your computer and np=3 is specified so that only 3 processor cores are made available.

Then the pbs_mom daemons need to know where to find the pbs_server. If your pbs_server was train6.bi.up.ac.za you’d put this in the /var/spool/pbs/mom_priv/config file:

$pbsserver train6.bi.up.ac.za

Modify that line for your specific setup. Finally set the server name in /var/spool/pbs/server_name to the name of the management server, e.g. if you are running on train6 this file should contain:

train6.bi.up.ac.za

Initialising the pbs_server database

The Torque resource management server, pbs_server, manages a database tracking the state of the cluster. To initialise the database, first stop the running pbs_server with

sudo service pbs_server stop 

and then initialise a new database with:

sudo pbs_server -t create

By default the only user allowed to manager the Torque pbs_server is root on the manager host. You can now add an user to manage the pbs_server and batch jobs with with qmgr command. If the user is called myuser and will be working on master.example.com then use:

qmgr -c 'set server operators += myuser@master.example.com'
qmgr -c 'set server managers += myuser@master.example.com'

You can also use a host pattern to allow administration from more than a single host, e.g.:

qmgr -c 'set server operators += myuser@*.example.com'
qmgr -c 'set server managers += myuser@*.example.com'

This would give myuser permission to make changes from any host in the example.com domain.

One final example, for the user train5 on train5.bi.up.ac.za:

qmgr -c 'set server operators += train5@train5.bi.up.ac.za'
qmgr -c 'set server managers += train5@train5.bi.up.ac.za'

Note that the hostname that you specify must exist, either in DNS or in /etc/hosts.

Having set up the operator permissions, you can now do some other basic setup:

qmgr -c 'set server scheduling = true'
qmgr -c 'set server keep_completed = 300'
qmgr -c 'set server mom_job_sync = true'

And now create and set some default settings for a queue named batch:

qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type = execution'
qmgr -c 'set queue batch started = true'
qmgr -c 'set queue batch enabled = true'
qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
qmgr -c 'set queue batch resources_default.nodes = 1'
qmgr -c 'set server default_queue = batch'

This will create and enable the queue named batch and set some default resource limits. By default jobs will use 1 processor and run for a maximum of 1 hour (1 hour of “wallclock” time).

Configuring the Maui scheduler

While pbs_server keeps track of the state of batch jobs, it doesn’t make decisions about when they should run and which one should run first. The Maui scheduler communicates with pbs_server and signals when jobs should run. It is configured using /var/spool/maui/maui.cfg which looks like this:

#
# Maui configuration example
# @(#)Maui.cfg David Groep 20031015.1
# for Maui version 3.2.5
#
SERVERHOST              master.example.com
ADMIN1                  root myuser
ADMINHOST               master.example.com
RMCFG[0]                TYPE=PBS

SERVERPORT            40559
SERVERMODE            NORMAL

# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL        00:00:10

# a max. 10 MByte log file in a logical location
LOGFILE               /var/log/maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

The parameters in this file are explained in the Maui parameters documentation from Adaptive Computing. SERVERHOST refers to the server where Maui is running, ADMIN1 is a space separated list of users that are allowed to manage the Maui scheduler and ADMINHOST is the name of the server from which they can manage the scheduler. The RMCFG[0] line specifies the details of the resource manager that the scheduler communicates with. This is a list of key/value pairs that are documented here.

Final steps and testing

After completing the installation restart the Torque/Maui services

sudo service pbs_server restart
sudo service pbs_mom restart
sudo service maui restart

Then make sure the daemons are started at boot time using:

sudo chkconfig --add munge
sudo chkconfig --add pbs_server
sudo chkconfig --add pbs_mom
sudo chkconfig --add maui

At this point you should be able to run qstat, which will return with no output. Test your new installation by submitting a simple job:

echo 'echo Hello World' | qsub

This will submit the command echo Hello World to the default job queue. You should see output similar to:

$ echo 'echo Hello World' |qsub
0.train6

This means that your job has been submitted as job 0.train6. If you now run qstat you will see something like:

$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.train6                      STDIN            pvh                    0 C batch  

This shows that your job has run. It will put output in home directory in files named STDIN.o0 and STDIN.e0 for the stderr and stdout of your job respectively. Note that in order to produce this output the node where the job runs must be able to copy files to the node where the job was submitted. This is typically done by allowing key-based passwordless ssh access between all nodes in the cluster.

Leave a Reply

Your email address will not be published. Required fields are marked *