Adding and Specifying Compute Resources at Torque

This blog entry is the follow-up of Installing Torque 2.5 on CentOS 6 with xCAT tool. After installing of Torque on the Head Node and Compute Node, the next things to do is to configure the  Torque Server. In this blog entry, I will focus on the Configuring the Compute Resources at Torque Server

Step 1: Adding Nodes to the Torque Server

# qmgr -c "create node node01"

Step 2: Configure Auto-Detect Nodes CPU Detection. Setting auto_node_np to TRUE overwrites the value of np set in $TORQUEHOME/server_priv/nodes

# qmgr -c "set server auto_node_np = True"

Step 3: Start the pbs_mom of the compute nodes, the torque server will detect the nodes automatically

# service pbs_mom start

Installing Torque 2.5 on CentOS 6

A. Configuring for TORQUE Server

Step 1: Download the Torque Software from Adaptive Computing

# wget TORQUE Downloads

Step 2: Configure the Torque Server

./configure \
--prefix=/opt/torque \
--exec-prefix=/opt/torque/x86_64 \
--enable-docs \
--disable-gui \
--with-server-home=/var/spool/torque \
--enable-syslog \
--with-scp \
--disable-rpp \
--disable-spool \
--with-pam

Step 3: Compile Torque

# make
# make install

Step 4: Make packages for the clients

# make packages

You should have the following

torque-package-doc-linux-x86_64.sh
torque.setup
torque-package-clients-linux-x86_64.sh
torque-package-mom-linux-x86_64.sh
torque_setup.sh
torque-package-devel-linux-x86_64.sh
torque-package-server-linux-x86_64.sh
torque-package-pam-linux-x86_64.sh  
torque.spec

Step 5: Installing Torque as a service (pbs_mom)

I was unable to use the default “init.d” script found at $TORQUE/contrib/init.d to run as a service. But a workaround it to use the open-source XCAT which has a working pbs_mom /opt/xcat/share/xcat/netboot/add-on/torque/pbs_mom. To install the latest xcat, you may want to read the blog entry Dependency issues when installing xCAT 2.7 on CentOS 6

Assuming you have successful install xCAT, copy the pbs_mom script to /etc/init.d/pbs_mom

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_mom /etc/init.d/pbs_mom

Step 5a: Edit the /etc/init.d/pbs_mom and restart the service

# vim /etc/init.d/pbs_mom

Inside pbs_mom script

BASE_PBS_PREFIX=/opt/torque

#ulimit -n 20000
#ulimit -i 20000
ulimit -l unlimited

Save and exit.

At the console, do a start

# service pbs_mom start
Starting PBS Mom:                                          [  OK  ]

Step 5b: Installing Torque as a service (pbs_server)

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_server /etc/init.d/pbs_server

Inside the pbs_server script, just ensure that the BASE_PBS_PREFIX point to the right directory

BASE_PBS_PREFIX=/opt/torque

Save and Exit.

At the console, start the pbs_server service

# service pbs_server start
Starting PBS Server:                                       [ OK ]

Step 5c: Installing Torque as a service (pbs_sched)

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_sched /etc/init.d/pbs_sched

Inside the pbs_sched script, just ensure that the BASE_PBS_PREFIX point to the right directory

 BASE_PBS_PREFIX=/opt/torque

Save and Exit.

At the console, start the pbs_sched service

# service pbs_sched start
Starting PBS Scheduler:                                    [ OK ]

B. Configuring the TORQUE Clients

Step 1a: Copy the torque package to the nodes using xCAT

# pscp torque-package-mom-linux-x86_64.sh compute:/tmp
# pscp torque-package-clients-linux-x86_64 compute:/tmp

Step 1b: Run the scripts

# psh compute "/tmp/torque-package*.x86_64.sh --install"

Step 2a. Copy the /etc/init.d/pbs_mom to compute nodes

# pscp /etc/init.d/pbs_mom compute:/etc/init.d
# psh compute "/sbin/service pbs_mom start"

Further Information:

  1. Configuring the Torque Default Queue
  2. Adding and Specifying Compute Resources at Torque

Resources:

  1. TORQUE installation overview

Enabling Torque for email notification

Step 1:

  1. Do look at the article Configuring CentOS 5 as an SMTP Mail Client with sendmail for configuring your Torque Server to become a SMTP Mail Client.

Step 2:

Ensure the Torque Server has this line

  1. “set server mail_from = adm”(You can replace adm with another useird of your choice). You may want to take a look at Setting up Torque Server on xCAT 2.x from Linux Toolkit

Step 3:

Finally, to ensure that the batch system can send an email to the user when the job start, end or abort, you have to set 2 options

  1. -m switch which define wh information send
  2. -M switch on where the information will be send

For example,

# Send notification when job starts.
#PBS -m b
# Send notification when job finishes and aborts.
#PBS -m ea
# Send notification when job starts, finishes and aborts.
#PBS -m bea

A typical submission script will be

#!/bin/bash
#PBS -N jobname
#PBS -j oe
#PBS -V
#PBS -m bea
#PBS -M kittycool@linucluster.wordpress.com
#PBS -l nodes=2:ppn=8

## pre-processing script
cd $PBS_O_WORKDIR
NCPUS=`cat $PBS_NODEFILE | wc -l`
echo $NCPUS

Commonly used qstat options

Commonly used Qstat Options

 Options Description
qstat -i Display jobs that are non-running in alternative format
qstat -r Display jobs that are running
qstat -n In addition to basic information, it also provide information of nodes allocated to the job listed.
qstat -u users(s) Display jobs of a user or users
qstat -Q Status of queues
qstat -Q -f Full status of queues in the alternative format
qstat -q Status of queues in the alternative format
qstat -B Batch server status
qstat -B -f Full batch server status including configuration

Overview of MAUI Scheduler Commands

  MAUI is an open source job scheduler for clusters and supercomputers. It is an optimized, configurable tool capable of supporting an array of scheduling policies, dynamic priorities, extensive reservations, and fairshare capabilities.

This Blog Entry attempt to capture the essence of MAUI and some of the more commonly used commands and configuration.

To download MAUI Scheduler, go to Maui Cluster Scheduler. To download the MAUI Documentation, proceed to Cluster Resources Documentation

Useful commands for MAUI

1. Configuring MAUI Scheduler

  1. schedctl -R command can be used to reconfigure the scheduler at any time, forcing it to re-read all config files before continuing.
  2. Shut-down MAUI Scheduler
    # schedctl -k
  3. Stop maui scheduling
    # schedctl -s
  4. maui will resume scheduling immediately
    # schedctl -r

 2. Status Commands

 Maui provides an array of commands to organize and present information about the current state and historical statistics of the scheduler, jobs, resources, users, accounts, etc. The following commands are taken from Cluster Resources and reproduce here

checkjob -> display job state, resource requirements, environment, constraints,
credentials, history, allocated resources, and resource utilization
checknode -> Displays state information and statistics for the specified node.
diagnose -j -> display summarized job information and any unexpected state
diagnose -n -> display summarized node information and any unexpected state
diagnose -p -> display summarized job priority information
diagnose -r -> display summarized reservation information
showgrid -> display various aspects of scheduling performance across a job duration/job size matrix
showq -> display various views of currently queued active, idle, and non-eligible jobs
showstat -f -> display historical fairshare usage on a per credential basis
showstat -g -> display current and historical usage on a per group basis
showstat -u -> display current and historical usage on a per user basis
showstat -v -> display high level current and historical scheduling statistics

3. Job Management Commands

Maui shares job management tasks with the resource manager. The commands below the available job management commands

canceljob   -> cancel existing job
releasehold [-a]  -> remove job holds or defers
runjob   -> start job immediately if possible
sethold   -> set hold on job
setqos   -> set/modify QoS of existing job
setspri   -> adjust job/system priority of job

4. Reservation Management Commands

Maui exclusively controls and manages all advance reservation features including both standing and administrative reservations

diagnose -r -> display summarized reservation information and any unexpected state
releaseres -> remove reservations
setres -> immediately create an administrative reservation
showres -> display information regarding location and state of reservations

5. Policy/Config Management Commands

Maui allows dynamic modification of most scheduling parameters allowing new scheduling policies, algorithms, constraints, and permissions to be set at any time.

changeparam  -> immediately change parameter value
schedctl  -> control scheduling behavior (i.e., stop/start scheduling, recycle, shutdown, etc.)
showconfig ->  display settings of all configuration parameters

6. End User Commands

canceljob ->  cancel existing job
checkjob  -> display job state, resource requirements, environment, constraints, credentials, history, allocated resources, and resource utilization
showbf  -> show resource availability for jobs with specific resource requirements
showq ->  display detailed prioritized list of active and idle jobs
showstart ->  show estimated start time of idle jobs
showstats  -> show detailed usage statistics for users, groups, and accounts which the end user has access to

MAUI Installation on Torque and xCAT

Maui Cluster Scheduler (a.k.a. Maui Scheduler) is the first generation cluster scheduler, precursor to the highly successful MOAB scheduler. Maui is an advanced policy engine used to improve the manageability and efficiency of machines ranging from clusters of a few processors to multi-teraflop supercomputers.

Taken and modified from http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Maui
 

Step 1: Download MAUI tarball from Cluster Resources

Create an account and download at http://www.clusterresources.com/product/maui/index.php
Untar in /tmp

Step 2: Configure soft links for Torque

# cd /opt/torque
# ln -s x86_64/bin .
# ln -s x86_64/lib .
# ln -s x86_64/sbin .

# export PATH=$PATH:/opt/torque/x86_64/bin/

Step 3: Configure and Install MAUI

# cd maui-3.2.6p21
# ./configure --prefix=/opt/maui --with-pbs=/opt/torque/ --with-spooldir=/opt/maui
# make -j8
# make install
# cp /opt/xcat/share/xcat/netboot/add-on/torque/moab /etc/init.d/maui
(Edit /etc/init.d/maui so that all MOAB is MAUI and all moab becomes maui)
# service start maui
# chkconfig --level 345 maui on

Step 4: Configure MAUI and maui.cfg

# touch /etc/profile.d/maui.sh
# vim maui (Type: export PATH=$PATH:/opt/maui/bin)
# source /etc/profile.d/maui
# vim /usr/local/maui/maui.cfg
(Change: RMCFG[] TYPE=PBS@...@ to:
RMCFG[] TYPE=PBS)
# service maui restart

(If there is MAUI error regarding the Torque Server host name, ensure the host name sequence changes in /etc/hosts). Assuming pbs_server.com is the name of the Torque Server name used in its configuration file, it should come first before other aliases)

192.168.1.5       pbs_server.com    pbsserver

Step 5: Test the Configuration

# showq

(You should see all of the processors. Next try running a job to make sure that maui picks it up.)