Adding and Specifying Compute Resources at Torque

This blog entry is the follow-up of Installing Torque 2.5 on CentOS 6 with xCAT tool. After installing of Torque on the Head Node and Compute Node, the next things to do is to configure the  Torque Server. In this blog entry, I will focus on the Configuring the Compute Resources at Torque Server

Step 1: Adding Nodes to the Torque Server

# qmgr -c "create node node01"

Step 2: Configure Auto-Detect Nodes CPU Detection. Setting auto_node_np to TRUE overwrites the value of np set in $TORQUEHOME/server_priv/nodes

# qmgr -c "set server auto_node_np = True"

Step 3: Start the pbs_mom of the compute nodes, the torque server will detect the nodes automatically

# service pbs_mom start

Installing Torque 2.5 on CentOS 6

A. Configuring for TORQUE Server

Step 1: Download the Torque Software from Adaptive Computing

# wget TORQUE Downloads

Step 2: Configure the Torque Server

./configure \
--prefix=/opt/torque \
--exec-prefix=/opt/torque/x86_64 \
--enable-docs \
--disable-gui \
--with-server-home=/var/spool/torque \
--enable-syslog \
--with-scp \
--disable-rpp \
--disable-spool \
--with-pam

Step 3: Compile Torque

# make
# make install

Step 4: Make packages for the clients

# make packages

You should have the following

torque-package-doc-linux-x86_64.sh
torque.setup
torque-package-clients-linux-x86_64.sh
torque-package-mom-linux-x86_64.sh
torque_setup.sh
torque-package-devel-linux-x86_64.sh
torque-package-server-linux-x86_64.sh
torque-package-pam-linux-x86_64.sh  
torque.spec

Step 5: Installing Torque as a service (pbs_mom)

I was unable to use the default “init.d” script found at $TORQUE/contrib/init.d to run as a service. But a workaround it to use the open-source XCAT which has a working pbs_mom /opt/xcat/share/xcat/netboot/add-on/torque/pbs_mom. To install the latest xcat, you may want to read the blog entry Dependency issues when installing xCAT 2.7 on CentOS 6

Assuming you have successful install xCAT, copy the pbs_mom script to /etc/init.d/pbs_mom

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_mom /etc/init.d/pbs_mom

Step 5a: Edit the /etc/init.d/pbs_mom and restart the service

# vim /etc/init.d/pbs_mom

Inside pbs_mom script

BASE_PBS_PREFIX=/opt/torque

#ulimit -n 20000
#ulimit -i 20000
ulimit -l unlimited

Save and exit.

At the console, do a start

# service pbs_mom start
Starting PBS Mom:                                          [  OK  ]

Step 5b: Installing Torque as a service (pbs_server)

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_server /etc/init.d/pbs_server

Inside the pbs_server script, just ensure that the BASE_PBS_PREFIX point to the right directory

BASE_PBS_PREFIX=/opt/torque

Save and Exit.

At the console, start the pbs_server service

# service pbs_server start
Starting PBS Server:                                       [ OK ]

Step 5c: Installing Torque as a service (pbs_sched)

# cp /opt/xcat/share/xcat/netboot/add-on/torque/pbs_sched /etc/init.d/pbs_sched

Inside the pbs_sched script, just ensure that the BASE_PBS_PREFIX point to the right directory

 BASE_PBS_PREFIX=/opt/torque

Save and Exit.

At the console, start the pbs_sched service

# service pbs_sched start
Starting PBS Scheduler:                                    [ OK ]

B. Configuring the TORQUE Clients

Step 1a: Copy the torque package to the nodes using xCAT

# pscp torque-package-mom-linux-x86_64.sh compute:/tmp
# pscp torque-package-clients-linux-x86_64 compute:/tmp

Step 1b: Run the scripts

# psh compute "/tmp/torque-package*.x86_64.sh --install"

Step 2a. Copy the /etc/init.d/pbs_mom to compute nodes

# pscp /etc/init.d/pbs_mom compute:/etc/init.d
# psh compute "/sbin/service pbs_mom start"

Further Information:

  1. Configuring the Torque Default Queue
  2. Adding and Specifying Compute Resources at Torque

Resources:

  1. TORQUE installation overview

Automate pushing of ssh-copy-id to multiple servers

This is a follow-up of the writeup of  Tools to automate ssh-copy-id to remote servers. The Server OS used is CentOS 6.2. If you are automating scripts, you may have to modify the default settings SSH first.

I think you probably would have encounter the yes/no question below when trying to ssh into a remote server.

The authenticity of host 'yourserver.com.sg (192.168.1.1)' can't be established.
RSA key fingerprint is 8d:e7:92:ef:86:1a:fb:4a:01:00:6a:fc:8c:23:ed:15.
Are you sure you want to continue connecting (yes/no)?

To rectify the issue, you can do at server levels /etc/ssh/ssh_config

# vim /etc/ssh/ssh_config
#  StrictHostKeyChecking ask
StrictHostKeyChecking no

Alternatively, you can Or at local account level at ~/.ssh/config

$ vim ~/.ssh/config

Add the following lines

StrictHostKeyChecking no
UserKnownHostsFile=/dev/null

You may want to revert back to the default settings of StrictHostKeyChecking after you have push your keys if you have configure of /etc/ssh/ssh_config or remove the 2 lines above if you are doing with the local account

Next you can use a simple bash scripts. I’m not comfortable in using the password in text. So make sure only you can view the file.

for i in 'cat my_hosts_list'    
    do
       sshpass -p 'server_password' ssh-copy-id admin@${i}
    done

Dependency issues when installing xCAT 2.7 on CentOS 6

If you are using the yum install for xCAT 2.7 on CentOS 6, you will need the .repo and putting in /etc/yum.repos.d/

# wget http://sourceforge.net/projects/xcat/files/yum/stable/xcat-core/xCAT-core.repo
# wget http://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/xCAT-dep.repo

Do a yum check-update

# yum check-update

Do a yum install of xCAT ie

# yum install xCAT

You might get the error

Error: Package: xCAT-2.7.2-snap201205230215.x86_64 (xcat-2-core)
           Requires: elilo-xcat
Error: Package: xCAT-2.7.2-snap201205230215.x86_64 (xcat-2-core)
           Requires: xCAT-genesis-x86_64

You will notice you will have these error. To rectify, you have to download the from http://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/ and do a rpm install

# rpm -Uvh xCAT-genesis-x86_64-2.7.......
# rpm -Uvh elilo-xcat-3.14-4.noarch.rpm

Finally do a yum install xCAT and you should be able to install without issue.