Summary of Job Management Commands for MAUI

This is a good summary taken from Adaptive Computing 4.3 Job Managament Commands to manage jobs for MAUI.

Command Flags Description
canceljob cancel existing job
checkjob display job state, resource requirements, environment, constraints,
credentials, history, allocated resources, and resource utilization
diagnose -j display summarized job information and any unexpected state
releasehold [-a] remove job holds or defers
runjob start job immediately if possible
sethold set hold on job
setqos set/modify QoS of existing job
setspri adjust job/system priority of job

Enabling SRIOV on Intel Ethernet Server Adapter

First thing first

Step 1: Check that the Intel Ethernet Server Adapter. For more information, do take a look at Using SR-IOV with Intel® Ethernet Server Adapters

In a nutshell, You blacklist the vf driver in the host, and enable the VFs as part of the kvm guests.

Step 1: Add a line /etc/modprobe.conf

options ixgbe max_vfs=8

The above configuration will create 8 Virtual Nics per Port. The Intel Card supports up to 64 FVs.

Step 2: Blacklist the ixgbevf driver by creating a file called /etc/modprobe.d/blacklist-ixgbevf.conf

blacklist ixgbevf

Step 3: Reboot the machine

Debugging Tools to track run-time errors for mpirun

If you are having with unexplained issues with mpirun, you can use various method to troubleshoot.

Information on “–mca orte_base_help_aggregate 0”

If your mpirun dies without any error messages  you may want to take read from OpenMPI FAQ which
Debugging applications in parallel 7. My process dies without any output. Why?

If your application fails due to memory corruption, Open MPI may subsequently fail to output an error message before dying. Specifically, starting with v1.3, Open MPI attempts to aggregate error messages from multiple processes in an attempt to show unique error messages only once (vs. one for each MPI process — which can be unweildly, especially when running large MPI jobs).

However, this aggregation process requires allocating memory in the MPI process when it displays the error message. If the process’ memory is already corrupted, Open MPI’s attempt to allocate memory may fail and the process will simply die, possibly silently. When Open MPI does not attempt to aggregate error messages, most of its setup work is done during MPI_INIT and no memory is allocated during the “print the error” routine. It therefore almost always successfully outputs error messages in real time — but at the expense that you’ll potentially see the same error message for each MPI process that encourntered the error.

Hence, the error message aggregation is usually a good thing, but sometimes it can mask a real error. You can disable Open MPI’s error message aggregation with the orte_base_help_aggregate MCA parameter. For example:

 $ mpirun --mca orte_base_help_aggregate 0 ...

Compiling Java 7 on CentOS 5 and 6

Step 1: Go to Oracle Java Download site and select the

Step 2: Unpack the Archive

# cd /usr/local/
# tar -zxvf jdk-7u51-linux-x64.tar.gz

Step 3: Setup the Environmental Variables. At your .bashrc

export JAVA_HOME=/usr/local/jdk1.7.0_51
export JRE_HOME=/usr/local/jdk1.7.0_51/jre
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

Step 4: Check the version

# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Compiling gnuplot 4.6.4 on CentOS 5

I compiled gnuplot 4.6.4 on CentOS 5.4

Step 1: Install prerequisites,

(a) wxGTK and wxGTK-devel,

(b) readline, readline-devel

(c) gd, gd-devel

# yum install wxGTK wxGTK-devel readline readline-devel gd gd-devel

Step 2: Compile gnuplot-4.6.4

# tar -zxvf  gnuplot-4.6.4
# cd gnuplot-4.6.4
# ./configure --prefix=/usr/local/gnuplot-4.6.4
# make
# make install

If you are having issues, do take a note at the fedora forum below

References:

  1. http://forums.fedoraforum.org/showthread.php?p=1397790
  2. gnuplot homepage

Using the command “ls” options and arguments

ls has some interesting format which is very useful. Here are some arguments which you can use. For example, I like

$ ls - ltF
 -a   –all  All files include those with .
 -l  Display results in long format
 -r   –reverse  Reverse order while sorting
 -S  Sort results by file size
 -t  Sort by modification time.
 -F   –classify  Append an indicator character to the end of each listed name

References:

  1. The Linux Command Line (by No Starch Press)

Deleting PBS and MAUI Jobs which cannot be purged

 If the Compute Node pbs_mom is lost and cannot be recovered (due to hardware or network failure) and to purge a running job from the qstat output or show

1. Shutdown the pbs_server daemon on the PBS Server

# service pbs_server stop

2. Remove Job Spool Files that holds the hanged JobID (For example 4444)

# rm /var/spool/torque/server_priv/jobs/4444.headnode.SC
# rm /var/spool/torque/server_priv/jobs/4444.headnode.JB

3. Start the pbs_Server Daemon

# service pbs_server start

4. Restart the MAUI Daemon

# service maui restart

References:

  1. Deleting PBS/Maui Jobs

Compiling Chelseio IWARP Drivers (2.8.0.0) on CentOS 5

The below is a subset of the Chelsio 2.8.0.0 ReadMe

The Chelsio Unified Wire software has been developed to run on 64-bit Linux
based platforms. Following is the list of Drivers/Software and supported Linux
distributions. Here is a subset of the README.

The OS I used was CentOS 5.8

|########################|#####################################################|
|   Linux Distribution   |                Driver/Software                      |
|########################|#####################################################|
|RHEL5.8,2.6.18-308.el5  |NIC/TOE,vNIC,iWARP,WD-UDP*,WD-TOE*,iSCSI Target*,    |
|                        |Bonding,IPv6,Bypass*,Sniffer & Tracer                |
|                        |UM(Agent,Client),UDP-SO,Filtering,TM                 |
|------------------------|-----------------------------------------------------|
|RHEL5.9,2.6.18-348.el5  |NIC/TOE*,vNIC*,iWARP*,WD-UDP*,WD-TOE*,iSCSI Target*, |
|                        |Bonding*,IPv6*,Bypass*,Sniffer & Tracer*,UDP-SO*,    |
|                        |Filtering*,TM*                                       |
|------------------------|-----------------------------------------------------|
|RHEL6.3,                |NIC/TOE,vNIC,iWARP,WD-UDP,WD-TOE*,iSCSI Target*,     |
|2.6.32-279.el6          |iSCSI Initiator*,FCoE Initiator*,                    |
|                        |Bonding,IPv6,Bypass*,Sniffer & Tracer,UDP-SO,        |
|                        |UM(Agent,Client,WebGUI),Filtering,TM                 |
|------------------------|-----------------------------------------------------|
|RHEL6.4,                |NIC/TOE,vNIC,iWARP,WD-UDP,WD-TOE,iSCSI Target,       |
|2.6.32-358.el6          |iSCSI Initiator,FCoE Initiator,Bonding,IPv6,Bypass,  |
|                        |Sniffer & Tracer,UDP-SO,UM(Agent,Client,WebGUI),     |
|                        |Filtering,TM,uBoot(DUD)                              |
|------------------------|-----------------------------------------------------|

Strangely, I was not able to compile with 3.5.1. It seems that the compat-rdma on 3.5.1 is having issues with CentOS 5.8. See Failed to build compat-rdma RPM when compiling OFED 3.5.1 on CentOS 5.8

I tried with OFED 1.5.4.1, but errors occurred as well. But compiling OFED 1.5.3.2 works well and Chelsio T420-BCH was able to compile nicely with OFED 1.5.3.2. To download OFED 1.5.3.2, do visit the OFED Downloads Site


Part 1

To compile from source

i.  Download the tarball ChelsioUwire-x.x.x.x.tar.gz

ii. Untar the tarball

[root@host]# tar zxvfm ChelsioUwire-x.x.x.x.tar.gz

iii. Change your current working directory to Chelsio Unified Wire package

directory. Build the source:

[root@host]# make

iv. Install the drivers, tools and libraries:

[root@host]# make install

v. The default configuration tuning option is Unified Wire.

The configuration tuning can be selected using the following commands:

[root@host]# make CONF=(T5/T4 Configuration)

[root@host]# make CONF=(T5/T4 Configuration install)

(where T5/T4 Configuration is

UNIFIED_WIRE, HIGH_CAPACITY_TOE, HIGH_CAPACITY_RDMA, LOW_LATENCY, UDP_OFFLOAD, T5_WIRE_DIRECT_LATENCY)


Part 2  – Installing Individual Drivers

i. To build and install iWARP driver against outbox OFED:
[root@host]# make iwarp

[root@host]# make iwarp_install


Part 3a – Loading IWARP Drivers

Manually  Load  Drivers

To load the iWARP driver we need to load the NIC driver & core RDMA drivers first:

[root@host]# modprobe cxgb4

[root@host]# modprobe iw_cxgb4

[root@host]# modprobe rdma_ucm

Part 3b – Automatic IWARP Drivers

To load the Chelsio iWARP drivers automatically, add this additional lines to /etc/modprobe.conf

options iw_cxgb4 peer2peer=1
install cxgb4 /sbin/modprobe -i cxgb4; /sbin/modprobe -f iw_cxgb4; /sbin/modprobe rdma_ucm
alias eth1 cxgb4 # assuming eth1 is used by the Chelsio interface

Finally Reboot the system to load the new modules

References:

  1. Chelsio 2.8.0.0 ReadMe

Compiling R-3.0.2 on CentOS 5

Do note that R-3.0.2 requires a higher version of gfortran 4.4.4 and above. To help ensure that the gfortran44 is installed, do check that the gnu44 are installed. For more information how to install, see Installing GNU 4.4 of C, C++ and gfortran for CentOS 5

Compiling R-3.0.2

# tar -zxvf 3.0.2.tar.gz 
# ./configure --prefix=/usr/local/R-3.0.2/ CC=gcc44 CXX=g++44 F77=gfortran44 FC=gfortran44
# make
# make install