January 25, 2013 by kittycool only

Sample PBS Scripts for MATLAB

Here is a sample of PBS Scripts that can be used for MATLAB. This is just a suggested PBS script. Modify and comment at will. The script below is named as matlab_serial.sh

#!/bin/bash
#PBS -N MATLAB_Serial
#PBS -j oe
#PBS -V
#PBS -m bea
#PBS -M myemail@hotmail.com
#PBS -l nodes=1:ppn=1

# comment these out if you wish
echo "qsub host = " $PBS_O_HOST
echo "original queue = " $PBS_O_QUEUE
echo "qsub working directory absolute = " $PBS_O_WORKDIR
echo "pbs environment = " $PBS_ENVIRONMENT
echo "pbs batch = " $PBS_JOBID
echo "pbs job name from me = " $PBS_JOBNAME
echo "Name of file containing nodes = " $PBS_NODEFILE
echo "contents of nodefile = " $PBS_NODEFILE
echo "Name of queue to which job went = " $PBS_QUEUE

## pre-processing script
cd $PBS_O_WORKDIR
NCPUS=`cat $PBS_NODEFILE | wc -l`
echo "Number of requested processors = " $NCPUS

# Load MATLAB Module
module load intel/12.0.2
module load matlab/R2011b

cd $PBS_O_WORKDIR
/usr/local/MATLAB/R2011b/bin/matlab -nodisplay -r $file

The corresponding qsub command and its parameter should be something like

$ qsub -q dqueue -l nodes=1:ppn=8 matlab_serial.sh -v file=yourmatlabfile.m

January 24, 2013 by kittycool only

Configuring the Torque Default Queue

Here are the sample Torque Queue configuration

qmgr -c "create queue dqueue"
qmgr -c "set queue dqueue queue_type = Execution"
qmgr -c "set queue dqueue resources_default.neednodes = dqueue"
qmgr -c "set queue dqueue enabled = True"
qmgr -c "set queue dqueue started = True"

qmgr -c "set server scheduling = True"
qmgr -c "set server acl_hosts = headnode.com"
qmgr -c "set server default_queue = dqueue"
qmgr -c "set server log_events = 127"
qmgr -c "set server mail_from = Cluster_Admin"
qmgr -c "set server query_other_jobs = True"
qmgr -c "set server resources_default.walltime = 240:00:00"
qmgr -c "set server resources_max.walltime = 720:00:00"
qmgr -c "set server scheduler_iteration = 60"
qmgr -c "set server node_check_rate = 150"
qmgr -c "set server tcp_timeout = 6"
qmgr -c "set server node_pack = False"
qmgr -c "set server mom_job_sync = True"
qmgr -c "set server keep_completed = 300"
qmgr -c "set server submit_hosts = headnode1.com"
qmgr -c "set server submit_hosts += headnode2.com"
qmgr -c "set server allow_node_submit = True"
qmgr -c "set server auto_node_np = True"
qmgr -c "set server next_job_number = 21293"

January 22, 2013 by kittycool only

Quick method for estimating walltime for Torque Resource Manager

For Torque / OpenPBS or any other scheduler, walltime is a important parameter to allow the scheduler to determine how long the jobs will take. You can do a quick rough estimate by using the command time

# time -p mpirun -np 16 --host node1,node2 hello_world_mpi

real 4.31
user 0.04
sys 0.01

Use the value of 4:31 as the estimate walltime. Since this is an estimate, you may want to place a higher value in the walltime

$ qsub -l walltime=5:00 -l nodes=1:ppn=8 openmpi.sh -v file=hello_world

January 18, 2013 by kittycool only

Disabling and enabling interactive mode on Torque

To disable interactive mode for a selected queue in Torque, it is very simple to implement, you just fire a command

# qmgr -c 'set queue queue_name disallowed_types = interactive'

To remove this attribute , use the qmgr -c unset command.

# qmgr -c 'unset queue queue_name disallowed_types'

January 16, 2013 by kittycool only

OFED Performance Micro-Benchmark Latency Test

Open Fabrics Enterprise Distribution (OFED) has provided simple performance micro-benchmark has provided a collection of tests written over uverbs. Some notes taken from OFED Performance Tests README

The benchmark uses the CPU cycle counter to get time stamps without a context switch.
The benchmark measures round-trip time but reports half of that as one-way latency. This means that it may not be sufficiently accurate for asymmetrical configurations.
Min/Median/Max results are reported.
The Median (vs average) is less sensitive to extreme scores.
Typically, the Max value is the first value measured Some CPU architectures
Larger samples only help marginally. The default (1000) is very satisfactory. Note that an array of cycles_t (typically an unsigned long) is allocated once to collect samples and again to store the difference between them. Really big sample sizes (e.g., 1 million) might expose other problems with the program.

On the Server Side

# ib_write_lat -a

On the Client Side

# ib_write_lat -a Server_IP_address

------------------------------------------------------------------
                    RDMA_Write Latency Test
 Number of qps   : 1
 Connection type : RC
 Mtu             : 2048B
 Link type       : IB
 Max inline data : 400B
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
------------------------------------------------------------------
 local address: LID 0x01 QPN 0x02ce PSN 0x1bd93e RKey 0x014a00 VAddr 0x002b7004651000
 remote address: LID 0x03 QPN 0x00f2 PSN 0x20aec7 RKey 0x010100 VAddr 0x002aeedfbde000
------------------------------------------------------------------

#bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]
2       1000          0.92           5.19         1.24
4       1000          0.92           65.20        1.24
8       1000          0.90           72.28        1.23
16      1000          0.92           19.56        1.25
32      1000          0.94           17.74        1.26
64      1000          0.94           26.40        1.20
128     1000          1.05           53.24        1.36
256     1000          1.70           21.07        1.83
512     1000          2.13           11.61        2.22
1024    1000          2.44           8.72         2.52
2048    1000          2.79           48.23        3.09
4096    1000          3.49           52.59        3.63
8192    1000          4.58           64.90        4.69
16384   1000          6.63           42.26        6.76
32768   1000          10.80          31.11        10.91
65536   1000          19.14          35.82        19.23
131072  1000          35.56          62.17        35.84
262144  1000          68.95          80.15        69.10
524288  1000          135.34         195.46       135.62
1048576 1000          268.37         354.36       268.64
2097152 1000          534.34         632.83       534.67
4194304 1000          1066.41        1150.52      1066.71
8388608 1000          2130.80        2504.32      2131.39

Common Options you can use.

Common Options to all tests:
-p, --port=<port>            listen on/connect to port <port> (default: 18515)
-m, --mtu=<mtu>              mtu size (default: 1024)
-d, --ib-dev=<dev>           use IB device <dev> (default: first device found)
-i, --ib-port=<port>         use port <port> of IB device (default: 1)
-s, --size=<size>            size of message to exchange (default: 1)
-a, --all                    run sizes from 2 till 2^23
-t, --tx-depth=<dep>         size of tx queue (default: 50)
-n, --iters=<iters>          number of exchanges (at least 100, default: 1000)
-C, --report-cycles          report times in cpu cycle units (default: microseconds)
-H, --report-histogram       print out all results (default: print summary only)
-U, --report-unsorted        (implies -H) print out unsorted results (default: sorted)
-V, --version                display version number

January 1, 2013 by kittycool only

Multiprotocol Performance Test of VMware EX 3.5 on NetApp Storage Systems

NetApp has written a technical paper “Performance Report: Multiprotocol Performance Test of VMware® ESX 3.5 on NetApp Storage Systems” on performance test using FCP, iSCSI, NFSon on Vmware 3.5. Do read the article for good details. I have listed the summary only.

Fibre Channel Protocol Summary

FC achieved up to 9% higher throughput than the other protocols while requiring noticeably lower CPU utilization on the ESX 3.5 host compared to NFS and iSCSI.
FC storage infrastructures are generally the most costly of all the protocols to install and maintain. FC infrastructure requires expensive Fibre Channel switches and Fibre Channel cabling in order to be deployed.

iSCSI Protocol Summary

Using the VMware iSCSI software initiator, we observed performance was at most 7% lower than FC.
Software iSCSI also exhibited the highest maximum ESX 3.5 host CPU utilization of all the protocols tested.
iSCSI is relatively inexpensive to deploy and maintain. as it is running on a standard TCP/IP network,

NFS Protocol Summary

NFS performance was at maximum 9% lower than FC. NFS also exhibited ESX 3.5 host server CPU utilization maximum on average higher than FC but lower than iSCSI.
Running on a standard TCP/IP network, NFS does not require the expensive Fibre Channel switches, host bus adapters, and Fibre Channel cabling that FC requires, making NFS a lower cost alternative of the two protocols.
NFS provides further storage efficiencies by allowing on-demand resizing of data stores and increasing storage saving efficiencies gained when using deduplication. Both of these advantages provide additional operational savings as a result of this storage simplification.

December 25, 2012 by kittycool only

Switching between Ethernet and Infiniband using Virtual Protocol Interconnect (VPI)

According to the Open Fabrics Alliance Documents, Open Fabrics Enterprise Distribution (OFED) ConnectX driver (mlx4) in OFED 1.4 Release Notes.

It is recommended to use the QSA Adapter (QSFP+ to SFP+ adapter) which is the world’s first solution for the QSFP to SFP+ conversion challenge for 40GB/Infiniband to 10G/1G. For more information, see Quad to Serial Small Form Factor Pluggable (QSA) Adapter to allow for the hardware

Here is the summary of the excerpts from the document.

Overview
mlx4 is the low level driver implementation for the ConnectX adapters designed by Mellanox Technologies. The ConnectX can operate as an InfiniBand adapter, as an Ethernet NIC, or as a Fibre Channel HBA. The driver in OFED 1.4 supports Infiniband and Ethernet NIC configurations. To accommodate the supported configurations, the driver is split into three modules:

mlx4_core
Handles low-level functions like device initialization and firmware commands processing. Also controls resource allocation so that the InfiniBand and Ethernet functions can share the device without interfering with each other.
mlx4_ib
Handles InfiniBand-specific functions and plugs into the InfiniBand midlayer
mlx4_en
A new 10G driver named mlx4_en was added to drivers/net/mlx4. It handles Ethernet specific functions and plugs into the netdev mid-layer.

Using Virtual Protocol Interconnect (VPI) to switch between Ethernet and Infiniband

Loading Drivers

The VPI driver is a combination of the Mellanox ConnectX HCA Ethernet and Infiniband drivers. It supplies the user with the ability to run Infiniband and Ethernet protocols on the same HCA.

Check the MLX4 Driver is loaded, ensure that the

# vim /etc/infiniband/openib.conf

# Load MLX4_EN module
MLX4_EN_LOAD=yes

If the MLX4_EN_LOAD=no, the Ethernet Driver can be loaded by running
```
# /sbin/modprobe mlx4_en
```

Port Management / Driver Switching

Show Port Configuration

# /sbin/connectx_port_config -s

--------------------------------
Port configuration for PCI device: 0000:16:00.0 is:
eth
eth
--------------------------------

Looking at saved configuration
```
# vim /etc/infiniband/connectx.conf
```
Switching between Ethernet and Infiniband
```
# /sbin/connectx_port_config
```

Configuration supported by VPI

- The following configurations are supported by VPI:
	Port1 = eth   Port2 = eth
	Port1 = ib    Port2 = ib
	Port1 = auto  Port2 = auto
	Port1 = ib    Port2 = eth
	Port1 = ib    Port2 = auto
	Port1 = auto  Port2 = eth

  Note: the following options are not supported:
	Port1 = eth   Port2 = ib
	Port1 = eth   Port2 = auto
	Port1 = auto  Port2 = ib

For more information, see

November 13, 2012 by kittycool only

Compiling and Installing GAP System for Computational Discrete Algebra

GAP is a system for computational discrete algebra, with particular emphasis on Computational Group Theory. GAP provides a programming language, a library of thousands of functions implementing algebraic algorithms written in the GAP language as well as large data libraries of algebraic The objects.

The information is taken from Compilation

Step 1: Download the GAP Software

Download the GAP Software at http://www.gap-system.org/Releases/index.html Current version at point of writing is 4.5.6

# tar -zxvf gap4r5p6_2012_11_04-18_46.tar.gz

# cd gap4r5p6

Step 2: Configure and Install (Default Installation)

#./configure

# make

Step 3: Optional Installation – GMP packages.
If you wish to use the GAP internal GMP Packges, then the version of GMP bundled with this GAP will be used. This is the default.

# ./configure --with-gmp=yes|no|system|"path"

Step 4: Optional Installation – Readline Editing for better command-line Editing.
If the argument you supply is yes, then GAP will look in standard locations for a Readline installed on your system. Or you can specify a path to a Readline installation.

# ./configure --with-readline=yes|no|"path"

For more information,
See INSTALL when you unpacked the GAP.

November 8, 2012 by kittycool only

Installing and Configuring Environment Modules on CentOS 6

This tutorial is very similar to Installing and Configuring Environment Modules on CentOS 5 and the steps are very similar for CentOS 6 except that the tcl/tk 8.5.x used in CentOS repository does not have tclConfig.sh which is needed when you compile the Modules packages. I used 8.4.x which is similar to the version used in the CentOS 5 repository. You can use more more updated version of tcl

Step 1: Download the modules packages

Download the latest modules packages from Modules Sourceforge Project Site.

Step 2. Download the tcl/tk package from tcl/tk download site

# tar -zxvf tcl8.4.19-src.tar.gz

# cd tcl8.4.19/unix

Step 2a. Compile the tcl package

# ./configure --prefix=/usr/local/tcl --enable-threads

# make && make install

Step 2b. Compile the tk package

# tar -zxvf tk8.4.19-src.tar.gz

# cd tk8.4.19

# ./configure --prefix=/usr/local/tk --with-tcl=/usr/local/tcl/lib

# make && make install

Make sure you put the tcl library in the /etc/ld.so.conf.d

# vim /etc/ld.so.conf.d/tclx-x86_64.conf

/usr/local/tcl/lib
/usr/local/tk/lib

Do a ldconfig to update dynamic linker run-time bindings

# /sbin/ldconfig

Step 3: Unpacked, Configure and Install

# tar -zxvf modules-3.2.9c.tar.gz

Go to the Modules folder ($ModuleHome)

# cd modules-3.2.9

I wanted to keep all my individual modules files at /usr/local/Modules/contents. You can keep module files anywhere you wish to keep.

# ./configure --with-module-path=/usr/local/Modules/contents

Make and install the configuration

# make && make install

Step 4: Amend .modulefiles

Edit . modules to let Modules know where all the customized module files will be kept

# vim /usr/local/Modules/3.2.9/init/.modulespath

Comment out all the lines except the directory where all the customised modules files will be kept.

.....
.....
/usr/local/Modules/contents                             # General module files
.....
.....

Step 5: Update /etc/profile.d of the Servers

Copy the profile.modules from the $ModuleHome Directory

# cp /$ModuleHome/modules-3.2.9/etc/global/profile.modules /etc/profile.d/modules.sh

The content of modules.sh are as followed

#----------------------------------------------------------------------#
# system-wide profile.modules                                          #
# Initialize modules for all sh-derivative shells                      #
#----------------------------------------------------------------------#
trap "" 1 2 3

case "$0" in
-bash|bash|*/bash) . /usr/local/Modules/default/init/bash ;;
-ksh|ksh|*/ksh) . /usr/local/Modules/default/init/ksh ;;
-zsh|zsh|*/zsh) . /usr/local/Modules/default/init/zsh ;;
*) . /usr/local/Modules/default/init/sh ;; # sh and default for scripts
esac

trap 1 2 3

Create a softlink at /usr/local/Modules

# cd /usr/local/Modules
# ln -s 3.2.9 default

Sample Installation of an application using Modules (Intel Compilers)

Step 1: Create a Module File.

Place the Module File for Intel in /usr/local/Modules/contents

a. Create an Intel Folder inside /usr/local/Modules/contents

# mkdir /usr/local/Modules/contents/intel

b. Create a module file for the version of Intel (In my case, “12.0.2″). To save some time, you can copy a “sample” file and you can edit

# cp $ModuleHome/modules-3.2.9/modulefiles/modulefile /usr/local/Modules/contents/intel/12.0.2

# vim /usr/local/Modules/contents/intel/12.0.2

#%Module1.0
proc ModulesHelp { } {
global version prefix

puts stderr "\tIntel XE 12.0.2 (icc, icpc, ifort)"
}

module-whatis   "Intel XE 12.0.2 (icc, icpc, ifort)"

prepend-path    PATH            /opt/intel/composerxe/bin
prepend-path    LIBRARY_PATH    /opt/intel/composerxe/lib/intel64
prepend-path    LD_LIBRARY_PATH /opt/intel/composerxe/lib/intel64:/opt/intel/mkl/10.2.6.038/lib/em64t
prepend-path    MANPATH         /opt/intel/composerxe/man
prepend-path    MKL_HOME        /opt/intel/mkl/10.2.6.038

setenv CC       icc
setenv CXX      icpc
setenv FC       ifort
setenv F77      ifort
setenv F90      ifort

Step 2: Setting the Default versions of Intel.

If you have different version of software you wish to present to the users, do the following

# vim /usr/local/Modules/contents/intel/.version

#%Module1.0
set ModuleVersion "12.0.2"

More Information

Part 2 – Usage of Environment Modules on CentOS and in Cluster

November 6, 2012 by kittycool only

Usage of Environment Modules on CentOS and in Cluster

This is the 2nd part of “Installing and Configuring Environment Modules on CentOS 5”

1. List the Modules on System

# module avail

--------------------------- /usr/local/Modules/contents ----------------------------
R/R-2.15.1            intel/12.0.2(default) matlab/R2011b

2. Load the Modules on System

# module load intel/12.0.2

Checking the version

# icc -v
Version 12.0.2

3. Unload the Modules on System

# module unload  intel/12.0.2

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux