Enabling SRIOV on Intel Ethernet Server Adapter

First thing first

Step 1: Check that the Intel Ethernet Server Adapter. For more information, do take a look at Using SR-IOV with Intel® Ethernet Server Adapters

In a nutshell, You blacklist the vf driver in the host, and enable the VFs as part of the kvm guests.

Step 1: Add a line /etc/modprobe.conf

options ixgbe max_vfs=8

The above configuration will create 8 Virtual Nics per Port. The Intel Card supports up to 64 FVs.

Step 2: Blacklist the ixgbevf driver by creating a file called /etc/modprobe.d/blacklist-ixgbevf.conf

blacklist ixgbevf

Step 3: Reboot the machine

Debugging Tools to track run-time errors for mpirun

If you are having with unexplained issues with mpirun, you can use various method to troubleshoot.

Information on “–mca orte_base_help_aggregate 0”

If your mpirun dies without any error messages  you may want to take read from OpenMPI FAQ which
Debugging applications in parallel 7. My process dies without any output. Why?

If your application fails due to memory corruption, Open MPI may subsequently fail to output an error message before dying. Specifically, starting with v1.3, Open MPI attempts to aggregate error messages from multiple processes in an attempt to show unique error messages only once (vs. one for each MPI process — which can be unweildly, especially when running large MPI jobs).

However, this aggregation process requires allocating memory in the MPI process when it displays the error message. If the process’ memory is already corrupted, Open MPI’s attempt to allocate memory may fail and the process will simply die, possibly silently. When Open MPI does not attempt to aggregate error messages, most of its setup work is done during MPI_INIT and no memory is allocated during the “print the error” routine. It therefore almost always successfully outputs error messages in real time — but at the expense that you’ll potentially see the same error message for each MPI process that encourntered the error.

Hence, the error message aggregation is usually a good thing, but sometimes it can mask a real error. You can disable Open MPI’s error message aggregation with the orte_base_help_aggregate MCA parameter. For example:

 $ mpirun --mca orte_base_help_aggregate 0 ...

Compiling Java 7 on CentOS 5 and 6

Step 1: Go to Oracle Java Download site and select the

Step 2: Unpack the Archive

# cd /usr/local/
# tar -zxvf jdk-7u51-linux-x64.tar.gz

Step 3: Setup the Environmental Variables. At your .bashrc

export JAVA_HOME=/usr/local/jdk1.7.0_51
export JRE_HOME=/usr/local/jdk1.7.0_51/jre
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

Step 4: Check the version

# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

Compiling gnuplot 4.6.4 on CentOS 5

I compiled gnuplot 4.6.4 on CentOS 5.4

Step 1: Install prerequisites,

(a) wxGTK and wxGTK-devel,

(b) readline, readline-devel

(c) gd, gd-devel

# yum install wxGTK wxGTK-devel readline readline-devel gd gd-devel

Step 2: Compile gnuplot-4.6.4

# tar -zxvf  gnuplot-4.6.4
# cd gnuplot-4.6.4
# ./configure --prefix=/usr/local/gnuplot-4.6.4
# make
# make install

If you are having issues, do take a note at the fedora forum below

References:

  1. http://forums.fedoraforum.org/showthread.php?p=1397790
  2. gnuplot homepage

Using the command “ls” options and arguments

ls has some interesting format which is very useful. Here are some arguments which you can use. For example, I like

$ ls - ltF
 -a   –all  All files include those with .
 -l  Display results in long format
 -r   –reverse  Reverse order while sorting
 -S  Sort results by file size
 -t  Sort by modification time.
 -F   –classify  Append an indicator character to the end of each listed name

References:

  1. The Linux Command Line (by No Starch Press)

Deleting PBS and MAUI Jobs which cannot be purged

 If the Compute Node pbs_mom is lost and cannot be recovered (due to hardware or network failure) and to purge a running job from the qstat output or show

1. Shutdown the pbs_server daemon on the PBS Server

# service pbs_server stop

2. Remove Job Spool Files that holds the hanged JobID (For example 4444)

# rm /var/spool/torque/server_priv/jobs/4444.headnode.SC
# rm /var/spool/torque/server_priv/jobs/4444.headnode.JB

3. Start the pbs_Server Daemon

# service pbs_server start

4. Restart the MAUI Daemon

# service maui restart

References:

  1. Deleting PBS/Maui Jobs

Compiling Chelseio IWARP Drivers (2.8.0.0) on CentOS 5

The below is a subset of the Chelsio 2.8.0.0 ReadMe

The Chelsio Unified Wire software has been developed to run on 64-bit Linux
based platforms. Following is the list of Drivers/Software and supported Linux
distributions. Here is a subset of the README.

The OS I used was CentOS 5.8

|########################|#####################################################|
|   Linux Distribution   |                Driver/Software                      |
|########################|#####################################################|
|RHEL5.8,2.6.18-308.el5  |NIC/TOE,vNIC,iWARP,WD-UDP*,WD-TOE*,iSCSI Target*,    |
|                        |Bonding,IPv6,Bypass*,Sniffer & Tracer                |
|                        |UM(Agent,Client),UDP-SO,Filtering,TM                 |
|------------------------|-----------------------------------------------------|
|RHEL5.9,2.6.18-348.el5  |NIC/TOE*,vNIC*,iWARP*,WD-UDP*,WD-TOE*,iSCSI Target*, |
|                        |Bonding*,IPv6*,Bypass*,Sniffer & Tracer*,UDP-SO*,    |
|                        |Filtering*,TM*                                       |
|------------------------|-----------------------------------------------------|
|RHEL6.3,                |NIC/TOE,vNIC,iWARP,WD-UDP,WD-TOE*,iSCSI Target*,     |
|2.6.32-279.el6          |iSCSI Initiator*,FCoE Initiator*,                    |
|                        |Bonding,IPv6,Bypass*,Sniffer & Tracer,UDP-SO,        |
|                        |UM(Agent,Client,WebGUI),Filtering,TM                 |
|------------------------|-----------------------------------------------------|
|RHEL6.4,                |NIC/TOE,vNIC,iWARP,WD-UDP,WD-TOE,iSCSI Target,       |
|2.6.32-358.el6          |iSCSI Initiator,FCoE Initiator,Bonding,IPv6,Bypass,  |
|                        |Sniffer & Tracer,UDP-SO,UM(Agent,Client,WebGUI),     |
|                        |Filtering,TM,uBoot(DUD)                              |
|------------------------|-----------------------------------------------------|

Strangely, I was not able to compile with 3.5.1. It seems that the compat-rdma on 3.5.1 is having issues with CentOS 5.8. See Failed to build compat-rdma RPM when compiling OFED 3.5.1 on CentOS 5.8

I tried with OFED 1.5.4.1, but errors occurred as well. But compiling OFED 1.5.3.2 works well and Chelsio T420-BCH was able to compile nicely with OFED 1.5.3.2. To download OFED 1.5.3.2, do visit the OFED Downloads Site


Part 1

To compile from source

i.  Download the tarball ChelsioUwire-x.x.x.x.tar.gz

ii. Untar the tarball

[root@host]# tar zxvfm ChelsioUwire-x.x.x.x.tar.gz

iii. Change your current working directory to Chelsio Unified Wire package

directory. Build the source:

[root@host]# make

iv. Install the drivers, tools and libraries:

[root@host]# make install

v. The default configuration tuning option is Unified Wire.

The configuration tuning can be selected using the following commands:

[root@host]# make CONF=(T5/T4 Configuration)

[root@host]# make CONF=(T5/T4 Configuration install)

(where T5/T4 Configuration is

UNIFIED_WIRE, HIGH_CAPACITY_TOE, HIGH_CAPACITY_RDMA, LOW_LATENCY, UDP_OFFLOAD, T5_WIRE_DIRECT_LATENCY)


Part 2  – Installing Individual Drivers

i. To build and install iWARP driver against outbox OFED:
[root@host]# make iwarp

[root@host]# make iwarp_install


Part 3a – Loading IWARP Drivers

Manually  Load  Drivers

To load the iWARP driver we need to load the NIC driver & core RDMA drivers first:

[root@host]# modprobe cxgb4

[root@host]# modprobe iw_cxgb4

[root@host]# modprobe rdma_ucm

Part 3b – Automatic IWARP Drivers

To load the Chelsio iWARP drivers automatically, add this additional lines to /etc/modprobe.conf

options iw_cxgb4 peer2peer=1
install cxgb4 /sbin/modprobe -i cxgb4; /sbin/modprobe -f iw_cxgb4; /sbin/modprobe rdma_ucm
alias eth1 cxgb4 # assuming eth1 is used by the Chelsio interface

Finally Reboot the system to load the new modules

References:

  1. Chelsio 2.8.0.0 ReadMe

Compiling R-3.0.2 on CentOS 5

Do note that R-3.0.2 requires a higher version of gfortran 4.4.4 and above. To help ensure that the gfortran44 is installed, do check that the gnu44 are installed. For more information how to install, see Installing GNU 4.4 of C, C++ and gfortran for CentOS 5

Compiling R-3.0.2

# tar -zxvf 3.0.2.tar.gz 
# ./configure --prefix=/usr/local/R-3.0.2/ CC=gcc44 CXX=g++44 F77=gfortran44 FC=gfortran44
# make
# make install

GPFS NSD Nodes stuck in Arbitrating Mode

One of our GPFS NSD Nodes are forever stuck in arbitrating nodes. One of the symptoms that was noticeable was that the users was able to log-in but unable to do a “ls” of their own directories. You can get a quick deduction by looking at one of the NSD Nodes. For this kind of issues, do a mmdiag –waiters first. There are limited articles on this

# mmdiag --waiters 

.....
.....
0x7FB0C0013D10 waiting 27176.264845756 seconds, SharedHashTabFetchHandlerThread: 
on ThCond 0x1C0000F9B78 (0x1C0000F9B78) (TokenCondvar), reason 'wait for SubToken to become stable'

References:

  1. IZ17622: GPFS DEADLOCK WAITING FOR SUBTOKEN TO BECOME STABLE CAUSES HANG
  2. GPFS File System Deadlock

Here is how PMR Solution to collect information to help resolve the issue.

The steps below will gather all the docs you could provide in terms of first time data capture given an unknown problem.   Do these steps for all your performance/hang/unknown GPFS issues WHEN the problem is occurring.  Commands are executed from one node.  Collection of the docs will vary based on the working collective created below.
.
1) Gather waiters and create working collective. It can be good to get  multiple looks at what the waiters are and how they have changed,  so doing the first mmlsnode command (with the -L) numerous times  as you proceed through the steps below  might be helpful (specially
if issue is pure performance, no hangs).
.

# mmlsnode -N waiters -L  > /tmp/allwaiters.$(date +"%m%d%H%M%S")
# mmlsnode -N waiters > /tmp/waiters.wcoll

.
View allwaiters and waiters.wcoll files to verify that these files are not empty.
.
If either (or both) file(s) are empty, this indicates that the issues seen are not GPFS waiting on any of it’s threads.  Docs to be gathered in this case will vary.  Do not continue with steps.  Tell Service person and they will determine the best course of action and what docs will be needed.
.
2) Gather internaldump from all nodes in the working collective
.

# mmdsh -N /tmp/waiters.wcoll "/usr/lpp/mmfs/bin/mmfsadm dump all > /tmp/\$(hostname -s).dumpall.\$(date +"%m%d%H%M%S")"

.
3) Gather kthreads from all nodes in the working collective
.
Depending on various factors, this command can take a long time
to complete.   If not specifically looking for kernel threads, this
step can be skipped. If command is running it can stopped by
ctrl-C.
.

# mmdsh -N /tmp/waiters.wcoll "/usr/lpp/mmfs/bin/mmfsadm dump kthreads > /tmp/\$(hostname -s).kthreads.\$(date +"%m%d%H%M%S")"

.
4) If this is a performance problem, get 60 seconds mmfs trace from the
nodes in the working collective.
.
If AIX …
.

# mmtracectl --start --aix-trace-buffer-size=64M --trace-file-size=128M -N /tmp/waiters.wcoll ; sleep 60; mmtracectl --stop -N /tmp/waiters.wcoll

.
If Linux ..
.

# mmtracectl --start i--trace-file-size=128M -N /tmp/waiters.wcoll ; sleep 60; mmtracectl --stop -N /tmp/waiters.wcoll

.
5) Gather gpfs.snap from same nodes.
.

# gpfs.snap -N /tmp/waiters.wcoll

.
Gather the docs taken. Steps 1) and 5) will be on the local node, in /tmp and /tmp/gpfs.snapOut respectively and steps 2) and 3) will be in /tmp on the nodes represented in the waiters.wcoll file. The gpfs.snap will pick up the trcrpt in /tmp/mmfs

Many times steps 3) and 4) are not needed unless asked for.  If supplied they may or may not be used.  If there are any issues collecting doc, Steps 1), 2) and 5) are the most critical.


Solution:

1) The all waiters show:

nsd1:  0x2AAAACC659F0 waiting 31358.847013000 seconds, GroupProtocolDriverThread: 
on ThCond 0x5572138 (0x5572138) (MsgRecordCondvar), reason 'RPC wait' for ccMsgGroupLeave
nsd1:  0x2AAAACC659F0 waiting 31358.847013000 seconds, GroupProtocolDriverThread: 
on ThCond 0x5572138 (0x5572138) (MsgRecordCondvar), reason 'RPC wait' for ccMsgGroupLeave

2) Looking at the tscomm section to see which node is “pending”:

Output for mmfsadm dump tscomm on nsd1
######################################################################

Pending messages:
msg_id 345326326, service 1.1, msg_type 26 'ccMsgGroupLeave', n_dest 470, n_pending 1
this 0x5571F90, n_xhold 1, cl 0, cbFn 0x0, age 33501 sec
sent by 'GroupProtocolDriverThread' (0x2AAAACC659F0)

.
.
.
dest <c0n3>          status pending   , err 0, reply len 0
c0n3> 10.x.x.x/0, x.y.y.u (nsd2)

3) Waiters for nsd2 show the following:

nsd2:  0x2AAAAC9F5A50 waiting 193857.401337000 seconds, NSDThread: 
on ThCond 0x2AAAC01CA600 (0x2AAAC01CA600) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9F33D0 waiting 193856.387375000 seconds, NSDThread: 
on ThCond 0x2AAAD806B190 (0x2AAAD806B190) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9F2090 waiting 193857.691998000 seconds, NSDThread: 
on ThCond 0x2AAAD40A0F90 (0x2AAAD40A0F90) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9DC610 waiting 193857.589074000 seconds, NSDThread: 
on ThCond 0x2AAAC81B2DE0 (0x2AAAC81B2DE0) (VERBSEventWaitCondvar), reason 'waiting for RDMA read DTO completion'
nsd2:  0x2AAAAC9D8C50 waiting 193857.406763000 seconds, NSDThread: 
on ThCond 0x2AAAC01FE5E0 (0x2AAAC01FE5E0) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9CDF10 waiting 193857.692074000 seconds, NSDThread: 
on ThCond 0x2AAAD806F120 (0x2AAAD806F120) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9CB890 waiting 193857.686966000 seconds, NSDThread: 
on ThCond 0x2AAABC140880 (0x2AAABC140880) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9C31D0 waiting 193857.412257000 seconds, NSDThread: 
on ThCond 0x2AAAACD83400 (0x2AAAACD83400) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'

Do a “mmfsadm dump verbs” from all of the NSD nodes.

# mmfsadmn dump verbs

To fix this issue, stop and restart the GPFS daemon on nsd2.

# mmshutdown -N nsd2
# mmstartup -N nsd2