October 27, 2010 by kittycool only

Installing Voltaire QDR Infiniband Drivers for CentOS 5.4

OS Prerequisites

RedHat EL4
RedHat EL5
SuSE SLES 10
SuSE SLES 11
Cent OS 5

Software Prerequisites

bash-3.x.x
glibc-2.3.x.x
libgcc-3.4.x-x
libstdc++-3.4.x-x
perl-5.8.x-x
tcl 8.4
tk 8.4.x-x
rpm 4.1.x-x
libgfortran 4.1.x-x

Step 1: Download the Voltaire Drivers that is fitting to your OS and version.

Do find the link for Voltaire QDR Drivers at Download Voltaire OFED Drivers for CentOS

Step 2: Unzip and Untar the Voltaire OFED Package

# bunzip2 VoltaireOFED-1.5_3-k2.6.18-164.el5-x86_64.tar.bz
# tar -xvf VoltaireOFED-1.5_3-k2.6.18-164.el5-x86_64.tar

Step 3: Install the Voltaire OFED Package

# cd VoltaireOFED-1.5_3-k2.6.18-164.el5-x86_64
# ./install

Step 3a: Reboot the Server

Step 4: Setup ip-over-ib

# vim /etc/sysconfig/network-scripts/ifcfg-ib0

# Voltaire Infiniband IPoIB
DEVICE=ib0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.10.10.1
NETWORK=10.10.10.0
NETMASK=255.255.255.0
BROADCAST=10.10.255.255
MTU=65520

# service openibd start

Step 5 (Optional): Disable yum repository.

If you plan to use yum to local install the opensmd from the Voltaire package directory, you can opt for disabling the yum.

# vim /etc/yum.conf

Type the following at /etc/yum.conf

enabled=0

Step 6: Install Subnet Manager (opensmd). This can be found under

# cd  $VoltaireRootDirectory/VoltaireOFED-1.5_3-k2.6.18-164.el5-x86_64/x86_64/2.6.18-164.15.1.el5

Yum install the opensmd packages

# yum localinstall opensm* --nogpgcheck

Restart the opensmd service

# service opensmd start

Step 7: Check that the Infiniband is working

# ibstat

You should get “State: Active”

CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 1
        Firmware version: 2.6.0
        Hardware version: a0
        Node GUID: 0x0008f1476328oaf0
        System image GUID: 0x0008fd6478a5af3
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 2
                LMC: 0
                SM lid: 14
                Capability mask: 0x0251086a
                Port GUID: 0x0008f103467a5af1

Step 8: Test Connectivity

At the Server side,

# ibping -S

Do Step 1 to 7 again for the Client. Once done,

# ibping -G 0x0008f103467a5af1 (PORT GUID)

You should see a response like this.

Pong from headnode.cluster.com.(none) (Lid 2): time 0.062 ms
Pong from headnode.cluster.com.(none) (Lid 2): time 0.084 ms
Pong from headnode.cluster.com.(none) (Lid 2): time 0.114 ms
Pong from headnode.cluster.com.(none) (Lid 2): time 0.082 ms
Pong from headnode.cluster.com.(none) (Lid 2): time 0.118 ms
Pong from headnode.cluster.com.(none) (Lid 2): time 0.118 ms

Great! you are done.

October 7, 2010 by kittycool only

Using xCAT contributed scripts

This is a continuation of blog entry User Contributed Script ported from xcat 1.x to xcat 2.x

Step 1: Placing addclusteruser in /opt/xcat/sbin

# cd /opt/xcat/sbin
# wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/
xCAT-2-admin_patch-1.1/addclusteruser

Step 2: Placing gensshkeys in /opt/xcat/sbin

# cd /opt/xcat/sbin
# wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/
xCAT-2-admin_patch-1.1/gensshkeys

Step 3: Placing shfunctions1 in /opt/xcat/lib

#  cd /opt/xcat/lib
# wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/
xCAT-2-admin_patch-1.1/shfunctions1

To add users using addclusteruser

# addclusteruser
......

I’m assuming you have exported the home directory to other nodes

# pscp /etc/passwd compute:/etc/
# pscp /etc/shadow compute:/etc/
# pscp /etc/group compute:/etc/

October 6, 2010 by kittycool only

Tuning NFSD Server Daemon for Performance

Do note that NFSD Daemon play an important component in performance tuning. Here are some tips

Number of Instances of the NFSD Server Daemon. By default, the instances of NFSD = 8. From Optimizing NFS Performance, the author recommend that system admin should use at the very least one daemon per processor, but four to eight per processor may be a better rule of thumb. To modify the number of nfsd, you can edit the RPCNFSDCOUNT at the NFS startup script (/etc/rc.d/init.d/nfs on RHEL, Fedora or CentOS)
If you want to determine the nfsd yourself, you can look at the NFS statistics in details which are provided by the Linux kernel at /proc/net/rpc/nfsd
A sample of /proc/net/rpc/nfsd

rc 0 47750055 170015423
fh 39 0 0 0 0
io 376475178 3831903891
th 8 18573687 48505.610 3718.131 2831.176 0.000 1813.483 1468.532 1399.593 1551.349 0.000 12224.473
ra 16 122635704 971110 83992 77018 15770 11434 1655 550 882 407 518440
net 217768755 0 217768891 1072
rpc 217765688 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 3 24906977 238795 7255551 10595346 837 124313278 42671631 2419345 5043 5865 0 2399297 5130 2560 1593 48707 133600 34910 3 0 2650721
proc4 2 0 0
To analyse some of the output parameters, I’ll be drawing most of the information below from an excellent article “Understanding Linux nfsd statistics”. A brief summary is as followed:
– rc reports the stats for the NFS reach cache. The three numbers are cache hits, cache misses, and”nocache” which is presumably requests that bypassed the cache.
– io reports the overall I/O counter. The 2 numbers are bytes read, bytes written
– th reports the nfsd thread utilization. The first number is the numberof nsfd thread configured. The second number of times any thread is used. The remaining ten numbers are histogram representing a 10% range of thread utilisation in seconds
– ra reports the read-ahead cache. The first number is the read-ahead cache size. The next 10 numbers are the number of times an entry was found in the read-ahead cache < 10%, < 20%, …, < 100% in to the cache. The last number on this line is the number of times an entry was not found in the cache.

October 5, 2010 by kittycool only

Tuning NFS Server exports file for performance

Tuning NFS Server exports file (/etc/exports) for performance. As far as I know, these 2 options are the most important

async: The default export behavior for both NFS Version 2 and Version 3 protocols, used by exportfs is “asynchronous”. According to Optimizing NFS Performance. This default permits the server to reply to client requests as soon as it has processed the request and handed it off to the local file system, without waiting for the data to be written to stable storage. This is indicated by the async option denoted in the server’s export list. It yields better performance at the cost of possible data corruption if the server reboots while still holding unwritten data and/or metadata in its caches. This possible data corruption is not detectable at the time of occurrence, since the async option instructs the server to lie to the client, telling the client that all data has indeed been written to the stable storage, regardless of the protocol used.
no_subtree_check: For NFS version 1.0.x and above, To speed up transfer, disable subtree check especially if you are exporting large directory.

/tmp *(rw,async,no_subtree_check)

For other good materials on the /etc/exports, do check out
http://linux.die.net/man/5/exports

October 1, 2010 by kittycool only

Taking into account Poor Bandwidth and Latency for Remote Desktop Services

This excellent article Dealing with Poor Bandwidth and Latency is an excellent article how Terminal Servers or Remote Desktop Servers should take into account poor bandwidth and latency for Remote Desktop Services. The factors highlighted in the articles include

Changing packet acknowledgements timer
Keepalives
The use of QoS
MTU and more…..

But that is just the Server side. You must also configure the Client portion to deal with High Latency Issue. You should try to update to RDP Client and make use of the new features. Among the many features is those that support High Latency Network

There are many ways to reduce the latency between the client and server. Will probably write on this soon.

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Month: October 2010