The variables are as followed:
LSB_JOBID: LSF assigned job ID
LSB_BATCH_JID: Array job ID. Includes job ID and array index number
LSB_JOBINDEX: Job array index
References:
The variables are as followed:
LSB_JOBID: LSF assigned job ID
LSB_BATCH_JID: Array job ID. Includes job ID and array index number
LSB_JOBINDEX: Job array index
References:
The IPoIB driver supports two modes of operation: Unreliable Datagram (UD) and Connected Mode.
In Unreliable datagram mode, the IB UD (Unreliable Datagram) transport is used and so the interface MTU has is equal to the IB L2 MTU minus the IPoIB encapsulation header (4 bytes). In QDR, the default NTU value is 2044. In FDR onwards, the default MTU value for Unreliable Datagram is 4096.
In Connected Mode, the IB RC (Reliable Connected) transport is used.Connected mode takes advantage of the connected nature of the IB transport and allows an MTU up to the maximal IP packet size of 64K, which reduces the number of IP packets needed for handling large UDP datagrams, TCP segments, etc and increases the performance for large messages. Default MTU will be 65000. Performance will be better
To verify what modes you are working on, just do a
# cat /sys/class/net/ib0/mode Datagram
You can use the ibdiagnet to generate the topology of the IB Network simply by using the “-w” switch
# ibdiagnet -w /var/tmp/ibdiagnet2/topology.top ..... ..... -I- ibdiagnet database file : /var/tmp/ibdiagnet2/ibdiagnet2.db_csv -I- LST file : /var/tmp/ibdiagnet2/ibdiagnet2.lst -I- Topology file : /var/tmp/ibdiagnet2/topology.top -I- Subnet Manager file : /var/tmp/ibdiagnet2/ibdiagnet2.sm -I- Ports Counters file : /var/tmp/ibdiagnet2/ibdiagnet2.pm -I- Nodes Information file : /var/tmp/ibdiagnet2/ibdiagnet2.nodes_info -I- Partition keys file : /var/tmp/ibdiagnet2/ibdiagnet2.pkey -I- Alias guids file : /var/tmp/ibdiagnet2/ibdiagnet2.aguid
# vim /var/tmp/ibdiagnet2/topology.top
# This topology file was automatically generated by IBDM SX6036G Left-Leaf-SW03 U1/P1 -4x-14G-> HCA_1 mtlacad05 U1/P1 U1/P17 -4x-14G-> SX6012 Right-Spine-SW02 U1/P2 U1/P18 -4x-14G-> SX6012 Left-Spine-SW01 U1/P2 U1/P2 -4x-14G-> HCA_1 mtlacad07 U1/P1 U1/P3 -4x-14G-> HCA_1 mtlacad03 U1/P1 U1/P4 -4x-14G-> HCA_1 mtlacad04 U1/P1 U1/P6 -4x-14G-> HCA_1 mtlacad06 U1/P1 ..... .....
ibportstate
# ibportstate LID PortNumber
# Port info: Lid 15 port 1 LinkState:.......................Active PhysLinkState:...................LinkUp Lid:.............................15 SMLid:...........................1 LMC:.............................0 LinkWidthSupported:..............1X or 4X LinkWidthEnabled:................1X or 4X LinkWidthActive:.................4X LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps LinkSpeedActive:.................10.0 Gbps LinkSpeedExtSupported:...........14.0625 Gbps LinkSpeedExtEnabled:.............14.0625 Gbps LinkSpeedExtActive:..............14.0625 Gbps Mkey:............................<not displayed> MkeyLeasePeriod:.................0 ProtectBits:.....................0 # MLNX ext Port info: Lid 15 port 1 StateChangeEnable:...............0x00 LinkSpeedSupported:..............0x01 LinkSpeedEnabled:................0x01 LinkSpeedActive:.................0x00
If you are using Mellanox IB Switches, you can use the following to do conduct performance tests, these are:
Latency Server Side:
Latency Client Side:
For examples:
1a. Latency Server Side
# ib_read_lat
1b. Client Side
# ib_read_lat IP_Address_of_Server -F -a #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] 2 1000 1.66 12.98 1.70 4 1000 1.64 13.40 1.67 8 1000 1.64 20.25 1.67 16 1000 1.64 19.61 1.68 ..... ..... 4096 1000 2.94 18.45 2.99
2a. Bandwidth Server Side
# ib_read_bw
2b. Bandwidth Client Side
# ib_read_bw -F -a #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] 2 1000 6.97 6.47 3.394435 .... .... 8192 1000 5983.30 5982.07 0.765704 .... .... 65536 1000 6075.37 6042.28 0.096676
Suppose you have source and destination server with a IB Switch in-between
Server 1 is as below. I have changed the GUID and mask for confidentiality sake
[root@mtlacad03 ~]# ibstat CA 'mlx4_0' CA type: MT4103 Number of ports: 2 Firmware version: 2.33.5100 Hardware version: 0 Node GUID: 00000000000000 System image GUID: 00000000000 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 15 LMC: 0 SM lid: 5 Capability mask: 0000000000000 Port GUID: 00000000000 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 111111111111 Port GUID: 1111111111111 Link layer: Ethernet
Server 2 is
[root@mtlacad07 ~]# ibstat CA 'mlx4_0' CA type: MT4103 Number of ports: 2 Firmware version: 2.33.5100 Hardware version: 0 Node GUID: 000000000000 System image GUID: 0000000000 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 13 LMC: 0 SM lid: 5 Capability mask: 00000000000 Port GUID: 000000000000000 Link layer: InfiniBand Port 2: State: Down Physical state: Disabled Rate: 40 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 1111111111111 Port GUID: 1111111111111 Link layer: Ethernet
So if you do a ibtracert
[root@mtlacad07 ~]# ibtracert 13 15
From ca {0000000000000} portnum 1 lid 13-13 "mtlacad07 HCA-1"
[1] -> switch port {0000000000000}[2] lid 20-20 "MF0;Left-Leaf-SW03:SX6036G/U1"
[3] -> ca port {00000000000}[1] lid 15-15 "mtlacad03 HCA-1"
To ca {0000000000000} portnum 1 lid 15-15 "mtlacad03 HCA-1"
Basically, what it mean is that the
Octopus is a scientific program aimed at the ab initio virtual experimentation on a hopefully ever-increasing range of system types. Electrons are described quantum-mechanically within density-functional theory (DFT), in its time-dependent form (TDDFT) when doing simulations in time. Nuclei are described classically as point particles. Electron-nucleus interaction is described within the pseudopotential approximation.
Requirements: (Taken from Octopus Installation Wiki)
In a nutshell, this is what you need. Do look at Octopus Wiki for more details
Step 1: Compile libXC. You can download libxc.2.0.0 from Octopus
Compile libXC. After untaring, do take a look at the INSTALL
# tar -zxvf libxc-2.0.0 # cd libxc-2.0.0 # ./configure --prefix=/usr/local/libxc-2.0.0/ CC=gcc CXX=g++ # make - j 8 # make install
Step 2: Compile gsl-1.14
Do take a look at Compiling GNU Scientific Library (GSL) gsl-1.16 on CentOS 6. Do look at ftp://ftp.gnu.org/gnu/gsl/
Step 3: Update your .bashrc
....... export FC=mpif90 export CC=mpicc export FCFLAGS="-O3" export CFLAGS="-O3" export PATH=$PATH:/usr/local/openmpi-1.6.4-gnu/bin:........... export LD_LIBRARY_PATH: $LD_LIBRARY_PATH: /usr/local/openmpi-1.6.4-gnu/lib: /usr/local/fftw-3.3.3-single/lib:/usr/local/libxc-2.0.0/lib...................
Step 4: Configure the Octopus-4.1.2
# ./configure --prefix=/usr/local/octopus-4.1.2 \ --with-libxc-prefix=/usr/local/libxc-2.0.0 \ --with-libxc-include=/usr/local/libxc-2.0.0/include \ --with-gsl-prefix=/usr/local/gsl-1.14 \ --with-blas=/usr/lib64/libblas.so.3.2.1 --with-arpack=/usr/lib64/libarpack.so.2 \ --with-fft-include=/usr/local/fftw-3.3.3-single\include \ --enable-mpi # make -j 8 # make install
If you are planning to have more nodes where the users can do submission apart from the Head Node of the Cluster, you may want to configure a Submission Node. By default, TORQUE only allow one submission node. There are 2 ways to configure this submission node. One way is using the “submit_hosts paramter” in the Torque Server.
Step 1a: Configuring the Submission
First and Foremost, one of the main prerequisites is that the submission nodes must be part of the resource pool identified by the Torque Server. If you are not part of the Torque Server, you may want to follow the steps to make the to-be-submission node part of the resource pool or a pbs_mom client. You can check the setup by looking at the Installing Torque 4.2.5 on CentOS 6. Configuring the TORQUE Clients. You might want to follow up with this optional setup Adding and Specifying Compute Resources at Torque to make sure your cores count are correct.
Step 1b: Ensure the exchange keys between submission node and Torque Server
For more information, see Auto SSH Login without Password
Step 1c: Configure the submission node as a non-default queue (Optional)
For more information, see Using Torque to set up a Queue to direct users to a subset of resources
Step 2: Registering the Submission Node in Torque
If you do not wish the compute node to be a compute resource, you can put a non-default queue or unique queue which users will not send to.
Once you have configured the to-be-submission node as one of the client, you have to now to configure the torque server by this commands.
# qmgr -c 'set server submit_hosts = hostname1'
# qmgr -c 'set server allow_node_submit = True'
Step 3: Putting Submission Node inside Torque Server /etc/hosts.equiv
# vim /etc/hosts.equiv submission_node.cluster.com
Step 4a: Copy trqauthd from primary submission node to the secondary submission node
# scp -v /etc/init.d/trqauthd root@submission_node.cluster.com:/etc/init.d
Step 4b: Start the trqauthd service on the submission node
# service trqauthd start
Step 5: Test the configuration
Do a
$ qsub -I nodes=1:ppn=8
You should see from the torque server that the job has been submitted via the submission node by doing a qstat -an
$ qstat -an
Step 6: Mount Maui Information from PBS/MAUI Server
From the MAUI Server, do a NFS, mount the configuration and binaries of MAUI
Edit /etc/exports
/opt/maui Submission-Node1(rw,no_root_squash,async,no_subtree_check) /usr/local/maui Submission-Node1(rw,no_root_squash,async,no_subtree_check)
At the MAUI Server, restart NFS Services
# service restart nfs
At the submission node, make sure you have the mount point /opt/maui and /usr/local/maui for the
At /etc/fstab, mount the file system and restart netfs
head-node1:/usr/local/maui /usr/local/maui nfs defaults 0 0 head-node1:/opt/maui /opt/maui nfs defaults 0 0
#service netfs restart
Resources:
Tuned is a Dynamic Adaptive Tuning System Daemon. According to Manual Page
tuned is a dynamic adaptive system tuning daemon that tunes system settings dynamically depending on usage. For each hardware subsystem a specific monitoring plugin collects data periodically. This information is then used by tuning plugins to change system settings to lower or higher power saving modes in order to adapt to the current usage. Currently monitoring and tuning plugins for CPU, ethernet network and ATA harddisk devices are implemented.
Using Tuned
1. Installing tuned
# yum install tuned
2. To view a list of available tuning profiles
[root@myCentOS ~]# tuned-adm list Available profiles: - laptop-ac-powersave - server-powersave - laptop-battery-powersave - desktop-powersave - virtual-host - virtual-guest - enterprise-storage - throughput-performance - latency-performance - spindown-disk - default
3. Tuning to a specific profile
# tuned-adm profile latency-performance Switching to profile 'latency-performance' Applying deadline elevator: dm-0 dm-1 dm-2 sda [ OK ] Applying ktune sysctl settings: /etc/ktune.d/tunedadm.conf: [ OK ] Calling '/etc/ktune.d/tunedadm.sh start': [ OK ] Applying sysctl settings from /etc/sysctl.conf Starting tuned: [ OK ]
4. Checking current tuned profile used and its status
# tuned-adm active Current active profile: latency-performance Service tuned: enabled, running Service ktune: enabled, running
5. Turning off the tuned daemon
# tuned-adm off
References:
Compiling Gromacs has never been easier using the cmake. There are a few assumptions.
Here is my configuration file using Intel Compilers
# tar xfz gromacs-5.0.4.tar.gz
# cd gromacs-5.0.4
# mkdir build
# cd build
# /usr/local/cmake-3.1.3/bin/cmake
-DGMX_BUILD_OWN_FFTW=ON \
-DREGRESSIONTEST_DOWNLOAD=OFF \
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-5.0.4 \
-DGMX_MPI=on \
-DGMX_FFT_LIBRARY=mkl \
-DMKL_LIBRARIES="/usr/local/intel/mkl/lib/intel64" \
-DMKL_INCLUDE_DIR="/usr/local/intel/mkl/include" \
-DGMX_DOUBLE=on \
-DGMX_BUILD_MDRUN_ONLY=off \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_CXX_COMPILER=mpicxx
# make
# make check
# sudo make install
# source /usr/local/gromacs/bin/GMXRC
Installation Flags
– -DCMAKE_C_COMPILER=xxx equal to the name of the C99 compiler you
wish to use (or the environment variable CC)
– -DCMAKE_CXX_COMPILER=xxx equal to the name of the C++98 compiler you
wish to use (or the environment variable CXX)
– -DGMX_MPI=on to build using an MPI wrapper compiler
– -DGMX_GPU=on to build using nvcc to run with an NVIDIA GPU
– -DGMX_SIMD=xxx to specify the level of SIMD support of the node on
which mdrun will run
– -DGMX_BUILD_MDRUN_ONLY=on to build only the mdrun binary, e.g. for
compute cluster back-end nodes
– -DGMX_DOUBLE=on to run GROMACS in double precision (slower, and not
normally useful)
– -DCMAKE_PREFIX_PATH=xxx to add a non-standard location for CMake to
search for libraries
– -DCMAKE_INSTALL_PREFIX=xxx to install GROMACS to a non-standard
location (default /usr/local/gromacs)
– -DBUILD_SHARED_LIBS=off to turn off the building of shared libraries
– -DGMX_FFT_LIBRARY=xxx to select whether to use fftw, mkl or fftpack
libraries for FFT support
– -DCMAKE_BUILD_TYPE=Debug to build GROMACS in debug mode