Intel MPI Parameter to consider for Performance

Selection of best available communication fabrics

Suggestion 1:

 I_MPI_DEVICE I_MPI_FABRICS Description
 sock tcp  TCP/IP-enable network fabrics, such as Ethernet and Infiniband* (through IPoIB*)
 shm shm Shared-memory only
 ssm  shm:tcp  Shared-memory + TCP/IP
 rdma dapl  DAPL-capable network fabrics, such as Infiniband*, iWarp*, Dolphon*, and XPMEM* (through DAPK*)
 rdssm shm:dapl  Shared-Memory + DAPL + sockers
 ofa  OFA-capable network fabrics including Infiniband* (through OFED* verbs)
 tmi  TMI-capable network fabrics including Qlogic*, Myrinet* (through Tag Matching Interface)

 

Suggestion 2:

I_MPI_DAPL_UD Values Description
 enable
  • Connectionless feature works for DAPL fabrics only.
  • Works with OFED 1.4.2 and 2.0.24 or higher
  • Provides better scalability
  • Significant reduces memory requirements

 

Suggestion 3:

 I_MPI_PERHOST Values Remarks
 1  Make round-robin distirbution (Default value)
 all  Maps processes to all logical CPUs on a node
 allcores  Maps processes to all physical CPUs on a node

 

Suggestion 4:

I_MPI_SHM_BYPASS Values Remarks
 disable Set I_MPI_SHM_BYPASS* to ‘enable’ to turn on RDMA data exchange within single node that may outperform regular shared memory exchange. This is normally happens for large (350kb+) messages.

 

Suggestion 5:

I_MPI_ADJUST_ALLREDUCE Values Remarks
 recursive doubling algorithm 1
 Rabenseifner’s algorithm 2
 Reduce + Bcast 3
Topology aware Reduce + Bcast algorithm 4  
 Binomial gather + scatter algorithm 5
 Topology Aware Binomial Gather + scatter algorithm 6
 Ring Algorithm 7

 

Suggesion 6:

I_MPI_WAIT_MODE Values Remarks
 1 Set I_MPI_WAIT_MODE ‘to enable’ to try wait mode of the progress engine. The processes that waits for receiving that waits for receiving messages without polling of the fabrics(d) can save CPU time.

Apply wait mode to oversubscribe jobs

 

References:

  1. 23 Tips for Performance Tuning with the Intel Library (pdf)

Using Intel IMB-MPI1 to check Fabrics and expected performances

In your .bashrc, do source the

source /usr/local/intel_2015/parallel_studio_xe_2015/bin/psxevars.sh intel64
source /usr/local/intel_2015/impi/5.0.3.049/bin64/mpivars.sh intel64
source /usr/local/intel_2015/composerxe/bin/compilervars.sh intel64
source /usr/local/intel_2015/mkl/bin/mklvars.sh intel64
MKLROOT=/usr/local/intel_2015/mkl

To simulate 3 workloads pingpong, sendrecv, and exchange with IMB-MPT1

$ mpirun -r ssh -RDMA -n 512 -env I_MPI_DEBUG 5 IMB-MPT1

 

Compiling Intel FFTW3 and FFTW2 Interface Wrapper Library

FFTW3 wrappers to Intel MKL are delivered both in Intel MKL and as source code which can be compiled to build to build standalone wrapper library with exactly the same functionality.

The source code for the wrappers, makefiles are found …..\interfaces\fftw3xc subdirectory in the Intel MKL Directory

Intel FFTW3 Interface Wrapper Library. Do the same for fftw3xc and fftw3xf

# cd $MKLROOT
# cd interfaces/fftw3xc
# make libintel64  INSTALL_DIR=$MKLROOT/lib/intel64
# cd $MKLROOT
# cd interfaces/fftw3xf
# make libintel64  INSTALL_DIR=$MKLROOT/lib/intel64

Once Compiled, the libraries are kept $MKLROOT/lib/intel64

Intel FFTW2 Interface Wrapper Library. Do the same for fftw2xc and fftw2xf

# cd $MKLROOT
# cd interfaces/fftw2xc
# make libintel64  PRECISION=MKL_DOUBLE
# make libintel64  PRECISION=MKL_SINGLE
# cd $MKLROOT
# cd interfaces/fftw2xf
# make libintel64  PRECISION=MKL_DOUBLE INSTALL_DIR=$MKLROOT/lib/intel64
# make libintel64  PRECISION=MKL_SINGLE INSTALL_DIR=$MKLROOT/lib/intel64

Once Compiled, the libraries are kept $MKLROOT/lib/intel64

Compiling with NWChem-6.6 with Intel MPI-5.0.3

Here is a write-up of my computing platform and applications:

  1. NWChem 6.6 (Oct 2015)
  2. Intel Compilers 2015 XE (version 15.0.6)
  3. Intel MPI (5.0.3)
  4. Intel MKL (11.2.4)
  5. Infiniband Inteconnect (OFED 1.5.3)
  6. CentOS 6.3 (x86_64)

Step 1: First thing first, source the intel components setting from

source /usr/local/intel_2015/parallel_studio_xe_2015/bin/psxevars.sh intel64
source /usr/local/intel_2015/impi/5.0.3.049/bin64/mpivars.sh intel64
source /usr/local/intel_2015/composerxe/bin/compilervars.sh intel64
source /usr/local/intel_2015/mkl/bin/mklvars.sh intel64

Step 2: Assuming you are done, you may want to download the NWChem 6.6 from NWChem Website. You may also want to take a look at instruction set for Compiling NWChem.

I have less problem running NWCHEM when the installation and the compiling directories are the same. If you recompile, do untar from source. Somehow the “make clean” does not clean the directories properly.

# tar -zxvf Nwchem-6.6.revision27746-src.2015-10-20.tar.gz
# cd nwchem-6.6

Step 3: Apply All the Patches for the 27746 revision of NWChem 6.6

cd $NWCHEM_TOP
wget http://www.nwchem-sw.org/download.php?f=Xccvs98.patch.gz
gzip -d Xccvs98.patch.gz
patch -p0 < Xccvs98.patch

Here is my nwchem_script_Feb2017.sh. For more details information, see Compiling NWChem for details on some of the parameters

export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=/home/melvin/Downloads/nwchem-6.6
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export LARGE_FILES=TRUE

export ARMCI_NETWORK=OPENIB
export IB_INCLUDE=/usr/include
export IB_LIB=/usr/lib64
export IB_LIB_NAME="-libumad -libverbs -lpthread"

export MSG_COMMS=MPI
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/local/RH6_apps/intel_2015/impi_5.0.3/intel64
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-lmpigf -lmpigi -lmpi_ilp64 -lmpi"

export FC=ifort
export CC=icc

export MKLLIB=/usr/local/RH6_apps/intel_2015/mkl/lib/intel64
export MKLINC=/usr/local/RH6_apps/intel_2015/mkl/include

export PYTHONHOME=/usr
export PYTHONVERSION=2.6
export USE_PYTHON64=y
export PYTHONLIBTYPE=so
sed -i 's/libpython$(PYTHONVERSION).a/libpython$(PYTHONVERSION).$(PYTHONLIBTYPE)/g' config/makefile.h

export HAS_BLAS=yes
export BLAS_SIZE=8 
export BLASOPT="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"
export LAPACK_LIBS="-L$MKLLIB -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm"
export SCALAPACK_SIZE=8
export SCALAPACK="-L$MKLLIB -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm"
export USE_64TO32=y

echo "cd $NWCHEM_TOP/src"
cd $NWCHEM_TOP/src

echo "BEGIN --- make realclean "
make realclean
echo "END --- make realclean "

echo "BEGIN --- make nwchem_config "
make nwchem_config
echo "END --- make nwchem_config "

echo "BEGIN --- make"
make CC=icc FC=ifort FOPTIMIZE=-O3 -j4
echo "END --- make "

cd $NWCHEM_TOP/src/util
make CC=icc FC=ifort FOPTIMIZE=-O3 version
make CC=icc FC=ifort FOPTIMIZE=-O3
cd $NWCHEM_TOP/src
make CC=icc FC=ifort FOPTIMIZE=-O3  link

General Site Installation

Determine the local storage path for the install files. (e.g., /usr/local/NWChem).
Make directories

# mkdir /usr/local/nwchem-6.6
# mkdir /usr/local/nwchem-6.6/bin
# mkdir /usr/local/nwchem-6.6/data

Copy binary

# cp $NWCHEM_TOP/bin/ /usr/local/nwchem-6.6/bin
# cd /usr/local/nwchem-6.6/bin
# chmod 755 nwchem

Copy libraries

# cd $NWCHEM_TOP/src/basis
# cp -r libraries /usr/local/nwchem-6.6/data

# cd $NWCHEM_TOP/src/
# cp -r data /usr/local/nwchem-6.6

# cd $NWCHEM_TOP/src/nwpw
# cp -r libraryps /usr/local/nwchem-6.6/data

The Final Lap (From Compiling NWChem)

Each user will need a .nwchemrc file to point to these default data files. A global one could be put in /usr/local/nwchem-6.6/data and a symbolic link made in each users $HOME directory is probably the best plan for new installs. Users would have to issue the following command prior to using NWChem: ln -s /usr/local/nwchem-6.6/data/default.nwchemrc $HOME/.nwchemrc

Contents of the default.nwchemrc file based on the above information should be:

nwchem_basis_library /usr/local/nwchem-6.6/data/libraries/
nwchem_nwpw_library /usr/local/nwchem-6.6/data/libraryps/
ffield amber
amber_1 /usr/local/nwchem-6.6/data/amber_s/
amber_2 /usr/local/nwchem-6.6/data/amber_q/
amber_3 /usr/local/nwchem-6.6/data/amber_x/
amber_4 /usr/local/nwchem-6.6/data/amber_u/
spce    /usr/local/nwchem-6.6/data/solvents/spce.rst
charmm_s /usr/local/nwchem-6.6/data/charmm_s/
charmm_x /usr/local/nwchem-6.6/data/charmm_x/

References:

  1. 470. Very briefly: compiling nwchem 6.3 with ifort and mkl
  2. Compiling NWChem from source
  3. Problem building NWChem version 6.5 on IB cluster with MKL & IntelMPI

Compiling swig-3.0.12 on CentOS 6

Compile swig-3.0.12

Do download http://www.swig.org/index.php.
PCRE needs to be installed on your system to build SWIG, in particular pcre-config must be available. For more information on SWIG installation do take a look at http://www.swig.org/Doc3.0/SWIGDocumentation.html

./configure --prefix=/usr/local/RH6_apps/swig-3.0.12 --with-pcre-prefix=/usr/local/RH6_apps/pcre-8.40 --without-clisp --without-maximum-compile-warnings
# make
# make install

Configuring 2 Gateways on the Same Linux Box

Suppose you have 2 network on a PC says
192.168.1.5/24 (eth0 – Private Network)
172.16.10.4/24 (eth1 – Public Network)

Let’s assume the 172.16.10.254 is the Default Gateway for 172.16.10.0 network. If there is a router at 192.168.1.254 for the internal network and you wish to connect to other networks says 192.168.5.0 192.168.6.0 and 192.168.7.0 networks. You will need to add “static routes” to these networks

route add -net 192.168.5.0 netmask 255.255.255.0 gw 192.168.1.254
route add -net 192.168.6.0 netmask 255.255.255.0 gw 192.168.1.254
route add -net 192.168.7.0 netmask 255.255.255.0 gw 192.168.1.254

To make the setting permanent, edit /etc/sysconfig/network-scripts/route-eth0

192.168.5.0/24 via 192.168.1.154 dev eth0 
192.168.6.0/24 via 192.168.1.154 dev eth0 
192.168.7.0/24 via 192.168.1.154 dev eth0

To check the setting is ok

# ip route show