Compiling Quantum ESPRESSO-6.5.0 with Intel MPI 2018 on CentOS 7

Step 1: Download Quantum ESPRESSO 6.5.0 from Quantum ESPRESSO Download Site or git-clone QE

$ git clone https://gitlab.com/QEF/q-e.git

Step 2: Remember to source the Intel Compilers and indicate MKLROOT in your .bashrc

export MKLROOT=/usr/local/intel_2018/mkl/lib
source /usr/local/intel/2018u3/parallel_studio_xe_2018/bin/psxevars.sh intel64
source /usr/local/intel/2018u3/compilers_and_libraries/linux/bin/compilervars.sh intel64
source /usr/local/intel/2018u3/impi/2018.3.222/bin64/mpivars.sh intel64

Step 3: Make a file call setup.sh and copy the contents inside.

export F90=mpiifort
export F77=mpiifort
export MPIF90=mpiifort
export CC=mpiicc
export CPP="icc -E"
export CFLAGS=$FCFLAGS
export AR=xiar
export BLAS_LIBS=""
export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"
export SCALAPACK_LIBS="-lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64"
export FFT_LIBS="-L$MKLROOT/intel64"
# ./configure  --enable-parallel --prefix=/usr/local/espresso-6.5.0
# ./setup.sh
# make all -j 16 
# make install

 

Intel® Math Kernel Library Link Line Advisor

The Intel® Math Kernel Library (Intel® MKL) is designed to run on multiple processors and operating systems. It is also compatible with several compilers and third party libraries, and provides different interfaces to the functionality. To support these different environments, tools, and interfaces Intel MKL provides mutliple libraries from which to choose.

 

For more information to generate the libraries  Intel® Math Kernel Library Link Line Advisor

 

Online AI Training by Intel (December 2019)

Online AI Training

 

TensorFlow*

MXNet

Intel® Distribution of OpenVINO™ Toolkit

Compiling OpenFOAM-5.0 with Intel-MPI

Minimum Requirements version

  1. gcc: 4.8.5
  2. cmake: 3.3 (required for ParaView and CGAL build)
  3. boost: 1.48 (required for CGAL build)
  4. fftw: 3.3.7 (optional – required for FFT-related functionality)
  5. Qt: 4.8 (optional – required for ParaView build)

I’m using Intel-16.0.4 and Intel-MPI-5.1.3.258

Step 1a: Download and Unpacking Sources

# wget -O - http://dl.openfoam.org/source/5-0 | tar xvz
# wget -O - http://dl.openfoam.org/third-party/5-0 | tar xvz

Step 1b:  Rename the Directory

# mv OpenFOAM-5.x-version-5.0 OpenFOAM-5.0
# mv ThirdParty-5.x-version-5.0 ThirdParty-5.0

Step 2: Initiate Intel and Intel-MPI Environment and source OpenFOAM-5.0 bashrc

source /usr/local/intel/bin/compilervars.sh intel64
source /usr/local/intel/parallel_studio_xe_2016.4.072/bin/psxevars.sh intel64
source /usr/local/intel/impi/5.1.3.258/bin64/mpivars.sh intel64
source /usr/local/intel/mkl/bin/mklvars.sh intel64
source /usr/local/OpenFOAM/OpenFOAM-5.0/etc/bashrc
export MPI_ROOT=/usr/local/intel/impi/5.1.3.258/intel64

Step 3: Make sure your CentOS-7 Environment have the following base packages

# yum install gcc-c++ gcc-gfortran gmp flex flex-devel boost zlib zlib-devel qt4 qt4-devel

Step 4: Edit the OpenFOAM  internal bashrc

# vim /usr/local/OpenFOAM/OpenFOAM-5.0/etc/bashrc

Line 35,36

export WM_PROJECT=OpenFOAM
export WM_PROJECT_VERSION=5.0

Line 45

FOAM_INST_DIR=/usr/local/$WM_PROJECT

Line 60

export WM_COMPILER_TYPE=system

Line 65

export WM_COMPILER=Icc

Line 88

export WM_MPLIB=INTELMPI

Step 5: Compile OpenFOAM

# ./Allwmake -update -j

Compiling Intel BLAS95 and LAPACK95 Interface Wrapper Library

BLAS95 and LAPACK95 wrappers to Intel MKL are delivered both in Intel MKL and as source code which can be compiled to build to build standalone wrapper library with exactly the same functionality.

The source code for the wrappers, makefiles are found …..\interfaces\blas95 subdirectory in the Intel MKL Directory

For blas95

# cd $MKLROOT
# cd interfaces/blas95
# make libintel64  INSTALL_DIR=$MKLROOT/lib/intel64

Once Compiled, the libraries are kept $MKLROOT/lib/intel64

For Lapack95

# cd $MKLROOT
# cd interfaces/lapack95
# make libintel64  INSTALL_DIR=$MKLROOT/lib/intel64

Once Compiled, the libraries are kept $MKLROOT/lib/intel64

 

Intel MPI Parameter to consider for Performance

Selection of best available communication fabrics

Suggestion 1:

 I_MPI_DEVICE I_MPI_FABRICS Description
 sock tcp  TCP/IP-enable network fabrics, such as Ethernet and Infiniband* (through IPoIB*)
 shm shm Shared-memory only
 ssm  shm:tcp  Shared-memory + TCP/IP
 rdma dapl  DAPL-capable network fabrics, such as Infiniband*, iWarp*, Dolphon*, and XPMEM* (through DAPK*)
 rdssm shm:dapl  Shared-Memory + DAPL + sockers
 ofa  OFA-capable network fabrics including Infiniband* (through OFED* verbs)
 tmi  TMI-capable network fabrics including Qlogic*, Myrinet* (through Tag Matching Interface)

 

Suggestion 2:

I_MPI_DAPL_UD Values Description
 enable
  • Connectionless feature works for DAPL fabrics only.
  • Works with OFED 1.4.2 and 2.0.24 or higher
  • Provides better scalability
  • Significant reduces memory requirements

 

Suggestion 3:

 I_MPI_PERHOST Values Remarks
 1  Make round-robin distirbution (Default value)
 all  Maps processes to all logical CPUs on a node
 allcores  Maps processes to all physical CPUs on a node

 

Suggestion 4:

I_MPI_SHM_BYPASS Values Remarks
 disable Set I_MPI_SHM_BYPASS* to ‘enable’ to turn on RDMA data exchange within single node that may outperform regular shared memory exchange. This is normally happens for large (350kb+) messages.

 

Suggestion 5:

I_MPI_ADJUST_ALLREDUCE Values Remarks
 recursive doubling algorithm 1
 Rabenseifner’s algorithm 2
 Reduce + Bcast 3
Topology aware Reduce + Bcast algorithm 4  
 Binomial gather + scatter algorithm 5
 Topology Aware Binomial Gather + scatter algorithm 6
 Ring Algorithm 7

 

Suggesion 6:

I_MPI_WAIT_MODE Values Remarks
 1 Set I_MPI_WAIT_MODE ‘to enable’ to try wait mode of the progress engine. The processes that waits for receiving that waits for receiving messages without polling of the fabrics(d) can save CPU time.

Apply wait mode to oversubscribe jobs

 

References:

  1. 23 Tips for Performance Tuning with the Intel Library (pdf)

Using Intel IMB-MPI1 to check Fabrics and expected performances

In your .bashrc, do source the

source /usr/local/intel_2015/parallel_studio_xe_2015/bin/psxevars.sh intel64
source /usr/local/intel_2015/impi/5.0.3.049/bin64/mpivars.sh intel64
source /usr/local/intel_2015/composerxe/bin/compilervars.sh intel64
source /usr/local/intel_2015/mkl/bin/mklvars.sh intel64
MKLROOT=/usr/local/intel_2015/mkl

To simulate 3 workloads pingpong, sendrecv, and exchange with IMB-MPT1

$ mpirun -r ssh -RDMA -n 512 -env I_MPI_DEBUG 5 IMB-MPT1