August 18, 2016 by kittycool only

Compiling CPMD-3.17.1 with Intel-13.0.1.117 and OpenMPI-1.8.3

I’m assuming you have compiled OpenMPI with Intel Compiler. If you are not sure, you can look at Blog Entry
Compiling OpenMPI 1.6.5 with Intel 12.1.5 on CentOS 6

To get the source code from CPMD, please go to http://www.cpmd.org/

Step 1: From the CPMD Directory

cd ~/CPMD-3.13.2/SOURCE
./mkconfig.sh IFORT-AMD64-MPI > Makefile

Step 2: I’m using CentOS 6 internal Blas, lapack and atlas. Make sure your configure the one below.

#--------------- Default Configuration for IFORT-AMD64-MPI ---------------
SRC  = .
DEST = .
BIN  = .
FFLAGS = -pc64  -tpp6 -O2 -unroll
#LFLAGS =  -L. -latlas_x86_64
LFLAGS =  -L/usr/lib64/atlas -llapack -lblas
CFLAGS = -O2 -Wall -m64
CPP = /lib/cpp -P -C -traditional
CPPFLAGS = -D__Linux -D__PGI -DFFT_DEFAULT -DPOINTER8 -DLINUX_IFC \
-DPARALLEL
NOOPT_FLAG =
CC = mpicc
FC = mpif77 -c
LD = mpif77 -i-static
AR = ar
#----------------------------------------------------------------------------

Step 3: Compile CPMD

# make

If the compilation succeed, it should generate a cpmd.x executable.

Step 4: Pathing
Make sure your $PATH reflect the path of the executable cpmd.x. It is also important to ensure that you check that the libraries are properly linked to the executable

# ldd cpmd.x

Step 5: Test your executable. You have to go to CPMD Consortium to download the cpmd-test.tar.gz for testing.

August 16, 2016 by kittycool only

LSF retained the original Max Locked Memory and not the updated one

The value of “max locked memory” has been modified at the operating system level, but LSF still returns the original value.

Symptoms before updating max locked memory

[user1@cluster-h00 ~]$ bsub -q myQueue -W 120:00 -n 16 -P myProjectGroup -m compute-node1 -I ulimit -a
Job <32400> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on compute-node1>>
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1027790
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1027790
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

To resolve this issue,

# badmin hshutdown
# badmin hstartup

[user1@cluster-h00 ~]$ bsub -q gpgpu -m compute-node1 -I ulimit -a
Job <32490> is submitted to queue <gpgpu>.
<<Waiting for dispatch ...>>
<<Starting on compute-node1>>
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515133
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515133
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

References:

LSF does not recognize that “max locked memory” has been updated

August 15, 2016 by kittycool only

Algorithm negotiation failed for SSH Secure Shell Client

If you are using the dated SSH Secure Shell Client 3.2.9, you may have issue connect to the more updated OpenSSH Server.

If you cannot change the client (which is recommended), you will have to update the OpenSSH Server on Linux. Add this in

# vim /etc/ssh/sshd_config

# Ciphers
Ciphers aes128-cbc,aes192-cbc,aes256-cbc,blowfish-cbc,arcfour
KexAlgorithms diffie-hellman-group1-sha1

*If you are using Centrify-OpenSSH, you have to modify /etc/centrifydc/ssh/sshd_config and do the same

References:

Bug 1228013 – Server responded “Algorithm negotiation failed”

August 15, 2016 by kittycool only

Enable Centrify Agent to read UID and GID from Centrify DirectManage Access Manager

We purchased Centrify Standard and setup the DirectManage Access Manager. Next we proceed to install the client agent on the compute node.

After unpacking and installing the agent, when we do a

# getent passwd  |grep kittycool
kittycool:x:1304567321211:1304567321211:kittycool:/home/kittycool:/bin/bash
kittycool:x:10001:10001:kittycool:/home/kittycool:/bin/bash

Apparently, the getent passwd |grep kittycool is pulling both the Active Directory UID and the DirectManage Access and the user UID differs

To resolve this issue, you need to specify the zone which is used by DirectManage Access Manager, so your UID of the user will pick from the DirectManage Access Manager.

# adjoin -z cluster -u OU_Administrator  staff.mycompany.com.sg -c "staff.mycompany.com.sg/HPC/Computers"

To check it is displaying the correct UID and GID,

# getent passwd  |grep kittycool
kittycool:x:10001:10001:kittycool:/home/kittycool:/bin/bash

July 27, 2016 by kittycool only

Compiling and Installing mfix-2016 with Intel MPI 5.0.3

The document to install MFIX can be found at https://mfix.netl.doe.gov/download/mfix/mfix_current_documentation/mfix_user_guide.pdf

We compiled using Intel 15.0.6 and Intel-MPI-5.0.3. Once done, you can easily compile with the following configuration parameters

# ./configure FC=mpif90 FCFLAGS='-g -O2' --prefix=/usr/local/mfix-2016.1_impi --enable-dmp
# make -j 16
# make install

Copy the libmfix.a to /usr/local/mfix-2016.1_impi

# mkdir /usr/local/mfix-2016.1_impi/lib
# cp libmfix.a /usr/local/mfix-2016.1_impi/lib

July 20, 2016 by kittycool only

Compiling FDS-SMV with Intel MPI-5.0.3 on CentOS 6

Download the FDS Development Zip from fds-smv GitHub

Step 1: Unzip the fds-development.zip

$ unzip fds-development.zip

Step 2: Update .bashrc

.....
.....
export IFORT_COMPILER=/usr/local/intel_2015/composerxe/bin
#FDS environment -----------------------
export MPIDIST_ETH=/usr/local/intel_2015/impi/5.0.3.049/bin64
export MPIDIST_IB=/usr/local/intel_2015/impi/5.0.3.049/bin64
source ~/.bashrc_fds
#FDS -----------------------------------
.....
.....

Step 3: Initialise Intel Compilers

# touch /etc/profile.d/intel.sh
# vim intel.sh

source /usr/local/intel_2015/composerxe/bin/compilervars.sh intel64

Step 4: Compilation

# cd /home/user1/Downloads/fds-smv-development/FDS_Compilation/mpi_intel_linux_64ib
# ./make_fds.sh

Step 5: Test

# ./fds_mpi_intel_linux_64ib

Fire Dynamics Simulator

 Current Date     : July 21, 2016  13:13:56
 Version          : FDS 6.5.1
 Revision         :
 Revision Date    :
 Compilation Date : Jul 21, 2016  12:16:33

 MPI Enabled; Number of MPI Processes:          1
 OpenMP Enabled; Number of OpenMP Threads:   4

 MPI version: 3.0
 MPI library version: Intel(R) MPI Library 5.0 Update 3 for Linux* OS


 Consult FDS Users Guide Chapter, Running FDS, for further instructions.

 Hit Enter to Escape...

June 14, 2016 by kittycool only

Compiling LAMMPS-14May16 with Intel-15.0.6 and Intel-MPI-5.0.3

Step 1: Remember to initialise Intel Environment. In your .bashrc

source /usr/local/intel/impi/5.0.3.049/bin64/mpivars.sh intel64
source /usr/local/intel/composerxe/bin/compilervars.sh intel64
source /usr/local/intel/mkl/bin/mklvars.sh intel64

Step 2: Untar LAMMPS

# tar -zxvf lammps-stable.tar.gz

Step 3: Prepare selected libraries for lammps

3a. lib/reax

# make -f Makefile.gfortran

3b. lib/meam

# make -f Makefile.ifort

3c. lib/poems

# make -f Makefile.icc

3d. lib/colvars

# make -f Makefile.g++

Step 4: Compile required the packages. Go to src directory

# cd src

4a. Check which packages are included

# make package-status
Installed YES: package ASPHERE
Installed YES: package BODY
Installed YES: package CLASS2
Installed YES: package COLLOID
Installed YES: package COMPRESS
Installed YES: package CORESHELL
Installed YES: package DIPOLE
Installed YES: package FLD
Installed  NO: package GPU
Installed YES: package GRANULAR
Installed  NO: package KIM
Installed YES: package KOKKOS
  src/pair_lj_sdk_kokkos.cpp does not exist
  src/pair_lj_sdk_kokkos.h does not exist
Installed YES: package KSPACE
Installed YES: package MANYBODY
Installed YES: package MC
Installed YES: package MEAM
Installed YES: package MISC
Installed YES: package MOLECULE
Installed YES: package MPIIO
Installed YES: package OPT
Installed YES: package PERI
Installed YES: package POEMS
Installed YES: package PYTHON
Installed YES: package QEQ
Installed YES: package REAX
Installed YES: package REPLICA
Installed YES: package RIGID
Installed YES: package SHOCK
Installed YES: package SNAP
Installed YES: package SRD
Installed  NO: package VORONOI
Installed YES: package XTC
.........

4b. Choose all the standard

# make yes-standard

4c. Exclude packages that are not required

# make no-voronoi
# make no-kim
# make no-gpu
# make no-kokkos

Step 5: Install User-Contributed Intel Optimised Package (user-intel) and User-Contributed OpenMP Packages (user-omp)

# make yes-user-intel
# make yes-user-omp

Step 6: Compile LAMMPS

Check make options. Go to src

# make intel_cpu_intelmpi -j 16

Step 6a:

If in the midst of compilation you land yourself in this error ld: unable to to locate -lompstub. This is due to Intel 2015 using depreciating ompstub to ompistub5. Just go to the Intel Directory

# cd /usr/local/intel_2015/composerxe/lib/intel64
# ln -s  libiompstubs5.so libompstub.so

Step 7: Create a /usr/local/lammps-7Dec15 and copy libraries. Go to lammps root directory

# cp -Rv bench /usr/local/lammps-14May16
# cp -Rv doc /usr/local/lammps-14May16
# cp -Rv examples /usr/local/lammps-14May16
# cp -Rv potentials /usr/local/lammps-14May16
# cp README /usr/local/lammps-14May16
# cp -Rv tools /usr/local/lammps-14May16
# cp -Rv lib /usr/local/lammps-14May16
# cp src/lmp_intel_cpu_intelmpi /usr/local/lammps-14May16/bin

7a. Create a softlinks

# ln -s /usr/local/lammps-7Dec15/bin/lmp_intel_cpu_intelmpi lammps

References:

June 5, 2016 by kittycool only

Compiling Quantum ESPRESSO-2.4.0 with Intel Parallel Studio 2016 on CentOS 6

Before Compilation,

Step 1: Download Quantum ESPRESSO 5.4.0 from Quantum ESPRESSO Download Site

Step 2: Remember to source the Intel Compilers and indicate MKLROOT in your .bashrc

export MKLROOT=/usr/local/intel_2016/mkl/lib
source /usr/local/intel_2016/bin/compilervars.sh intel64

Compilation QE-5.40

Step 3: Make a file call setup.sh and copy the contents inside.

export F90=mpiifort
export F77=mpiifort
export MPIF90=mpiifort
export CC=mpiicc
export CPP="icc -E"
export CFLAGS="-g O3"
export AR=xiar
export BLAS_LIBS=""
export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"
export SCALAPACK_LIBS="-lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64"
export FFT_LIBS="-L$MKLROOT/intel64"
./configure  --enable-parallel --prefix=/usr/local/espresso-5.4.0

# ./setup.sh

# make all -j 16
# make install

References:

May 27, 2016 by kittycool only

GPFS Nodes being Expelled by Failed GPFS Clients

According to IBm Developer Wiki Troubleshooting Debug Expel

Disk Lease Expiration – GPFS uses a mechanism referred to as a disk lease to prevent file system data corruption by a failing node. A disk lease grants a node the right to submit IO to a file system. File system disk leases are managed by the Cluster Manager of the file system’s home cluster. A node must periodically renew it’s disk lease with the Cluster Manager to maintain it’s right to submit IO to the file system. When a node fails to renew a disk lease with the Cluster Manager, the Cluster Manager marks the node as failed, revokes the node’s right to submit IO to the file system, expels the node from the cluster, and initiates recovery processing for the failed node.
Node Expel Request – GPFS uses a mechanism referred to as a node expel request to prevent file system resource deadlocks. Nodes in the cluster require reliable communication amongst themselves to coordinate sharing of file system resources. If a node fails while owning a file system resource, a deadlock may ensue. If a node in the cluster detects that another node owing a shared file system resource may have failed, the node will send a message to the file system Cluster Manger requesting the failed node to be expelled from the cluster to prevent a shared file system resource deadlock. When the Cluster Manager receives a node expel request, it determines which of the two nodes should be expelled from the cluster and takes similar action as described for the Disk Lease expiration.

But in my case, I have an errant failed GPFS Client node in the network which was within the cluster. All the other legitimiate GPFS Client was trying to expel this failed node, but got expel instead. The errant one remain, while the legitimate was expelled. The only solution was to power off this errant and the entire GPFS File System became operational. Here is an except in the Log File of the NSD Nodes.

in fact, a lots of hints are found on the /var/adm/ras/mmfs.log.latest of any NSD Nodes. You should be able to locate it there.

Fri May 27 16:34:53.249 2016: Expel 172.16.20.5 (goldsvr1) request from 192.168.104.34 (compute186). Expelling: 192.168.104.34 (compute186)
Fri May 27 16:34:53.259 2016: Recovering nodes: 192.168.104.34
Fri May 27 16:34:53.311 2016: Recovered 1 nodes for file system gpfs3.
Fri May 27 16:34:55.636 2016: Accepted and connected to 10.0.104.34 compute186 <c0n135>
Fri May 27 16:39:13.333 2016: Expel 172.16.20.5 (goldsvr1) request from 192.168.104.45 (compute197). Expelling: 192.168.104.45 (compute197)
Fri May 27 16:39:13.334 2016: VERBS RDMA closed connection to 192.168.104.45 compute197 on mlx4_0 port 1
Fri May 27 16:39:13.344 2016: Recovering nodes: 192.168.104.45
Fri May 27 16:39:13.393 2016: Recovered 1 nodes for file system gpfs3.
Fri May 27 16:39:15.725 2016: Accepted and connected to 10.0.104.45 compute197 <c0n141>
Fri May 27 16:40:18.570 2016: VERBS RDMA accepted and connected to 192.168.104.45 on mlx4_0 port 1

May 15, 2016 by kittycool only

Install Nvidia CUDA-7.5 environment in CentOS 6

This note is derived from How to install Nvidia CUDA environment in RHEL 6? It works for me.

Step 1: Install kernel

# yum install kernel-devel kernel-headers -y

Step 2: Download and Install Cuda Toolkit

Cuda Downloads Site

Step 3: Configure CUDA Toolkit

# echo -e "/usr/local/cuda-7.5/lib64\n/usr/local/cuda-7.5/lib" > /etc/ld.so.conf./cuda.conf
# echo "export PATH=/usr/local/cuda-7.5/bin:$PATH" > /etc/profile.d/cuda.sh
# /sbin/ldconfig

Step 4: Disable Noveau Driver

# echo -e "\nblacklist nouveau" >> /etc/modprobe.d/blacklist.conf
# dracut -f /boot/initramfs-`rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`.img `rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`

Step 5: Reboot the Server

Step 6: Check Supported Nvidia Card

[root@comp1 ~]# lspci -d "10de:*" -v
84:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m] (rev a1)
Subsystem: NVIDIA Corporation Device 097e
Flags: bus master, fast devsel, latency 0, IRQ 64
Memory at c9000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3c400000000 (64-bit, prefetchable) [size=16G]
Memory at 3c3fe000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: nvidia
Kernel modules: nvidia, nouveau, nvidiafb

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Author: kittycool only

Compiling CPMD-3.17.1 with Intel-13.0.1.117 and OpenMPI-1.8.3

LSF retained the original Max Locked Memory and not the updated one

Algorithm negotiation failed for SSH Secure Shell Client

Enable Centrify Agent to read UID and GID from Centrify DirectManage Access Manager

Compiling and Installing mfix-2016 with Intel MPI 5.0.3

Compiling FDS-SMV with Intel MPI-5.0.3 on CentOS 6

Compiling LAMMPS-14May16 with Intel-15.0.6 and Intel-MPI-5.0.3

Compiling Quantum ESPRESSO-2.4.0 with Intel Parallel Studio 2016 on CentOS 6

GPFS Nodes being Expelled by Failed GPFS Clients

Install Nvidia CUDA-7.5 environment in CentOS 6