Reinstating user password-less access to compute nodes

There are occasionally in a cluster environment that users accidentally delete their head node SSH keys and later cannot submit their jobs to the queue or their MPI jobs cannot scale beyond 1 node. The system you will see when you turn on the verbose method

To conduct a quick test,

# ssh -v remote-host

you will see an errors similar to  such as those below:

debug1: Unspecified GSS failure.  Minor code may provide more information
Unknown code krb5 195

OR

debug1: Miscellaneous failure
No credentials cache found

To reinstate the password-less access to compute nodes, you have to do the following. First thing first, please do backup files at your ~/.ssh/

Step 1: Regenerate the SSH keys
SSH Login without Password

Step 2: Append the public keys ~/.ssh/id_rsa.pub and put into the ~/.ssh/authorized_keys

# cd ~/.ssh/
# cat id_rsa.pub >> authorized_keys
# chmod 400 /home/myuser/.ssh/authorized_keys

Step 3: Try ssh into the compute nodes. It should be clear password-less access to all nodes.

rpcbind.socket systemd unit fails to start when IPv6 is disabled

I encountered this error when I used this command

echo “net.ipv6.conf.all.disable_ipv6 = 1” >> /etc/sysctl.d/ipv6.conf

When I rebooted the server, my NFS Services were dysfunctional. The rpcbind.socket systemd unit fails to load. I managed to find information on Red Hat Bugzilla – Bug 1402961 rpcbind.socket systemd unit fails to start when IPv6 is disabled. 

The Solution is simply remove echo “net.ipv6.conf.all.disable_ipv6 = 0” >> /etc/sysctl.d/ipv6.conf

Using multiple LDFLAGS and CPPFLAGS

In very layman terms,

LDFLAGS refers for linker flags and is often user defined libraries
CPPLAGS is used by the preprocessor and is often the include directory

For example, if I’m compiling multiple LDFLAGS and CPPFLAGS which is required by guile-2.2.4

# ./configure --prefix=/usr/local/guile-2.2.4 LDFLAGS="-L/usr/local/libtool-2.4.6/lib -L/usr/local/gmp-6.1.0/lib" CPPFLAGS="-I/usr/local/libtool-2.4.6/include -I/usr/local/gmp-6.1.0/include"

Reverting back to CMake-3.9.6

When I was compiling CMAKE-3.11.4 with GNU-5.4.0, we encountered the error

"The C++ compiler does not support C++11 (e.g. std::unique_ptr)"

This was rather complex to solve. I believe if I upgrade my GNU Compilers, it might work. Somehow at GNU 5.4, it does not recognize the C++11 support.

But when I downgraded to Cmake-3.9.6, it works immediately without issues. Do look at https://cmake.org/files/v3.9/

# cd $CMAKE_HOME
#./bootstrap
# gmake

Compiling with NWChem-6.8 with Intel MPI 2018u3

Here is a write-up of my computing platform and applications:

  1. NWChem 6.8 (14 December 2017)
  2. Intel Compilers, IMPI and MKL (2018 U3)
  3. Infiniband Interconnect (OFED-4.3-1.0.1)
  4. CentOS 7.4 (x86_64)

Step 1: First thing first, source the intel components setting from

# source /usr/local/intel/2018u3/bin/compilervars.sh intel64
# source /usr/local/intel/2018u3/impi/2018.3.222/bin64/mpivars.sh intel64
# source /usr/local/intel/2018u3/mkl/bin/mklvars.sh intel64
# source /usr/local/intel/2018u3/parallel_studio_xe_2018/bin/psxevars.sh intel64
export NWCHEM_TOP=/usr/local/software/nwchem-6.8/nwchem-6.8
export NWCHEM_MODULES=pnnl
export NWCHEM_TARGET=LINUX64
export NWCHEM_LONG_PATHS=y
export PYTHONHOME=/usr
export PYTHONVERSION=2.7
export PYTHONLIBTYPE=so
export USE_PYTHON64=y
export USE_NOFSCHECK=y
export TCGRSH=/usr/bin/ssh
export LARGE_FILES=y
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC="/usr/local/intel/2018u3/impi/2018.3.222/intel64"
export MPI_INCLUDE="/usr/local/intel/2018u3/impi/2018.3.222/intel64/include/gfortran/5.1.0 -I/usr/local/intel/2018u3/impi/2018.3.222/intel64/include"
export MPI_LIB="/usr/local/intel/2018u3/impi/2018.3.222/intel64/lib/release_mt -L/usr/local/intel/2018u3/impi/2018.3.222/intel64/lib"
export LIBMPI="-lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"
export USE_OPENMP=y
export MKLROOT=/usr/local/intel/2018u3/mkl
export MKLLIB="${MKLROOT}/lib/intel64"
export MKLINC="${MKLROOT}/include"
export HAS_BLAS=y
export BLAS_SIZE=8
export BLASOPT="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"
export LAPACK_SIZE=8
export LAPACK_LIB="$BLASOPT"
export LAPACK_LIBS="$BLASOPT"
export LAPACKOPT="$BLASOPT"
export USE_SCALAPACK=y
export SCALAPACK_SIZE=8
export SCALAPACK="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
export SCALAPACK_LIB="$SCALAPACK"
export SCALAPACK_LIBS="$SCALAPACK"
export CC=icc
export FC=ifort
export USE_64TO32=y

cd $NWCHEM_TOP/src
#make realclean
make nwchem_config
make 64_to_32
make CC=icc FC=ifort FOPTIMIZE=-O3
cd $NWCHEM_TOP/src/tools
make CC=icc FC=ifort FOPTIMIZE=-O3 version
make CC=icc FC=ifort FOPTIMIZE=-O3
cd $NWCHEM_TOP/src
make CC=icc FC=ifort FOPTIMIZE=-O3 link

General Site Installation

Determine the local storage path for the install files. (e.g., /usr/local/NWChem).
Make directories

Determine the local storage path for the install files. (e.g., /usr/local/NWChem).
Make directories

# mkdir /usr/local/nwchem-6.8
# mkdir /usr/local/nwchem-6.8/bin
# mkdir /usr/local/nwchem-6.8/data

Copy binary

# cp $NWCHEM_TOP/bin/ /usr/local/nwchem-6.8/bin
# cd /usr/local/nwchem-6.8/bin
# chmod 755 nwchem

Copy libraries

# cd $NWCHEM_TOP/src/basis
# cp -r libraries /usr/local/nwchem-6.8/data

# cd $NWCHEM_TOP/src/
# cp -r data /usr/local/nwchem-6.8

# cd $NWCHEM_TOP/src/nwpw
# cp -r libraryps /usr/local/nwchem-6.8/data

The Final Lap (From Compiling NWChem)

Each user will need a .nwchemrc file to point to these default data files. A global one could be put in /usr/local/nwchem-6.8/data and a symbolic link made in each users $HOME directory is probably the best plan for new installs. Users would have to issue the following command prior to using NWChem: ln -s /usr/local/nwchem-6.8/data/default.nwchemrc $HOME/.nwchemrc

Contents of the default.nwchemrc file based on the above information should be:

nwchem_basis_library /usr/local/nwchem-6.8/data/libraries/
nwchem_nwpw_library /usr/local/nwchem-6.8/data/libraryps/
ffield amber
amber_1 /usr/local/nwchem-6.8/data/amber_s/
amber_2 /usr/local/nwchem-6.8/data/amber_q/
amber_3 /usr/local/nwchem-6.8/data/amber_x/
amber_4 /usr/local/nwchem-6.8/data/amber_u/
spce    /usr/local/nwchem-6.8/data/solvents/spce.rst
charmm_s /usr/local/nwchem-6.8/data/charmm_s/
charmm_x /usr/local/nwchem-6.8/data/charmm_x/

References:

  1. Compiling NWCHEM

Adding New Nodes under PBS Professional

Step 1: Copy /etc/pbs.conf from any of the existing node to /etc of the new node

# scp -v /etc/pbs.conf root@remotenode:/etc

Step 2: Install Pbs execution rpm on the new node

# rpm -Uvh /usr/local/software/admin/altair/PBSPro_14.2.5/pbspro-execution-14.2.5.20180221140231-0.el7.x86_64.rpm

Step 3: Copy /var/spool/pbs/mom_priv/config from any existing node to /var/spool/pbs/mom_priv of new node

# scp -v /var/spool/pbs/mom_priv/config root@remotenode:/var/spool/pbs/mom_priv/

Step 4. “service pbs restart “ on the new node

# service pbs restart

Step 5. Create the node on PBS-Server

# qmgr -c “create node node-name"