February 15, 2024 by kittycool only

Building LAMMPS using CMAKE with OpenMPI on Rocky Linux 8

What is LAMMPS (briefly)?

LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It’s an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. More Information on the software, do take a look at https://www.lammps.org/

Where to Download?

You can download the latest stable from Download LAMMPS

Step 1: Ensure Prerequisites are present

You may want to use HPCX which has optimised OpenMPI. Do take a look at Installing and using Mellanox HPC-X Software Toolkit
You will need FFTW. Compiling FFTW-3.3.10 with OpenMPI on Rocky Linux 8

Step 2: Download and build LAMMPS

For more information, do take a look at https://docs.lammps.org/Install_tarball.html

$ tar -zxvf lammps-stable.tar.gz

$ cd lammps-2Aug203
$ mkdir build
$ touch setup.sh
$ vim setup.sh

Inside the setup.sh

cmake   -C ../cmake/presets/most.cmake ../cmake             \
        -D CMAKE_INSTALL_PREFIX=/usr/local/lammps-2Aug2023  \
        -D BUILD_MPI=on                                     \
        -D BUILD_SHARED_LIBS=yes                            \
        -D FFT=FFTW3                                        \
        -D FFTW3_INCLUDE_DIRS=${FFTW_INC}                   \
        -D FFTW3_LIBRARIES=${FFTW_LIB}/libfftw3_mpi.a

Notes:
The -C ../cmake/presets/most.cmake command adds the packages that don’t need extra libraries.

Make and Compile……

$ make -j 16 
$ make install

References:

February 13, 2024 by kittycool only

Compiling FFTW-3.3.10 with OpenMPI on Rocky Linux 8

For detailed explanation and information, do take a look at FFTW Installation on UNIX. For my installation.

We will be focusing on using Nvidia hpcx only for this blog. To compile Nvidia hpcx, do take a look at Installing and using Mellanox HPC-X Software Toolkit

You may want to module use which come in the hpcx installation

export HPCX_HOME=/usr/local/hpcx-v2.15-gcc-MLNX_OFED_LINUX-5-redhat8-cuda12-gdrcopy2-nccl2.17-x86_64
module use $HPCX_HOME/modulefiles

Next, I used the following parameters that suit my HPC Environment. The default installation is already double-precision. I needed MPI, OPenMPI and needs AVX512…..

# ./configure --prefix=/usr/local/fftw-3.3.10 --enable-threads --enable-openmp --enable-mpi --enable-avx512
# make && make install

References:

FFTW Installation on UNIX

February 12, 2024 by kittycool only

Installing ORCA-5.0.4 on Rocky Linux 8 with OpenMPI

ORCA is a general-purpose quantum chemistry package that is free of charge for academic users. The Project and Download Website can be found at ORCA Forum. The current version is 5.0.4.

The current prerequisites that I have used were OpenMPI-4.1.1 and System GNU which is 8.5.

Unless I have missed something, the packages of ORCA-5.0.4 has been split into 3 different packages which you have to untar and combine together

orca_5_0_4_linux_x86-64_openmpi411_part1
orca_5_0_4_linux_x86-64_openmpi411_part2
orca_5_0_4_linux_x86-64_openmpi411_part3

How do I untar the packages?

The first thing is to untar all the packages separately first. Assuming you are untarring at the /usr/local/

$ tar -xf orca_5_0_4_linux_x86-64_openmpi411_part1.tar.xz
$ tar -xf orca_5_0_4_linux_x86-64_openmpi411_part2.tar.xz
$ tar -xf orca_5_0_4_linux_x86-64_openmpi411_part3.tar.xz

How do I do with all the untarred packages?

Copy all the untar files into /usr/local/orca-5.0.4.

cp -rv ../orca_5_0_4_linux_x86-64_openmpi411_part1/* .
cp -rv ../orca_5_0_4_linux_x86-64_openmpi411_part2/* .
cp -rv ../orca_5_0_4_linux_x86-64_openmpi411_part3/* .

How to Compile OpenMPI-4.1.1?

Although the Compiling OpenMPI-4.1.5 for ROCEv2 with GNU-8.5 is of a higher version of OpenMPI, the principle and parameters can still be used.

How do I Put them Together?

If you are not using the Module Environment, you can consider installing. For more information do take a look at Installing Environment Modules on Rocky Linux 8.5. All you need to do is then is to load the additional module such as OpenMPI as a prerequisites. Alternatively, you can set the PATH, LD_LIBRARY_PATH of OpenMPI something like this.

export OPENMPI_HOME=/usr/local/openmpi-4.1.1
export PATH=$PATH:$OPENMPI_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENMPI_HOME/lib:$OPENMPI_HOME/lib64
export MANPATH=$MANPATH:$OPENMPI_HOME/share
export PATH=$PATH:/usr/local/orca-5.0.4

If you are using without Module Environment, you may want to

orca $INPUT > $OUTPUT

References:

Installing ORCA

October 30, 2023 by kittycool only

Enabling Nvidia Tesla 4 x A100 with NVLink for MPI

I was having issues with the Applications like NetKET to detect and enable MPI.

Diagnosis

I have installed OpenMPI and enabled CUDA during the configuration.
CUDA Libraries including nvidia-smi has been installed without issue. But running, nvidia-smi topo –matrix, I am not able to see NVLink similar to

In fact, when I run NetKet on CUDA with MPI, the error that was generated was

mpirun noticed that process rank 0 with PID 0 on node gpu1 exited on signal 11 (Segmentation fault)."

Solution

This forum entry provided some enlightenment. https://forums.developer.nvidia.com/t/cuda-initialization-error-on-8x-a100-gpu-hgx-server/250936

The solution was to disable the Multi-instance GPU Mode which is enabled by default. Reboot the Server and it should see

nvidia-smi -mig 0

Enabling Persistence Mode

Make sure the configuration stays after a reboot.

# systemctl enable nvidia-persistenced.service
# systemctl start nvidia-persistenced.service

September 27, 2023 by kittycool only

Retrieving OpenMPI Configuration

If you need to find out the information on your configuration setting, you may want to use the below commands

$ ./ompi_info -all|grep 'command line'
 Configure command line: '--prefix=/build-result/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi' '--with-libevent=internal' '--enable-mpi1-compatibility' '--without-xpmem' '--with-cuda=/hpc/local/oss/cuda12.1.1' '--with-slurm' '--with-platform=contrib/platform/mellanox/optimized' '--with-hcoll=/build-result/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/hcoll' '--with-ucx=/build-result/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ucx' '--with-ucc=/build-result/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ucc'

If you wish to look at the full configuration

$ ./ompi_info
Package: Open MPI root@hpc-kernel-03 Distribution
Open MPI: 4.1.5rc2
Open MPI repo revision: v4.1.5rc1-17-gdb10576f40
Open MPI release date: Unreleased developer copy
Open RTE: 4.1.5rc2
Open RTE repo revision: v4.1.5rc1-17-gdb10576f40
Open RTE release date: Unreleased developer copy
OPAL: 4.1.5rc2
OPAL repo revision: v4.1.5rc1-17-gdb10576f40
OPAL release date: Unreleased developer copy
MPI API: 3.1.0
Ident string: 4.1.5rc2
Prefix: /usr/local/hpcx-v2.16-gcc-mlnx_ofed-redhat8-cuda12-gdrcopy2-nccl2.18-x86_64/ompi
.....
.....
.....

July 9, 2023 by kittycool only

Compiling OpenMPI-4.1.5 for ROCEv2 with GNU-8.5

https://docs.open-mpi.org/en/v5.0.x/release-notes/networks.html

Prerequisites 1

First thing first, You may want to check whether you are using RoCE. Do take a look at Installing RoCE using Mellanox (Nvidia) OFED package

Prerequisites 2

Do check whether you have ucx. You can do a dnf install

# dnf install ucx ucx-devel

Alternatively, you can do a manual install. For information on how to install, do take a look at http://openucx.org/wp-content/uploads/UCX_install_guide.pdf

# wget https://github.com/openucx/ucx/releases/download/v1.4.0/ucx-1.4.0.tar.gz
$ tar xzf ucx-1.4.0.tar.gz
$ cd ucx-1.4.0
$ ./contrib/configure-release --prefix=/usr/local/ucx-1.4.0
$ make -j8 
$ make install

Prerequisites 3

Make sure you have install GNU and GNU-C++. This can be done easily using the

# dnf install gcc-c++ gcc

Step 1: Download the OpenMPI package

You can go to OpenMPI to download the latest package at (https://www.open-mpi.org/software/ompi/v4.1/). The latest one at the point of writing is OpenMPI-4.1.

Step 2: Compile the Package

$ ./configure --prefix=/usr/local/openmpi-4.1.5 --enable-mpi-cxx --with-devel-headers --with-ucx --with-verbs --with-slurm=no
$ make && make install

Step 3: To run the MPIRUN using ROCE, do the following.

You may want to see Network Support Information on OpenMPI

$ mpirun --np 12 --hostfile path/to/hostfile --mca pml ucx -x -x UCX_NET_DEVICES=mlx5_0:1 ........

References:

July 6, 2022 by kittycool only

Encountering shm_open permission denied issues with hpcx

If you are using Nvidia hpc-x and encountering issues like the one below during your MPI Run

shm_open(file_name=/ucx_shm_posix_77de2cf3 flags=0xc2) failed: Permission denied

The error message indicates that the shared memory has no permission to be used, The permission of /dev/shm is found to be 755, not 777, causing the error. The issue can be resolved after the permission is changed to 777. To change and verify the changes:

% chmod 777 /dev/shm 
% ls -ld /dev/shm
drwxrwxrwx 2 root root 40 Jul  6 15:18 /dev/sh

July 2, 2022 by kittycool only

Installing CP2K with Nvidia HPCX on Rocky Linux 8.5

What is HPCX?

NVIDIA® HPC-X® is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. For more information, do take a look at https://developer.nvidia.com/networking/hpc-x

What is CP2K?

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimization using NEB or dimer method. (Detailed overview of features.). For more information, do take a look at https://www.cp2k.org/

Getting the CP2K

git clone --recursive https://github.com/cp2k/cp2k.git cp2k

Unpack hpcx and Optimised OpenMPI Libraries. For more information on installation, do take a look at Installing and Loading HPC-X

Extract hpcx.tbz into your current working directory.

% tar -xvf hpcx.tbz
% cd hpcx
% export HPCX_HOME=$PWD
% module use $HPCX_HOME/modulefiles
% module load hpcx

Use the CP2K Toolchain to Compile for the easiest

% cd cp2k
% cd /usr/local/software/cp2k/tools/toolchain
% ./install_cp2k_toolchain.sh --no-check-certificate --with-openmpi --with-sirius=no

Compiling the CP2K

.....
.....
==================== generating arch files ====================
arch files can be found in the /usr/local/software/cp2k/tools/toolchain/install/arch subdirectory
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.ssmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_static.ssmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.sdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_coverage.sdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.pdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_static.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_warn.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_coverage.pdbg
========================== usage =========================
Done!
Now copy:
  cp /usr/local/software/cp2k/tools/toolchain/install/arch/* to the cp2k/arch/ directory
To use the installed tools and libraries and cp2k version
compiled with it you will first need to execute at the prompt:
  source /usr/local/software/cp2k/tools/toolchain/install/setup
To build CP2K you should change directory:
  cd cp2k/
  make -j 80 ARCH=local VERSION="ssmp sdbg psmp pdbg"

Do exactly on the ending instruction

% cp /usr/local/software/cp2k/tools/toolchain/install/arch/* /usr/local/software/cp2k/arch
% source /usr/local/software/cp2k/tools/toolchain/install/setup
% cd /usr/local/software/cp2k
% make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg"

If you encounter an error during making like the one below, just do an install for liblsan

% /usr/bin/ld: cannot find /usr/lib64/liblsan.so.0.0.0
% dnf install liblsan -y

If you encounter error like the ones below for fftw libraries,

/usr/bin/ld: cannot find -lfftw3_mpi
collect2: error: ld returned 1 exit status

You have to go to the supporting package libraries and do some editing.

% cd /usr/local/software/cp2k/tools/toolchain/install/fftw-3.3.10/lib
% ln -s libfftw3.a libfftw3_mpi.a
% ln -s libfftw3.la libfftw3_mpi.la

Try again

% cd /usr/local/software/cp2k
% make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg"

If successful, you should see binaries at /usr/local/software/cp2k/exe/local

March 31, 2022 by kittycool only

Efficient Heterogeneous Parallel Programming Using OpenMP

This article is taken from Intel “Efficient Heterogeneous Parallel Programming Using OpenMP”. In this article, we will show you how to do CPU+GPU asynchronous calculations using OpenMP.

In some cases, offloading computations to an accelerator like a GPU means that the host CPU sits idle until the offloaded computations are finished. However, using the CPU and GPU resources simultaneously can improve the performance of an application. In OpenMP® programs that take advantage of heterogenous parallelism, the master clause can be used to exploit simultaneous CPU and GPU execution. In this article, we will show you how to do CPU+GPU asynchronous calculation using OpenMP.
…..
…..
…..
The Intel® oneAPI DPC++/C++ Compiler was used with following command-line options:
‑O3 ‑Ofast ‑xCORE‑AVX512 ‑mprefer‑vector‑width=512 ‑ffast‑math ‑qopt‑multiple‑gather‑scatter‑by‑shuffles ‑fimf‑precision=low
‑fiopenmp ‑fopenmp‑targets=spir64=”‑fp‑model=precise”
…..
…..
…..
OpenMP provides true asynchronous, heterogeneous execution on CPU+GPU systems. It’s clear from our timing results and VTune profiles that keeping the CPU and GPU busy in the OpenMP parallel region gives the best performance. We encourage you to try this approach.
Intel: Efficient Heterogeneous Parallel Programming Using OpenMP (Best Practices to Keep the CPU and GPU Working at the Same Time)

March 30, 2022 by kittycool only

Compiling ORCA-4.2.1 with OpenMPI-3.1.4

ORCA is a general-purpose quantum chemistry package that is free of charge for academic users. The Project and Download Website can be found at ORCA Forum

You have to register yourself before you can participate in the forum or download ORCA-4.2.1. The current latest version for ORCA is 5.0.3. The package you might want to consider is ORCA 4.2.1, Linux, x86-64, .tar.xz Archive

Prerequisites that I use.

OpenMPI-3.1.4 (For some idea on how to compile, you may want to take a look at
(Compiling OpenMPI-3.1.6 with GCC-6.5)
GNU-6.5.0

Unpacking ORCA-4.2.1

% tar -xvf orca_4_2_1_linux_x86-64_openmpi314.tar.xz
.....
.....
orca_4_2_1_linux_x86-64_openmpi314/autoci_rhf_poly1_sigma
orca_4_2_1_linux_x86-64_openmpi314/orca_eprnmr_mpi
orca_4_2_1_linux_x86-64_openmpi314/autoci_uhf_poly1_sigma
orca_4_2_1_linux_x86-64_openmpi314/orca_casscf
orca_4_2_1_linux_x86-64_openmpi314/autoci_iprocisd_sigma_alpha_doublet_mpi
orca_4_2_1_linux_x86-64_openmpi314/autoci_rohf_cisd_product
orca_4_2_1_linux_x86-64_openmpi314/orca_gstep
orca_4_2_1_linux_x86-64_openmpi314/contrib/
orca_4_2_1_linux_x86-64_openmpi314/contrib/G2_MP2.cmp
orca_4_2_1_linux_x86-64_openmpi314/contrib/W2_2.cmp
orca_4_2_1_linux_x86-64_openmpi314/contrib/G2_MP2_SV.cmp
orca_4_2_1_linux_x86-64_openmpi314/contrib/G2_MP2_SVP.cmp
orca_4_2_1_linux_x86-64_openmpi314/orca4.2-eula.pdf
orca_4_2_1_linux_x86-64_openmpi314/Third_Party_Licenses_ORCA_4.2.pdf

Running ORCA. If your environment has Module Environment

% module load openmpi/3.1.4/gcc-6.5.0

If not, you have to pacify PATH and LD_LIBRARY_PATH, MANPATH

export PATH=$PATH:$OPENMPI_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENMPI_HOME/lib:$OPENMPI_HOME/lib64
export MANPATH=$MANPATH:$OPENMPI_HOME/share

Typical Input file

Calling ORCA requires full pathing

/usr/local/orca_4_2_1_linux_x86-64_openmpi314/orca $INPUT > $OUTPUT "--bind-to core --verbose"

For Input File usage, you may want to take a look at the ORCA 4.2.1 Manual found when you unpack or you can look at it online at orca_manual_4_2_1.pdf (enea.it) .

For example…….

! B3LYP def2-SVP SP
%tddft
tda false
nroots 50
triplets true
end
%pal
nprocs 32
end

* xyz 0 1 fac_irppy3.xyz
  Ir        0.00000        0.00000        0.03016
   N       -1.05797        1.55546       -1.09121
   N        1.87606        0.13850       -1.09121
.....
.....

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

OpenMPI

Building LAMMPS using CMAKE with OpenMPI on Rocky Linux 8

Compiling FFTW-3.3.10 with OpenMPI on Rocky Linux 8

Installing ORCA-5.0.4 on Rocky Linux 8 with OpenMPI

Enabling Nvidia Tesla 4 x A100 with NVLink for MPI

Retrieving OpenMPI Configuration

Compiling OpenMPI-4.1.5 for ROCEv2 with GNU-8.5

Encountering shm_open permission denied issues with hpcx

Installing CP2K with Nvidia HPCX on Rocky Linux 8.5

Efficient Heterogeneous Parallel Programming Using OpenMP

Compiling ORCA-4.2.1 with OpenMPI-3.1.4