Using firewall-cmd to configure gateways and isolated client network on CentOS-7 and Rocky Linux 8

Objectives:

Compute Nodes in an HPC environment are usually physically isolated from the public network and has to route through the gateway which are often found in Head Node or any delegated Node in small or small-medium size cluster to access the internet or to access company LAN to access LDAP, you can use the firewall-cmd to route the traffic through the interconnect facing the internet.

Scenario:

Traffic will be routed through the Head Node’s eno1 (internet facing) from the Head Node’s eno2 (private network). The interconnect eno1 is attached to a switch where the compute nodes are similarly attached. Some

  1. 192.168.1.0/24 is the private network subnet.
  2. 192.168.1.1 is the IP Address of the Head Node
  3. 155.1.1.2 is the IP Address of the external-facing ethernet ie eno1

Check the zones.

# firewall-cmd --list-all-zones

Check the Active Zones

# firewall-cmd --get-active-zones
external
  interfaces: eno2
internal
  interfaces: eno1

Enable masquerade at the Head Node’s External Zone

IP masquerading is a process where one computer acts as an IP gateway for a network. For masquerading, the gateway dynamically looks up the IP of the outgoing interface all the time and replaces the source address in the packets with this address.

You use masquerading if the IP of the outgoing interface can change. A typical use case for masquerading is if a router replaces the private IP addresses, which are not routed on the internet, with the public dynamic IP address of the outgoing interface on the router.

For more information. Do take a look at 5.10. Configuring IP Address Masquerading

# firewall-cmd --zone=external --query-masquerade 
no
# firewall-cmd --zone=external --add-masquerade --permanent
# firewall-cmd --reload

Compute Nodes at the Private Network 

(Assuming that eno1 is connected to the private switch). It is very important that you input the gateway at the compute node’s /etc/sysconfig/network-scripts/ifcfg-eno1)

.....
.....
DEVICE=enp47s0f1
ONBOOT=yes
IPADDR=192.168.1.2 #Internal IP Address of the Compute Node
NETMASK=255.255.255.0
GATEWAY=192.168.1.1 #Internal IP Address of the Head Node

Next, you have to put the Network Interface of the Client in the Internal Zone of the firewall-cmd. Assuming that eno1 is also used by the Client Network

# firewall-cmd --zone=internal --change-interface=eno1 --permanent

You may want to set the selinux to disabled

# setenforce 0

Configure the Head Node’s External Zone.

For Zoning, do take a look at 5.7.8. Using Zone Targets to Set Default Behavior for Incoming Traffic

For this setting, we have chosen target “default”

# firewall-cmd --zone=external --set-target=default

You can configure other settings. For the External Zone. For example, add SSH Service, mDNS

# firewall-cmd --permanent --zone=external --add-service=ssh
# firewall-cmd --permanent --zone=external --add-service=mdns
# firewall-cmd --runtime-to-permanent
# firewall-cmd --reload

Make sure the right Ethernet is placed in the right Zone. For External-Facing Ethernet Card, (eno2), you may want to place it

# firewall-cmd --zone=external --change-interface=eno2 --permanent

For Internal Facing Ethernet Card, (eno1), you want want to place it

# firewall-cmd --zone=internal --change-interface=eno1 --permanent

Configure the firewall-Source of Internal Network (eno1)

# firewall-cmd --zone=internal --add-source=192.168.1.0/24

Checking the Settings in the “firewall-cmd –get-active-zones”

# firewall-cmd --get-active-zones
internal (active)
  target: default
  icmp-block-inversion: no
  interfaces: eno1
  sources: 192.168.1.0/32
  services: dhcpv6-client mdns ssh
  ports:
  protocols:
  forward: no
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eno2
  sources:
  services: dhcpv6-client ssh
  ports: 
  protocols:
  forward: no
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Check the Firewall Status

systemctl status firewalld.service

Encountering shm_open permission denied issues with hpcx

If you are using Nvidia hpc-x and encountering issues like the one below during your MPI Run

shm_open(file_name=/ucx_shm_posix_77de2cf3 flags=0xc2) failed: Permission denied

The error message indicates that the shared memory has no permission to be used,  The permission of /dev/shm is found to be 755, not 777, causing the error. The issue can be resolved after the permission is changed to 777. To change and verify the changes:

% chmod 777 /dev/shm 
% ls -ld /dev/shm
drwxrwxrwx 2 root root 40 Jul  6 15:18 /dev/sh

Installing CP2K with Nvidia HPCX on Rocky Linux 8.5

What is HPCX?

NVIDIA® HPC-X® is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. For more information, do take a look at https://developer.nvidia.com/networking/hpc-x

What is CP2K?

CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO, …), and classical force fields (AMBER, CHARMM, …). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimisation, and transition state optimization using NEB or dimer method. (Detailed overview of features.). For more information, do take a look at https://www.cp2k.org/

Getting the CP2K

git clone --recursive https://github.com/cp2k/cp2k.git cp2k

Unpack hpcx and Optimised OpenMPI Libraries. For more information on installation, do take a look at Installing and Loading HPC-X

Extract hpcx.tbz into your current working directory.

% tar -xvf hpcx.tbz
% cd hpcx
% export HPCX_HOME=$PWD
% module use $HPCX_HOME/modulefiles
% module load hpcx

Use the CP2K Toolchain to Compile for the easiest

% cd cp2k
% cd /usr/local/software/cp2k/tools/toolchain
% ./install_cp2k_toolchain.sh --no-check-certificate --with-openmpi --with-sirius=no

Compiling the CP2K

.....
.....
==================== generating arch files ====================
arch files can be found in the /usr/local/software/cp2k/tools/toolchain/install/arch subdirectory
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.ssmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_static.ssmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.sdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_coverage.sdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local.pdbg
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_static.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_warn.psmp
Wrote /usr/local/software/cp2k/tools/toolchain/install/arch/local_coverage.pdbg
========================== usage =========================
Done!
Now copy:
  cp /usr/local/software/cp2k/tools/toolchain/install/arch/* to the cp2k/arch/ directory
To use the installed tools and libraries and cp2k version
compiled with it you will first need to execute at the prompt:
  source /usr/local/software/cp2k/tools/toolchain/install/setup
To build CP2K you should change directory:
  cd cp2k/
  make -j 80 ARCH=local VERSION="ssmp sdbg psmp pdbg"

Do exactly on the ending instruction

% cp /usr/local/software/cp2k/tools/toolchain/install/arch/* /usr/local/software/cp2k/arch
% source /usr/local/software/cp2k/tools/toolchain/install/setup
% cd /usr/local/software/cp2k
% make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg"

If you encounter an error during making like the one below, just do an install for liblsan

% /usr/bin/ld: cannot find /usr/lib64/liblsan.so.0.0.0
% dnf install liblsan -y

If you encounter error like the ones below for fftw libraries,

/usr/bin/ld: cannot find -lfftw3_mpi
collect2: error: ld returned 1 exit status

You have to go to the supporting package libraries and do some editing.

% cd /usr/local/software/cp2k/tools/toolchain/install/fftw-3.3.10/lib
% ln -s libfftw3.a libfftw3_mpi.a
% ln -s libfftw3.la libfftw3_mpi.la

Try again

% cd /usr/local/software/cp2k
% make -j 32 ARCH=local VERSION="ssmp sdbg psmp pdbg"

If successful, you should see binaries at /usr/local/software/cp2k/exe/local

GCCGO Error During GCC-10.4.0 Compilation on CentOS 7

If you encounter “gccgo: error: ../x86_64-pc-linux-gnu/libgo/libgotool.a: No such file or directory”

.....
.....
/home/user1/gcc-10.4.0/host-x86_64-pc-linux-gnu/gcc/gccgo -B/home/user1/gcc-10.4.0/host-x86_64-pc-linux-gnu/gcc/ -B/usr/x86_64-pc-linux-gnu/bin/ -B/usr/x86_64-pc-linux-gnu/lib/ -isystem /usr/x86_64-pc-linux-gnu/include -isystem /usr/x86_64-pc-linux-gnu/sys-include   -g -O2 -I ../x86_64-pc-linux-gnu/libgo -static-libstdc++ -static-libgcc  -L ../x86_64-pc-linux-gnu/libgo -L ../x86_64-pc-linux-gnu/libgo/.libs -o go ../.././gotools/../libgo/go/cmd/go/alldocs.go ../.././gotools/../libgo/go/cmd/go/go11.go ../.././gotools/../libgo/go/cmd/go/main.go ../x86_64-pc-linux-gnu/libgo/libgotool.a  
gccgo: error: ../x86_64-pc-linux-gnu/libgo/libgotool.a: No such file or directory
make[2]: *** [Makefile:821: go] Error 1
make[2]: Leaving directory '/home/user1/gcc-10.4.0/host-x86_64-pc-linux-gnu/gotools'
make[1]: *** [Makefile:14649: all-gotools] Error 2
make[1]: Leaving directory '/home/user1/gcc-10.4.0'
make: *** [Makefile:997: all] Error 2

The issue can be easily resolved by not building gcc in the same directory as the source code. At GCC Home

% ./contrib/download_prerequisites
% mkdir build
% ../configure --prefix=/usr/local/gcc-10.4.0 --disable-multilib --enable-languages=all
% make -j 8
% make install

Compiling GCC-10.4.0 on CentOS-7

Step 1: Download the TarBall version of GCC version. If you want to take look at all the available versions, you can take a look at http://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/

For this blog entry, we will install GCC-10.4.0. First thing first, let’s get the Tarball

% wget http://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-10.4.0/gcc-10.4.0.tar.gz

Step 2: Make sure the bzip2 is available in the System

% yum install bzip2 bzip2-devel

Step 3: Untar the TarBall

% tar -zxvf gcc-10.4.0.tar.gz
% cd gcc-10.4.0 

Step 4: Download the prerequisites and start configuring the GCC

% ./contrib/download_prerequisites
% ./configure --prefix=/usr/local/gcc-10.4.0 --disable-multilib --enable-languages=all
% make -j 8
% make install

Step 5: Verify the Installation

% gcc --version

Compiling VASP.6.3.0 with GPGPU Capability using Nvidia HPC-SDK on Rocky Linux 8.5

To Compile VASP with GPGPU Capability using Nvidia HPC-SDK. For more information, do take a look at VASP – Install VASP.6.X.X

VASP support several compilers. But we will be focusing on Nvidia HPC-SDK only for this blog. To download the NVIDIA HPC-SDK

To compile Nvidia HPC SDK, do take a look at HPC SDK Documentation

% tar -xpfz <tarfile>.tar.gz

You may want to use modulefiles provided at hpc-sdk if you are using Module Environment

% module use /usr/local/nvidia/hpc_sdk/modulefiles

You should be able to see something like

------------------- /usr/local/nvidia/hpc_sdk/modulefiles ---------------
nvhpc-byo-compiler/22.5  nvhpc-nompi/22.5  nvhpc/22.5

You can untar the VASP.6.3.3. and potpaw_PBE.54

% tar -xvf vasp.6.3.0.tar
% tar -xvf potpaw_PBE.54.tar 

At the installation base of vasp.6.3.0 base

% cp arch/makefile.include.nvhpc_ompi_mkl_omp_acc ./makefile.include

Load the Nvidia GPGPU SDK and compile. If you are using OneAPI Intel Compilers, you can use module use after compilation. It will not be covered in this write-up.

% module use /usr/local/intel/oneapi-2022/modulefiles
% module load nvhpc/22.5
% module load mkl/latest
% make veryclean
% make DEPS=1 -j

If during the make, you encounter the error

/usr/local/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/openmpi/openmpi-3.1.5/bin/.bin/mpif90: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory

You can dnf install libatomic

% dnf install libatomic -y

Try Compiling again

References:

  1. Installing VASP.6.X.X

Changes to SSH Server on DevCloud

When connecting to the DevCloud for oneAPI

$ ssh devcloud @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ED25519 key sent by the remote host is SHA256:/Dlip01tdMyRmhMDc870Z4Uk7AancwwoTnbb0EZajK0. Please contact your system administrator. Add correct host key in /home/<user_name>/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /home/<user_name>/.ssh/known_hosts:# Host key for ssh.devcloud.intel.com has changed and you have requested strict checking. Host key verification failed. kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535

Cause:

DevCloud have just migrated oneAPI DevCloud to a new SSH tunnel server and upgraded the SSH server version to OpenSSH _8.2p1. For this reason the DevCloud are unable to reuse the old SSH fingerprint for the new server.

Remediation:

Step 1: Remove the Offending FingerPrint(s)

Method 1: rename your existing ~/.ssh/known_hosts file to something else, such as ~/.ssh/known_hosts.yymmdd

$ mv ~/.ssh/known_hosts ~/.ssh/known_hosts.220623 

Method 2: remove the offending host SSH fingerprint only:

$ ssh-keygen -R ssh.devcloud.intel.com # Host ssh.devcloud.intel.com found: line 1 # Host ssh.devcloud.intel.com found: line 2 # Host ssh.devcloud.intel.com found: line 3 /home/<user_name>/.ssh/known_hosts updated. Original contents retained as /home/<user_name>/.ssh/known_hosts.old 

Step 2: reconnect to the DevCloud and accept the new key.

$ ssh devcloud The authenticity of host 'ssh.devcloud.intel.com (12.229.61.118)' can't be established. ED25519 key fingerprint is SHA256:/Dlip01tdMyRmhMDc870Z4Uk7AancwwoTnbb0EZajK0. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'ssh.devcloud.intel.com' (ED25519) to the list of known hosts. 

Intel Distribution OpenVINO Toolkit 2022.1 is available!

For more information, do take a look at Intel® Distribution of OpenVINO™ Toolkit

Updated, Cleaner API

  • The new OpenVINO API 2.0 was introduced, which aligns OpenVINO inputs and outputs with frameworks. Input and output tensors use native framework layouts and element types. 
  • The API parameters in Model Optimizer have been reduced to minimize complexity. Performance has been significantly improved for model conversion on Open Neural Network Exchange (ONNX*) models.

Broader Model Support

  • With Dynamic Input Shapes capabilities on CPU, OpenVINO is able to adapt to multiple input dimensions in a single model providing more complete NLP support. Support for Dynamic Shapes on additional XPUs is expected in a future dot release.
  • New models with a focus on NLP and a new category, Anomaly Detection, and support for conversion and inference of select PaddlePaddle* models:
    • Pretrained models for anomaly segmentation focus on industrial inspection making speech denoising trainable, plus updates on speech recognition and speech synthesis
    • Combined demonstration that includes noise reduction, speech recognition, question answering, translation, and text to speech
    • Public models with a focus on NLP ContextNet, Speech-Transformer, HiFi-GAN, Glow-TTS, FastSpeech2, and Wav2Vec

Portability and Performance

  • New AUTO plug-in self-discovers available system inferencing capacity based on model requirements so applications no longer need to know their compute environment in advance.
  • Automatic batching functionality via code hints automatically scale batch size based on XPU and available memory.
  • Built with 12th generation Intel® Core™ processors (formerly code named Alder Lake) in mind. Supports the hybrid architecture necessary to deliver enhancements for high performance inferencing on CPUs and integrated GPUs.

No matching repo to modify: PowerTools when using dnf install on Rocky Linux 8.5

I was trying to install hdf5 after enabling EPEL. Installing EPEL

% dnf install -y epel-release
% dnf config-manager --set-enabled PowerTools
Error: No matching repo to modify: PowerTools.

I’ve noticed this documentation from CentOS-8 Repoid, there are name changes from Yum_repo_file_and_repoid_changes from 8.3 onwards. The documents can be found at https://wiki.centos.org/Manuals/ReleaseNotes/CentOS8.2011#Yum_repo_file_and_repoid_changes

Repoid (8.2.2004 and before)Repoid (8.3.2011 and later)
BaseOSbaseos
AppStreamappstream
PowerToolspowertools
centosplusplus
HighAvailabilityha
base-debuginfodebuginfo
Develdevel
BaseOS-sourcebaseos-source
AppStream-sourceappstream-source
centosplus-sourceplus-source
base-debuginfodebuginfo
% dnf config-manager --set-enabled powertools