Automating the Linux Client Server for Centrify and 2FA on Rocky Linux 8

The whole manual setup including those on the Active Directory can be found at Preparing a Linux Client Server for Centrify and 2FA for CentOS-7

If you just want to automate the Linux portion, here is something you may wish to consider.

Update the sshd_config Templates (The most important portion is that the “PasswordAuthentication no” and “ChallengeResponseAuthentication yes” is present. The whole sshd_config template is too large for me to put into the blog.

.....
.....
# To disable tunneled clear text passwords, change to no here!
#PasswordAuthentication yes
#PermitEmptyPasswords no
PasswordAuthentication no

# Change to no to disable s/key passwords
#ChallengeResponseAuthentication yes
ChallengeResponseAuthentication yes
.....
.....
- name: Generate /etc/ssh/sshd_config from /etc/ssh/sshd_config.j2 template
  template:
      src: ../templates/sshd_config.j2
      dest: /etc/ssh/sshd_config
      owner: root
      group: root
      mode: 0600
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Restart SSH Service
  systemd:
    name: sshd
    state: restarted
    enabled: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  changed_when: false

Here is Centrify_2FA.yml to insert the IWaTrustRoot.pem certificate

- name: Copy IwaTrustRoot.pem to /etc/pki/ca-trust/source/anchors/
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /etc/pki/ca-trust/source/anchors/
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

- name: Copy IwaTrustRoot.pem to /var/centrify/net/certs
  template:
      src: /usr/local/software/certificate/IwaTrustRoot.pem
      dest: /var/centrify/net/certs
      owner: root
      group: root
      mode: 0600
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Restart the CentrifyDC and do a Flush so that the AD is updated.

- name: CentrifyDC Restart
  ansible.builtin.shell: "/usr/share/centrifydc/bin/centrifydc restart"
  register: centrifydc_status
  changed_when: false

- name: Active Directory Flush
  ansible.builtin.shell: "adflush -f"
  register: flush_status
  changed_when: false

- name: Centrify Service Restarted
  debug:
    msg: "Load Average: {{ centrifydc_status.stdout }}"

Azure looks like a house of cards collapsing under the weight of exploits and vulnerabilities

Taken from Microsoft comes under blistering criticism for “grossly irresponsible” security – ars Technica

Microsoft has once again come under blistering criticism for the security practices of Azure and its other cloud offerings, with the CEO of security firm Tenable saying Microsoft is “grossly irresponsible” and mired in a “culture of toxic obfuscation.”

The comments from Amit Yoran, chairman and CEO of Tenable, come six days after Sen. Ron Wyden (D-Ore.) blasted Microsoft for what he said were “negligent cybersecurity practices” that enabled hackers backed by the Chinese government to steal hundreds of thousands of emails from cloud customers………. 

Microsoft comes under blistering criticism for “grossly irresponsible” security

Installing CUDA with Ansible for Rocky Linux 8

Installation Guide

You can take a look at Nvidia CUDA Installation Guide for more information

Step 1: Get the Nvidia CUDA Repo

You can find the Repo from the Nvidia Download Sites. It should be named cuda_rhel8.repo. Copy it and use it as a template with a j2 extension.

[cuda-rhel8-x86_64]
name=cuda-rhel8-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/D42D0685.pub

Step 2: Use Ansible to Generate the repo from Templates.

The Ansible Script should look like this.

 - name: Generate /etc/yum.repos.d/cuda_rhel8.repo
   template:
    src: ../templates/cuda-rhel8-repo.j2
    dest: /etc/yum.repos.d/cuda_rhel8.repo
    owner: root
    group: root
    mode: 0644
   become: true
   when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 3: Install the Kernel-Headers and Kernel-Devel

The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well as whenever the driver is rebuilt.

- name: Install Kernel-Headers and  Kernel-Devel
  dnf:
    name:
        - kernel-devel
        - kernel-headers
    state: present
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 4: Disabling Nouveau

To install the Display Driver, the Nouveau drivers must first be disabled. I use a template to disable it. I created a template called blacklist-nouveau-conf.j2. Here is the content

blacklist nouveau
options nouveau modeset=0

The Ansible script for disabling Noveau using a template

- name: Generate blacklist nouveau
  template:
    src: ../templates/blacklist-nouveau-conf.j2
    dest: /etc/modprobe.d/blacklist-nouveau.conf
    owner: root
    group: root
    mode: 0644
  become: true
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Step 5: Install the Drivers and CUDA

- name: Install driver packages RHEL 8 and newer
  dnf:
    name: '@nvidia-driver:latest-dkms'
    state: present
    update_cache: yes
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  register: install_driver

- name: Install CUDA
  dnf:
    name: cuda
    state: present
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
  register: install_cuda

Step 6: Reboot if there are changes to Drivers and CUDA

- name: Reboot if there are changes to Drivers or CUDA
  ansible.builtin.reboot:
  when:
    - install_driver.changed or install_cuda.changed
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"

Aftermath

After reboot, you should try to do “nvidia-smi” commands, hopefully, you should see

If you have an error “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver for RHEL 8“, do follow the steps in NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver for RHEL 8 and run the ansible script in the blog.

You may also combine all these yml into one large yml file

Other better? Ansible Scripts

You may want to consider other better? options for https://github.com/NVIDIA/ansible-role-nvidia-docker

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver for RHEL 8

If you have installed the CUDA Drivers and CUDA SDK using the NVIDIA CUDA Installation Guide for Linux. Look for Section 3.3.3 for RHEL 8 / Rocky 9

If after following instruction, you are still facing issues, you may want to consider the following

Step 1 – Blacklist nouveau.conf

$ vim /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

Step 2 – Remove Nvidia driver installation

# dnf module remove --all nvidia-driver

Step 3 – Remove CUDA-Related Installation

sudo dnf remove "cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
 "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*"

Step 4 – Reboot

# shutdown -r now

Step 5 – Make sure your BIOS SecureBoot is off. You can check

mokutil --sb-state
SecureBoot disabled

References:

  1. Forum – CentOS Stream 8: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver

Guide to Creating Symbolic Links with Ansible

You can use the ansible.builtin.file module. In my example below, I wanted to link the Module Environment profile.csh and profile.sh to be placed on the /etc/profile.d so that it will load on startup. Do take a look at the Ansible Document ansible.builtin.file module – Manage files and file properties

- name: Check for CUDA Link
  stat: path=/usr/local/cuda
  register: link_available

- name: Create a symbolic link for CUDA
  ansible.builtin.file:
    src: /usr/local/cuda-12.2
    dest: /usr/local/cuda
    owner: root
    group: root
    state: link
  when:
    - ansible_os_family == "RedHat"
    - ansible_distribution_major_version == "8"
    - link_available.stat.isdir is not defined and link_available.stat.isdir == False

Compiling VASP.6.3.0 using Nvidia hpcx, gcc-12.3 and MKL on Rocky Linux 8.7

For information, do take a look at VASP – Install VASP.6.X.X

VASP support several compilers. But we will be focusing on Nvidia hpcx only for this blog. To compile Nvidia hpcx, do take a look at Installing and using Mellanox HPC-X Software Toolkit

You may want to use Nvida hpcx. You may want to module use

export HPCX_HOME=/usr/local/hpcx-v2.15-gcc-MLNX_OFED_LINUX-5-redhat8-cuda12-gdrcopy2-nccl2.17-x86_64
module use $HPCX_HOME/modulefiles
----------------- /usr/local/hpcx-v2.15 --------------------------------------------
hpcx  hpcx-debug  hpcx-debug-ompi  hpcx-mt  hpcx-mt-ompi  hpcx-ompi  hpcx-prof  hpcx-prof-ompi  hpcx-stack

You can untar the VASP.6.3.3. and potpaw_PBE.54

% tar -xvf vasp.6.3.0.tar
% tar -xvf potpaw_PBE.54.tar 

At the installation base of vasp.6.3.0 base

% cp arch/makefile.include.gnu_ompi_mkl_omp ./makefile.include

You will need the latest GNU GCC-10 or the latest for a successful compile. I compiled with GCC-12.3. You may want to take a look at Compiling GCC 12.1.0 on Rocky Linux 8.5

If you are using OneAPI Intel MKL, you can use module use after compilation. It will not be covered in this write-up. But you can:

% module use /usr/local/intel/2023.1/modulefiles

Finally,

% module load mkl/latest
% module load gnu/gcc-12.3
% module load hpcx
% make veryclean
% make DEPS=1 -j
.....
.....
make[2]: Leaving directory '/usr/local/vasp/vasp.6.3.0/build/std'
make[1]: Leaving directory '/usr/local/vasp/vasp.6.3.0/build/std'

In the bin directory, you should see vasp_gam vasp_ncl vasp_std

Using Ansible Expect Module to executes a command and responds to prompts

Ansible Documentation:

Ansible Expect Module is very useful to listen for certain strings in stdout and react accordingly. This is particularly useful if you have to respond to accept a license agreement or enter some important information. Here is my sample

- name: Install RPM package from local system
  yum:
    name: /tmp/my-software.rpm
    state: present
    disable_gpg_check: true
  when: ansible_os_family == "RedHat"

- name:
  ansible.builtin.stat:
    path: /usr/local/mysoftware
  register: directory_check

- name: Setup Licensing Server's Connection if directory does not exist
  ansible.builtin.expect:
    command: /usr/local/mysoftware/install.sh
    responses:
      (?i)Do you already have a license server on your network? [y/N] "y"
      (?i)Enter the name (or IP address) of your license server "xx.xx.xx.xx"
      (?i)Install/update the MySoftware web service? [Y/n] "n"
  when: not directory_check.stat.isdir

Compiling OpenMPI-4.1.5 for ROCEv2 with GNU-8.5

https://docs.open-mpi.org/en/v5.0.x/release-notes/networks.html

Prerequisites 1

First thing first, You may want to check whether you are using RoCE. Do take a look at Installing RoCE using Mellanox (Nvidia) OFED package

Prerequisites 2

Do check whether you have ucx. You can do a dnf install

# dnf install ucx ucx-devel

Alternatively, you can do a manual install. For information on how to install, do take a look at http://openucx.org/wp-content/uploads/UCX_install_guide.pdf

# wget https://github.com/openucx/ucx/releases/download/v1.4.0/ucx-1.4.0.tar.gz
$ tar xzf ucx-1.4.0.tar.gz
$ cd ucx-1.4.0
$ ./contrib/configure-release --prefix=/usr/local/ucx-1.4.0
$ make -j8 
$ make install

Prerequisites 3

Make sure you have install GNU and GNU-C++. This can be done easily using the

# dnf install gcc-c++ gcc

Step 1: Download the OpenMPI package

You can go to OpenMPI to download the latest package at (https://www.open-mpi.org/software/ompi/v4.1/). The latest one at the point of writing is OpenMPI-4.1.

Step 2: Compile the Package

$ ./configure --prefix=/usr/local/openmpi-4.1.5 --enable-mpi-cxx --with-devel-headers --with-ucx --with-verbs --with-slurm=no
$ make && make install

Step 3: To run the MPIRUN using ROCE, do the following.

You may want to see Network Support Information on OpenMPI

$ mpirun --np 12 --hostfile path/to/hostfile --mca pml ucx -x -x UCX_NET_DEVICES=mlx5_0:1 ........

References:

  1. Setting up a RoCE cluster
  2. OpenMPI – Network Support
  3. How do I run Open MPI over RoCE? (UCX PML)

Installing RoCE using Mellanox (Nvidia) OFED package

Prerequisites:

Do read Basic Understanding RoCE and Infiniband

Step 1: Install Mellanox Package

First and Foremost, you have to install Mellanox Package which you can download at https://developer.nvidia.com/networking/ethernet-software. You may want to consider installing using the traditional method or Ansible Method (Installing Mellanox OFED (mlnx_ofed) packages using Ansible)

Step 2: Load the Drivers

Activate two kernel modules that are needed for rdma and RoCE exchanges by using the command

# modprobe rdma_cm ib_umad

Step 3: Verify the drivers are loaded

# ibv_devinfo

Step 4: Set the RoCE to version 2

Set the version of the RoCE protocol to v2 by issuing the command below.

  • -d is the device, 
  • -p is the port 
  • -m the version of RoCE:
[root@node1]# cma_roce_mode -d mlx5_0 -p 1 -m 2
RoCE v2

Step 5: Check which RoCE devices are enabled on the Ethernet

[root@node-1]# ibdev2netdev
mlx5_0 port 1 ==> ens1f0 (Up)
mlx5_1 port 1 ==> ens1f1 (Down)

Refererences:

  1. Setting up a RoCE cluster

Basic Understanding RoCE and Infiniband

Prerequisites:

  1. RoCE required Compliant Ethernet. Currently, I am using Mellanox ConnectX-6 Cards
  2. RoCE required a Compliant Switch. I used Mellanox 100G Switch.

The Difference between Traditional Ethernet Communication and RoCE can be explained very clearly in the diagram taken by Huawei’s Basic Knowledge and Differences of RoCE, IB, and TCP Networks

Some Key Pointers on the difference between TCP/IP and RDMA

  1. The Traditional TCP/IP network communication uses the Kernel to send messages which have high data movement and data replication overhead.
  2. RDMA can bypass the kernel and access the memory directly which allows low-latency network communication.

There are 3 types of RDMA network technologies is so neatly presented in Basic Knowledge and Differences of RoCE, IB, and TCP Networks

References:

  1. Basic Knowledge and Differences of RoCE, IB, and TCP Networks