Installing MLNX_OFED 5.5-1 on Rocky Linux 8.5

If you are installing MLNX-OFED-5.5-1 on Rocky Linux 8.5, you may want to download the drivers from Nvidia Linux Drivers

Step 1: Installing Prerequistics

# dnf install tk tcsh tcl gcc-gfortran kernel-modules-extra

Step 2a: Installing MLNX on Rocky 8.5

If you just do a ./mlnxofedinstall

# ./mlnxofedinstall
Current operation system is not supported!

Step 2b: Force Install with the right distro.

# ./mlnxofedinstall --distro rhel8.5 --force
.....
.....
.....
Device #1:
----------

  Device Type:      ConnectX5
  Part Number:      MCX512F-ACH_Ax_Bx
  Description:      ConnectX-5 EN network interface card; with host management 25GbE Dual-port SFP28; PCIe3.0 x16; ROHS
  PSID:             MT_0000000416
  PCI Device Name:  10:00.0
  Base GUID:        xxxxxxxxxxxx
  Base MAC:         yyyyyyyyyyyy
  Versions:         Current        Available
     FW             16.31.1014     16.32.1010
     PXE            3.6.0403       3.6.0502
     UEFI           14.24.0013     14.25.0017

  Status:           Update required

After installing…….

Restart needed for updates to take effect.
Log File: /tmp/PAl8Z5mkHc
Real log file: /tmp/MLNX_OFED_LINUX.150443.logs/fw_update.log
To load the new driver, run:
/etc/init.d/openibd restart

Step 3: You have to remove and reload the drivers before you can do the /etc/init.d/openibd restart

[root@h00 media]# modprobe -rv ib_isert rpcrdma ib_srpt
rmmod ib_isert
rmmod iscsi_target_mod
rmmod rpcrdma
rmmod ib_srpt
[root@h00 media]# /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]

References:

  1. Driver Installation of Mellanox InfiniBand
  2. Mellanox NIC driver: Current Operation System is not supported

Disable the web console message from Cockpit on Rocky Linux 8.5

When you logged on to Rocky Linux 8.5, you may see this message.

Activate the web console with: systemctl enable --now cockpit.socket

This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

If you wish to disable the message, you may want to

rm -f /etc/motd.d/cockpit

The command will remove the symlink to file /run/cockpit/motd

Setting up NTP Client in Rocky Linux 8.5

Prerequisites Step 1: Endure you are in the correct time zone

# timedatectl
               Local time: Wed 2022-04-20 10:04:44 +08
           Universal time: Wed 2022-04-20 02:04:44 UTC
                 RTC time: Wed 2022-04-20 02:04:44
                Time zone: Asia/Singapore (+08, +0800)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

Prerequisites Step 2: List Time Zone

# timedatectl list-timezones
.....
Asia/Singapore
.....

Prerequisites Step 3: Set Time Zone

# timedatectl set-timezone Asia/Singapore

In Rocky Linux 8.5, the ntp package is no longer supported and it is implemented by the chronyd (a daemon that runs in user-space) which is provided in the chrony package.

chrony works both as an NTP server and as an NTP client, which is used to synchronize the system clock with NTP servers.

To install the chrony suite, use the DNF Package Manager.

# dnf install chrony

Enable the Service

# systemctl start chronyd
# systemctl status chronyd
# systemctl enable chronyd

Check it is synchronised

[root@h00 etc]# timedatectl
               Local time: Wed 2022-04-20 10:19:56 +08
           Universal time: Wed 2022-04-20 02:19:56 UTC
                 RTC time: Wed 2022-04-20 02:19:56
                Time zone: Asia/Singapore (+08, +0800)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

Setting up NTP Client Using Chrony in Rocky Linux 8.5

# vim /etc/chrony.conf
.....
pool sg.pool.ntp.org iburst
.....
# systemctl restart chronyd

Show the current time sources that chronyd is accessing

# chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 178.128.223.142               0   6     0     -     +0ns[   +0ns] +/-    0ns
.....
.....
.....

References:

Comparison between the /etc/os-release of RHEL-8.5 and Rocky Linux 8.5

Rocky Linux is a production-ready downstream version of Red Hat Enterprise Linux started by original founder of CentOS, Gregory Kurtzer. the OS is almost identical under intensive development by the community.

For RHEL-8.5,

NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Green Obsidian)"
ID="rhel"
ID_LIKE=”fedora"
VERSION_ID=”8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.5 (Green Obsidian)"
ANSI_COLOR="0;31"
CPE_NAME=”cpe:/o:redhat:enterprise_linux:8.5:GA”
HOME_URL=”https://www.redhat.com/”
DOCUMENTATION_URL=”https://access.redhat.com/documentation/red_hat_enterprise_linux/8/”
BUG_REPORT_URL=”https://bugzilla.redhat.com/”

For Rocky-Linux-8.5,

NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.5 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8.5:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"

Container runtimes, container images, and other key concepts

This is Linux Container Fundamental 101. If you want more information after reading this blog entry 101 A System’s Guide to Containers by OpenSource.com

What is Linux Container?

A Linux® container is a set of 1 or more processes that are isolated from the rest of the system. All the files necessary to run them are provided from a distinct image, meaning Linux containers are portable and consistent as they move from development to testing, and finally to production. This makes them much quicker to use than development pipelines that rely on replicating traditional testing environments. Because of their popularity and ease of use containers are also an important part of IT security. Read more

What are Container Runtimes?

Take a deep dive into container runtimes so you can understand how container environments are built. Read more

What is Container Image?

A container image contains a packaged application, along with its dependencies, and information on what processes it runs when launched. Read more

4 Linux technologies fundamental to containers

Namespaces, cgroups, seccomp, and SELinux are the Linux technologies that make up the foundations of building and running a container process on your system. Read more

Configuring NVMeoF RoCE For SUSE 15

The blog is taken from Configuring NVMeoF RoCE For SUSE 15.

The purpose of this blog post is to provide the steps required to implement NVMe-oF using RDMA over Converged Ethernet (RoCE) for SUSE Enterprise Linux (SLES) 15 and subsequent releases.

An important item to note is that RoCE requires a lossless network, requiring global pause flow control or PFC to be configured on the network for smooth operation.

All of the below steps are implemented using Mellanox Connect-X4 adapters.

The blog is taken from Configuring NVMeoF RoCE For SUSE 15.

Listing 50 biggest files in Recursive Directories.

I have always enjoyed BASH Commands that can reveal more the files and directories we have. I wrote a short write-up Checking Disk Usage within the subfolders but avoid mount-point

I wondered whether if I can capture the top 20 directories that uses the most disk space. There was this document by OpenSource.com that I find helpful 7 Linux command-line tips for saving media file space

1 To Find the 50 biggest files in its recursive directory tree

% find  -type f  -exec  du -Sh {} +  |  sort -rh  |  head -n 50
19G     ./Downloads/xxxxxx.iso
17G     ./Downloads/AI/AI.tar
17G     ./Downloads/ISO/xxxx.iso
6.4G    ./Ansys/uuuuuuuu.zip
5.3G    ./.cache/tracker/meta.db
4.5G    ./Downloads/HelloThere_AVX2/tar/E45.tbJ
4.4G    ./Documents/EEEEE/EEEEEE_dvd.iso
3.7G    ./Downloads/12345/2021/2021.tar
.....
......

abrtd daemon deleting recently created application core dumps on CentOS 7

When I did the command “systemctl status abrtd.service”, I’ve noticed the following

[root@node1 ~]# systemctl status abrtd.service
● abrtd.service - ABRT Automated Bug Reporting Tool
   Loaded: loaded (/usr/lib/systemd/system/abrtd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2022-01-19 09:52:50 +08; 2 months 11 days ago
 Main PID: 1113 (abrtd)
   CGroup: /system.slice/abrtd.service
           └─1113 /usr/sbin/abrtd -d -s


Apr 01 22:29:43 node1 abrt-server[361048]: Package 'rapidfile' isn't signed with proper key
Apr 01 22:29:44 node1 abrt-server[361048]: 'post-create' on '/var/spool/abrt/ccpp-2022-04-01-22:29:29-360942' exited with 1
Apr 01 22:29:44 node1 abrt-server[361048]: Deleting problem directory '/var/spool/abrt/ccpp-2022-04-01-22:29:29-360942'
Apr 01 22:53:14 node1 abrt-server[423453]: Executable '/usr/local/intel/2019u5/intelpython3/bin/python3.6' doesn't belong to any package and ProcessUnp...et to 'no'
Apr 01 22:53:14 node1 abrt-server[423453]: 'post-create' on '/var/spool/abrt/ccpp-2022-04-01-22:52:40-420563' exited with 1
Apr 01 22:53:14 node1 abrt-server[423453]: Deleting problem directory '/var/spool/abrt/ccpp-2022-04-01-22:52:40-420563'
Apr 01 23:55:22 node1 abrt-server[432522]: '.' does not exist
Apr 01 23:55:23 node1 abrt-server[432522]: 'post-create' on '/var/spool/abrt/ccpp-2022-04-01-23:55:09-432449' exited with 1
Apr 01 23:55:23 node1 abrt-server[432522]: Deleting problem directory '/var/spool/abrt/ccpp-2022-04-01-23:55:09-432449'
Apr 01 23:55:23 node1 abrt-server[432522]: '/var/spool/abrt/ccpp-2022-04-01-23:55:09-432449' does not exist
Hint: Some lines were ellipsized, use -l to show in full.

The Issue seems to be caused by

  • The abrtd daemon deletes recently created core dumps
  • Error: Package isn’t signed with proper key

According to Red Hat Knowledge Base “Why does the abrtd daemon delete recently created application core dumps?”, the resolution can be simply

% vim /etc/abrt/abrt-action-save-package-data.conf 
OpenGPGCheck = no
ProcessUnpackaged = yes

Restart the abrtd daemon – as root – for the new settings to take effect:

# systemctl restart abrtd.service

According to Red Hat, the root cause is written as followed:
When the OpenGPGCheck variable is set to yes (the default setting), this informs ABRT to only analyse and handle crashes in applications provided by packages which are signed by the GPG keys whose locations are listed in the /etc/abrt/gpg_keys file. Setting OpenGPGCheck = no, tells ABRT to catch crashes in all programs. Also, abrt is configured to capture coredump of files installed from rpm only. Variable ‘ProcessUnpackaged’ tells abrt to keep the coredump even if application is not installed via rpm/yum.

References:

Efficient Heterogeneous Parallel Programming Using OpenMP

This article is taken from Intel “Efficient Heterogeneous Parallel Programming Using OpenMP”. In this article, we will show you how to do CPU+GPU asynchronous calculations using OpenMP.

In some cases, offloading computations to an accelerator like a GPU means that the host CPU sits idle until the offloaded computations are finished. However, using the CPU and GPU resources simultaneously can improve the performance of an application. In OpenMP® programs that take advantage of heterogenous parallelism, the master clause can be used to exploit simultaneous CPU and GPU execution. In this article, we will show you how to do CPU+GPU asynchronous calculation using OpenMP.
…..
…..
…..

The Intel® oneAPI DPC++/C++ Compiler was used with following command-line options:
‑O3 ‑Ofast ‑xCORE‑AVX512 ‑mprefer‑vector‑width=512 ‑ffast‑math ‑qopt‑multiple‑gather‑scatter‑by‑shuffles ‑fimf‑precision=low
‑fiopenmp ‑fopenmp‑targets=spir64=”‑fp‑model=precise”

…..
…..
…..
OpenMP provides true asynchronous, heterogeneous execution on CPU+GPU systems. It’s clear from our timing results and VTune profiles that keeping the CPU and GPU busy in the OpenMP parallel region gives the best performance. We encourage you to try this approach.

Intel: Efficient Heterogeneous Parallel Programming Using OpenMP (Best Practices to Keep the CPU and GPU Working at the Same Time)