Installing Nvidia Drivers on Rocky Linux 8.5

If you are planning to install Nvidia Drivers on Rocky Linux 8.5, you may want to use DNF to install. For a detailed explanation Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

# dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
# dnf module install nvidia-driver:latest
cuda-rhel8-x86_64                                                                                                            18 MB/s | 1.4 MB     00:00
Dependencies resolved.
============================================================================================================================================================
 Package                                               Architecture           Version                               Repository                         Size
============================================================================================================================================================
Installing group/module packages:
 cuda-drivers                                          x86_64                 510.47.03-1                           cuda-rhel8-x86_64                 7.0 k
 nvidia-driver                                         x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  22 M
 nvidia-driver-NVML                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 516 k
 nvidia-driver-NvFBCOpenGL                             x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  52 k
 nvidia-driver-cuda                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 591 k
 nvidia-driver-cuda-libs                               x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  63 M
 nvidia-driver-devel                                   x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  12 k
 nvidia-driver-libs                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 168 M
 nvidia-kmod-common                                    noarch                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  12 k
.....
.....
.....
Total download size: 292 M
Installed size: 697 M
Is this ok [y/N]:

Once done, do a reboot,

# reboot

If after a reboot and if you do a “nvidia-smi” and receive an error like the one

# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

You may want to take a look at https://gist.github.com/espoirMur/65cec3d67e0a96e270860c9c276ab9fa. It could be coming Secure Boot Option in your BIOS.

GTC 2021 Keynote with NVIDIA CEO Jensen Huang

NVIDIA CEO Jensen announced NVIDIA’s first data center CPU, Grace, named after Grace Hopper, a U.S. Navy rear admiral and computer programming pioneer. Grace is a highly specialized processor targeting largest data intensive HPC and AI applications as the training of next-generation natural-language processing models that have more than one trillion parameters.

Further accelerating the infrastructure upon which hyperscale data centers, workstations, and supercomputers are built, Huang announced the NVIDIA BlueField-3 DPU.

The next-generation data processing unit will deliver the most powerful software-defined networking, storage and cybersecurity acceleration capabilities.

Where BlueField-2 offloaded the equivalent of 30 CPU cores, it would take 300 CPU cores to secure, offload, and accelerate network traffic at 400 Gbps as BlueField-3— a 10x leap in performance, Huang explained.

CUDA driver version is insufficient for CUDA runtime version

When you do a “/usr/local/cuda-10.1/extras/demo_suite/deviceQuery”. You might get the errors seemed above

[root@node1 ~]# /usr/local/cuda-10.1/extras/demo_suite/deviceQuery
/usr/local/cuda-10.1/extras/demo_suite/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

The Issue may cause some confusion. It is not your libraries. But the it is the Power Setting at the BIOS. Most Servers are configured to be balanced. But for GPGPU, you need to put Power to “Maximum Performance”. For example, for HPE Server, you should put “Static High Performance Mode”

Compiling Gromacs-2019.3 with Intel MKL and CUDA

Prerequisites

GCC-6.5 Compilers and associates libraries
m4-1.4.18
mpfr-3.1.4
cmake-3.15.1
gmp-6.1.0
mpc-1.0.3

Intel Compilers and Prerequisites

% source /usr/local/intel/2018u3/bin/compilervars.sh intel64
% source /usr/local/intel/2018u3/impi/2018.3.222/bin64/mpivars.sh intel64
% source /usr/local/intel/2018u3/mkl/bin/mklvars.sh intel64
% source /usr/local/intel/2018u3/parallel_studio_xe_2018/bin/psxevars.sh intel64
% MKLROOT=/usr/local/intel/2018u3/mkl

Create a setup file

% touch gromacs_gpgpu.sh

Put the following into the gromacs_cpu.sh

CC=mpicc CXX=mpicxx cmake .. -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=on -DGMX_FFT_LIBRARY=mkl
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-2019.3_intel18_mkl_cuda10.1 -DREGRESSIONTEST_DOWNLOAD=ON
-DCMAKE_C_FLAGS:STRING="-cc=icc -O3 -xHost -ip"
-DCMAKE_CXX_FLAGS:STRING="-cxx=icpc -O3 -xHost -ip -I/usr/local/intel/2018u3/compilers_and_libraries_2018.3.222/linux/mpi/intel64/include/" 
-DGMX_GPU=on 
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1
-DCMAKE_BUILD_TYPE=Release
-DCUDA_HOST_COMPILER:FILEPATH=/usr/local/intel/2018u3/compilers_and_libraries_2018.3.222/linux/bin/intel64/icpc
% ./gromacs_gpgpu.sh
% make
% make install

Testing and Verification

$ source /your/installation/prefix/here/bin/GMXRC
./gmxtest.pl all -np 2

Developing a Linux Kernel Module using GPUDirect RDMA

Taken from Developing a Linux Kernel Module using GPUDirect RDMA

1.0 Overview

GPUDirect RDMA is a technology introduced in Kepler-class GPUs and CUDA 5.0 that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express. Examples of third-party devices are: network interfaces, video acquisition devices, storage adapters.

GPUDirect RDMA is available on both Tesla and Quadro GPUs.

A number of limitations can apply, the most important being that the two devices must share the same upstream PCI Express root complex. Some of the limitations depend on the platform used and could be lifted in current/future products.

A few straightforward changes must be made to device drivers to enable this functionality with a wide range of hardware devices. This document introduces the technology and describes the steps necessary to enable an GPUDirect RDMA connection to NVIDIA GPUs on Linux.

 

1.1. How GPUDirect RDMA Works

When setting up GPUDirect RDMA communication between two peers, all physical addresses are the same from the PCI Express devices’ point of view. Within this physical address space are linear windows called PCI BARs. Each device has six BAR registers at most, so it can have up to six active 32bit BAR regions. 64bit BARs consume two BAR registers. The PCI Express device issues reads and writes to a peer device’s BAR addresses in the same way that they are issued to system memory.

Traditionally, resources like BAR windows are mapped to user or kernel address space using the CPU’s MMU as memory mapped I/O (MMIO) addresses. However, because current operating systems don’t have sufficient mechanisms for exchanging MMIO regions between drivers, the NVIDIA kernel driver exports functions to perform the necessary address translations and mappings.

To add GPUDirect RDMA support to a device driver, a small amount of address mapping code within the kernel driver must be modified. This code typically resides near existing calls to get_user_pages().

The APIs and control flow involved with GPUDirect RDMA are very similar to those used with standard DMA transfers.

References:

Read more at: http://docs.nvidia.com/cuda/gpudirect-rdma/index.html