Cannot install the best candidate for the job for CUDA Drivers and Rocky Linux 8.5

I follow the blog Installing Nvidia Drivers on Rocky Linux 8.5. But I encountered an error that I have not encountered before

Error:
 Problem 1: package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64
 Problem 2: package cuda-drivers-515.48.07-1.x86_64 requires nvidia-kmod >= 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64
 Problem 3: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64
 Problem 4: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-modprobe-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64
 Problem 5: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-settings-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64
 Problem 6: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-xconfig-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - nothing provides dkms needed by kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64

The hint is that dkms is required.

nothing provides dkms needed by kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64

Enable EPEL Repository

# dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
 # dnf config-manager --enable epel

Install dkms

 # dnf install dkms*

Install the latest Nvidia Drivers (If possible).

# dnf module install nvidia-driver:latest

If the Error pop out like this

Last metadata expiration check: 0:01:01 ago on Mon 06 Jun 2022 08:47:40 PM EDT.
Error:
 Problem 1: package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
 Problem 2: package cuda-drivers-515.48.07-1.x86_64 requires nvidia-kmod >= 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
 Problem 3: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
 Problem 4: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-modprobe-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
 Problem 5: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-settings-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
 Problem 6: package nvidia-driver-3:515.48.07-1.el8.x86_64 requires nvidia-kmod-common = 3:515.48.07, but none of the providers can be installed
  - package nvidia-xconfig-3:515.48.07-1.el8.x86_64 requires nvidia-driver(x86-64) = 3:515.48.07, but none of the providers can be installed
  - package nvidia-kmod-common-3:515.48.07-1.el8.noarch requires nvidia-kmod = 3:515.48.07, but none of the providers can be installed
  - cannot install the best candidate for the job
  - package kmod-nvidia-latest-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering
  - package kmod-nvidia-open-dkms-3:515.48.07-1.el8.x86_64 is filtered out by modular filtering

You will notice that the dkms issues has been resolved. Try not using the nvidia-driver:latest

# dnf module install nvidia-driver
===================================================================================================================================================
 Package                                Architecture        Version                                           Repository                      Size
===================================================================================================================================================
Upgrading:
 bcc                                    x86_64              0.19.0-5.el8                                      appstream                      674 k
 bcc-tools                              x86_64              0.19.0-5.el8                                      appstream                      447 k
 bpftrace                               x86_64              0.12.1-4.el8                                      appstream                      1.3 M
 clang-libs                             x86_64              13.0.1-1.module+el8.6.0+825+7e27476a              appstream                       23 M
 clang-resource-filesystem              x86_64              13.0.1-1.module+el8.6.0+825+7e27476a              appstream                       13 k
 compiler-rt                            x86_64              13.0.1-1.module+el8.6.0+825+7e27476a              appstream                      4.2 M
 libglvnd                               x86_64              1:1.3.4-1.el8                                     appstream                      126 k
 libglvnd-egl                           x86_64              1:1.3.4-1.el8                                     appstream                       48 k
 libglvnd-gles                          x86_64              1:1.3.4-1.el8                                     appstream                       39 k
 libglvnd-glx                           x86_64              1:1.3.4-1.el8                                     appstream                      136 k
 libomp-devel                           x86_64              13.0.1-1.module+el8.6.0+825+7e27476a              appstream                       28 k
 llvm-libs                              x86_64              13.0.1-1.module+el8.6.0+825+7e27476a              appstream                       24 M
 mesa-dri-drivers                       x86_64              21.3.4-1.el8                                      appstream                       11 M
 mesa-filesystem                        x86_64              21.3.4-1.el8                                      appstream                       33 k
 mesa-libxatracker                      x86_64              21.3.4-1.el8                                      appstream                      2.0 M
 python3-bcc                            x86_64              0.19.0-5.el8                                      appstream                       89 k
Installing group/module packages:
 cuda-drivers                           x86_64              515.48.07-1                                       cuda-rhel8-x86_64              8.1 k
 kmod-nvidia-latest-dkms                x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               30 M
 nvidia-driver                          x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               23 M
 nvidia-driver-NVML                     x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64              462 k
 nvidia-driver-NvFBCOpenGL              x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               54 k
 nvidia-driver-cuda                     x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64              455 k
 nvidia-driver-cuda-libs                x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               54 M
 nvidia-driver-devel                    x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               13 k
 nvidia-driver-libs                     x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64              177 M
 nvidia-kmod-common                     noarch              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               13 k
 nvidia-libXNVCtrl                      x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               26 k
 nvidia-libXNVCtrl-devel                x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               56 k
 nvidia-modprobe                        x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               37 k
 nvidia-persistenced                    x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64               43 k
 nvidia-settings                        x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64              835 k
 nvidia-xconfig                         x86_64              3:515.48.07-1.el8                                 cuda-rhel8-x86_64              106 k
Installing dependencies:
 dnf-plugin-nvidia                      noarch              2.0-1.el8                                         cuda-rhel8-x86_64               12 k
 egl-wayland                            x86_64              1.1.9-3.el8                                       appstream                       39 k
 libX11-devel                           x86_64              1.6.8-5.el8                                       appstream                      975 k
 libXau-devel                           x86_64              1.0.9-3.el8                                       appstream                       19 k
 libglvnd-opengl                        x86_64              1:1.3.4-1.el8                                     appstream                       46 k
 libvdpau                               x86_64              1.4-2.el8                                         appstream                       40 k
 libxcb-devel                           x86_64              1.13.1-1.el8                                      appstream                      1.1 M
 mesa-vulkan-drivers                    x86_64              21.3.4-1.el8                                      appstream                      6.7 M
 ocl-icd                                x86_64              2.2.12-1.el8                                      appstream                       50 k
 opencl-filesystem                      noarch              1.0-6.el8                                         appstream                      7.3 k
 vulkan-loader                          x86_64              1.3.204.0-2.el8                                   appstream                      133 k
 xorg-x11-proto-devel                   noarch              2020.1-3.el8                                      appstream                      279 k
Installing module profiles:
 nvidia-driver/default
Enabling module streams:
 nvidia-driver                                              latest-dkms

.....
.....

Finally do a

# nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:A3:00.0 Off |                    0 |
| N/A   49C    P0    46W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  Off  | 00000000:C3:00.0 Off |                    0 |
| N/A   53C    P0    46W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Advertisement

Installing Nvidia Drivers on Rocky Linux 8.5

If you are planning to install Nvidia Drivers on Rocky Linux 8.5, you may want to use DNF to install. For a detailed explanation Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

Step 1: Add Offical Nvidia Repository to Package Managers repository list.

# dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo

Step 2: Install Kernel-Devel and Headers used by the Drivers

# dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

Step 3: Installing Nvidia Drivers and Settings

# dnf install nvidia-driver nvidia-settings

Step 4: Install CUDA Drivers and REboot

# dnf install cuda-driver

Once done, do a reboot,

# reboot

If after a reboot and if you do a “nvidia-smi” and receive an error like the one

# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

You may want to take a look at https://gist.github.com/espoirMur/65cec3d67e0a96e270860c9c276ab9fa. It could be coming Secure Boot Option in your BIOS.

GTC 2021 Keynote with NVIDIA CEO Jensen Huang

NVIDIA CEO Jensen announced NVIDIA’s first data center CPU, Grace, named after Grace Hopper, a U.S. Navy rear admiral and computer programming pioneer. Grace is a highly specialized processor targeting largest data intensive HPC and AI applications as the training of next-generation natural-language processing models that have more than one trillion parameters.

Further accelerating the infrastructure upon which hyperscale data centers, workstations, and supercomputers are built, Huang announced the NVIDIA BlueField-3 DPU.

The next-generation data processing unit will deliver the most powerful software-defined networking, storage and cybersecurity acceleration capabilities.

Where BlueField-2 offloaded the equivalent of 30 CPU cores, it would take 300 CPU cores to secure, offload, and accelerate network traffic at 400 Gbps as BlueField-3— a 10x leap in performance, Huang explained.

CUDA driver version is insufficient for CUDA runtime version

When you do a “/usr/local/cuda-10.1/extras/demo_suite/deviceQuery”. You might get the errors seemed above

[root@node1 ~]# /usr/local/cuda-10.1/extras/demo_suite/deviceQuery
/usr/local/cuda-10.1/extras/demo_suite/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

The Issue may cause some confusion. It is not your libraries. But the it is the Power Setting at the BIOS. Most Servers are configured to be balanced. But for GPGPU, you need to put Power to “Maximum Performance”. For example, for HPE Server, you should put “Static High Performance Mode”

Compiling Gromacs-2019.3 with Intel MKL and CUDA

Prerequisites

GCC-6.5 Compilers and associates libraries
m4-1.4.18
mpfr-3.1.4
cmake-3.15.1
gmp-6.1.0
mpc-1.0.3

Intel Compilers and Prerequisites

% source /usr/local/intel/2018u3/bin/compilervars.sh intel64
% source /usr/local/intel/2018u3/impi/2018.3.222/bin64/mpivars.sh intel64
% source /usr/local/intel/2018u3/mkl/bin/mklvars.sh intel64
% source /usr/local/intel/2018u3/parallel_studio_xe_2018/bin/psxevars.sh intel64
% MKLROOT=/usr/local/intel/2018u3/mkl

Create a setup file

% touch gromacs_gpgpu.sh

Put the following into the gromacs_cpu.sh

CC=mpicc CXX=mpicxx cmake .. -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=on -DGMX_FFT_LIBRARY=mkl
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs-2019.3_intel18_mkl_cuda10.1 -DREGRESSIONTEST_DOWNLOAD=ON
-DCMAKE_C_FLAGS:STRING="-cc=icc -O3 -xHost -ip"
-DCMAKE_CXX_FLAGS:STRING="-cxx=icpc -O3 -xHost -ip -I/usr/local/intel/2018u3/compilers_and_libraries_2018.3.222/linux/mpi/intel64/include/" 
-DGMX_GPU=on 
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1
-DCMAKE_BUILD_TYPE=Release
-DCUDA_HOST_COMPILER:FILEPATH=/usr/local/intel/2018u3/compilers_and_libraries_2018.3.222/linux/bin/intel64/icpc
% ./gromacs_gpgpu.sh
% make
% make install

Testing and Verification

$ source /your/installation/prefix/here/bin/GMXRC
./gmxtest.pl all -np 2