How Synthetic Data Supercharges Vision AI Development NVIDIA Webinar

In this meetup you’ll learn how synthetic data is transforming AI development efforts:

  • Learn how to use NVIDIA’s Omniverse Replicator to quickly create synthetic data and how it can integrate with NVIDIA TAO training tools.
  • Hear from Sky Engine AI, an NVIDIA synthetic data partner, sharing how you can leverage 3rd party synthetic data services.
  • Get your questions answered in a live Q&A session with our team of experts.

Register here and select one of the following sessions:

  • Americas, Europe, Middle East: Wednesday May 18 – 8am PT | 4PM CET 
  • Asia-Pacific: Thursday May 19 – 11am SST | 12pm JST/KST | ?8:30am IST

EOL notice for Mellanox ConnectX-5 VPI host channel adapters and Switch-IB 2 based EDR InfiniBand Switches

Nvidia Corporation has announced the EOL Notice #LCR-000906 – MELLANOX

PCN INFORMATION:
PCN Number: LCR-000906 – MELLANOX
PCN Description: EOL notice for Mellanox ConnectX-5 VPI host channel adapters and Switch-IB 2 based EDR InfiniBand Switches
Publish Date: Sun May 08 00:00:00 GMT 2022
Type: FYI

Installing Nvidia Drivers on Rocky Linux 8.5

If you are planning to install Nvidia Drivers on Rocky Linux 8.5, you may want to use DNF to install. For a detailed explanation Streamlining NVIDIA Driver Deployment on RHEL 8 with Modularity Streams

# dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
# dnf module install nvidia-driver:latest
cuda-rhel8-x86_64                                                                                                            18 MB/s | 1.4 MB     00:00
Dependencies resolved.
============================================================================================================================================================
 Package                                               Architecture           Version                               Repository                         Size
============================================================================================================================================================
Installing group/module packages:
 cuda-drivers                                          x86_64                 510.47.03-1                           cuda-rhel8-x86_64                 7.0 k
 nvidia-driver                                         x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  22 M
 nvidia-driver-NVML                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 516 k
 nvidia-driver-NvFBCOpenGL                             x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  52 k
 nvidia-driver-cuda                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 591 k
 nvidia-driver-cuda-libs                               x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  63 M
 nvidia-driver-devel                                   x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  12 k
 nvidia-driver-libs                                    x86_64                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                 168 M
 nvidia-kmod-common                                    noarch                 3:510.47.03-1.el8                     cuda-rhel8-x86_64                  12 k
.....
.....
.....
Total download size: 292 M
Installed size: 697 M
Is this ok [y/N]:

Once done, do a reboot,

# reboot

If after a reboot and if you do a “nvidia-smi” and receive an error like the one

# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

You may want to take a look at https://gist.github.com/espoirMur/65cec3d67e0a96e270860c9c276ab9fa. It could be coming Secure Boot Option in your BIOS.

Webinar – Cloud-Native Supercomputing Powers New Data Centre Architecture

Computing power becomes the service. Data center becomes the new computing unit to serve the unlimited computing resource with high performance, flexibility and security. Network as the bridge between the computing resource and storage resource, between data centers and between the user and data center,  is becoming the key to impact performance and security. The Cloud Native Supercomputing architecture is designed to leverage the advantage from both supercomputer and cloud to provide the best performance in the modern zero trust environment.

By attending this webinar, you will learn how to:

  • Use the supercomputing technologies in data center
  • Deliver the cloud flexibility with supercomputing technologies to drive the most powerful data center
  • Provide the cloud native supercomputing service in zero trust environment

Date: February 23, 2022
Time: 15:00 – 16:00 SGT
Duration: 1 hour

To Register (Cloud Native Supercomputing Powers New Data Center Architecture (nvidianews.com)

nvidia-smi – failed to initialize nvml: insufficient permissions

The Error Encountered

If you are a non-root user and you issue a command, you might see the error

% nvidia-smi
NVML: Insufficient Permissions" error

The default module option NVreg_DeviceFileMode=0660 set via /etc/modprobe.d/nvidia-default.conf. This causes the nvidia device nodes to have 660 permission.

vim /etc/modprobe.d/nvidia-default.conf
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=1001 NVreg_DeviceFileMode=0660

The Fix

[user1@node1 dev]$ ls -l nvidia*
crw-rw---- 1 root vglusers 195,   0 Jan  5 17:07 nvidia0
crw-rw---- 1 root vglusers 195, 255 Jan  5 17:07 nvidiactl
crw-rw---- 1 root vglusers 195, 254 Jan  5 17:07 nvidia-modeset

The reason for the error is due to the vglusers or video group. The fix is simply putting the users in the /etc/group

# usermod -a -G vglusers user1 

Logged off and Login again, you should be able to do nvidia-smi

References:

A fix for the “NVML: Insufficient Permissions”

Install Nvidia Drivers on CentOS 7

Getting Information on Nvidia GPU on CentOS 7

# lspci | grep -i --color 'vga\|3d\|2d'
02:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 42)
86:00.0 VGA compatible controller: NVIDIA Corporation GP102GL [Quadro P6000] (rev a1)
# lshw -class display
 *-display
       description: VGA compatible controller
       product: MGA G200e [Pilot] ServerEngines (SEP1)
       vendor: Matrox Electronics Systems Ltd.
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 42
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller bus_master cap_list rom
       configuration: driver=mgag200 latency=0
       resources: irq:16 memory:d3000000-d3ffffff memory:d4a10000-d4a13fff memory:d4000000-d47fffff memory:d4a00000-d4a0ffff
  *-display
       description: VGA compatible controller
       product: GP102GL [Quadro P6000]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:86:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: iomemory:3df0-3def iomemory:3df0-3def irq:320 memory:ec000000-ecffffff memory:3dfe0000000-3dfefffffff memory:3dff0000000-3dff1ffffff ioport:c000(size=128) memory:ed000000-ed07ffff

Nvidia Downloads Site

From the Information, Download the Drivers from Nvidia Download Page

Yum Install Libraries and Dependencies

# yum group install "Development Tools"
# yum install kernel-devel
# yum install epel-release
# yum install dkms

Disable Noveau Drivers

Disable nouveau driver by changing the configuration /etc/default/grub file. Add the nouveau.modeset=0 into line starting with GRUB_CMDLINE_LINUX. This will disable the noveau driver after the reboot.

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet nouveau.modeset=0"
GRUB_DISABLE_RECOVERY="true"

Modifying the Grub.cfg

For BIOS User,

# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-86f557f292e5492aa7ac0bf1cb2670b0
Found initrd image: /boot/initramfs-0-rescue-86f557f292e5492aa7ac0bf1cb2670b0.img
done

For UEFI User

# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Switch CentOS from GUI to Text Mode

First switch to Text Mode

# systemctl isolate multi-user.target

Installing the Nvidia Driver on CentOS 7

# bash NVIDIA-Linux-x86_64-*

Reboot the System

# reboot

Finally, run the command nvidia-settings to check and configure

# nvidia-settings

References

VMware-NVIDIA AI-Ready Enterprise platform

NVIDIA and VMware have formed a strategic partnership to transform the data center to bring AI and modern workloads to every enterprise.

NVIDIA AI Enterprise is an end-to-end, cloud-native suite of  AI and data analytics software, optimized, certified, and supported by NVIDIA to run on VMware vSphere with NVIDIA-Certified  Systems. It includes key enabling technologies  from NVIDIA for rapid deployment, management, and scaling of AI workloads in the modern hybrid cloud.

For more information, see NVIDIA AI Enterprise