nvidia-smi – failed to initialize nvml: insufficient permissions

The Error Encountered

If you are a non-root user and you issue a command, you might see the error

% nvidia-smi
NVML: Insufficient Permissions" error

The default module option NVreg_DeviceFileMode=0660 set via /etc/modprobe.d/nvidia-default.conf. This causes the nvidia device nodes to have 660 permission.

vim /etc/modprobe.d/nvidia-default.conf
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=1001 NVreg_DeviceFileMode=0660

The Fix

[user1@node1 dev]$ ls -l nvidia*
crw-rw---- 1 root vglusers 195,   0 Jan  5 17:07 nvidia0
crw-rw---- 1 root vglusers 195, 255 Jan  5 17:07 nvidiactl
crw-rw---- 1 root vglusers 195, 254 Jan  5 17:07 nvidia-modeset

The reason for the error is due to the vglusers or video group. The fix is simply putting the users in the /etc/group

# usermod -a -G vglusers user1 

Logged off and Login again, you should be able to do nvidia-smi

References:

A fix for the “NVML: Insufficient Permissions”

Basic Usage of Singularity (Part 2)

The information can be taken from https://github.com/NIH-HPC/Singularity-Tutorial. I’m just documenting what I understand step-by-step

EXEC Command

Using the exec command, we can run commands within the container from the host system.

[user1@node ~]$ singularity exec lolcow_latest.sif cowsay 'How is today?'
 _______________
< How is today? >
 ---------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

RUN Command

Sometimes you can run a container

 [user1@node ~]$ singularity run lolcow_latest.sif
 _____________________________________
/ Fine day to work off excess energy. \
\ Steal something heavy.              /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

A special script “runscript” is activated when the “run” command is exercised. To have a closer look

[user1@node ~]$ singularity inspect --runscript lolcow_latest.sif
#!/bin/sh

    fortune | cowsay | lolcat

runscript consists of three simple commands with the output of each command piped to the subsequent command. Altenatively, you can do a

[user1@node ~]$ ./lolcow_latest.sif

Pipes and Redirection

Singularity allows piping services so that the host system can interact with the container. Here we are executing a command in the container and piping the output out into a file called output.txt

[user1@node ~]$ singularity exec lolcow_latest.sif cowsay moo > output.txt

You will notice that the output has been pushed to output.txt. Make sure you create output.txt prior

[user1@node ~]$ vim output.txt

Inside output.txt

 How are you >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Let’s say we create a file called HelloThere and put in a text called “Shame On You”. The command is then fed into cowsay.

% cat HelloThere | singularity exec lolcow_latest.sif cowsay -n
______________
< Shame on You >
 --------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

References:

  1. https://github.com/NIH-HPC/Singularity-Tutorial

Basic Usage of Singularity (Part 1)

The article is taken from https://github.com/NIH-HPC/Singularity-Tutorial. I’m personally learning much from the systematic rich contents and below are my learning points……

Popular Sites

There are popular sites for Singularity Download that has pre-build containers

Downloading the Containers

% singularity pull library://godlovedc/funny/lolcow
INFO:    Downloading library image
89.2MiB / 89.2MiB [========================================================================================================================================================] 100 % 5.4 MiB/s 0s
WARNING: integrity: signature not found for object group 1
WARNING: Skipping container verification

Singularity File

The Singularity File has a .sif extension something like this

lolcow_latest.sif

Entering the Shell of Singularity

% singularity shell lolcow_latest.sif
Singularity> cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

You will notice the container is running Ubuntu…… although the Host OS could be different

Interesting Observations

a. The user remains the same inside and outside of the container.

Singularity> whoami
admin

b. The hostname remains the same inside and outside the container

Singularity> hostname
node1

c. Running the application within the containers

Singularity> cowsay moo
 _____
< moo >
 -----
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

d. Exiting the Application

Singularity > exit

References:

  1. https://github.com/NIH-HPC/Singularity-Tutorial
  2. https://github.com/sylabs/singularity

Install Nvidia Drivers on CentOS 7

Getting Information on Nvidia GPU on CentOS 7

# lspci | grep -i --color 'vga\|3d\|2d'
02:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 42)
86:00.0 VGA compatible controller: NVIDIA Corporation GP102GL [Quadro P6000] (rev a1)
# lshw -class display
 *-display
       description: VGA compatible controller
       product: MGA G200e [Pilot] ServerEngines (SEP1)
       vendor: Matrox Electronics Systems Ltd.
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 42
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller bus_master cap_list rom
       configuration: driver=mgag200 latency=0
       resources: irq:16 memory:d3000000-d3ffffff memory:d4a10000-d4a13fff memory:d4000000-d47fffff memory:d4a00000-d4a0ffff
  *-display
       description: VGA compatible controller
       product: GP102GL [Quadro P6000]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:86:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: iomemory:3df0-3def iomemory:3df0-3def irq:320 memory:ec000000-ecffffff memory:3dfe0000000-3dfefffffff memory:3dff0000000-3dff1ffffff ioport:c000(size=128) memory:ed000000-ed07ffff

Nvidia Downloads Site

From the Information, Download the Drivers from Nvidia Download Page

Yum Install Libraries and Dependencies

# yum group install "Development Tools"
# yum install kernel-devel
# yum install epel-release
# yum install dkms

Disable Noveau Drivers

Disable nouveau driver by changing the configuration /etc/default/grub file. Add the nouveau.modeset=0 into line starting with GRUB_CMDLINE_LINUX. This will disable the noveau driver after the reboot.

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet nouveau.modeset=0"
GRUB_DISABLE_RECOVERY="true"

Modifying the Grub.cfg

For BIOS User,

# grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-957.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-86f557f292e5492aa7ac0bf1cb2670b0
Found initrd image: /boot/initramfs-0-rescue-86f557f292e5492aa7ac0bf1cb2670b0.img
done

For UEFI User

# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Switch CentOS from GUI to Text Mode

First switch to Text Mode

# systemctl isolate multi-user.target

Installing the Nvidia Driver on CentOS 7

# bash NVIDIA-Linux-x86_64-*

Reboot the System

# reboot

Finally, run the command nvidia-settings to check and configure

# nvidia-settings

References

Top 10 videos from Red Hat Developer

Podman: A Linux tool for working with containers and pods Get started with Podman, an open source, Linux-based tool that builds Docker-compatible container images.

Easily secure your Spring Boot applications with Keycloak Discover how to deploy and configure a Keycloak server and then secure a Spring Boot application.

Learn how to move your existing Java app to Kubernetes—without changing a single line of code Using the free Developer Sandbox for Red Hat OpenShift, we demo how you can take your existing source code or create a new application and easily deploy and manage them as containers.

KBE Insider (E3): Luke Hinds We talk to Luke Hinds, Security Lead for Office of CTO, Red Hat, about his work on the Kubernetes Security Response Team, Sigstore, and the Kubernetes HackerOne Bug Bounty Program.

Local OpenShift environment on Windows with Red Hat CodeReady Containers Brian Tannous walks through getting a local OpenShift environment installed on Windows using Red Hat CodeReady Containers.

Securing apps and services with Keycloak authentication | DevNation Tech Talk See how to easily secure all of your applications and services, regardless of how they’re implemented and hosted, with Keycloak—all with little-to-no code required.

A deep dive into Keycloak | DevNation Tech Talk This tutorial introduces Keycloak, an open source identity and access management solution for modern applications and services.

Secure Spring Boot Microservices with Keycloak | DevNation Tech Talk In this interactive, live-coding session, you’ll explore the Spring Boot adapter provided by Keycloak.

KBE Insider (E5): Savitha Raghunathan We talk to Savitha Raghunathan, Senior Software Engineer at Red Hat, about her work and experience as an open source contributor within the Kubernetes ecosystem.

Apache Kafka + Debezium | DevNation Tech Talk This tutorial explores how to use Apache Kafka and Debezium. Learn how to use change data capture for reliable microservices integration.

Compiling Singularity-CE-3.9.2 on CentOS-7

The Official Documentation can be found at https://sylabs.io/guides/3.0/user-guide/installation.html

Prerequisites 1 – Yum

If you are using CentOS, you may want to put the necessary libraries and dependencies in first.

yum install -y && \
yum groupinstall -y 'Development Tools' && \
yum install -y \
openssl-devel \
libuuid-devel \
libseccomp-devel \
wget \
squashfs-tools

Prerequisites 2 – Go

Go to the Download Page https://go.dev/dl/ to download the Linux Version.

Extract the archive you downloaded into /usr/local, creating a Go tree in /usr/local/go.

This step below will remove a previous installation at /usr/local/go, if any, prior to extracting. Please back up any data before proceeding.

% rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.5.linux-amd64.tar.gz

Add /usr/local/go/bin to the PATH environment variable. You can do this by adding the following line to your $HOME/.profile or /etc/profile (for a system-wide installation):

export PATH=$PATH:/usr/local/go/bin

Verify the Installation with the command

% go version

Compiling Singularity

To download Singularity, do visit the download site. Singularity uses a build system called makeit. mconfig is called to generate Makefile and them make is used to compile and install

% cd singularity
% ./mconfig
 --prefix=/usr/local/singularity-ce-3.9.2
% cd build
% make
% make install

Source bash-completion file

To enjoy bash completion with Singularity commands and options, source the bash-completion file. you need to source the appropriate file and add it to the bash-completion directory in /etc so that it will be sourced automatically when you start another shell.

% . /usr/local/singularity-ce-3.9.2/etc/bash_completion.d/singularity
% cp /usr/local/singularity-ce-3.9.2/etc/bash_completion.d/singularity /etc/bash_completion.d/

Testing

As long as you see a cow your installation is working properly…….

singularity run library://godlovedc/funny/lolcow
/ Q: What is the difference between a \
\ duck? A: One leg is both the same.  /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

References

  1. https://sylabs.io/guides/3.0/user-guide/installation.html
  2. https://github.com/NIH-HPC/Singularity-Tutorial

Working around Docker’s download limit on RedHat OpenShift

Taken from “How to work around Docker’s new download rate limit on Red Hat OpenShift” from RedHat Developer.

Docker recently changed its policy for downloading images as an anonymous user. The company now has a limit of 100 downloads every six hours from a single IP address.

If you are using the OpenShift Developer Sandbox to experiment with a free OpenShift cluster, you might encounter the error message shown in Figure 1.

All you have to do to avoid Docker’s new rate-limit error is authenticate to your Docker Hub account. After you’ve authenticated to the account, you won’t be pulling the image as an anonymous user but as an authenticated user. The image download will count against your personal limit of 200 downloads per six hours instead of the 100 downloads shared across all anonymous cluster users.

For the complete article, do take a look at “How to work around Docker’s new download rate limit on Red Hat OpenShift” from RedHat Developer.