April 13, 2021 by kittycool only

GTC 2021 Keynote with NVIDIA CEO Jensen Huang

NVIDIA CEO Jensen announced NVIDIA’s first data center CPU, Grace, named after Grace Hopper, a U.S. Navy rear admiral and computer programming pioneer. Grace is a highly specialized processor targeting largest data intensive HPC and AI applications as the training of next-generation natural-language processing models that have more than one trillion parameters.

Further accelerating the infrastructure upon which hyperscale data centers, workstations, and supercomputers are built, Huang announced the NVIDIA BlueField-3 DPU.

The next-generation data processing unit will deliver the most powerful software-defined networking, storage and cybersecurity acceleration capabilities.

Where BlueField-2 offloaded the equivalent of 30 CPU cores, it would take 300 CPU cores to secure, offload, and accelerate network traffic at 400 Gbps as BlueField-3— a 10x leap in performance, Huang explained.

March 10, 2021 by kittycool only

Using multiple GPUs for Machine Learning

Taken from Sharcnet HPC

The Video will consider two cases – when the GPUs are inside a single node, and a multi-node case.

January 11, 2021 by kittycool only

Deep Learning Flower Classification Inference on Nvidia A100

December 19, 2020 by kittycool only

Tensor Cores in a Nutshell

November 19, 2020 by kittycool only

Deep Learning Training Performance with Nvidia A100 and V100 on Dell EMC PowerEdge R7525 Servers

Articles from: Deep Learning Training Performance on Dell EMC PowerEdge R7525 Servers with NVIDIA A100 GPUs

CUDA Basic Linear Algebra

For FP16, the HGEMM TFLOPs of the NVIDIA A100 GPU is 2.27 times faster than the NVIDIA V100S GPU.
For FP32, the SGEMM TFLOPs of the NVIDIA A100 GPU is 1.3 times faster than the NVIDIA V100S GPU.
For TF32, performance improvement is expected without code changes for deep learning applications on the new NVIDIA A100 GPUs. This expectation is because math operations are run on NVIDIA A100 Tensor Cores GPUs with the new TF32 precision format. Although TF32 reduces the precision by a small margin, it preserves the range of FP32 and strikes an excellent balance between speed and accuracy. Matrix multiplication gained a sizable boost from 13.4 TFLOPS (FP32 on the NVIDIA V100S GPU) to 86.5 TFLOPS (TF32 on the NVIDIA A100 GPU).

MLPerf Training v0.7 ResNet-50

Both runs using two NVIDIA A100 GPUs and two NVIDIA V100S GPUs converged at the 40^th epoch. The NVIDIA A100 run took 166 minutes to converge, which is 1.8 times faster than the NVIDIA V100S run. Regarding throughput, two NVIDIA A100 GPUs can process 5240 images per second, which is also 1.8 times faster than the two NVIDIA V100S GPUs.

November 19, 2020 by kittycool only

HPC Application Performance with Nvidia V100 versus A100 on Dell PowerEdge R7525 Servers

Articles Taken from: HPC Application Performance on Dell PowerEdge R7525 Servers with NVIDIA A100 GPGPUs

Difference between Nvidia A100 GPGPU versus Nvidia V100s GPGPU

	NVIDIA A100 GPGPU		NVIDIA V100S GPGPU
Form factor	SXM4	PCIe Gen4	SXM2	PCIe Gen3
GPU architecture	Ampere		Volta
Memory size	40 GB	40 GB	32 GB	32 GB
CUDA cores	6912		5120
Base clock	1095 MHz	765 MHz	1290 MHz	1245 MHz
Boost clock	1410 MHz		1530 MHz	1597 MHz
Memory clock	1215 MHz		877 MHz	1107 MHz
MIG support	Yes		No
Peak memory bandwidth	Up to 1555 GB/s		Up to 900 GB/s	Up to 1134 GB/s
Total board power	400 W	250 W	300 W	250 W

Benchmark Results (In Summary)

HPL performance comparison for the PowerEdge R7525 server with either NVIDIA A100 or NVIDIA V100S GPGPUs

HPCG performs at a rate 70 percent higher with the NVIDIA A100 GPGPU due to higher memory bandwidth

August 18, 2020 by kittycool only

Checking Process running on GPGPU

If you wish to check the running process at GPU, it is quite easy.

watch -n 1 nvidia-smi

Look at the Processes at the bottom. You have which GPU is holding running what and the corresponding PID and Process Name. Quite useful

February 9, 2019 by kittycool only

Getting on board Nvidia GPGPU on CentOS KVM

For vGPU test you’ll need a license, which can be requested here:
https://www.nvidia.com/object/nvidia-enterprise-account.html
Other documentation for installing vGPU on Red Hat / CentOS is here:
https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#red-hat-el-kvm-install-configure-vgpu
Virtual GPU Software Quick Start Guide
https://linuxcluster.wordpress.com/2019/01/28/virtual-gpu-software-quick-start-guide/

In summary the steps are:
– Install a piece of sw in the host/hypervisor to help virtualize GPUs
– Install the GPU drivers inside the guest OS of the VMs
– Install a license server (flex) for the licensing
– Configure license server and settings within the VM to connect to the license server

December 5, 2018 by kittycool only

GPU Accelerated Multi-Node HPC Workloads with Singularity

An “just-out-of-the-oven” presentation slide deck by Nvidia titled GPU Accelerated Multi-Node HPC Workloads with Singularity”

RHEL Certification for DGX-1
https://access.redhat.com/ecosystem/hardware/3570661

July 25, 2018 by kittycool only

Nvidia DGX Data Centre Reference Design

This is a white Paper from Nvidia which is an interesting information for easy deployment of DGX Servers for Deep Learning

Nvidia DGX POD Reference Design Whitepaper (pdf)

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

GPGPU