September 10, 2017 by kittycool only

Developing a Linux Kernel Module using GPUDirect RDMA

Taken from Developing a Linux Kernel Module using GPUDirect RDMA

1.0 Overview

GPUDirect RDMA is a technology introduced in Kepler-class GPUs and CUDA 5.0 that enables a direct path for data exchange between the GPU and a third-party peer device using standard features of PCI Express. Examples of third-party devices are: network interfaces, video acquisition devices, storage adapters.

GPUDirect RDMA is available on both Tesla and Quadro GPUs.

A number of limitations can apply, the most important being that the two devices must share the same upstream PCI Express root complex. Some of the limitations depend on the platform used and could be lifted in current/future products.

A few straightforward changes must be made to device drivers to enable this functionality with a wide range of hardware devices. This document introduces the technology and describes the steps necessary to enable an GPUDirect RDMA connection to NVIDIA GPUs on Linux.

1.1. How GPUDirect RDMA Works

When setting up GPUDirect RDMA communication between two peers, all physical addresses are the same from the PCI Express devices’ point of view. Within this physical address space are linear windows called PCI BARs. Each device has six BAR registers at most, so it can have up to six active 32bit BAR regions. 64bit BARs consume two BAR registers. The PCI Express device issues reads and writes to a peer device’s BAR addresses in the same way that they are issued to system memory.

Traditionally, resources like BAR windows are mapped to user or kernel address space using the CPU’s MMU as memory mapped I/O (MMIO) addresses. However, because current operating systems don’t have sufficient mechanisms for exchanging MMIO regions between drivers, the NVIDIA kernel driver exports functions to perform the necessary address translations and mappings.

To add GPUDirect RDMA support to a device driver, a small amount of address mapping code within the kernel driver must be modified. This code typically resides near existing calls to get_user_pages().

The APIs and control flow involved with GPUDirect RDMA are very similar to those used with standard DMA transfers.

References:

September 6, 2016 by kittycool only

Compiling OpenMPI-1.8.8 with Intel Compiler and CUDA

Configuration parameters for Compiling OpenMPI-1.8.8 with Intel Compiler and CUDA can be found here.

# ./configure --prefix=/usr/local/openmpi-1.8.8-gpu_intel-15.0.7 CC=icc CXX=icpc F77=ifort FC=ifort --with-devel-headers --enable-binaries --with-cuda=/usr/local/cuda/
# make -j 16
# make all

References:

May 15, 2016 by kittycool only

Install Nvidia CUDA-7.5 environment in CentOS 6

This note is derived from How to install Nvidia CUDA environment in RHEL 6? It works for me.

Step 1: Install kernel

# yum install kernel-devel kernel-headers -y

Step 2: Download and Install Cuda Toolkit

Cuda Downloads Site

Step 3: Configure CUDA Toolkit

# echo -e "/usr/local/cuda-7.5/lib64\n/usr/local/cuda-7.5/lib" > /etc/ld.so.conf./cuda.conf
# echo "export PATH=/usr/local/cuda-7.5/bin:$PATH" > /etc/profile.d/cuda.sh
# /sbin/ldconfig

Step 4: Disable Noveau Driver

# echo -e "\nblacklist nouveau" >> /etc/modprobe.d/blacklist.conf
# dracut -f /boot/initramfs-`rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`.img `rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`

Step 5: Reboot the Server

Step 6: Check Supported Nvidia Card

[root@comp1 ~]# lspci -d "10de:*" -v
84:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m] (rev a1)
Subsystem: NVIDIA Corporation Device 097e
Flags: bus master, fast devsel, latency 0, IRQ 64
Memory at c9000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3c400000000 (64-bit, prefetchable) [size=16G]
Memory at 3c3fe000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: nvidia
Kernel modules: nvidia, nouveau, nvidiafb

July 28, 2013 by kittycool only

Using nvidia-smi to get information on GPU Cards

NVIDIA’s System Management Interface (nvidia-smi) is a useful tool to manipulate and control the GPU Cards. There are a few use case listed here

1. Listing of NVIDIA GPU Cards

# nvidia-smi -L

GPU 0: Tesla M2070 (S/N: 03212xxxxxxxx)
GPU 1: Tesla M2070 (S/N: 03212yyyyyyyy)

2. Display GPU information

# nvidia-smi -i 0 -q

==============NVSMI LOG==============

Timestamp : Sun Jul 28 23:49:20 2013

Driver Version : 295.41

Attached GPUs : 2

GPU 0000:19:00.0
Product Name : Tesla M2070
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 03212xxxxxxxx
GPU UUID : GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
VBIOS Version : 70.00.3E.00.03
Inforom Version
OEM Object : 1.0
ECC Object : 1.0
Power Management Object : 1.0
PCI
Bus : 0x19
Device : 0x00
Domain : 0x0000
Device Id : 0xxxxxxxxx
Bus Id : 0000:19:00.0
Sub System Id : 0x083010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Memory Usage
Total : 6143 MB
Used : 10 MB
Free : 6132 MB
Compute Mode : Exclusive_Thread
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Max Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Compute Processes : None

3. Display selected GPU Information (MEMORY, UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK, COMPUTE, PIDS, PERFORMANCE)

# nvidia-smi -i 0 -q -d MEMORY,ECC

==============NVSMI LOG==============

Timestamp                       : Mon Jul 29 00:04:36 2013

Driver Version                  : 295.41

Attached GPUs                   : 2

GPU 0000:19:00.0
Memory Usage
Total                   : 6143 MB
Used                    : 10 MB
Free                    : 6132 MB
Ecc Mode
Current                 : Disabled
Pending                 : Disabled
ECC Errors
Volatile
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Aggregate
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A

July 24, 2013 by kittycool only

Turning off and on ECC RAM for NVIDIA GP-GPU Cards

From NVIDIA Developer site.

Turn off ECC (C2050 and later). ECC can cost you up to 10% in performance and hurts parallel scaling. You should verify that your GPUs are working correctly, and not giving ECC errors for example before attempting this. You can turn this off on Fermi based cards and later by running the following command for each GPU ID as root, followed by a reboot:

Extensive testing of AMBER on a wide range of hardware has established that ECC has little to no benefit on the reliability of AMBER simulations. This is part of the reason it is acceptable (see recommended hardware) to use the GeForce gaming cards for AMBER simulations.

To Turn off the ECC RAM, just do a

# nvidia-smi -g 0 --ecc-config=0
(repeat with -g x for each GPU ID)

To Turn back on ECC RAM, just do

# nvidia-smi -g 0 --ecc-config=1
(repeat with -g x for each GPU ID)

July 15, 2013 by kittycool only

Compiling OpenMPI 1.7.2 with CUDA and Intel Compilers 13

If you are intending to compile OpenMPI with CUDA Support, do note that you have to download the feature version of OpenMPI. The version I used for compiling OpenMPI with CUDA is version 1.7.2. The current stable version of OpenMPI 1.6.5 does not have CUDA-Support

1. Download and unpack OpenMPI 1.7.2 (features)

# wget http://www.open-mpi.org/software/ompi/v1.7/downloads/openmpi-1.7.2.tar.gz
# tar -zxvf openmpi-1.7.2.tar.gz
# cd openmpi-1.7.2

2. Configure the OpenMPI with CUDA Support

# ./configure --prefix=/usr/local/openmpi-1.7.2-intel-cuda CC=icc CXX=icpc F77=ifort FC=ifort --with-cuda=/opt/cuda --with-cuda-libdir=/usr/lib64
# make -j 8
# make install

References:

34. How do I build Open MPI with support for sending CUDA device memory?

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Nvidia