July 28, 2013 by kittycool only

Using nvidia-smi to get information on GPU Cards

NVIDIA’s System Management Interface (nvidia-smi) is a useful tool to manipulate and control the GPU Cards. There are a few use case listed here

1. Listing of NVIDIA GPU Cards

# nvidia-smi -L

GPU 0: Tesla M2070 (S/N: 03212xxxxxxxx)
GPU 1: Tesla M2070 (S/N: 03212yyyyyyyy)

2. Display GPU information

# nvidia-smi -i 0 -q

==============NVSMI LOG==============

Timestamp : Sun Jul 28 23:49:20 2013

Driver Version : 295.41

Attached GPUs : 2

GPU 0000:19:00.0
Product Name : Tesla M2070
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 03212xxxxxxxx
GPU UUID : GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
VBIOS Version : 70.00.3E.00.03
Inforom Version
OEM Object : 1.0
ECC Object : 1.0
Power Management Object : 1.0
PCI
Bus : 0x19
Device : 0x00
Domain : 0x0000
Device Id : 0xxxxxxxxx
Bus Id : 0000:19:00.0
Sub System Id : 0x083010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Memory Usage
Total : 6143 MB
Used : 10 MB
Free : 6132 MB
Compute Mode : Exclusive_Thread
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Max Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Compute Processes : None

3. Display selected GPU Information (MEMORY, UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK, COMPUTE, PIDS, PERFORMANCE)

# nvidia-smi -i 0 -q -d MEMORY,ECC

==============NVSMI LOG==============

Timestamp                       : Mon Jul 29 00:04:36 2013

Driver Version                  : 295.41

Attached GPUs                   : 2

GPU 0000:19:00.0
Memory Usage
Total                   : 6143 MB
Used                    : 10 MB
Free                    : 6132 MB
Ecc Mode
Current                 : Disabled
Pending                 : Disabled
ECC Errors
Volatile
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Aggregate
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A

July 24, 2013 by kittycool only

Turning off and on ECC RAM for NVIDIA GP-GPU Cards

From NVIDIA Developer site.

Turn off ECC (C2050 and later). ECC can cost you up to 10% in performance and hurts parallel scaling. You should verify that your GPUs are working correctly, and not giving ECC errors for example before attempting this. You can turn this off on Fermi based cards and later by running the following command for each GPU ID as root, followed by a reboot:

Extensive testing of AMBER on a wide range of hardware has established that ECC has little to no benefit on the reliability of AMBER simulations. This is part of the reason it is acceptable (see recommended hardware) to use the GeForce gaming cards for AMBER simulations.

To Turn off the ECC RAM, just do a

# nvidia-smi -g 0 --ecc-config=0
(repeat with -g x for each GPU ID)

To Turn back on ECC RAM, just do

# nvidia-smi -g 0 --ecc-config=1
(repeat with -g x for each GPU ID)

July 15, 2013 by kittycool only

Compiling OpenMPI 1.7.2 with CUDA and Intel Compilers 13

If you are intending to compile OpenMPI with CUDA Support, do note that you have to download the feature version of OpenMPI. The version I used for compiling OpenMPI with CUDA is version 1.7.2. The current stable version of OpenMPI 1.6.5 does not have CUDA-Support

1. Download and unpack OpenMPI 1.7.2 (features)

# wget http://www.open-mpi.org/software/ompi/v1.7/downloads/openmpi-1.7.2.tar.gz
# tar -zxvf openmpi-1.7.2.tar.gz
# cd openmpi-1.7.2

2. Configure the OpenMPI with CUDA Support

# ./configure --prefix=/usr/local/openmpi-1.7.2-intel-cuda CC=icc CXX=icpc F77=ifort FC=ifort --with-cuda=/opt/cuda --with-cuda-libdir=/usr/lib64
# make -j 8
# make install

References:

34. How do I build Open MPI with support for sending CUDA device memory?

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

CUDA

Using nvidia-smi to get information on GPU Cards

Turning off and on ECC RAM for NVIDIA GP-GPU Cards

Compiling OpenMPI 1.7.2 with CUDA and Intel Compilers 13