Compiling OpenMPI-1.8.8 with Intel Compiler and CUDA

Configuration parameters for Compiling OpenMPI-1.8.8 with Intel Compiler and CUDA can be found here.

# ./configure --prefix=/usr/local/openmpi-1.8.8-gpu_intel-15.0.7 CC=icc CXX=icpc F77=ifort FC=ifort --with-devel-headers --enable-binaries --with-cuda=/usr/local/cuda/
# make -j 16
# make all

References:

  1. Compiling OpenMPI 1.6.5 with Intel 12.1.5 on CentOS 6
  2. Building OpenMPI Libraries for 64-bit integers

Install Nvidia CUDA-7.5 environment in CentOS 6

This note is derived from How to install Nvidia CUDA environment in RHEL 6? It works for me.

Step 1: Install kernel

# yum install kernel-devel kernel-headers -y

Step 2: Download and Install Cuda Toolkit

Cuda Downloads Site

Step 3: Configure CUDA Toolkit

# echo -e "/usr/local/cuda-7.5/lib64\n/usr/local/cuda-7.5/lib" > /etc/ld.so.conf./cuda.conf
# echo "export PATH=/usr/local/cuda-7.5/bin:$PATH" > /etc/profile.d/cuda.sh
# /sbin/ldconfig

Step 4: Disable Noveau Driver

# echo -e "\nblacklist nouveau" >> /etc/modprobe.d/blacklist.conf
# dracut -f /boot/initramfs-`rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`.img `rpm -qa kernel --queryformat "%{PROVIDEVERSION}.%{ARCH}\n" | tail -1`

Step 5: Reboot the Server

Step 6: Check Supported Nvidia Card

[root@comp1 ~]# lspci -d "10de:*" -v
84:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40m] (rev a1)
Subsystem: NVIDIA Corporation Device 097e
Flags: bus master, fast devsel, latency 0, IRQ 64
Memory at c9000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3c400000000 (64-bit, prefetchable) [size=16G]
Memory at 3c3fe000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: nvidia
Kernel modules: nvidia, nouveau, nvidiafb

Using nvidia-smi to get information on GPU Cards

NVIDIA’s System Management Interface (nvidia-smi) is a useful tool to manipulate and control the GPU Cards. There are a few use case listed here

1. Listing of NVIDIA GPU Cards

# nvidia-smi -L

GPU 0: Tesla M2070 (S/N: 03212xxxxxxxx)
GPU 1: Tesla M2070 (S/N: 03212yyyyyyyy)

2. Display GPU information

# nvidia-smi -i 0 -q

==============NVSMI LOG==============

Timestamp : Sun Jul 28 23:49:20 2013

Driver Version : 295.41

Attached GPUs : 2

GPU 0000:19:00.0
Product Name : Tesla M2070
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 03212xxxxxxxx
GPU UUID : GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx
VBIOS Version : 70.00.3E.00.03
Inforom Version
OEM Object : 1.0
ECC Object : 1.0
Power Management Object : 1.0
PCI
Bus : 0x19
Device : 0x00
Domain : 0x0000
Device Id : 0xxxxxxxxx
Bus Id : 0000:19:00.0
Sub System Id : 0x083010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Memory Usage
Total : 6143 MB
Used : 10 MB
Free : 6132 MB
Compute Mode : Exclusive_Thread
Utilization
Gpu : 0 %
Memory : 0 %
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Max Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Compute Processes : None

3. Display selected GPU Information (MEMORY, UTILIZATION, ECC, TEMPERATURE, POWER, CLOCK, COMPUTE, PIDS, PERFORMANCE)

# nvidia-smi -i 0 -q -d MEMORY,ECC

==============NVSMI LOG==============

Timestamp                       : Mon Jul 29 00:04:36 2013

Driver Version                  : 295.41

Attached GPUs                   : 2

GPU 0000:19:00.0
Memory Usage
Total                   : 6143 MB
Used                    : 10 MB
Free                    : 6132 MB
Ecc Mode
Current                 : Disabled
Pending                 : Disabled
ECC Errors
Volatile
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Aggregate
Single Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A
Double Bit
Device Memory   : N/A
Register File   : N/A
L1 Cache        : N/A
L2 Cache        : N/A
Total           : N/A

Turning off and on ECC RAM for NVIDIA GP-GPU Cards

From NVIDIA Developer site.

Turn off ECC (C2050 and later). ECC can cost you up to 10% in performance and hurts parallel scaling. You should verify that your GPUs are working correctly, and not giving ECC errors for example before attempting this. You can turn this off on Fermi based cards and later by running the following command for each GPU ID as root, followed by a reboot:

Extensive testing of AMBER on a wide range of hardware has established that ECC has little to no benefit on the reliability of AMBER simulations. This is part of the reason it is acceptable (see recommended hardware) to use the GeForce gaming cards for AMBER simulations.

To Turn off the ECC RAM, just do a

# nvidia-smi -g 0 --ecc-config=0
(repeat with -g x for each GPU ID)

To Turn back on ECC RAM, just do

# nvidia-smi -g 0 --ecc-config=1
(repeat with -g x for each GPU ID)

Compiling OpenMPI 1.7.2 with CUDA and Intel Compilers 13

If you are intending to compile OpenMPI with CUDA Support, do note that you have to download the feature version of OpenMPI. The version I used for compiling OpenMPI with CUDA is version 1.7.2. The current stable version of OpenMPI 1.6.5 does not have CUDA-Support

1. Download and unpack OpenMPI 1.7.2 (features)

# wget http://www.open-mpi.org/software/ompi/v1.7/downloads/openmpi-1.7.2.tar.gz
# tar -zxvf openmpi-1.7.2.tar.gz
# cd openmpi-1.7.2

2. Configure the OpenMPI with CUDA Support

# ./configure --prefix=/usr/local/openmpi-1.7.2-intel-cuda CC=icc CXX=icpc F77=ifort FC=ifort --with-cuda=/opt/cuda --with-cuda-libdir=/usr/lib64
# make -j 8
# make install

References:

  1. 34. How do I build Open MPI with support for sending CUDA device memory?