Ganglia and Gmond Python module for GPUs

If you are running a cluster with NVIDIA GPUs, there now exists a python module for monitoring NVIDIA GPUs using the newly released Python bindings for NVML (NVIDIA Management Library). These bindings are under BSD license and allow simplified access to GPU metrics like temperature, memory usage, and utilization.

Nvidia Developer – Ganglia Monitoring System

To install the Ganglia plug-in on your Ganglia installation, see these download links:

For more information see:

Acknowledgements:

Graphite – highly scalable real-time graphing system

Graphite is an interesting project. If you wish to take a look at the project a bit deeper. The official Graphite Documentation is very comprehensive.

But some pointers could be useful.

Point 1: What is Graphite?

Graphite is a highly scalable real-time graphing system. As a user, you write an application that collects numeric time-series data that you are interested in graphing, and send it to Graphite’s processing backend, carbon, which stores the data in Graphite’s specialized database. The data can then be visualized through graphite’s web interfaces.

Graphite 1.2.0 Documentation

Point 2: Architecture

Graphite consists of 3 software components:

  1. carbon – a Twisted daemon that listens for time-series data
  2. whisper – a simple database library for storing time-series data (similar in design to RRD)
  3. graphite webapp – A Django webapp that renders graphs on-demand using Cairo

Point 3: Who should be using Graphite?

Anybody who would want to track values of anything over time. If you have a number that could potentially change over time, and you might want to represent the value over time on a graph, then Graphite can probably meet your needs.

Specifically, Graphite is designed to handle numeric time-series data. For example, Graphite would be good at graphing stock prices because they are numbers that change over time. Whether it’s a few data points, or dozens of performance metrics from thousands of servers, then Graphite is for you. As a bonus, you don’t necessarily know the names of those things in advance (who wants to maintain such huge configuration?); you simply send a metric name, a timestamp, and a value, and Graphite takes care of the rest!

Graphite 1.2.0 Documentation

Point 4: Tools

Ganglia, a tool used by many High Performing Cluster (HPC) worldwide can be integrated with Graphite. Other tools that work with Graphite can be found here

Point 5: Get the book…..

3 Tenets of Monitoring and Approach to IT Monitoring

I read the book Monitoring with Graphite by Oreilly. Please read the book further. It is a good read. I’m just pending my own thoughts.

He mentioned something that is quite interesting that I have not really thought of. This can be divided into 3 main categories:

  1. Fault Detection
  2. Alerting
  3. Capacity Planning

Fault Detection

Fault Detection is to identify when a resource becomes unavailable or starts to perform poorly. Traditionally, system administrators employ thresholds to recognise the delta in a system’s behaviour

Alerting

Alerting constitutes the moment the monitoring system identifies a fault, the recipient(s) is alerted through som means perhaps like email, SMS so that further actions can be taken by the recipient(s)

Capacity Planning

The act of capacity planning is the ability to study trends in the data and use that knowledge make informed decisions about adding capacity now or in the near future. You can use Graphite to work on the time-series data

Pull and Push Model

Pull Model – The Traditional Approach to IT Monitoring centers around a polling agent spending resources to connect to remote users or appliances to determine their current status. However, traditional method of pull method have limitation in integrating trending and monitoring and often different software stacks is required.

Push Model – Metrics are pushed from the sources to a unified storage repository, and providing with a consolidated set of data to drive both IT responses and business decisions. The advantage is that collection tasks are decentralised and we no longer require to scale our collection system horizontally as the architecture scale vertically. One of the interesting aspects of the push model is that we can isolate the functional responsibilities of the monitoring system.

chsh -s /bin/tcsh and you (user) don’t exist error

Sometimes, you are a non-root user and you wish to change shell and you have an error

$ chsh -s /bin/tcsh
chsh you (user xxxxxxxxx) don't exist

This error occurs when the userID and Passowrd is using LDAP or Active Directory so there is no local account in the /etc/passwd where it first looks to. I used Centrify where we can configure the Default Shell Environment on AD. But there is a simple workaround if you do not want to bother your system administrator

First check that you have install tcsh. I have it!

$ chsh -l
/bin/bash
/bin/cdax/bash
/bin/cdax/csh
/bin/cdax/ksh
/bin/cdax/rksh
/bin/cdax/sh
/bin/cdax/tcsh
/bin/csh
/bin/ksh
/bin/rksh
/bin/sh
/bin/tcsh
/sbin/nologin
/usr/bin/bash
/usr/bin/cdax/bash
/usr/bin/cdax/dzsh
/usr/bin/cdax/sh
/usr/bin/dzsh
/usr/bin/sh
/usr/sbin/nologin
/usr/bin/tmux

Next Step: Check your current shell

$ echo "$SHELL"
/bin/bash

Step 3: Write a simple .profile file

$ vim ~/.profile
if [ "$SHELL" != "/bin/tcsh" ]
then
    export SHELL="/bin/tcsh"
    exec /bin/tcsh -l    # -l: login shell again
fi

Step 4: In your .bashrc, just add the “source ~/.profile”

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

source ~/.profile

Source the .bashrc again

$ source ~/.bashrc

Basic Introduction to Git (Part 2)

Initialising a Repository in an Existing Directory

If you wish to have a project directory under version control with GIt, do the following

$ cd /home/user/my_project
$ git init

If you wish to add existing files into the version control

$ git add *.sh
$ git add LICENSE
$ git commit --m "Gekko Menu Help Application"
[master (root-commit) c98ae91] Gekko Menu Help Application
 1 file changed, 73 insertions(+)
 create mode 100755 mymenu.sh

You have an initial commit and tracked files. Hooray.

Checking the status of your Files

[user1@node1 menu]$ git status
# On branch master
nothing to commit, working directory clean

This means you have a clean working directory; in other words, none of your tracked files are modified.

Adding new files to your Git Directory

Let’s say you added a new file called check_license_abaqus.sh into the Project Directory, you will have something like

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       check_license_abaqus.sh
nothing added to commit but untracked files present (use "git add" to track)

To add files

[user1@node1 menu]$ git add check_license_abaqus.sh
[user1@node1 menu]$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   check_license_abaqus.sh
#

To remove file

[user1@node1 menu]$ git rm check_license_abaqus.sh -f
rm 'check_license_abaqus.sh'
[user1@node1 menu]$ git status
# On branch master
nothing to commit, working directory clean

To see log, you want to use the command

[user1@node1 menu]$ git log
commit xxxxxxxxx
Author: user1 <kmyemail_used_in_Github@hotmail.com>
Date:   Sun Sep 25 23:50:33 2022 +0800

    Gekko Menu Help Application

There are many more usage. For more information, do take a look at 2.3 Git Basics – Viewing the Commit History

References:

  1. 2.2 Git Basics – Recording Changes to the Repository
  2. 2.3 Git Basics – Viewing the Commit History
  3. 2.4 Git Basics – Undoing Things

Intel OpenVINO 2022.2 is available

Key Updates includes:

Broader Model & Hardware Support

  • Preview support for upcoming Intel® processors, including the Intel® Data Center GPU Flex Series and Intel® Arc™ GPU
  • Support for 4th Gen Intel® Xeon Scalable processor (code named Sapphire Rapids)
  • Reduced memory consumption when using dynamic shapes on CPU to improve efficiency of NLP applications

Portability and Performance

Introducing new performance hint “Cumulative throughput” in AUTO device plug-in, enabling multiple accelerators (e.g. multiple GPUs) to be used at once maximizing inferencing performance.

To download the latest release, do take a look at Intel® Distribution of OpenVINO™ Toolkit

Basic Introduction to GitHub (Part 1)

GitHub is the largest code-hosting platform in the world. It uses Git as version control and the repository is based on GitHub. Features such as Pull Requests, Project Boards and GitHub are central and found in one place.

Sign up for a Free Account

To start using GitHub, please go to https://github.com/join and follow the instruction

Creating a PAT of SSH Key

A PAT is a string of characters that can be used in place of a password against the GitHub API and on command lone.

You may need to understand the various scopes on GitHub such as repo, admin: repo_hook, users etc. For more information, do take a look at https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps#available-scopes

For starters, you may want to go to https://github.com/settings/tokens and click on Generate new token

On Linux, you can generate your SSH key using the email that you have created in your GitHub User Account

[user1@node1 ~]$ ssh-keygen -t rsa -C "myemail_used_in_Github@hotmail.com"
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user1/.ssh/id_rsa):
/home/user1/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user1/.ssh/id_rsa.
Your public key has been saved in /home/user1/.ssh/id_rsa.pub.
The key fingerprint is:
........
........

Adding the SSH Key to the ssh-agent

Although this is not mandatory, adding the SSH Key to the SSH Agent is a good practice that will keep the SSH Key safe. The SSH-agent is an SSH Key Manager that helps to keep the SSH key safe because it protects your SSH keys from being exported. The SSH Agent also saves you from having to type the passphrase you create. every time your SSH key is used.

Before you check, you want to check your ~/.ssh/config first

$ vim ~/.ssh/config
Host * 
AddKeysToAgent yes

At the Terminal,

$ ssh-add ~/.ssh/id_rsa

Copy your SSH Public Key to the field. In your ~/.ssh/config, it should have a .pub extension like id_rsa.pub

Configuring Git

To intialise the Git. Do the following. You may want to take a look at

[user1@node1 ~]$ git config --global user.name "Melvin Soh"
[user1@node1 ~]$ git config --global user.email "kittycool@hotmail.sg"
[user1@node1 ~]$ git config --global init.defaultBranch main
[user1@node1 ~]$ git config --list
credential.helper=netrc -f ~/.netrc.gpg -v
user.name=user1
user.email=myemail_used_in_Github@hotmail.com
init.defaultbranch=main

To continue, See Basic Introduction to GitHub (Part 2)

References:

  1. 1.5 Getting Started – Installing Git
  2. 1.6 Getting Started – First-Time Git Setup

Basic Commands for Mellanox Network Switches for Break-out-Ports

More information can be found at Command Line Interface (CLI)

Point 1: To configure Break-Out

> enable
# configure terminal
# interface ethernet ?
R2-R8-LEAF01 [standalone: master] (config) # interface ethernet ?
<Device/Port>[-<Device/Port>]
1/1/1
1/1/2
1/1/3
1/1/4
1/3/1
1/3/2
1/3/3
1/3/4
1/5/1
1/5/2
1/5/3
1/5/4
1/7/1
1/7/2
1/7/3
1/7/4
1/9/1
1/9/2
1/9/3
1/9/4
.....
.....
1/25
1/26
1/27
1/28
1/29
1/30
1/31
1/32
# interface ethernet 1/25 shutdown
# interface ethernet 1/26 shutdown
# interface ethernet 1/25
# (config interface ethernet 1/25) # module-type qsfp-split-4 force

The resulting interface will become

Ethernet 1/25/1
Ethernet 1/25/2
Ethernet 1/25/3
Ethernet 1/25/4

Speed configuration can be found at

interface ethernet 1/25/1
# speed 25G

Using Ethtool to query Network and Driver Information

Ethtool is a utility for configuration of Network Interface Cards (NICs). This utility allows querying and changing settings such as speed, port, auto-negotiation, PCI locations and checksum offload on many network devices, especially Ethernet devices.

1. Query the specified network device for associated driver information

# ethtool -i ens3f1np1
driver: mlx5_core
version: 5.7-1.0.2
firmware-version: 16.34.1002 (MT_0000000416)
expansion-rom-version:
bus-info: 0000:0f:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

2. Enable an operator to easily identify the adapter by sight.
This involves blinking one or more LEDs on the specified network port.

# ethtool -p ens3f1np1 5

where integer 5 represents the time in seconds to perform the action,

3. Turn off the AutoNegotiation and fixed it at 25GB

ethtool -s ens3f1np1 --speed 25000 --autoneg off --duplex full

References:

Red Hat Documentation 11.8. Ethtool