The NVIDIA Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) takes advantage of the in-network computing capabilities in the NVIDIA Mellanox Quantum switch, dramatically improving the performance of distributed machine learning workloads.
Author: kittycool only
Listing processes for a specific user
Using htop to list users. Which is one of my favourite.
% top -U user1

pstree which displays a tree of processes and can include parents and child processes which make it easier to understand.
% pstree -l -a -p -s user1

where
-l : Long format
-a : Show command line args
-p : Display Linux PIDs
-s : See parents of the selected process
pgrep look up or signal processes based on name and other attributes
% pgrep -l -u user1

References:
SC20 TOP500 Birds of a Feather (BoF)
The TOP500 list of supercomputers serves as a “Who’s Who” in the field of high-performance computing (HPC). It started as a list of the most powerful supercomputers in the world and has evolved to a major source of information about trends in HPC. The 56th TOP500 list was published in November 2020 just in time for SC20.
No rule to make target /usr/include/sgidefs.h, needed by `surf.o’
I was compiling an external programs required by VMD surf at $VMDHOME/vmd-1.9.4a48/lib/surf
% make depend make: *** No rule to make target `/usr/include/sgidefs.h', needed by `surf.o'. Stop.
You will require yum install imake
# yum install makedepend
Installing CUDA Python
How to install CUDA Python followed by a tutorial on how to run a Python example on a GPU
Digital Scalable multi-node training for AI jobs on NVIDIA DGX, OpenShift and Spectrum Scale
Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.
Tutorial on In-Network Computing SHARP Technology for MPI Offloads
In this video from the 2017 HPC Advisory Council Stanford Conference, Devendar Bureddy from Mellanox presents a Tutorial on In-Network Computing SHARP Technology for MPI Offloads.
How to prevent SSH from disconnecting
From Sharcnet HPC
Building the Future Today with HPC
At SC20, Intel’s Trish Damkroger, vice president and general manager of HPC at Intel, shows how Intel and its partners are building the future of HPC today through hardware and software technologies that accelerate the broad deployment of advanced HPC systems. (Credit: Intel Corporation)
NVIDIA SC20 Special Address