Researchers Train Fluid Dynamics Neural Networks on Supercomputers

Fluid dynamics simulations are critical for applications ranging from wind turbine design to aircraft optimization. Running these simulations through direct numerical simulations, however, is computationally costly. Many researchers instead turn to large-eddy simulations (LES), which generalize the motions of a given fluid in order to reduce the computational costs – but these generalizations lead to tradeoffs in accuracy. Now, researchers are using supercomputers at the High-Performance Computing Center Stuttgart (HLRS) to help make those more accurate simulations accessible to more researchers.

 

For more information, do take a look at Researchers Train Fluid Dynamics Neural Networks on Supercomputers

Intel turns to TSMC: another step towards fabless?

The recent news that Intel will turn to TSMC to mass produce CPU products signals a new era in the processor IDM/foundry arena. The production is slated to start in the second half of 2021 and will cover some of Intel’s low- and mid- tier CPU products. Yole Développement’s report “Computing for Datacenter Servers 2021” and “Processor Quarterly Market Monitor” cover the market space where these events are occurring. Meanwhile, speculation over Intel’s motivation is rampant, as are theories of what this means for the firm’s long-term strategy.

 

For more information, do take a look at Intel turns to TSMC: another step towards fabless?

CUDA driver version is insufficient for CUDA runtime version

When you do a “/usr/local/cuda-10.1/extras/demo_suite/deviceQuery”. You might get the errors seemed above

[root@node1 ~]# /usr/local/cuda-10.1/extras/demo_suite/deviceQuery
/usr/local/cuda-10.1/extras/demo_suite/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

The Issue may cause some confusion. It is not your libraries. But the it is the Power Setting at the BIOS. Most Servers are configured to be balanced. But for GPGPU, you need to put Power to “Maximum Performance”. For example, for HPE Server, you should put “Static High Performance Mode”

How to unmount NFS mount that fails to unmount with ‘device is busy’

If you are attempting to unmount a NFS command like

# mount -t nfs -o remount /mnt/nfs 
# umount /mnt/nfs 
# umount -f /mnt/nfs 
# umount -l /mnt/nfs 
# umount -lf /mnt/nfs

Identify which processes tied to the mount need to be killed by using lsof and fuser:

# lsof | grep /mnt/nfs

lsof command above identifies the PID of the processes associated with the /mnt/nfs share. Kill any processes locking the stale mount.

Try to force umount again after the processes as been killed

# umount -lf

References:

  1. How to unmount a stale NFS mount that fails to unmount with ‘device is busy’ after network disconnectivity?

How AI Is Reshaping HPC And What This Means For Data Center Architects

In quarterly earnings reports this year, the CEO and founder of NVIDIA (a Liqid partner) noted that its recent advancements in delivering its new compute platform designed with AI in mind and its acquisition of a leading networking company this year are all designed to achieve the central goal of advancing what is increasingly known as data center-scale computing. For providers of high-performance computing solutions, both those built around NVIDIA’s tech and those that are competing with the GPU goliath, this need for data center-scale computing has been defined by and escalated alongside the data performance requirements of artificial intelligence and machine learning (AI+ML), something I discuss further in a recent article.

https://www.forbes.com/sites/forbestechcouncil/2021/01/19/how-ai-is-reshaping-hpc-and-what-this-means-for-data-center-architects/?sh=3dec4e4d7371

How to train a robot (using AI and supercomputers)

From Science Daily

Computer scientists developed a deep learning method to create realistic objects for virtual environments that can be used to train robots. The researchers used TACC’s Maverick2 supercomputer to train the generative adversarial network. The network is the first that can produce colored point clouds with fine details at multiple resolutions.

https://www.sciencedaily.com/releases/2021/01/210119194329.htm

Rapidfile Toolskit 1.0

RapidFile Toolkit 1.0 (formerly, PureTools) provides fast client-side alternatives for common Linux commands like ls, du, find, chown, chmod, rm and cp which has been optimized for the high level of concurrency supported by FlashBlade NFS. You will be

For CentOS/RHEL

# sudo rpm -U rapidfile-1.0.0-beta.5/rapidfile-1.0.0-beta.5-Linux.rpm

Examples:

Disk Usages:

% pdu -sh /scratch/user1

Copy Files:

% pcp -r -p -u /scratch/user1/ /backup/user1/

Remove Files:

% prm -rv /scratch/user1/

Change Ownership:

% pchown -Rv user1:usergroup /scratch/user1

Change Permission:

% pchmod -Rv 755 /scratch/user1

References:

  1. RapidFile Toolkit for FlashBlade (PureTools)

Increasing NFS Performance by using nconnect

nconnect is included in linux kernel versions >= 5.3. What is nconnect? nconnect enables multiple TCP connections for a single NFS mount. It is easy to implement

At /etc/fstab

mount -t nfs -o rw,nconnect=16 192.168.1.0:/applications /user/local

For more information, do take a look at Session Trunking for NFS available in RHEL-8

References:

  1. Use nconnect to effortlessly increase NFS performance