Nvidia and IBM did a complex proof-of-concept to demonstrate the scaling of AI workload using Nvidia DGX, Red Hat OpenShift and IBM Spectrum Scale at the example of ResNet-50 and the segmentation of images using the Audi A2D2 dataset. The project team published an IBM Redpaper with all the technical details and will present the key learnings and results.
- How NFS exports became more dynamic with Spectrum Scale 5.0.2
- HPC storage on AWS (IBM Spectrum Scale)
- Upgrade with Excluding the node(s) using Install-toolkit
- Offline upgrade using Install-toolkit
- IBM Spectrum Scale for Linux on IBM Z ? What’s new in IBM Spectrum Scale 5.0.2
- What’s New in IBM Spectrum Scale 5.0.2
- Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails.
- IBM Spectrum Scale installation toolkit enhancements over releases 22.214.171.124
- Announcing HDP 3.0 support with IBM Spectrum Scale
- IBM Spectrum Scale Tuning Overview for Hadoop Workload
- Making the Most of Multicloud Storage
- Disaster Recovery for Transparent Cloud Tiering using SOBAR
- Your Optimal Choice of AI Storage for Today and Tomorrow
- Analyze IBM Spectrum Scale File Access Audit with ELK Stack
- Mellanox SX1710 40G switch MLAG configuration for IBM ESS
- Protocol Problem Determination Guide for IBM Spectrum Scale SMB and NFS Access issues
- Access Control in IBM Spectrum Scale Object
- IBM Spectrum Scale HDFS Transparency Docker support
- Protocol Problem Determination Guide for IBM Spectrum Scale Log Collection
mmlspdisk which Lists information for one or more GPFS Native RAID pdisks. To check faulty disks, do the commands
# mmlspdisk all --not-ok mmlspdisk: [I] No disks were found.
# mmlspdisk all --replace mmlspdisk: [I] No disks were found.
We have encountered a situation where a defunct disk was accepting IO request and did not return any failure in time. As a result, these IO requests hanged there till time out (default 10 seconds). Typically, Spectrum Scale/GPFS will fail to read or write a disk, the failure is written in log and we have to shift IO to other available disks which should be quick.
Normally such operations should return in 20 milliseconds or less. When we have IO timeout, this request has wasted us
10 seconds / 20 milliseconds = 500 times of time. Even if Spectrum Scale/GPFS is able to choose a fast disk in the second attempt, we are much slower than normal.
Due to the utilization of striping technology, a bad/slow disks always affects IO of many files, much more than the situation without striping. IO on the same file involves more than several disks, and the IO has to wait for the slowest request to return. So a bad/slow disk may have considerable influence on Spectrum Scale/GPFS performance.
Before restarting the NSD Nodes or Quorum Manager Nodes or other critical nodes, do check the following first to ensure the file system is in the right order before restarting.
1. Make sure all three quorum nodes are active.
# mmgetstate -N quorumnodes
*If any machine is not active, do *not* proceed
2. Make sure file system is mounted on machines
# mmlsmount gpfs0
If the file system is not mounted somewhere, we should try to resolve it first.
There was a good and varied topics being discussed at the Spectrum Scale
- Opening & welcome – Simon Thompson, Claire O’Toole, Ted Hoover
- Update Scale (video)/ ESS / Support (video) – Mathias Dietz & Chris Maestas
- MultiCloud Transparent Cloud Tiering (video) – Rob Basham
- Shared NVMe for High Performance Spectrum Scale Clusters (video)- Stuart Campbell
- User Talk – EBI MMAP issues (video – both speakers) – Jordi Valls / Sven Oehme
- GxFS Storage Appliance at Karlsruher Institute of Technology (video) – Jan Erik Sundermann
- Tooling Scale – Automation
- R&S VSA (Virtual storage access) Reliable fault tolerant storage in Broadcast (video) – Oliver Gappa
- Novel TCT: A brief demo on using TCT with alternative cloud gateways – Laurence Horrocks-Barlow
- Ten commandments of good I/O – Rosemary Francis
- Scientific Computing & Storage at The Francis Crick Institute – Michael Holliday
- Mixing storage systems in Spectrum Scale – Migrations & pools stories – Luis Bolinches
- File System Audit Logging / Running Spectrum Scale in a Vagrant environment – Chris Maestas
- AFM Deep Dive – Tuning and debugging – Venkateswara Puvvada
- User Talk – QMUL – Peter Childs
- Sponsor Talk – Lenovo – Michael Hennecke
- User Talk – MAX IV – Andreas Mattsson
- Sponsor Talk – DDN – Vic Cornell
- Cognititive, ML, Hortonworks – Yong ZY Zheng
If your cluster has symptoms of overload and GPFS kept reporting “overloaded” in GPFS logs like the ones below, you might get long waiters and sometimes deadlocks.
Wed Apr 11 15:53:44.232 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 15:55:24.488 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 15:57:04.743 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 15:58:44.998 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 16:00:25.253 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 16:28:45.601 2018: [I] Sending 'overloaded' status to the entire cluster Wed Apr 11 16:33:56.817 2018: [N] sdrServ: Received deadlock notification from
Increase scatterBuffersize to a Number that match IB Fabric
One of the first tuning will be to tune the scatterBufferSize. According to the wiki, FDR10 can be tuned to 131072 and FDR14 can be tuned to 262144
The default value of 32768 may perform OK. If the CPU utilization on the NSD IO servers is observed to be high and client IO performance is lower than expected, increasing the value of scatterBufferSize on the clients may improve performance.
# mmchconfig scatterBufferSize=131072
There are other parameters which can be tuned. But the scatterBufferSize worked immediately for me.
# mmchconfig verbsRdmaSend=no -N nsd1,nsd2
Verify settings has taken place
# mmfsadm dump config | grep verbsRdmasPerNode
Increase verbsRdmasPerNode to 514 for NSD Nodes
# mmchonfig verbsRdmasPerNode=514 -N nsd1,nsd2
Here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along:
GDPR Compliance and Unstructured Data Storage
IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights
Management GUI enhancements in IBM Spectrum Scale release 5.0.0
IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
Benefits and implementation of Spectrum Scale sudo wrappers
IBM Spectrum Scale: Big Data and Analytics Solution Brief
Variant Sub-blocks in Spectrum Scale 5.0
Compression support in Spectrum Scale 5.0.0
IBM Spectrum Scale Versus Apache Hadoop HDFS
ESS Fault Tolerance
Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
IBM Spectrum Scale On AWS Cloud: This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale.
If you wish to collect information from selected nodes from nsdnodes, quoumnodes, mangernodes
# /usr/lpp/mmfs/bin/gpfs.snap -N nsdnodes,quorumnodes,managernodes
Removing existing NSD Nodes from the GPFS Cluster is not difficult, but there are several steps to take note of.
Step 1: Check and make sure the NSD Nodes you are removing are not quorum-manager.
Step 2: Check that Quorum-Manager have been removed from the old NSD
# mmlscluster GPFS cluster information ======================== GPFS cluster name: mygpfs.gpfsnsd1 GPFS cluster id: 720691660936079521 GPFS UID domain: mygpfs.gpfsnsd1 Remote shell command: /usr/bin/ssh Remote file copy command: /usr/bin/scp GPFS cluster configuration servers: ----------------------------------- Primary server: newnsd3 Secondary server: newnsd4 ..... ..... ..... ..... 714 oldnsd1 192.168.111.5 oldnsd1 715 oldnsd2 192.168.111.6 oldnsd2 716 newnsd3 192.168.111.7 newnsd3 quorum-manager 717 newnsd4 192.168.111.8 newnsd4 quorum-manager
Step 3a: Umount the GPFS File System
# mmumount all -a
Step 3b: Check all the GPFS nodes have been unmounted
# mmlsmount all File system gpfs0 is mounted on 0 nodes.
Step 4: Displays Network Shared Disk (NSD) information for the GPFS cluster.
# mmlsnsd File system Disk name NSD servers --------------------------------------------------------------------------- gpfs0 dcs3700A_2 File system Disk name NSD servers --------------------------------------------------------------------------- gpfs0 dcs3700A_2 newnsd3,oldnsd1 gpfs0 dcs3700A_3 newnsd3,oldnsd1 gpfs0 dcs3700A_4 newnsd3,oldnsd1 gpfs0 dcs3700A_5 newnsd3,oldnsd1 gpfs0 dcs3700A_6 newnsd3,oldnsd1 gpfs0 dcs3700A_7 newnsd3,oldnsd1 gpfs0 dcs3700B_2 newnsd4,oldnsd2 gpfs0 dcs3700B_3 newnsd4,oldnsd2 gpfs0 dcs3700B_4 newnsd4,oldnsd2 gpfs0 dcs3700B_5 newnsd4,oldnsd2 gpfs0 dcs3700B_6 newnsd4,oldnsd2 gpfs0 dcs3700B_7 newnsd4,oldnsd2
Step 5: Changes Network Shared Disk (NSD) configuration attributes.
# mmchnsd dcs3700A_2:newnsd3
To confirm the changes, issue this command:
# mmlsnsd -d dcs3700A_2 File system Disk name NSD servers --------------------------------------------------------------------------- gpfs0 dcs3700A_2 newnsd3
Do for the rest…..
# mmchnsd dcs3700A_3:newnsd3 # mmlsnsd -d dcs3700A_3 # mmchnsd dcs3700A_4:newnsd3 # mmlsnsd -d dcs3700A_4 # mmchnsd dcs3700A_5:newnsd3 # mmlsnsd -d dcs3700A_5 # mmchnsd dcs3700A_6:newnsd3 # mmlsnsd -d dcs3700A_6 # mmchnsd dcs3700A_7:newnsd3 # mmlsnsd -d dcs3700A_7 # mmchnsd dcs3700B_2:newnsd4 # mmlsnsd -d dcs3700B_2 # mmchnsd dcs3700B_3:newnsd4 # mmlsnsd -d dcs3700B_3 # mmchnsd dcs3700B_4:newnsd4 # mmlsnsd -d dcs3700B_4 # mmchnsd dcs3700B_5:newnsd4 # mmlsnsd -d dcs3700B_5 # mmchnsd dcs3700B_6:newnsd4 # mmlsnsd -d dcs3700B_6 # mmchnsd dcs3700B_7:newnsd4 # mmlsnsd -d dcs3700B_7
Step 6: Remove Old NSD Nodes
Once you have confirmed that the NSD nodes are ok.
# mmdelnode -N oldnsd1,oldnsd2
Verify that the old NSD nodes are no more in the cluster
Step 7: Remount the File System
# mmmount all -a