Pre-check before restarting the NSD Nodes

Before restarting the NSD Nodes or Quorum Manager Nodes or other critical nodes, do check the following first to ensure the file system is in the right order before restarting.

1. Make sure all three quorum nodes are active.

# mmgetstate -N quorumnodes

*If any machine is not active, do *not* proceed

2. Make sure file system is mounted on machines

# mmlsmount gpfs0

If the file system is not mounted somewhere, we should try to resolve it first.

Spectrum Scale User Group @ London (April)

There was a good and varied topics being discussed at the Spectrum Scale

Basic Tuning of RDMA Parameters for Spectrum Scale

If your cluster has symptoms of overload and GPFS kept reporting “overloaded” in GPFS logs like the ones below, you might get long waiters and sometimes deadlocks.

Wed Apr 11 15:53:44.232 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:55:24.488 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:57:04.743 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 15:58:44.998 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:00:25.253 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:28:45.601 2018: [I] Sending 'overloaded' status to the entire cluster
Wed Apr 11 16:33:56.817 2018: [N] sdrServ: Received deadlock notification from

Increase scatterBuffersize to a Number that match IB Fabric
One of the first tuning will be to tune the scatterBufferSize. According to the wiki, FDR10 can be tuned to 131072 and FDR14 can be tuned to 262144

The default value of 32768 may perform OK. If the CPU utilization on the NSD IO servers is observed to be high and client IO performance is lower than expected, increasing the value of scatterBufferSize on the clients may improve performance.

# mmchconfig scatterBufferSize=131072

There are other parameters which can be tuned. But the scatterBufferSize worked immediately for me.
verbsRdmaSend
verbsRdmasPerConnection
verbsRdmasPerNode

Disable  verbsRdmaSend=no

# mmchconfig verbsRdmaSend=no -N nsd1,nsd2

Verify settings has taken place

# mmfsadm dump config | grep verbsRdmasPerNode

Increase verbsRdmasPerNode to 514 for NSD Nodes

# mmchonfig verbsRdmasPerNode=514 -N nsd1,nsd2

References:

  1. Best Practices RDMA Tuning

IBM Spectrum Scale Development Blogs for (Q1 2018)

Here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale On AWS Cloud: This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale.

Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY

Removing Existing NSD Nodes from GPFS Cluster

Removing existing NSD Nodes from the GPFS Cluster is not difficult, but there are several steps to take note of.

Step 1: Check and make sure the NSD Nodes you are removing are not quorum-manager.

See Removing Quorum Manager from NSD Nodes

Step 2: Check that Quorum-Manager have been removed from the old NSD

# mmlscluster

GPFS cluster information
========================
GPFS cluster name:         mygpfs.gpfsnsd1
GPFS cluster id:           720691660936079521
GPFS UID domain:           mygpfs.gpfsnsd1
Remote shell command:      /usr/bin/ssh
Remote file copy command:  /usr/bin/scp

GPFS cluster configuration servers:
-----------------------------------
Primary server:    newnsd3
Secondary server:  newnsd4
.....
.....
.....
.....
714   oldnsd1          192.168.111.5   oldnsd1
715   oldnsd2          192.168.111.6   oldnsd2
716   newnsd3          192.168.111.7   newnsd3         quorum-manager
717   newnsd4          192.168.111.8   newnsd4         quorum-manager

Step 3a: Umount the GPFS File System

# mmumount all -a

Step 3b: Check all the GPFS nodes have been unmounted

# mmlsmount all

File system gpfs0 is mounted on 0 nodes.

Step 4: Displays Network Shared Disk (NSD) information for the GPFS cluster.

# mmlsnsd

File system   Disk name    NSD servers
---------------------------------------------------------------------------
gpfs0         dcs3700A_2   File system   Disk name    NSD servers
---------------------------------------------------------------------------
gpfs0         dcs3700A_2   newnsd3,oldnsd1
gpfs0         dcs3700A_3   newnsd3,oldnsd1
gpfs0         dcs3700A_4   newnsd3,oldnsd1
gpfs0         dcs3700A_5   newnsd3,oldnsd1
gpfs0         dcs3700A_6   newnsd3,oldnsd1
gpfs0         dcs3700A_7   newnsd3,oldnsd1
gpfs0         dcs3700B_2   newnsd4,oldnsd2
gpfs0         dcs3700B_3   newnsd4,oldnsd2
gpfs0         dcs3700B_4   newnsd4,oldnsd2
gpfs0         dcs3700B_5   newnsd4,oldnsd2
gpfs0         dcs3700B_6   newnsd4,oldnsd2
gpfs0         dcs3700B_7   newnsd4,oldnsd2

Step 5: Changes Network Shared Disk (NSD) configuration attributes.

# mmchnsd dcs3700A_2:newnsd3

To confirm the changes, issue this command:

# mmlsnsd -d dcs3700A_2
File system   Disk name    NSD servers
---------------------------------------------------------------------------
 gpfs0         dcs3700A_2   newnsd3

Do for the rest…..

# mmchnsd dcs3700A_3:newnsd3
# mmlsnsd -d dcs3700A_3
# mmchnsd dcs3700A_4:newnsd3
# mmlsnsd -d dcs3700A_4
# mmchnsd dcs3700A_5:newnsd3
# mmlsnsd -d dcs3700A_5
# mmchnsd dcs3700A_6:newnsd3
# mmlsnsd -d dcs3700A_6
# mmchnsd dcs3700A_7:newnsd3
# mmlsnsd -d dcs3700A_7
# mmchnsd dcs3700B_2:newnsd4
# mmlsnsd -d dcs3700B_2
# mmchnsd dcs3700B_3:newnsd4
# mmlsnsd -d dcs3700B_3
# mmchnsd dcs3700B_4:newnsd4
# mmlsnsd -d dcs3700B_4
# mmchnsd dcs3700B_5:newnsd4
# mmlsnsd -d dcs3700B_5
# mmchnsd dcs3700B_6:newnsd4
# mmlsnsd -d dcs3700B_6
# mmchnsd dcs3700B_7:newnsd4
# mmlsnsd -d dcs3700B_7

Step 6: Remove Old NSD Nodes

Once you have confirmed that the NSD nodes are ok.

# mmdelnode -N oldnsd1,oldnsd2

Verify that the old NSD nodes are no more in the cluster

# mmlscluster

Step 7: Remount the File System

# mmmount all -a

References:

  1. General Parallel File System

GPFS Nodes being Expelled by Failed GPFS Clients

According to IBm Developer Wiki Troubleshooting Debug Expel

 

  • Disk Lease Expiration – GPFS uses a mechanism referred to as a disk lease to prevent file system data corruption by a failing node. A disk lease grants a node the right to submit IO to a file system. File system disk leases are managed by the Cluster Manager of the file system’s home cluster. A node must periodically renew it’s disk lease with the Cluster Manager to maintain it’s right to submit IO to the file system. When a node fails to renew a disk lease with the Cluster Manager, the Cluster Manager marks the node as failed, revokes the node’s right to submit IO to the file system, expels the node from the cluster, and initiates recovery processing for the failed node.
  • Node Expel Request – GPFS uses a mechanism referred to as a node expel request to prevent file system resource deadlocks. Nodes in the cluster require reliable communication amongst themselves to coordinate sharing of file system resources. If a node fails while owning a file system resource, a deadlock may ensue. If a node in the cluster detects that another node owing a shared file system resource may have failed, the node will send a message to the file system Cluster Manger requesting the failed node to be expelled from the cluster to prevent a shared file system resource deadlock. When the Cluster Manager receives a node expel request, it determines which of the two nodes should be expelled from the cluster and takes similar action as described for the Disk Lease expiration.

But in my case, I have an errant failed GPFS Client node in the network which was within the cluster. All the other legitimiate GPFS Client was trying to expel this failed node, but got expel instead. The errant one remain, while the legitimate was expelled. The only solution was to power off this errant and the entire GPFS File System became operational. Here is an except in the Log File of the NSD Nodes.

in fact, a lots of hints are found on the /var/adm/ras/mmfs.log.latest of any NSD Nodes. You should be able to locate it there.

 

Fri May 27 16:34:53.249 2016: Expel 172.16.20.5 (goldsvr1) request from 192.168.104.34 (compute186). Expelling: 192.168.104.34 (compute186)
Fri May 27 16:34:53.259 2016: Recovering nodes: 192.168.104.34
Fri May 27 16:34:53.311 2016: Recovered 1 nodes for file system gpfs3.
Fri May 27 16:34:55.636 2016: Accepted and connected to 10.0.104.34 compute186 <c0n135>
Fri May 27 16:39:13.333 2016: Expel 172.16.20.5 (goldsvr1) request from 192.168.104.45 (compute197). Expelling: 192.168.104.45 (compute197)
Fri May 27 16:39:13.334 2016: VERBS RDMA closed connection to 192.168.104.45 compute197 on mlx4_0 port 1
Fri May 27 16:39:13.344 2016: Recovering nodes: 192.168.104.45
Fri May 27 16:39:13.393 2016: Recovered 1 nodes for file system gpfs3.
Fri May 27 16:39:15.725 2016: Accepted and connected to 10.0.104.45 compute197 <c0n141>
Fri May 27 16:40:18.570 2016: VERBS RDMA accepted and connected to 192.168.104.45 on mlx4_0 port 1

Input/Output Error at General Parallel File System (GPFS)

I was having this error “Input/output error on /usr/local/………….”

I did a mmdiag –waiters on all the NSD Nodes. There are no issues reported.

# mmlsdisk gpfs0
disk         driver   sector failure holds    holds                            storage
name         type       size   group metadata data  status        availability pool
------------ -------- ------ ------- -------- ----- ------------- ------------ ------------
dcs3700A_2   nsd         512    4714 yes      yes   ready         down         system
dcs3700A_3   nsd         512    4715 yes      yes   ready         up           system
dcs3700A_4   nsd         512    4714 yes      yes   ready         down         system
dcs3700A_5   nsd         512    4715 yes      yes   ready         up           system
dcs3700A_6   nsd         512    4714 yes      yes   ready         up           system
dcs3700A_7   nsd         512    4715 yes      yes   ready         up           system
dcs3700B_2   nsd         512    4716 yes      yes   ready         up           system
dcs3700B_3   nsd         512    4717 yes      yes   ready         up           system
dcs3700B_4   nsd         512    4716 yes      yes   ready         up           system
dcs3700B_5   nsd         512    4717 yes      yes   ready         up           system
dcs3700B_6   nsd         512    4716 yes      yes   ready         up           system
dcs3700B_7   nsd         512    4717 yes      yes   ready         up           system
[root@gpfsnsd1]# mmchdisk gpfs0 start -d dcs3700A_2
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
gpfsnsd6:  Rediscovered nsd server access to dcs3700A_2.
gpfsnsd5:  Rediscovered nsd server access to dcs3700A_2.
gpfsnsd10:  Rediscovered nsd server access to dcs3700A_2.
gpfsnsd9:  Rediscovered nsd server access to dcs3700A_2.
Scanning file system metadata, phase 1 ...
6 % complete on Wed Apr 27 13:33:47 2016
25 % complete on Wed Apr 27 13:33:50 2016
41 % complete on Wed Apr 27 13:33:53 2016
58 % complete on Wed Apr 27 13:33:56 2016
76 % complete on Wed Apr 27 13:33:59 2016
90 % complete on Wed Apr 27 13:34:02 2016
100 % complete on Wed Apr 27 13:34:04 2016
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
50.23 % complete on Wed Apr 27 13:34:25 2016  (  26607616 inodes   38050968 MB)
100.00 % complete on Wed Apr 27 13:34:45 2016
Scan completed successfully.

References:

  1. Displaying GPFS™ disk states
  2. Changing GPFS disk states and parameters

GPFS Client Node cannot be added to the GPFS cluster

At the NSD Node, I issue the command

# mmaddnode -N node1
Thu Jul 23 13:40:12 SGT 2015: mmaddnode: Processing node node1
mmaddnode: Node node1 was not added to the cluster.
The node appears to already belong to a GPFS cluster.
mmaddnode: mmaddnode quitting.  None of the specified nodes are valid.
mmaddnode: Command failed.  Examine previous error messages to determine cause.

If we do a mmcluster, the node is not around in the cluster

# mmcluster |grep node1

If the node is not in the cluster, issue this command on the client node that could not be added:

# mmdelnode -f
mmdelnode: All GPFS configuration files on node goldsvr1 have been removed.

Reissue the mmaddnode command. References:

  1. Node cannot be added to the GPFS cluster