GPFS NSD Nodes stuck in Arbitrating Mode

One of our GPFS NSD Nodes are forever stuck in arbitrating nodes. One of the symptoms that was noticeable was that the users was able to log-in but unable to do a “ls” of their own directories. You can get a quick deduction by looking at one of the NSD Nodes. For this kind of issues, do a mmdiag –waiters first. There are limited articles on this

# mmdiag --waiters 

.....
.....
0x7FB0C0013D10 waiting 27176.264845756 seconds, SharedHashTabFetchHandlerThread: 
on ThCond 0x1C0000F9B78 (0x1C0000F9B78) (TokenCondvar), reason 'wait for SubToken to become stable'

References:

Here is how PMR Solution to collect information to help resolve the issue.

The steps below will gather all the docs you could provide in terms of first time data capture given an unknown problem. Do these steps for all your performance/hang/unknown GPFS issues WHEN the problem is occurring. Commands are executed from one node. Collection of the docs will vary based on the working collective created below.
.
1) Gather waiters and create working collective. It can be good to get multiple looks at what the waiters are and how they have changed, so doing the first mmlsnode command (with the -L) numerous times as you proceed through the steps below might be helpful (specially
if issue is pure performance, no hangs).
.

# mmlsnode -N waiters -L  > /tmp/allwaiters.$(date +"%m%d%H%M%S")
# mmlsnode -N waiters > /tmp/waiters.wcoll

.
View allwaiters and waiters.wcoll files to verify that these files are not empty.
.
If either (or both) file(s) are empty, this indicates that the issues seen are not GPFS waiting on any of it’s threads. Docs to be gathered in this case will vary. Do not continue with steps. Tell Service person and they will determine the best course of action and what docs will be needed.
.
2) Gather internaldump from all nodes in the working collective
.

# mmdsh -N /tmp/waiters.wcoll "/usr/lpp/mmfs/bin/mmfsadm dump all > /tmp/\$(hostname -s).dumpall.\$(date +"%m%d%H%M%S")"

.
3) Gather kthreads from all nodes in the working collective
.
Depending on various factors, this command can take a long time
to complete. If not specifically looking for kernel threads, this
step can be skipped. If command is running it can stopped by
ctrl-C.
.

# mmdsh -N /tmp/waiters.wcoll "/usr/lpp/mmfs/bin/mmfsadm dump kthreads > /tmp/\$(hostname -s).kthreads.\$(date +"%m%d%H%M%S")"

.
4) If this is a performance problem, get 60 seconds mmfs trace from the
nodes in the working collective.
.
If AIX …
.

# mmtracectl --start --aix-trace-buffer-size=64M --trace-file-size=128M -N /tmp/waiters.wcoll ; sleep 60; mmtracectl --stop -N /tmp/waiters.wcoll

.
If Linux ..
.

# mmtracectl --start i--trace-file-size=128M -N /tmp/waiters.wcoll ; sleep 60; mmtracectl --stop -N /tmp/waiters.wcoll

.
5) Gather gpfs.snap from same nodes.
.

# gpfs.snap -N /tmp/waiters.wcoll

.
Gather the docs taken. Steps 1) and 5) will be on the local node, in /tmp and /tmp/gpfs.snapOut respectively and steps 2) and 3) will be in /tmp on the nodes represented in the waiters.wcoll file. The gpfs.snap will pick up the trcrpt in /tmp/mmfs

Many times steps 3) and 4) are not needed unless asked for. If supplied they may or may not be used. If there are any issues collecting doc, Steps 1), 2) and 5) are the most critical.

Solution:

1) The all waiters show:

nsd1:  0x2AAAACC659F0 waiting 31358.847013000 seconds, GroupProtocolDriverThread: 
on ThCond 0x5572138 (0x5572138) (MsgRecordCondvar), reason 'RPC wait' for ccMsgGroupLeave
nsd1:  0x2AAAACC659F0 waiting 31358.847013000 seconds, GroupProtocolDriverThread: 
on ThCond 0x5572138 (0x5572138) (MsgRecordCondvar), reason 'RPC wait' for ccMsgGroupLeave

2) Looking at the tscomm section to see which node is “pending”:

Output for mmfsadm dump tscomm on nsd1
######################################################################

Pending messages:
msg_id 345326326, service 1.1, msg_type 26 'ccMsgGroupLeave', n_dest 470, n_pending 1
this 0x5571F90, n_xhold 1, cl 0, cbFn 0x0, age 33501 sec
sent by 'GroupProtocolDriverThread' (0x2AAAACC659F0)

.
.
.
dest <c0n3>          status pending   , err 0, reply len 0

c0n3> 10.x.x.x/0, x.y.y.u (nsd2)

3) Waiters for nsd2 show the following:

nsd2:  0x2AAAAC9F5A50 waiting 193857.401337000 seconds, NSDThread: 
on ThCond 0x2AAAC01CA600 (0x2AAAC01CA600) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9F33D0 waiting 193856.387375000 seconds, NSDThread: 
on ThCond 0x2AAAD806B190 (0x2AAAD806B190) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9F2090 waiting 193857.691998000 seconds, NSDThread: 
on ThCond 0x2AAAD40A0F90 (0x2AAAD40A0F90) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9DC610 waiting 193857.589074000 seconds, NSDThread: 
on ThCond 0x2AAAC81B2DE0 (0x2AAAC81B2DE0) (VERBSEventWaitCondvar), reason 'waiting for RDMA read DTO completion'
nsd2:  0x2AAAAC9D8C50 waiting 193857.406763000 seconds, NSDThread: 
on ThCond 0x2AAAC01FE5E0 (0x2AAAC01FE5E0) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9CDF10 waiting 193857.692074000 seconds, NSDThread: 
on ThCond 0x2AAAD806F120 (0x2AAAD806F120) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9CB890 waiting 193857.686966000 seconds, NSDThread: 
on ThCond 0x2AAABC140880 (0x2AAABC140880) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'
nsd2:  0x2AAAAC9C31D0 waiting 193857.412257000 seconds, NSDThread: 
on ThCond 0x2AAAACD83400 (0x2AAAACD83400) (VERBSEventWaitCondvar), reason 'waiting for RDMA write DTO completion'

Do a “mmfsadm dump verbs” from all of the NSD nodes.

# mmfsadmn dump verbs

To fix this issue, stop and restart the GPFS daemon on nsd2.

# mmshutdown -N nsd2
# mmstartup -N nsd2

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

GPFS NSD Nodes stuck in Arbitrating Mode

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply