There are a few diagnostic tools to diagnose Infiniband Fabric Information. Use man for the parameters for the
- ibnodes – (Show Infiniband nodes in topology)
- ibhosts – (Show InfiniBand host nodes in topology)
- ibswitches- (Show InfiniBand switch nodes in topology)
- ibnetdiscover – (Discover InfiniBand topology)
- ibchecknet – (Validate IB subnet and report errors)
- ibdiag (Scans the fabric using directed route packets and extracts all the available information regarding its connectivity and devices)
- perfquery (find errors on a particular or number of HCA’s and switch ports)
ibnodes (Show Infiniband nodes in topology)
ibnodes is a script which either walks the IB subnet topology or uses an already saved topology file and extracts the IB nodes (CAs and switches)
# ibnodes
..... Ca : 0x0000000000009b02 ports 2 "c00 HCA-1" Ca : 0x0000000000005af0 ports 1 "h00 HCA-1" Switch : 0x00000000000000fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0 .....
ibhosts (Show InfiniBand host nodes in topology)
ibhosts is a script which either walks the IB subnet topology or uses an already saved topology file and extracts the CA nodes.
# ibhosts
Ca : 0x0000000000009b02 ports 2 "c00 HCA-1" Ca : 0x0000000000005af0 ports 1 "h00 HCA-1"
ibswitches (Show InfiniBand switch nodes in topology)
ibswitches is a script which either walks the IB subnet topology or uses an already saved topology file and extracts the switch nodes.
# ibswitches
Switch : 0x00000000000003fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0 Switch : 0x00000000000003cc ports 36 "IBM HSSM" enhanced port 0 lid 16 lmc 0
ibnetdiscover (Discover InfiniBand topology)
ibnetdiscover performs IB subnet discovery and outputs a human readable topology file. GUIDs, node types, and port numbers are displayed as well as port LIDs and NodeDescriptions. All nodes (and links) are displayed (full topology). Optionally, this utility can be used to list the current connected nodes by nodetype. The output is printed to standard output unless a topology file is specified.
# ibnetdiscover
# # Topology file: generated on Mon Jan 28 14:19:57 2013 # # Initiated from node 0000000000000080 port 0000090300451281 vendid=0x2c9 devid=0xc738 sysimgguid=0x2c90000000000 switchguid=0x2c90000000080(0000000000080) Switch 36 "S-0002c9030071ba80" # "MF0;switch-6260a0:SX90Y3245/U1" enhanced port 0 lid 2 lmc 0 [2] "H-00000000000011e0"[1](00000000000e1) # "node-c01 HCA-1" lid 3 4xQDR [3] "H-00000000000012d0"[1](00000000000d1) # "node-c02 HCA-1" lid 4 4xQDR .... ....
ibchecknet (Validate IB subnet and report errors)
# ibchecknet
...... ...... ## Summary: 31 nodes checked, 0 bad nodes found ## 88 ports checked, 59 bad ports found ## 12 ports have errors beyond threshold
perfquery command
The perfquery command is useful for find errors on a particular or number of HCA’s and switch ports. You can also use perfquery to reset HCA and switch port counters.
# Port counters: Lid 1 port 1 PortSelect:......................1 CounterSelect:...................0x1400 SymbolErrorCounter:..............0 LinkErrorRecoveryCounter:........0 LinkDownedCounter:...............0 PortRcvErrors:...................13 PortRcvRemotePhysicalErrors:.....0 PortRcvSwitchRelayErrors:........0 PortXmitDiscards:................0 PortXmitConstraintErrors:........0 PortRcvConstraintErrors:.........0 CounterSelect2:..................0x00 LocalLinkIntegrityErrors:........0 ExcessiveBufferOverrunErrors:....0 VL15Dropped:.....................0 PortXmitData:....................199578830 PortRcvData:.....................504398997 PortXmitPkts:....................15649860 PortRcvPkts:.....................15645526 PortXmitWait:....................0
References: