Diagnostic Tools to diagnose Infiniband Fabric Information

There are a few diagnostic tools to diagnose Infiniband Fabric Information. Use man for the parameters for the

  1. ibnodes – (Show Infiniband nodes in topology)
  2. ibhosts – (Show InfiniBand host nodes in topology)
  3. ibswitches- (Show InfiniBand switch nodes in topology)
  4. ibnetdiscover – (Discover InfiniBand topology)
  5. ibchecknet – (Validate IB subnet and report errors)
  6. ibdiag (Scans the fabric using directed route packets and extracts all the available information regarding its connectivity and devices)
  7. perfquery (find errors on a particular or number of HCA’s and switch ports)

ibnodes (Show Infiniband nodes in topology)

ibnodes is a script which either walks the IB subnet topology  or  uses an  already  saved  topology  file  and  extracts the IB nodes (CAs and switches)

# ibnodes
.....
Ca      : 0x0000000000009b02 ports 2 "c00 HCA-1"
Ca      : 0x0000000000005af0 ports 1 "h00 HCA-1"
Switch  : 0x00000000000000fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0
.....

ibhosts  (Show InfiniBand host nodes in topology)

ibhosts is a script which either walks the IB subnet topology  or  uses an already saved topology file and extracts the CA nodes.

# ibhosts
Ca      : 0x0000000000009b02 ports 2 "c00 HCA-1"
Ca      : 0x0000000000005af0 ports 1 "h00 HCA-1"

ibswitches (Show InfiniBand switch nodes in topology)

ibswitches is a script which either walks the  IB  subnet  topology  or uses an already saved topology file and extracts the switch nodes.

# ibswitches
Switch  : 0x00000000000003fa ports 36 "IBM HSSM" enhanced port 0 lid 19 lmc 0
Switch  : 0x00000000000003cc ports 36 "IBM HSSM" enhanced port 0 lid 16 lmc 0

ibnetdiscover (Discover InfiniBand topology)

ibnetdiscover performs IB subnet discovery and outputs a human readable topology file. GUIDs, node types, and port numbers are displayed  as  well as port LIDs and NodeDescriptions.  All nodes (and links) are displayed (full topology).  Optionally, this utility can be used to list the current connected nodes by nodetype.  The output is printed to standard output unless a topology file is specified.

# ibnetdiscover
#
# Topology file: generated on Mon Jan 28 14:19:57 2013
#
# Initiated from node 0000000000000080 port 0000090300451281

vendid=0x2c9
devid=0xc738
sysimgguid=0x2c90000000000
switchguid=0x2c90000000080(0000000000080)
Switch  36 "S-0002c9030071ba80"         # "MF0;switch-6260a0:SX90Y3245/U1" enhanced port 0 lid 2 lmc 0
[2]     "H-00000000000011e0"[1](00000000000e1)          # "node-c01 HCA-1" lid 3 4xQDR
[3]     "H-00000000000012d0"[1](00000000000d1)          # "node-c02 HCA-1" lid 4 4xQDR
....
....

ibchecknet (Validate IB subnet and report errors)

# ibchecknet
......
......
## Summary: 31 nodes checked, 0 bad nodes found
##          88 ports checked, 59 bad ports found
##          12 ports have errors beyond threshold

perfquery command

The perfquery command is useful for find errors on a particular or number of HCA’s and switch ports. You can also use perfquery to reset HCA and switch port counters.

# Port counters: Lid 1 port 1
PortSelect:......................1
CounterSelect:...................0x1400
SymbolErrorCounter:..............0
LinkErrorRecoveryCounter:........0
LinkDownedCounter:...............0
PortRcvErrors:...................13
PortRcvRemotePhysicalErrors:.....0
PortRcvSwitchRelayErrors:........0
PortXmitDiscards:................0
PortXmitConstraintErrors:........0
PortRcvConstraintErrors:.........0
CounterSelect2:..................0x00
LocalLinkIntegrityErrors:........0
ExcessiveBufferOverrunErrors:....0
VL15Dropped:.....................0
PortXmitData:....................199578830
PortRcvData:.....................504398997
PortXmitPkts:....................15649860
PortRcvPkts:.....................15645526
PortXmitWait:....................0

References:

  1. Appendix B. InfiniBand Fabric Troubleshooting

Diagnostic Tools to diagnose Infiniband Device

There are a few Diagnostic Tools to diagnose Infiniband Devices.

  1. ibv_devinfo (Query RDMA devices)
  2. ibstat (Query basic status of InfiniBand device(s))
  3. ibstatus (Query basic status of InfiniBand device(s))

ibv_devinfo (Query RDMA devices)

Print  information about RDMA devices available for use from userspace.

# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.10.2322
        node_guid:                      0002:c903:0045:1280
        sys_image_guid:                 0002:c903:0045:1283
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       IBM0FD0140019
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             IB

                port:   2
                        state:                  PORT_DOWN (1)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             IB

ibstat (Query basic status of InfiniBand device(s))

ibstat is a binary which displays basic information obtained  from  the local  IB  driver.  Output  includes LID, SMLID, port state, link width active, and port physical state.

It is similar to the ibstatus  utility  but  implemented  as  a  binary rather  than a script. It has options to list CAs and/or ports and displays more information than ibstatus.

# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.10.2322
        Hardware version: 0
        Node GUID: 0x0002c90300451280
        System image GUID: 0x0002c90300451283
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251486a
                Port GUID: 0x0002c90300451281
                Link layer: InfiniBand
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02514868
                Port GUID: 0x0002c90300451282
                Link layer: InfiniBand

ibstatus – (Query basic status of InfiniBand device(s))

ibstatus is a script which displays basic information obtained from the local IB driver. Output includes LID, SMLID,  port  state,  link  width active, and port physical state.

# ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0045:1281
        base lid:        0x1
        sm lid:          0x1
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0045:1282
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            40 Gb/sec (4X QDR)
        link_layer:      InfiniBand