Selection of best available communication fabrics
Suggestion 1:
I_MPI_DEVICE | I_MPI_FABRICS | Description |
---|---|---|
sock | tcp | TCP/IP-enable network fabrics, such as Ethernet and Infiniband* (through IPoIB*) |
shm | shm | Shared-memory only |
ssm | shm:tcp | Shared-memory + TCP/IP |
rdma | dapl | DAPL-capable network fabrics, such as Infiniband*, iWarp*, Dolphon*, and XPMEM* (through DAPK*) |
rdssm | shm:dapl | Shared-Memory + DAPL + sockers |
ofa | OFA-capable network fabrics including Infiniband* (through OFED* verbs) | |
tmi | TMI-capable network fabrics including Qlogic*, Myrinet* (through Tag Matching Interface) |
Suggestion 2:
I_MPI_DAPL_UD | Values | Description |
---|---|---|
enable |
|
Suggestion 3:
I_MPI_PERHOST | Values | Remarks |
---|---|---|
1 | Make round-robin distirbution (Default value) | |
all | Maps processes to all logical CPUs on a node | |
allcores | Maps processes to all physical CPUs on a node |
Suggestion 4:
I_MPI_SHM_BYPASS | Values | Remarks |
---|---|---|
disable | Set I_MPI_SHM_BYPASS* to ‘enable’ to turn on RDMA data exchange within single node that may outperform regular shared memory exchange. This is normally happens for large (350kb+) messages. |
Suggestion 5:
I_MPI_ADJUST_ALLREDUCE | Values | Remarks |
---|---|---|
recursive doubling algorithm | 1 | |
Rabenseifner’s algorithm | 2 | |
Reduce + Bcast | 3 | |
Topology aware Reduce + Bcast algorithm | 4 | |
Binomial gather + scatter algorithm | 5 | |
Topology Aware Binomial Gather + scatter algorithm | 6 | |
Ring Algorithm | 7 |
Suggesion 6:
I_MPI_WAIT_MODE | Values | Remarks |
---|---|---|
1 | Set I_MPI_WAIT_MODE ‘to enable’ to try wait mode of the progress engine. The processes that waits for receiving that waits for receiving messages without polling of the fabrics(d) can save CPU time.
Apply wait mode to oversubscribe jobs |
References: