During the mpirun, you can put in the parameter of the Open MPI 1.4 and above to improve performance
- –bind-to-none: Do not bind processes (Default)
- –bind-to-core: Bind each MPI process to a core
- –bind-to-socket: Bind each MPI process to a processor socket
- –report bindings: Report how the launches processes are bound by Open MPI
If the hardware has multiple hardware threads like those belonging to Hyperthreading, only the first thread of each core is used with the -bind-to-*. According to the article, it is supposed to be fixed in v1.5
The following options below is to be used with –bind-to-*
- –byslot: Alias for –bycore
- –bycore: When laying out processes, put sequential MPI processes on adjacent processor cores. (Default)
- –bysocket: When laying out processes, put sequential MPI processes on adjacent processor sockets.
- –bynode: When laying out processes, put sequential MPI processes on adjacent nodes.
Finally you can use the –cpus-per-procs which binds ncpus OS processor IDS to each MPI process. If there is a machine with 4 cores and 4 cores, hence 16 cores in total.
$ mpirun -np 8 --cupus-per-proc 2 my_mpi_process
The command will bind each MPI process to ncpus=2 cores. All cores on the machine will be used.