First thing first
Step 1: Check that the Intel Ethernet Server Adapter. For more information, do take a look at Using SR-IOV with Intel® Ethernet Server Adapters
In a nutshell, You blacklist the vf driver in the host, and enable the VFs as part of the kvm guests.
Step 1: Add a line /etc/modprobe.conf
options ixgbe max_vfs=8
The above configuration will create 8 Virtual Nics per Port. The Intel Card supports up to 64 FVs.
Step 2: Blacklist the ixgbevf driver by creating a file called /etc/modprobe.d/blacklist-ixgbevf.conf
Step 3: Reboot the machine
If you are having with unexplained issues with mpirun, you can use various method to troubleshoot.
Information on “–mca orte_base_help_aggregate 0”
If your mpirun dies without any error messages you may want to take read from OpenMPI FAQ which
Debugging applications in parallel 7. My process dies without any output. Why?
If your application fails due to memory corruption, Open MPI may subsequently fail to output an error message before dying. Specifically, starting with v1.3, Open MPI attempts to aggregate error messages from multiple processes in an attempt to show unique error messages only once (vs. one for each MPI process — which can be unweildly, especially when running large MPI jobs).
However, this aggregation process requires allocating memory in the MPI process when it displays the error message. If the process’ memory is already corrupted, Open MPI’s attempt to allocate memory may fail and the process will simply die, possibly silently. When Open MPI does not attempt to aggregate error messages, most of its setup work is done during MPI_INIT and no memory is allocated during the “print the error” routine. It therefore almost always successfully outputs error messages in real time — but at the expense that you’ll potentially see the same error message for each MPI process that encourntered the error.
Hence, the error message aggregation is usually a good thing, but sometimes it can mask a real error. You can disable Open MPI’s error message aggregation with the orte_base_help_aggregate MCA parameter. For example:
$ mpirun --mca orte_base_help_aggregate 0 ...