For a highly parallel job that span across multiple hosts, it is desirable to allocate hosts to the job that are close together according to network topology. The purpose is to minimize communication latency.
The article is taken from IBM Platform LSF Wiki “Using Compute units for Topology Scheduling”
Step 1: Define COMPUTE_UNIT_TYPES in lsb.params
COMPUTE_UNIT_TYPES = enclosure! switch rack
- The example specifies 3 CU Types. In this parameter, the order of the values corresponds to levels in the network topology. CU Type enclosure are contained in CU Type switch; CU Type rack
- The exclamation mark (!) following switch means that this is the default level to be used for jobs with CU topology requirements. If the exclamation mark is omitted, the first string listed is the default type.
Step 2: Arrange hosts into lsb.hosts
Begin ComputeUnit NAME TYPE CONDENSE MEMBER en1-1 enclosure Y (c00 c01 c02) en1-2 enclosure Y (c03 c04 c05) en1-3 enclosure Y (c06 c07 co8 c09 c10) ..... s1 switch Y (en1-1 en1-2) s2 switch Y (en1-3) ..... r1 rack Y (s1 s2) ..... End ComputeUnit
Update the mbatchd by doing a
# badmin reconfig
View the CU Configuration
# bmgroup -cu
Step 3: Using bhosts to display information
Since you are using “Y” under the CONDENSE Column in lsb.params, the bhosts display the CU type. But if you do a bhosts -X, you will see all the nodes.