Topology Scheduling on Platform LSF

For a highly parallel job that span across multiple hosts, it is desirable to allocate hosts to the job that are close together according to network topology. The purpose is to minimize communication latency.

The article is taken from IBM Platform LSF Wiki “Using Compute units for Topology Scheduling”

Step 1: Define COMPUTE_UNIT_TYPES in lsb.params

COMPUTE_UNIT_TYPES = enclosure! switch rack

The example specifies 3 CU Types. In this parameter, the order of the values corresponds to levels in the network topology. CU Type enclosure are contained in CU Type switch; CU Type rack
The exclamation mark (!) following switch means that this is the default level to be used for jobs with CU topology requirements. If the exclamation mark is omitted, the first string listed is the default type.

Step 2: Arrange hosts into lsb.hosts

Begin ComputeUnit
NAME    TYPE            CONDENSE        MEMBER
en1-1   enclosure        Y                   (c00 c01 c02)
en1-2   enclosure        Y                   (c03 c04 c05)
en1-3   enclosure        Y                   (c06 c07 co8 c09 c10)
.....
s1      switch           Y                   (en1-1 en1-2)
s2      switch           Y                   (en1-3)
.....
r1      rack             Y                   (s1 s2)
.....
End ComputeUnit

Update the mbatchd by doing a

# badmin reconfig

View the CU Configuration

# bmgroup -cu

Step 3: Using bhosts to display information

Since you are using “Y” under the CONDENSE Column in lsb.params, the bhosts display the CU type. But if you do a bhosts -X, you will see all the nodes.

References:

Using Compute Units for Topology Scheduling

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Topology Scheduling on Platform LSF

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply