June 12, 2021 – The Linux Cluster

Taken from RedHat Article “What is the relation between I/O wait and load average?” I have learned quite a bit on this article.

Linux, unlike traditional UNIX operating systems, computes its load average as the average number of runnable or running processes (R state), and the number of processes in uninterruptable sleep (D state) over the specified interval. On UNIX systems, only the runnable or running processes are taken into account for the load average calculation.

On Linux the load average is a measurement of the amount of “work” being done by the machine (without being specific as to what that work is). This “work” could reflect a CPU intensive application (compiling a program or encrypting a file), or something I/O intensive (copying a file from disk to disk, or doing a database full table scan), or a combination of the two.

In the article, you can determine whether the high load average is the result processes in the running state or uninterruptible state,

I like this script…… that was written in the knowledgebase. The script show the running, blocked and runnin+blocked.

[user@node1 ~]$ while true; do echo; uptime; ps -efl | awk 'BEGIN {running = 0; blocked = 0} $2 ~ /R/ {running++}; $2 ~ /D/ {blocked++} END {print "Number of running/blocked/running+blocked processes: "running"/"blocked"/"running+blocked}'; sleep 5; done

 23:45:52 up 52 days,  7:06, 22 users,  load average: 1.40, 1.26, 1.02
Number of running/blocked/running+blocked processes: 3/1/4

 23:45:57 up 52 days,  7:06, 22 users,  load average: 1.45, 1.27, 1.02
Number of running/blocked/running+blocked processes: 2/0/2

 23:46:02 up 52 days,  7:06, 22 users,  load average: 1.41, 1.27, 1.02
Number of running/blocked/running+blocked processes: 1/1/2

 23:46:07 up 52 days,  7:07, 22 users,  load average: 1.46, 1.28, 1.03
Number of running/blocked/running+blocked processes: 2/0/2

 23:46:12 up 52 days,  7:07, 22 users,  load average: 1.42, 1.27, 1.03
Number of running/blocked/running+blocked processes: 2/0/2

 23:46:17 up 52 days,  7:07, 22 users,  load average: 1.55, 1.30, 1.04
Number of running/blocked/running+blocked processes: 2/0/2

 23:46:22 up 52 days,  7:07, 22 users,  load average: 1.51, 1.30, 1.04
Number of running/blocked/running+blocked processes: 1/1/2

 23:46:27 up 52 days,  7:07, 22 users,  load average: 1.55, 1.31, 1.05
Number of running/blocked/running+blocked processes: 2/0/2

 23:46:32 up 52 days,  7:07, 22 users,  load average: 1.62, 1.33, 1.06
Number of running/blocked/running+blocked processes: 2/1/3

 23:46:38 up 52 days,  7:07, 22 users,  load average: 1.81, 1.38, 1.07
Number of running/blocked/running+blocked processes: 1/1/2

 23:46:43 up 52 days,  7:07, 22 users,  load average: 1.66, 1.35, 1.07
Number of running/blocked/running+blocked processes: 1/0/1

 23:46:48 up 52 days,  7:07, 22 users,  load average: 1.53, 1.33, 1.06
Number of running/blocked/running+blocked processes: 1/0/1

Another useful way to typical top output when the load average is high (filter the idle/sleep status tasks with i). So the high load average is because lots of sendmail tasks are in D status. They may be waiting either for I/O or network.

op - 13:23:21 up 329 days,  8:35,  0 users,  load average: 50.13, 13.22, 6.27
Tasks: 437 total,   1 running, 435 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.1%us,  1.5%sy,  0.0%ni, 93.6%id,  4.5%wa,  0.1%hi,  0.2%si,  0.0%st
Mem:  34970576k total, 24700568k used, 10270008k free,  1166628k buffers
Swap:  2096440k total,        0k used,  2096440k free, 11233868k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
11975 root      15   0 13036 1356  820 R  0.7  0.0   0:00.66 top                
15915 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15918 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15920 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15921 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15922 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15923 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15924 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15926 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15928 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15929 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15930 root      17   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail           
15931 root      18   0  5312  872   80 D  0.0  0.0   0:00.00 sendmail

References:

What is the relation between I/O wait and load average?

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux

Day: June 12, 2021

Finding physical cpus, cores and logical cpus

Understanding Load Average in Linux