Taken from RedHat Article “What is the relation between I/O wait and load average?” I have learned quite a bit on this article.
Linux, unlike traditional UNIX operating systems, computes its load average as the average number of runnable or running processes (R state), and the number of processes in uninterruptable sleep (D state) over the specified interval. On UNIX systems, only the runnable or running processes are taken into account for the load average calculation.
On Linux the load average is a measurement of the amount of “work” being done by the machine (without being specific as to what that work is). This “work” could reflect a CPU intensive application (compiling a program or encrypting a file), or something I/O intensive (copying a file from disk to disk, or doing a database full table scan), or a combination of the two.
In the article, you can determine whether the high load average is the result processes in the running state or uninterruptible state,
I like this script…… that was written in the knowledgebase. The script show the running, blocked and runnin+blocked.
[user@node1 ~]$ while true; do echo; uptime; ps -efl | awk 'BEGIN {running = 0; blocked = 0} $2 ~ /R/ {running++}; $2 ~ /D/ {blocked++} END {print "Number of running/blocked/running+blocked processes: "running"/"blocked"/"running+blocked}'; sleep 5; done
23:45:52 up 52 days, 7:06, 22 users, load average: 1.40, 1.26, 1.02
Number of running/blocked/running+blocked processes: 3/1/4
23:45:57 up 52 days, 7:06, 22 users, load average: 1.45, 1.27, 1.02
Number of running/blocked/running+blocked processes: 2/0/2
23:46:02 up 52 days, 7:06, 22 users, load average: 1.41, 1.27, 1.02
Number of running/blocked/running+blocked processes: 1/1/2
23:46:07 up 52 days, 7:07, 22 users, load average: 1.46, 1.28, 1.03
Number of running/blocked/running+blocked processes: 2/0/2
23:46:12 up 52 days, 7:07, 22 users, load average: 1.42, 1.27, 1.03
Number of running/blocked/running+blocked processes: 2/0/2
23:46:17 up 52 days, 7:07, 22 users, load average: 1.55, 1.30, 1.04
Number of running/blocked/running+blocked processes: 2/0/2
23:46:22 up 52 days, 7:07, 22 users, load average: 1.51, 1.30, 1.04
Number of running/blocked/running+blocked processes: 1/1/2
23:46:27 up 52 days, 7:07, 22 users, load average: 1.55, 1.31, 1.05
Number of running/blocked/running+blocked processes: 2/0/2
23:46:32 up 52 days, 7:07, 22 users, load average: 1.62, 1.33, 1.06
Number of running/blocked/running+blocked processes: 2/1/3
23:46:38 up 52 days, 7:07, 22 users, load average: 1.81, 1.38, 1.07
Number of running/blocked/running+blocked processes: 1/1/2
23:46:43 up 52 days, 7:07, 22 users, load average: 1.66, 1.35, 1.07
Number of running/blocked/running+blocked processes: 1/0/1
23:46:48 up 52 days, 7:07, 22 users, load average: 1.53, 1.33, 1.06
Number of running/blocked/running+blocked processes: 1/0/1
Another useful way to typical top output when the load average is high (filter the idle/sleep status tasks with i). So the high load average is because lots of sendmail tasks are in D status. They may be waiting either for I/O or network.
op - 13:23:21 up 329 days, 8:35, 0 users, load average: 50.13, 13.22, 6.27
Tasks: 437 total, 1 running, 435 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.1%us, 1.5%sy, 0.0%ni, 93.6%id, 4.5%wa, 0.1%hi, 0.2%si, 0.0%st
Mem: 34970576k total, 24700568k used, 10270008k free, 1166628k buffers
Swap: 2096440k total, 0k used, 2096440k free, 11233868k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11975 root 15 0 13036 1356 820 R 0.7 0.0 0:00.66 top
15915 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15918 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15920 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15921 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15922 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15923 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15924 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15926 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15928 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15929 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15930 root 17 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
15931 root 18 0 5312 872 80 D 0.0 0.0 0:00.00 sendmail
References:
- What is the relation between I/O wait and load average?