Home >System Tutorial >LINUX >Uncovering the reasons why slow disks cause soaring Linux load
Here we need to distinguish between CPU load and CPU utilization. They are two different concepts, but their information can be displayed in the same top command. CPU utilization shows the percentage of CPU occupied by the program in real time during running. This is a statistics of CPU usage within a period of time. Through this indicator, you can see how much the CPU is occupied within a certain period of time. If it is occupied If the time is very high, then you need to consider whether the CPU is already overloaded. The CPU load shows the statistical information of the sum of the number of processes that the CPU is processing and waiting for the CPU to process within a period of time, which is also the statistical information of the length of the CPU usage queue.
High CPU utilization does not mean that the load must be large. Maybe this task is CPU-intensive. Will a high Load Average occur under the same low CPU utilization situation? By understanding the occupation time and usage time, you can know that when the CPU allocates a time slice, whether to use it depends entirely on the user, so it is entirely possible to have low utilization and high Load Average. In addition, IO devices may also cause high CPU load.
From this point of view, it is not enough to judge whether the CPU is in an overloaded working state based on the CPU usage alone. It must be combined with Load Average to look at the CPU usage globally. There is an example on the Internet to illustrate the difference between the two: in a public phone booth, there is one person calling and four people waiting. Each person is limited to using the phone for one minute. If someone does not finish the call within one minute, only I can hang up the phone and go to the queue to wait for the next round. The phone here is equivalent to the CPU, and the people who are calling or waiting to call are equivalent to the number of tasks. During the use of the phone booth, some people will definitely leave after making calls, some people will queue up again without finishing their calls, and there will even be new people queuing up here. The change in the number of people is equivalent to the increase or decrease in the number of tasks. In order to count the average load, we count the number of people every 5 seconds, and average the statistics at the 1st, 5th, and 15th minutes to form the average load at the 1st, 5th, and 15th minutes. Some people pick up the phone and call immediately, and the call lasts for one minute, while some people may be looking for the phone number in the first thirty seconds, or hesitating whether to call, and then actually call in the last thirty seconds. If the phone is regarded as a CPU and the number of people is regarded as a task, we say that the CPU utilization of the former person (task) is high and the CPU utilization of the latter person (task) is low. Of course, the CPU will not work in the first thirty seconds, and will rest in the next thirty seconds. The CPU will keep working. It’s just that some programs involve a lot of calculations, so the CPU utilization is high, while some programs involve very little calculation, so the CPU utilization is naturally low. But whether the CPU utilization is high or low, it has nothing to do with how many tasks are queued later.
The number of CPUs and the number of CPU cores (that is, the number of cores) will affect the CPU load, because tasks are ultimately assigned to CPU cores for processing. Two CPUs are better than one CPU, and dual cores are better than single cores. Therefore, we need to remember that apart from the difference in CPU performance, the CPU load is calculated based on the number of cores, that is, "how many cores there are, that is, how much load there is". For example, it is best not to exceed 100% for a single core, that is, the load is 1.00, and so on.
Linux has a /proc directory, which stores the virtual mapping of the current running system. There is a file called cpuinfo, which stores CPU information. The /proc/cpuinfo file displays information in sections by logical CPU rather than real CPU. The information of each logical CPU occupies one section, and the first logical CPU identifier starts from 0.
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family: 6 model : 63 model name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz stepping: 2 microcode : 0x36 cpu MHz : 2399.998 cache size: 20480 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid: 0 initial apicid: 0 fpu : yes fpu_exception : yes cpuid level : 15 wp: yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr ...... bogomips: 4799.99 clflush size: 64 cache_alignment : 64 address sizes : 42 bits physical, 48 bits virtual power management:
To understand the CPU information in this file, there are several related concepts to know, such as: processor represents the identification of the logical CPU, model name represents the model information of the real CPU, physical id represents the real CPU and identification, cpu cores represents the number of cores of the real CPU and so on.
Description of logical CPU: Today's servers generally use "Hyper-Threading" (HT) technology to improve CPU performance. Hyper-threading technology allows a CPU to execute multiple programs at the same time and share the resources in a CPU. In theory, it should execute two threads at the same time like two CPUs. Although hyper-threading technology can execute two threads at the same time, it is not like two real CPUs, each CPU has independent resources. When two threads need a resource at the same time, one of them must be temporarily stopped and give up the resource until these resources are idle before continuing. Therefore, the performance of hyper-threading is not equal to the performance of two CPUs. CPUs with Hyper-Threading Technology have other limitations.
The concept of Load average originates from the UNIX system. Although the formulas of each company are different, they are all used to measure the number of processes using the CPU and the number of processes waiting for the CPU. In one word, it is the number of runable processes. Therefore, the load average can be used as a reference indicator for CPU bottlenecks. If it is greater than the number of CPUs, it means that the CPU may not be enough.
However, there is a little difference on Linux!
In addition to the number of processes using the CPU and the number of processes waiting for the CPU, the load average on Linux also includes the number of uninterruptible sleep processes. Usually when waiting for IO devices and the network, the process will be in uninterruptible sleep state. The logic of Linux designers is that uninterruptible sleep should be very short-lived and will resume operation soon, so it is equated to runnable. However, uninterruptible sleep is still sleep even if it is short, not to mention that uninterruptible sleep may not be very short in the real world. A large number or long uninterruptible sleep usually means that the IO device has encountered a bottleneck. As we all know, processes in sleep state do not require CPU. Even if all CPUs are idle, the sleeping process cannot run. Therefore, the number of sleep processes is definitely not suitable to be used as an indicator to measure CPU load. Linux counts uninterruptible sleep processes as The practice of entering load average directly subverts the original meaning of load average. Therefore, on Linux systems, the load average indicator is basically useless because you don’t know what it means. When you see a high load average, you don’t know whether there are too many runnable processes or too many uninterruptible sleep processes. It is impossible to determine whether the CPU is insufficient or the IO device has a bottleneck.
From another aspect, it can also explain why the CPU load will soar when the disk is slow (when a large amount of disk is used). Basically, when I encounter high CPU load, there are two situations: the CPU itself handles too many tasks, plus soft interrupts and context switches are too frequent, resulting in high load; and the disk is too slow, causing too much uninterruptible sleep, which causes the CPU to Load is high.
The above is the detailed content of Uncovering the reasons why slow disks cause soaring Linux load. For more information, please follow other related articles on the PHP Chinese website!