Detailed explanation of Linux load average load problem-Linux Operation and Maintenance-php.cn

Home

Operation and Maintenance

Linux Operation and Maintenance

Detailed explanation of Linux load average load problem

不言

Mar 12, 2019 pm 05:24 PM

linux

This article brings you a detailed explanation of the Linux load average load problem. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

In one interview, the interviewer asked a question. The CPU usage is not high, but the Load (average load) is very high. How do you find the problem?

I didn’t understand the meaning of Load at the time. The interviewer explained that this indicator reflects more processes in an uninterruptible state. Based on my past back-end development experience, I answered that there may be more io blocking in the system, which mostly occurs in network io problems. Use the command netstat -tnp to see if there is much time_wait status in the tcp connection...

I know My answer was very one-sided, so I reviewed and took notes afterwards.

What is load average

Those who are familiar with Linux know that you can use the top uptime command to view the load average indicator.

Use man uptime to view Load average Explanation:

System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.

Understand the key point, the average load refers to the unit Within a certain period of time, the average number of processes in the system that are in the runnable state and the uninterruptible state is referred to as the average number of active processes. It is worth noting that it has no direct relationship with CPU usage

Use the command ps aux to view the status stat of the process, as noted in this article:

R status, runnable status (Running status) / Runnable), the D state of the process that is using the CPU or waiting for the CPU, the uninterruptible state (Uninterruptitle Sleep, also known as Disk Sleep), the process that is in the critical process of the kernel state, and is uninterruptible.

D Why the state cannot be interrupted? For example, the system calls the I/O response of the hardware device. In order to ensure the consistency of the data, before the disk device returns the data, it cannot interrupt other processes or Interrupts are interrupted. If interrupted, it is easy to cause inconsistency between disk data and process data. Therefore, the uninterruptible (D) state is a protection mechanism of the system for processes and hardware devices.

The average number of active processes, strictly speaking, is the exponential decay average of the number of active processes (the rate of decline of a certain quantity is proportional to its value). Usually, it can be understood as the number of active processes per unit time.

CPU Utilization and Balanced Load

From a CPU perspective, Load average only reflects the number of processes occupying the CPU per unit time, and CPU utilization is not directly related to the number of processes. We can Use the command top vmstat to check the CPU utilization. There are the following indicators:

%us: Indicates the cpu usage of the user space program (not scheduled through nice) %sy: Indicates the cpu usage of the system space. Mainly kernel programs. %ni: Indicates the cpu usage of programs in user space and scheduled through nice. %id: idle cpu %wa: the time the cpu is waiting for io when running %hi: the number of hard interrupts processed by the cpu %si: the number of soft interrupts processed by the cpu %st: cpu stolen by the virtual machine

How to measure a reasonable average load

Generally speaking, if the Load average is lower than the number of CPUs, the machine performance meets the service requirements. It does not matter if it exceeds the number. The Load average does not directly represent the CPU utilization, and it may be due to more io blocking. . When the load average is higher than 70% of the number of CPUs, it may cause the process to respond slowly, thus affecting the normal function of the service.

From the perspective of historical changes

Generally speaking, top uptime provides load average indicators at three time points, namely: 1 minute, 5 minutes, and 15 minutes. This reflects the recent state change trend of the system. In the actual production environment, we need to make long-term monitoring records. If there are abnormal numerical changes, for example, the average load is twice that of the CPU, the problem needs to be analyzed and investigated.

Comprehensive analysis of the differences between the two types of indicators

is based on the balanced load and CPU utilization, and the following possible situations are combined:

Load average is high, CPU If use is high, either CPU-intensive processes (threads) are running, or there are a large number of processes (threads) waiting for the CPU to schedule. Load average is high, and if CPU use is low, IO-intensive processes are running. Both are relatively low, and normal load average is low. High CPU use, this does not exist

Simulation cases and tools

How can we analyze cases with different combinations of these two indicators, balanced load and CPU utilization, and find the source of the indicator changes?

The following environment is Linux Arch 4.19 / 4 CPU / 8G Memory

Tool list

stress system stress testing tool

sysstat performance analysis tool package:

mpstat Multi-core CPU analysis performance tool, mp means multi processors (multi-processor) pidstat process performance analysis tool, pid means process ID. It is used to view the CPU, memory, I/O and context switching indicators of the process

Simulation scenarios

Using stress can simulate the following scenarios

CPU-intensive processes

# 模拟一个进程， 对 cpu 使用率 100%，限时 600s
stress --cpu 1 --timeout 600

IO intensive process

stress -i option, spawn N workers spinning on sync()

# 模拟一个进程不停的执行 sync
stress -i 1 --timeout 600

Scenario of a large number of processes

# 模拟16个进程， 对 cpu 使用率 100%，限时 600s
stress --cpu 16 --timeout 600

Tool indicators

mpstat -P ALL 5 monitors all CPUs and outputs a set of data every 5 seconds. Pay attention to the indicators %usr usage and %iowait IO blocking time. From this, you can determine whether it is CPU-intensive or IO-intensive pidstat - u 5 1 Statistics interval of 5 seconds, data of processes that have used the CPU, pay attention to the indicators %usr usage, %wait waiting time to use the CPU, from this you can determine whether there are too many processes (threads)

The above is the detailed content of Detailed explanation of Linux load average load problem. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:segmentfault. If there is any infringement, please contact admin@php.cn delete

What is Maintenance Mode in Linux? ExplainedApr 22, 2025 am 12:06 AM

MaintenanceModeinLinuxisaspecialbootenvironmentforcriticalsystemmaintenancetasks.Itallowsadministratorstoperformtaskslikeresettingpasswords,repairingfilesystems,andrecoveringfrombootfailuresinaminimalenvironment.ToenterMaintenanceMode,interrupttheboo

Linux: A Deep Dive into Its Fundamental PartsApr 21, 2025 am 12:03 AM

The core components of Linux include kernel, file system, shell, user and kernel space, device drivers, and performance optimization and best practices. 1) The kernel is the core of the system, managing hardware, memory and processes. 2) The file system organizes data and supports multiple types such as ext4, Btrfs and XFS. 3) Shell is the command center for users to interact with the system and supports scripting. 4) Separate user space from kernel space to ensure system stability. 5) The device driver connects the hardware to the operating system. 6) Performance optimization includes tuning system configuration and following best practices.

Linux Architecture: Unveiling the 5 Basic ComponentsApr 20, 2025 am 12:04 AM

The five basic components of the Linux system are: 1. Kernel, 2. System library, 3. System utilities, 4. Graphical user interface, 5. Applications. The kernel manages hardware resources, the system library provides precompiled functions, system utilities are used for system management, the GUI provides visual interaction, and applications use these components to implement functions.

Linux Operations: Utilizing the Maintenance ModeApr 19, 2025 am 12:08 AM

Linux maintenance mode can be entered through the GRUB menu. The specific steps are: 1) Select the kernel in the GRUB menu and press 'e' to edit, 2) Add 'single' or '1' at the end of the 'linux' line, 3) Press Ctrl X to start. Maintenance mode provides a secure environment for tasks such as system repair, password reset and system upgrade.

Linux: How to Enter Recovery Mode (and Maintenance)Apr 18, 2025 am 12:05 AM

The steps to enter Linux recovery mode are: 1. Restart the system and press the specific key to enter the GRUB menu; 2. Select the option with (recoverymode); 3. Select the operation in the recovery mode menu, such as fsck or root. Recovery mode allows you to start the system in single-user mode, perform file system checks and repairs, edit configuration files, and other operations to help solve system problems.

Linux's Essential Components: Explained for BeginnersApr 17, 2025 am 12:08 AM

The core components of Linux include the kernel, file system, shell and common tools. 1. The kernel manages hardware resources and provides basic services. 2. The file system organizes and stores data. 3. Shell is the interface for users to interact with the system. 4. Common tools help complete daily tasks.

Linux: A Look at Its Fundamental StructureApr 16, 2025 am 12:01 AM

The basic structure of Linux includes the kernel, file system, and shell. 1) Kernel management hardware resources and use uname-r to view the version. 2) The EXT4 file system supports large files and logs and is created using mkfs.ext4. 3) Shell provides command line interaction such as Bash, and lists files using ls-l.

Linux Operations: System Administration and MaintenanceApr 15, 2025 am 12:10 AM

The key steps in Linux system management and maintenance include: 1) Master the basic knowledge, such as file system structure and user management; 2) Carry out system monitoring and resource management, use top, htop and other tools; 3) Use system logs to troubleshoot, use journalctl and other tools; 4) Write automated scripts and task scheduling, use cron tools; 5) implement security management and protection, configure firewalls through iptables; 6) Carry out performance optimization and best practices, adjust kernel parameters and develop good habits.

See all articles