Let me go, the Linux system CPU is 100% full!-LINUX-php.cn

Home

System Tutorial

LINUX

Let me go, the Linux system CPU is 100% full!

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 13, 2024 pm 11:27 PM

linuxlinux tutoriallinux systemlinux commandshell scriptembeddedlinuxGetting started with linuxlinux learning

Yesterday afternoon, I suddenly received an email alert from the operation and maintenance department, which showed that the CPU utilization rate of the data platform server was as high as 98.94%. In recent times, this utilization rate has continued to be above 70%. At first glance, it seems that the hardware resources have reached a bottleneck and need to be expanded. But after thinking about it carefully, I found that our business system is not a highly concurrent or CPU-intensive application. This utilization rate is too exaggerated, and the hardware bottleneck cannot be reached so quickly. There must be a problem with the business code logic somewhere.

2. Troubleshooting ideas

2.1 Locate high load process pid

First log in to the server and use the top command to confirm the specific situation of the server, and then analyze and judge based on the specific situation.

By observing the load average and the load evaluation standard (8 cores), it can be confirmed that the server has a high load;

Observing the resource usage of each process, we can see that the process with process ID 682 has a higher CPU ratio

2.2 Locate specific abnormal business

Here we can use the pwdx command to find the business process path based on pid, and then locate the person in charge and the project:

It can be concluded that this process corresponds to the web service of the data platform.

2.3 Locate the abnormal thread and specific code lines

The traditional solution is generally 4 steps:

1. top oder by with P：1040 //First sort by process load and find maxLoad(pid)

2. top -Hp process PID: 1073 // Find the relevant load thread PID

3. printf "0x%x" Thread PID: 0x431 // Convert the thread PID to hexadecimal to prepare for later searching for jstack logs

4. jstack process PID | vim /hex thread PID – // For example: jstack 1040|vim /0x431 –

But for online problem locating, every second counts, and the above four steps are still too cumbersome and time-consuming. Oldratlee, who introduced Taobao before, encapsulated the above process into a tool: show-busy-java-threads. sh, you can easily locate this type of problem online:

It can be concluded that the execution CPU of a time tool method in the system is relatively high. After locating the specific method, check whether there are performance problems in the code logic.

※ If the online problem is more urgent, you can omit 2.1 and 2.2 and directly execute 2.3. The analysis here is from multiple angles just to present you with a complete analysis idea.

3. Root cause analysis

After the previous analysis and troubleshooting, we finally located a problem with time tools, which caused excessive server load and CPU usage.

Exception method logic: is to convert the timestamp into the corresponding specific date and time format;
Upper layer call: Calculate all the seconds from early morning to the current time, convert them into the corresponding format and put them into the set to return the result;
Logic layer: corresponds to the query logic of the real-time report of the data platform. The real-time report will come at a fixed time interval, and there will be multiple (n) method calls in one query.

Then it can be concluded that if the current time is 10 a.m. that day, the number of calculations for a query is 106060n times = 36,000n calculations, and As time goes by, the number of single queries increases linearly as it gets closer to midnight. Since a large number of query requests from modules such as real-time query and real-time alarm require calling this method multiple times, a large amount of CPU resources are occupied and wasted.

4. Solution

After locating the problem, the first consideration is to reduce the number of calculations and optimize the exception method. After investigation, it was found that when used at the logic layer, the contents of the set collection returned by this method were not used, but the size value of the set was simply used. After confirming the logic, simplify the calculation through a new method (current seconds - seconds in the early morning of the day), replace the called method, and solve the problem of excessive calculations. After going online, we observed the server load and CPU usage. Compared with the abnormal time period, the server load and CPU usage dropped by 30 times and returned to normal. At this point, the problem has been solved.

![Yesterday afternoon, I suddenly received an email alert from the operation and maintenance department, showing that the CPU utilization rate of the data platform server was as high as 98.94%. In recent times, this utilization rate has continued to be above 70%. At first glance, it seems that the hardware resources have reached a bottleneck and need to be expanded. But after thinking about it carefully, I found that our business system is not a highly concurrent or CPU-intensive application. This utilization rate is too exaggerated, and the hardware bottleneck cannot be reached so quickly. There must be a problem with the business code logic somewhere.

2. Troubleshooting ideas

2.1 Locate high load process pid

First log in to the server and use the top command to confirm the specific situation of the server, and then analyze and judge based on the specific situation.

By observing the load average and the load evaluation standard (8 cores), it can be confirmed that the server has a high load;

Observing the resource usage of each process, we can see that the process with process ID 682 has a higher CPU ratio

2.2 Locate specific abnormal business

Here we can use the pwdx command to find the business process path based on pid, and then locate the person in charge and the project:

It can be concluded that this process corresponds to the web service of the data platform.

2.3 Locate the abnormal thread and specific code lines

The traditional solution is generally 4 steps:

1. top oder by with P：1040 //First sort by process load and find maxLoad(pid)

2. top -Hp process PID: 1073 // Find the relevant load thread PID

3. printf "0x%x" Thread PID: 0x431 // Convert the thread PID to hexadecimal to prepare for later searching for jstack logs

4. jstack process PID | vim /hex thread PID – // For example: jstack 1040|vim /0x431 –

It can be concluded that the execution CPU of a time tool method in the system is relatively high. After locating the specific method, check whether there are performance problems in the code logic.

※ If the online problem is more urgent, you can omit 2.1 and 2.2 and directly execute 2.3. The analysis here is from multiple angles just to present you with a complete analysis idea.

3. Root cause analysis

After the previous analysis and troubleshooting, we finally located a problem with time tools, which caused excessive server load and CPU usage.

Exception method logic: is to convert the timestamp into the corresponding specific date and time format;
Upper layer call: Calculate all the seconds from early morning to the current time, convert them into the corresponding format and put them into the set to return the result;
Logic layer: corresponds to the query logic of the real-time report of the data platform. The real-time report will come at a fixed time interval, and there will be multiple (n) method calls in one query.

4. Solution

5. Summary

During the coding process, in addition to implementing business logic, we must also focus on optimizing code performance. The ability to realize a business requirement and the ability to achieve it more efficiently and more elegantly are actually two completely different manifestations of engineers' abilities and realms, and the latter is also the core competitiveness of engineers.
After the code is written, do more reviews and think more about whether it can be implemented in a better way.
Don’t miss any small detail in online questions! Details are the devil. Technical students need to have the thirst for knowledge and the spirit of pursuing excellence. Only in this way can they continue to grow and improve.

The above is the detailed content of Let me go, the Linux system CPU is 100% full!. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:良许Linux教程网. If there is any infringement, please contact admin@php.cn delete

What are the main tasks of a Linux system administrator?Apr 19, 2025 am 12:23 AM

The main tasks of Linux system administrators include system monitoring and performance tuning, user management, software package management, security management and backup, troubleshooting and resolution, performance optimization and best practices. 1. Use top, htop and other tools to monitor system performance and tune it. 2. Manage user accounts and permissions through useradd commands and other commands. 3. Use apt and yum to manage software packages to ensure system updates and security. 4. Configure a firewall, monitor logs, and perform data backup to ensure system security. 5. Troubleshoot and resolve through log analysis and tool use. 6. Optimize kernel parameters and application configuration, and follow best practices to improve system performance and stability.

Is it hard to learn Linux?Apr 18, 2025 am 12:23 AM

Learning Linux is not difficult. 1.Linux is an open source operating system based on Unix and is widely used in servers, embedded systems and personal computers. 2. Understanding file system and permission management is the key. The file system is hierarchical, and permissions include reading, writing and execution. 3. Package management systems such as apt and dnf make software management convenient. 4. Process management is implemented through ps and top commands. 5. Start learning from basic commands such as mkdir, cd, touch and nano, and then try advanced usage such as shell scripts and text processing. 6. Common errors such as permission problems can be solved through sudo and chmod. 7. Performance optimization suggestions include using htop to monitor resources, cleaning unnecessary files, and using sy

What is the salary of Linux administrator?Apr 17, 2025 am 12:24 AM

The average annual salary of Linux administrators is $75,000 to $95,000 in the United States and €40,000 to €60,000 in Europe. To increase salary, you can: 1. Continuously learn new technologies, such as cloud computing and container technology; 2. Accumulate project experience and establish Portfolio; 3. Establish a professional network and expand your network.

What is the main purpose of Linux?Apr 16, 2025 am 12:19 AM

The main uses of Linux include: 1. Server operating system, 2. Embedded system, 3. Desktop operating system, 4. Development and testing environment. Linux excels in these areas, providing stability, security and efficient development tools.

Does the internet run on Linux?Apr 14, 2025 am 12:03 AM

The Internet does not rely on a single operating system, but Linux plays an important role in it. Linux is widely used in servers and network devices and is popular for its stability, security and scalability.

What are Linux operations?Apr 13, 2025 am 12:20 AM

The core of the Linux operating system is its command line interface, which can perform various operations through the command line. 1. File and directory operations use ls, cd, mkdir, rm and other commands to manage files and directories. 2. User and permission management ensures system security and resource allocation through useradd, passwd, chmod and other commands. 3. Process management uses ps, kill and other commands to monitor and control system processes. 4. Network operations include ping, ifconfig, ssh and other commands to configure and manage network connections. 5. System monitoring and maintenance use commands such as top, df, du to understand the system's operating status and resource usage.

Boost Productivity with Custom Command Shortcuts Using Linux AliasesApr 12, 2025 am 11:43 AM

Introduction Linux is a powerful operating system favored by developers, system administrators, and power users due to its flexibility and efficiency. However, frequently using long and complex commands can be tedious and er

What is Linux actually good for?Apr 12, 2025 am 12:20 AM

Linux is suitable for servers, development environments, and embedded systems. 1. As a server operating system, Linux is stable and efficient, and is often used to deploy high-concurrency applications. 2. As a development environment, Linux provides efficient command line tools and package management systems to improve development efficiency. 3. In embedded systems, Linux is lightweight and customizable, suitable for environments with limited resources.

See all articles