Explore new paths - Diagnostic tool for IO waiting-LINUX-php.cn

Home

System Tutorial

LINUX

Explore new paths - Diagnostic tool for IO waiting

王林

Dec 29, 2023 pm 10:29 PM

linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

Introduction

Recently I have been doing real-time synchronization of logs. Before going online, I did a single online log stress test. There were no problems with the message queue, the client, and the local machine, but I didn’t expect that after the second log was uploaded, , here comes the question:

1. Question:

A certain machine in the cluster top saw a huge load. The hardware configuration of the machines in the cluster was the same, and the deployed software was the same, but there was a problem with the load on this one machine alone. I initially guessed that there might be a problem with the hardware.

At the same time, we also need to find out the culprit of the abnormal load, and then find solutions from the software and hardware levels.

另辟蹊径-诊断工具之 IO wait

2. Troubleshooting:

You can see from top that the load average is high, %wa is high, and %us is low:

另辟蹊径-诊断工具之 IO wait

From the above figure, we can roughly infer that IO has encountered a bottleneck. Next, we can use related IO diagnostic tools for specific verification and troubleshooting.

Commonly used combination methods are as follows:
•Use vmstat, sar, iostat to detect whether it is a CPU bottleneck
•Use free and vmstat to detect whether there is a memory bottleneck
•Use iostat and dmesg to detect whether it is a disk I/O bottleneck
•Use netstat to detect network bandwidth bottlenecks

2.1 vmstat

The meaning of the vmstat command is to display virtual memory status ("Virtual Memor Statics"), but it can report on the overall operating status of the system such as processes, memory, I/O, etc.

另辟蹊径-诊断工具之 IO wait
Its related fields are described as follows:

Procs(Process)
•r: The number of processes in the run queue. This value can also be used to determine whether the CPU needs to be increased. (long term greater than 1)
•b: The number of processes waiting for IO, that is, the number of processes in non-interruptible sleep state, showing the number of tasks that are executing and waiting for CPU resources. When this value exceeds the number of CPUs, a CPU bottleneck will occur

Memory
•swpd: Use virtual memory size. If the value of swpd is not 0, but the values of SI and SO are 0 for a long time, this situation will not affect system performance.
•free: Free physical memory size.
•buff: The size of memory used as buffer.
•cache: The memory size used as cache. If the cache value is large, it means that there are many files in the cache. If frequently accessed files can be cached, the read IO bi of the disk will be very small.

Swap（swap area）
•si: The size written from the swap area to the memory per second, which is transferred into the memory from the disk.
•so: The memory size written to the swap area per second, transferred from memory to disk.

Note: When the memory is sufficient, these two values are both 0. If these two values are greater than 0 for a long time, system performance will be affected, and disk IO and CPU resources will be consumed. Some friends think that the memory is not enough when they see that the free memory (free) is very small or close to 0. You can't just look at this, but also combine si and so. If there is very little free, there are also very few si and so. (Most of the time it is 0), then don’t worry, system performance will not be affected at this time.

IO (input and output)

(The current Linux version block size is 1kb)
•bi: Number of blocks read per second
•bo: Number of blocks written per second

Note: When reading and writing random disks, the larger these two values are (such as exceeding 1024k), the larger the value you can see that the CPU is waiting for IO.

system
•in: Number of interrupts per second, including clock interrupts.
•cs: Number of context switches per second.

Note: The larger the above two values are, the greater the CPU time consumed by the kernel will be.

CPU

(expressed as a percentage)
•us: Percentage of user process execution time (user time). When the value of us is relatively high, it means that the user process consumes a lot of CPU time, but if the usage exceeds 50% for a long time, then we should consider optimizing the program algorithm or accelerating it.
•sy: Percentage of kernel system process execution time (system time). When the value of sy is high, it means that the system kernel consumes a lot of CPU resources. This is not a benign performance and we should check the reason.
•wa: IO waiting time percentage. When the value of wa is high, it means that the IO wait is serious. This may be caused by a large number of random accesses on the disk, or there may be a bottleneck (block operation) on the disk.
•id: idle time percentage

As can be seen from vmstat, most of the CPU's time is wasted waiting for IO, which may be caused by a large number of random disk accesses or disk bandwidth. Bi and bo also exceed 1024k, which should be caused by IO. bottleneck.

2.2 iostat

Let’s use a more professional disk IO diagnostic tool to look at the relevant statistics.
另辟蹊径-诊断工具之 IO wait

Its related fields are described as follows:
•rrqm/s: The number of merge read operations per second. That is delta(rmerge)/s
•wrqm/s: The number of merge write operations per second. That is delta(wmerge)/s
•r/s: The number of reads from the I/O device completed per second. That is delta(rio)/s
•w/s: Number of writes to the I/O device completed per second. That is delta(wio)/s
•rsec/s: Number of sectors read per second. That is delta(rsect)/s
•wsec/s: Number of sectors written per second. That is delta(wsect)/s
•rkB/s: K bytes read per second. Is half of rsect/s because each sector size is 512 bytes. (needs calculation)
•wkB/s: Number of K bytes written per second. is half of wsect/s. (needs calculation)
•avgrq-sz: Average data size (sectors) per device I/O operation. delta(rsect wsect)/delta(rio wio)
•avgqu-sz: Average I/O queue length. That is delta(aveq)/s/1000 (because the unit of aveq is milliseconds).
•await: average waiting time (milliseconds) for each device I/O operation. That is delta(ruse wuse)/delta(rio wio)
•svctm: Average service time (milliseconds) of each device I/O operation. That is delta(use)/delta(rio wio)
•%util: What percentage of a second is used for I/O operations, or how much of a second the I/O queue is non-empty. That is delta(use)/s/1000 (because the unit of use is milliseconds)

You can see that the utilization rate of sdb in the two hard disks is 100%, and there is a serious IO bottleneck. The next step is to find out which process is reading and writing data to this hard disk.

2.3 iotop

另辟蹊径-诊断工具之 IO wait

According to the iotop results, we quickly located the problem with the flume process, which caused a large number of IO waits.

But as I said at the beginning, the machine configurations in the cluster are the same, and the deployed programs are exactly the same as rsync. Is it because the hard disk is broken?

This has to be verified by an operation and maintenance student. The final conclusion is:

Sdb is a dual-disk raid1, the raid card used is "LSI Logic/Symbios Logic SAS1068E", and there is no cache. The pressure of nearly 400 IOPS has reached the hardware limit. The raid card used by other machines is "LSI Logic / Symbios Logic MegaRAID SAS 1078", which has a 256MB cache and has not reached the hardware bottleneck. The solution is to replace the machine with a larger IOPS. For example, we finally changed to a machine with PERC6 /i Machines with integrated RAID controller cards. It should be noted that the RAID information is stored in the RAID card and the disk firmware. The RAID information on the disk and the information format on the RAID card must match. Otherwise, the RAID card cannot recognize it and the disk needs to be formatted.

IOPS essentially depends on the disk itself, but there are many ways to improve IOPS. Adding hardware cache and using RAID arrays are common methods. If it is a scenario like DB with high IOPS, it is now popular to use SSD to replace the traditional mechanical hard disk.

But as mentioned before, our purpose of starting from both the software and hardware aspects is to see if we can find the least expensive solution respectively:

Now that we know the hardware reason, we can try to move the read and write operations to another disk, and then see the effect:

另辟蹊径-诊断工具之 IO wait

3. Final words: Find another way

In fact, in addition to using the above-mentioned professional tools to locate this problem, we can directly use the process status to find the relevant processes.

We know that the process has the following states:
•D uninterruptible sleep (usually IO)
•R running or runnable (on run queue)
•S interruptible sleep (waiting for an event to complete)
•T stopped, either by a job control signal or because it is being traced.
•W paging (not valid since the 2.6.xx kernel)
•X dead (should never be seen)
•Z defunct ("zombie") process, terminated but not reaped by its parent.

The state of D is generally the so-called "non-interruptible sleep" caused by wait IO. We can start from this point and then locate the problem step by step:
另辟蹊径-诊断工具之 IO wait

The above is the detailed content of Explore new paths - Diagnostic tool for IO waiting. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:Linux就该这么学. If there is any infringement, please contact admin@php.cn delete

什么是linux设备节点Apr 18, 2022 pm 08:10 PM

linux设备节点是应用程序和设备驱动程序沟通的一个桥梁；设备节点被创建在“/dev”，是连接内核与用户层的枢纽，相当于硬盘的inode一样的东西，记录了硬件设备的位置和信息。设备节点使用户可以与内核进行硬件的沟通，读写设备以及其他的操作。

Linux中open和fopen的区别有哪些Apr 29, 2022 pm 06:57 PM

区别：1、open是UNIX系统调用函数，而fopen是ANSIC标准中的C语言库函数；2、open的移植性没fopen好；3、fopen只能操纵普通正规文件，而open可以操作普通文件、网络套接字等；4、open无缓冲，fopen有缓冲。

linux中什么叫端口映射May 09, 2022 pm 01:49 PM

端口映射又称端口转发，是指将外部主机的IP地址的端口映射到Intranet中的一台计算机，当用户访问外网IP的这个端口时，服务器自动将请求映射到对应局域网内部的机器上；可以通过使用动态或固定的公共网络IP路由ADSL宽带路由器来实现。

linux中eof是什么May 07, 2022 pm 04:26 PM

在linux中，eof是自定义终止符，是“END Of File”的缩写；因为是自定义的终止符，所以eof就不是固定的，可以随意的设置别名，linux中按“ctrl+d”就代表eof，eof一般会配合cat命令用于多行文本输出，指文件末尾。

什么是linux交叉编译Apr 29, 2022 pm 06:47 PM

在linux中，交叉编译是指在一个平台上生成另一个平台上的可执行代码，即编译源代码的平台和执行源代码编译后程序的平台是两个不同的平台。使用交叉编译的原因：1、目标系统没有能力在其上进行本地编译；2、有能力进行源代码编译的平台与目标平台不同。

linux怎么判断pcre是否安装May 09, 2022 pm 04:14 PM

在linux中，可以利用“rpm -qa pcre”命令判断pcre是否安装；rpm命令专门用于管理各项套件，使用该命令后，若结果中出现pcre的版本信息，则表示pcre已经安装，若没有出现版本信息，则表示没有安装pcre。

linux怎么查询mac地址Apr 24, 2022 pm 08:01 PM

linux查询mac地址的方法：1、打开系统，在桌面中点击鼠标右键，选择“打开终端”；2、在终端中，执行“ifconfig”命令，查看输出结果，在输出信息第四行中紧跟“ether”单词后的字符串就是mac地址。

linux中rpc是什么意思May 07, 2022 pm 04:48 PM

在linux中，rpc是远程过程调用的意思，是Reomote Procedure Call的缩写，特指一种隐藏了过程调用时实际通信细节的IPC方法；linux中通过RPC可以充分利用非共享内存的多处理器环境，提高系统资源的利用率。

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks agoByDDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

3 weeks agoByDDD

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.