Why %util number from iostat is meaningless for MySQL capaci

首頁

資料庫

mysql教程

Why %util number from iostat is meaningless for MySQL capaci_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:07 PM

Earlier this month I wrote aboutvmstat iowait cpu numbersand some of the comments I got were advertising the use of util% as reported by the iostat tool instead. I find this number even more useless for MySQL performance tuning and capacity planning.

Now let me start by saying this is a really tricky and deceptive number. Many DBAs who report instances of their systems having a very busy IO subsystem said the util% in vmstat was above 99% and therefore they believe this number is a good indicator of an overloaded IO subsystem.

Indeed – when your IO subsystem is busy, up to its full capacity, the utilization should be very close to 100%. However, it is perfectly possible for the IO subsystem and MySQL with it to have plenty more capacity than when utilization is showing 100% – as I will show in an example.

Before that though lets see what the iostat manual page has to say on this topic – fromthis main pagewe can read:

%util

Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.

Which says right here that the number is useless for pretty much any production database server that is likely to be running RAID, Flash/SSD, SAN or cloud storage (such as EBS) capable of handling multiple requests in parallel.

Let’s look at the following illustration. I will run sysbench on a system with a rather slow storage data size larger than buffer pool and uniform distribution to put pressure on the IO subsystem. I will use a read-only benchmark here as it keeps things more simple…

sysbench –num-threads=1 –max-requests=0 –max-time=6000000 –report-interval=10 –test=oltp –oltp-read-only=on –db-driver=mysql –oltp-table-size=100000000 –oltp-dist-type=uniform –init-rng=on –mysql-user=root –mysql-password= run

I’m seeing some 9 transactions per second, while disk utilization from iostat is at nearly 96%:

[ 80s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 171.82ms (95%)

[ 90s] threads: 1, tps: 9.20, reads/s: 128.80, writes/s: 0.00 response time: 157.72ms (95%)

[ 100s] threads: 1, tps: 9.00, reads/s: 126.00, writes/s: 0.00 response time: 215.38ms (95%)

[ 110s] threads: 1, tps: 9.30, reads/s: 130.20, writes/s: 0.00 response time: 141.39ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utildm-0 0.00 0.00 127.90 0.70 4070.40 28.00 31.87 1.01 7.83 7.52 96.68

This makes a lot of sense – with read single thread read workload the drive should be only used getting data needed by the query, which will not be 100% as there is some extra time needed to process the query on the MySQL side as well as passing the result set back to sysbench.

So 96% utilization; 9 transactions per second, this is a close to full-system capacity with less than 5% of device time to spare, right?

Let’s run a benchmark with more concurrency – 4 threads at the time; we’ll see…

[ 110s] threads: 4, tps: 21.10, reads/s: 295.40, writes/s: 0.00 response time: 312.09ms (95%)

[ 120s] threads: 4, tps: 22.00, reads/s: 308.00, writes/s: 0.00 response time: 297.05ms (95%)

[ 130s] threads: 4, tps: 22.40, reads/s: 313.60, writes/s: 0.00 response time: 335.34ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utildm-0 0.00 0.00 295.40 0.90 9372.80 35.20 31.75 4.06 13.69 3.38 100.01

So we’re seeing 100% utilization now, but what is interesting – we’re able to reclaim much more than less than 5% which was left if we look at utilization – throughput of the system increased about 2.5x

Finally let’s do the test with 64 threads – this is more concurrency than exists at storage level which is conventional hard drives in RAID on this system…

[ 70s] threads: 64, tps: 42.90, reads/s: 600.60, writes/s: 0.00 response time: 2516.43ms (95%)

[ 80s] threads: 64, tps: 42.40, reads/s: 593.60, writes/s: 0.00 response time: 2613.15ms (95%)

[ 90s] threads: 64, tps: 44.80, reads/s: 627.20, writes/s: 0.00 response time: 2404.49ms (95%)

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utildm-0 0.00 0.00 601.20 0.80 19065.60 33.60 31.73 65.98 108.72 1.66 100.00

In this case we’re getting 4.5x of throughput compared to single thread and 100% utilization. We’re also getting almost double throughput of the run with 4 thread where 100% utilization was reported. This makes sense – there are 4 drives which can work in parallel and with many outstanding requests they are able to optimize their seeks better hence giving a bit more than 4x.

So what have we so ? The system which was 96% capacity but which could have driven still to provide 4.5x throughput – so it had plenty of extra IO capacity. More powerful storage might have significantly more ability to run requests in parallel so it is quite possible to see 10x or more room after utilization% starts to be reported close to 100%

So if utilization% is not very helpful what can we use to understand our database IO capacity better ? First lets look at the performance reported from those sysbench runs. If we look at 95% response time you can see 1 thread and 4 threads had relatively close 95% time growing just from 150ms to 250-300ms. This is the number I really like to look at- if system is able to respond to the queries with response time not significantly higher than it has with concurrency of 1 it is not overloaded. I like using 3x as multiplier – ie when 95% spikes to be more than 3x of the single concurrency the system might be getting to the overload.

With 64 threads the 95% response time is 15-20x of the one we see with single thread so it is surely overloaded.

Do we have anything reported by iostat which we can use in a similar way? It turns out we do! Check out the “await” column which tells us how much the requester had to wait for the IO request to be serviced. With single concurrency it is 7.8ms which is what this drives can do for random IO and is as good as it gets. With 4 threads it is 13.7ms – less than double of best possible, so also good enough… with concurrency of 64 it is however 108ms which is over 10x of what this device could produce with no waiting and which is surely sign of overload.

A couple words of caution. First, do not look at svctm which is not designed with parallel processing in mind. You can see in our case it actually gets better with high concurrency while really database had to wait a lot longer for requests submitted. Second, iostat mixes together reads and writes in single statistics which specifically for databases and especially on RAID with BBU might be like mixing apples and oranges together – writes should go to writeback cache and be acknowledged essentially instantly while reads only complete when actual data can be delivered. The toolpt-diskstatsfromPercona Tookitbreaks them apart and so can be much more for storage tuning for database workload.

Final note – I used a read-only workload on purpose – when it comes to writes things can be even more complicated – MySQL buffer pool can be more efficient with more intensive writes plus group commit might be able to commit a lot of transactions with single disk write. Still, the same base logic will apply.

Summary:The take away is simple – util% only shows if a device has at least one operation to deal with or is completely busy, which does not reflect actual utilization for a majority of modern IO subsystems. So you may have a lot of storage IO capacity left even when utilization is close to 100%.

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

您什麼時候應該使用複合索引與多個單列索引？Apr 11, 2025 am 12:06 AM

在數據庫優化中，應根據查詢需求選擇索引策略：1.當查詢涉及多個列且條件順序固定時，使用複合索引；2.當查詢涉及多個列但條件順序不固定時，使用多個單列索引。複合索引適用於優化多列查詢，單列索引則適合單列查詢。

如何識別和優化MySQL中的慢速查詢？（慢查詢日誌，performance_schema）Apr 10, 2025 am 09:36 AM

要優化MySQL慢查詢，需使用slowquerylog和performance_schema：1.啟用slowquerylog並設置閾值，記錄慢查詢；2.利用performance_schema分析查詢執行細節，找出性能瓶頸並優化。

MySQL和SQL：開發人員的基本技能Apr 10, 2025 am 09:30 AM

MySQL和SQL是開發者必備技能。 1.MySQL是開源的關係型數據庫管理系統，SQL是用於管理和操作數據庫的標準語言。 2.MySQL通過高效的數據存儲和檢索功能支持多種存儲引擎，SQL通過簡單語句完成複雜數據操作。 3.使用示例包括基本查詢和高級查詢，如按條件過濾和排序。 4.常見錯誤包括語法錯誤和性能問題，可通過檢查SQL語句和使用EXPLAIN命令優化。 5.性能優化技巧包括使用索引、避免全表掃描、優化JOIN操作和提升代碼可讀性。

描述MySQL異步主奴隸複製過程。Apr 10, 2025 am 09:30 AM

MySQL異步主從復制通過binlog實現數據同步，提升讀性能和高可用性。 1)主服務器記錄變更到binlog；2)從服務器通過I/O線程讀取binlog；3)從服務器的SQL線程應用binlog同步數據。

mysql：簡單的概念，用於輕鬆學習Apr 10, 2025 am 09:29 AM

MySQL是一個開源的關係型數據庫管理系統。 1）創建數據庫和表：使用CREATEDATABASE和CREATETABLE命令。 2）基本操作：INSERT、UPDATE、DELETE和SELECT。 3）高級操作：JOIN、子查詢和事務處理。 4）調試技巧：檢查語法、數據類型和權限。 5）優化建議：使用索引、避免SELECT*和使用事務。

MySQL：數據庫的用戶友好介紹Apr 10, 2025 am 09:27 AM

MySQL的安裝和基本操作包括：1.下載並安裝MySQL，設置根用戶密碼；2.使用SQL命令創建數據庫和表，如CREATEDATABASE和CREATETABLE；3.執行CRUD操作，使用INSERT,SELECT,UPDATE,DELETE命令；4.創建索引和存儲過程以優化性能和實現複雜邏輯。通過這些步驟，你可以從零開始構建和管理MySQL數據庫。

InnoDB緩衝池如何工作，為什麼對性能至關重要？Apr 09, 2025 am 12:12 AM

InnoDBBufferPool通過將數據和索引頁加載到內存中來提升MySQL數據庫的性能。 1)數據頁加載到BufferPool中，減少磁盤I/O。 2)臟頁被標記並定期刷新到磁盤。 3)LRU算法管理數據頁淘汰。 4)預讀機制提前加載可能需要的數據頁。

MySQL：初學者的數據管理易用性Apr 09, 2025 am 12:07 AM

MySQL適合初學者使用，因為它安裝簡單、功能強大且易於管理數據。 1.安裝和配置簡單，適用於多種操作系統。 2.支持基本操作如創建數據庫和表、插入、查詢、更新和刪除數據。 3.提供高級功能如JOIN操作和子查詢。 4.可以通過索引、查詢優化和分錶分區來提升性能。 5.支持備份、恢復和安全措施，確保數據的安全和一致性。

See all articles