首頁 >資料庫 >mysql教程 >完全掌握MySQL主從延遲的解決方法

完全掌握MySQL主從延遲的解決方法

WBOY
WBOY轉載
2022-06-29 15:03:562971瀏覽

這篇文章為大家帶來了關於mysql的相關知識,其中主要整理了主從延遲的解決方法相關問題,包括了什麼是主從延遲、主從延遲的來源、主從延遲的解決方案等等內容,下面一起來看一下,希望對大家有幫助。

完全掌握MySQL主從延遲的解決方法

推薦學習:mysql影片教學

#之前專案中基於MySQL 主從複製以及AOP 的方式實作了讀寫分離,也寫了部落格記錄了這個實現過程。既然配置了MySQL 主從複製,那麼自然會存在主從延遲,如何盡可能減小主從延遲對應用系統的影響是很有必要的思考點,我個人認為主從延遲的解決方案正是實現讀寫入分離、MySQL 主從複製的精髓。

關於這個主題其實我之前就想著寫篇部落格分享一下,但我一直沒有提上日程。最近有讀者在《SpringBoot實現MySQL讀寫分離》 中留言問到了這個問題,也激勵我寫下了本文。關於這個問題,我閱讀了很多資料和博客,並經過自己的實踐實操,站在大佬的肩膀上總結下了這篇博客。

什麼是主從延遲

在討論如何解決主從延遲之前,我們先了解下什麼是主從延遲。

為了完成主從複製,從庫需要透過I/O 執行緒取得主庫中dump 執行緒讀取的binlog 內容並寫入到自己的中繼日誌relay log 中,從庫的SQL 執行緒再讀取中繼日誌,重做中繼日誌中的日誌,相當於再執行一遍SQL,更新自己的資料庫,以達到資料的一致性

與資料同步有關的時間點主要包括以下三個:

  1. 主函式庫執行完一個事務,寫入binlog,將這個時刻記為T1;
  2. 之後傳給從庫,將從庫接收完這個binlog 的時刻記為T2;
  3. 從庫執行完成這個事務,將這個時刻記為T3。

所謂主從延遲,就是同一個事務,從函式庫執行完成的時間與主函式庫執行完成的時間之差,也就是 T3 - T1

可以在備庫上執行 show slave status 指令,它的回傳結果裡面會顯示 seconds_behind_master,用來表示目前備庫延遲了多少秒。
seconds_behind_master 的計算方法是這樣的:

  1. 每個事務的binlog 裡面都有一個時間字段,用於記錄主庫上寫入的時間;
  2. 備庫取出目前正在執行的交易的時間欄位的值,計算它與目前系統時間的差值,得到seconds_behind_master

在網路正常的時候,日誌從主函式庫傳給從函式庫所需的時間是很短的,即 T2 - T1 的值是非常小的。也就是說,在網路正常情況下,主從延遲的主要來源是從函式庫接收完 binlog 和執行完這個交易之間的時間差。

由於主從延遲的存在,我們可能會發現,資料剛寫入主庫,結果卻查不到,因為可能還未同步到從庫。主從延遲越嚴重,問題也愈加明顯。

主從延遲的來源

主函式庫和從函式庫在執行同一筆交易的時候出現時間差的問題,主要原因包括但不限於以下幾種情況:

  • #有些部署條件下,從庫所在機器的效能要比主庫效能差
  • 從庫的壓力較大,即從庫承受了大量的請求。
  • 執行大事務。因為主庫上必須等事務執行完成才會寫入 binlog,再傳給備庫。如果一個主庫上語句執行 10 分鐘,那麼這個交易可能會導致從庫延遲 10 分鐘。
  • 從庫的平行複製能力

主從延遲的解決方案

解決主從延遲主要有以下方案:

  1. 配合semi-sync 半同步複製;
  2. 一主多從,分攤從庫壓力;
  3. 強制走主庫方案(強一致性);
  4. sleep 方案:主庫更新後,讀從庫之前先sleep 一下;
  5. 判斷主備無延遲方案(例如判斷seconds_behind_master 參數是否已經等於0、對比位點);
  6. 並行複製 — 解決從庫複製延遲的問題;

這裡主要介紹我在專案中使用的幾種方案,分別是半同步複製、即時性操作強制走主庫、並行複製

semi-sync 半同步複製

MySQL 有三種同步模式,分別是:

「Asynchronous Replication」: MySQL’s default replication is asynchronous. The main database will immediately return the results to the client after executing the transaction submitted by the client. It does not care whether the slave database has been Receive and process. This will cause a problem. Once the main database goes down, the transactions that have been submitted on the main database may not be transmitted to the slave database due to network reasons. If a failover is performed at this time and the slave is forcibly promoted to the master, it may cause The data on the new master is incomplete.

"Fully synchronous replication": It means that when the main database has completed a transaction and all slave databases have executed the transaction, the main database will submit the transaction and return the results to the client. Because you need to wait for all slave libraries to complete the transaction before returning, the performance of fully synchronous replication will inevitably be seriously affected.

"Semi-synchronous replication": It is a type between fully synchronous replication and fully asynchronous replication. The master library only needs to wait for at least one slave library to receive and write to the Relay Log Just file, the main library does not need to wait for all slave libraries to return ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" confirmation to the client.

MySQL's default replication is asynchronous, so there will be a certain delay in data between the master database and the slave database. More importantly, asynchronous replication may cause data loss. However, fully synchronous replication will lengthen the time to complete a transaction and reduce performance. So I turned my attention to semi-synchronous replication. Starting from MySQL 5.5, MySQL supports semi-sync semi-synchronous replication in the form of a plug-in.

Compared with asynchronous replication, semi-synchronous replication improves data security and reduces master-slave delay. Of course, it still has a certain degree of delay. This delay is at least one TCP/IP round-trip time. Therefore, semi-synchronous replication is best used in a low-latency network.

It should be noted that:

  • Both the master library and the slave library must enable semi-synchronous replication before semi-synchronous replication can be performed. Otherwise, the master library will be restored to the default Asynchronous replication.
  • If during the waiting process, the waiting time has exceeded the configured timeout and no ACK is received from any slave library, then the main library will automatically convert to asynchronous replication at this time. When at least one semi-synchronous slave node catches up, the master database will automatically convert to semi-synchronous replication.

Potential problems with semi-synchronous replication

In traditional semi-synchronous replication (introduced in MySQL 5.5), the main database writes data to binlog and executes commit after committing the transaction. , will always wait for an ACK from the slave library, that is, after the slave library writes the Relay Log, writes the data to disk, and then returns the ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" message to the client. confirm.

完全掌握MySQL主從延遲的解決方法

This will cause a problem, that is, the main library has actually committed the transaction to the storage engine layer. The application can already see the data changes and is just waiting for the return. That’s all. If the master database is down at this time , the slave database may not have written the Relay Log, and data inconsistency between the master and slave databases will occur.

In order to solve the above problems, MySQL 5.7 introduces enhanced semi-synchronous replication. For the above picture, "Waiting Slave dump" is adjusted to before "Storage Commit", that is, after the master library writes data to the binlog, it starts to wait for the response ACK from the slave library until at least one slave library writes to the Relay Log, and then The data is written to disk, and then ACK is returned to the main library, notifying the main library that it can perform the commit operation, and thenthe main library submits the transaction to the transaction engine layer, and the application can see the data changes at this time.

完全掌握MySQL主從延遲的解決方法

Of course, the previous semi-synchronization scheme is also supported. MySQL 5.7.2 introduces a new parameter rpl_semi_sync_master_wait_point for control. This parameter has two values:

  1. AFTER_SYNC: This is a new semi-synchronization scheme, Waiting Slave dump before Storage Commit.
  2. AFTER_COMMIT: This is the old semi-synchronous scheme.

In MySQL 5.5 - 5.6 using after_commit mode, after the client transaction is submitted at the storage engine layer, while the main database is waiting for confirmation from the slave database, the main database goes down. At this time, although the result is not returned to the current client, the transaction has been submitted, and other clients will read the submitted transaction. If the slave database does not receive the transaction or does not write it to the relay log, and the master database is down, and then switches to the standby database, the previously read transactions will disappear, and phantom reads will occur, which means the data will be lost.

The default value of MySQL 5.7 is after_sync. The master database writes each transaction to binlog, passes it to the slave database and flushes it to disk (relay log). The main library waits until the slave library returns ack, then commits the transaction and returns the commit OK result to the client. Even if the main library crashes, all transactions that have been committed on the main library can be guaranteed to be synchronized to the relay log of the slave library, solving the problems of phantom reading and data loss caused by the after_commit mode, Data consistency during failover Will be promoted. Because if the slave database does not write successfully, the master database will not commit the transaction. In addition, waiting for ACK from the slave library before committing can also accumulate transactions, which is beneficial to group commit group submission and improves performance.

But this will also have a problem. Assuming that the main library hangs before the storage engine is submitted, then it is obvious that the transaction is unsuccessful. However, since the corresponding Binlog has already performed a Sync operation, from now on The library has received these Binlogs and executed them successfully, which is equivalent to having extra data on the slave library (the slave library has this data but the main library does not), which can be considered a problem, but the extra data is generally not a serious problem. What it can guarantee is that no data will be lost. Having more data is better than losing data.

One master and multiple slaves

If the slave database undertakes a large number of query requests, the query operation on the slave database will consume a lot of CPU resources, thus affecting the synchronization speed and causing the master to from delay. Then we can connect several more slave libraries and let these slave libraries share the reading pressure.

In short, it is to add machines. The method is simple and crude, but it will also bring a certain cost.

Forcing the main database solution

If some operations have strict requirements on real-time data, they need to reflect the latest real-time data, such as finance involving money. system, online real-time system, or business that reads immediately after writing, then we have to give up the separation of reading and writing, and let such read requests also go through the main library, so there is no delay problem.

Of course, this also loses the performance improvement brought to us by the separation of reading and writing, so appropriate trade-offs are required.

Parallel replication

Generally, MySQL master-slave replication involves three threads, all of which are single threads: Binlog Dump thread, IO thread, and SQL thread. Replication delays generally occur in two places:

  • SQL threads are too busy (the main reason);
  • Network jitter causes IO thread replication delays (secondary reasons).

The execution of the log on the standby database is the logic of the SQL thread on the standby database executing the relay log (relay log) to update the data.

Before MySQL version 5.6, MySQL only supported single-threaded replication. As a result, serious master-slave delay problems would occur when the main database concurrency and TPS were high. Starting from MySQL 5.6, there is the concept of multiple SQL threads, which can restore data concurrently, that is, parallel replication technology. This can very well solve the MySQL master-slave delay problem.

From single-threaded replication to the latest version of multi-threaded replication, the evolution has gone through several versions. In fact, in the final analysis, all multi-threaded replication mechanisms are to split the sql_thread with only one thread into multiple threads, which means they all conform to the following multi-threading model:

coordinator is the original sql_thread, but now it no longer directly updates data, it is only responsible for reading the transit log and distributing transactions. What actually updates the log becomes the worker thread. The number of worker threads is determined by the parameter slave_parallel_workers.

Since worker threads run concurrently, in order to ensure transaction isolation and avoid update coverage problems, the coordinator needs to meet the following two basic requirements when distributing:

  1. Two transactions that update the same row must be distributed to the same worker (to avoid update coverage) .
  2. The same transaction cannot be split and must be placed in the same worker (to ensure transaction isolation).

Various versions of multi-threaded replication follow these two basic principles.

The following are table-by-table distribution strategy and row-by-row distribution strategy, which can help understand the iteration of the MySQL official version of parallel replication strategy:

  • Distribution by table strategy: If two transactions update different tables, they can be parallel. Because the data is stored in the table, distribution by table ensures that two workers will not update the same row.
    • The table-based distribution scheme works well in scenarios where multiple tables have even loads, but the disadvantage is: if a hot table is encountered, for example, when all update transactions involve a certain table, All transactions will be assigned to the same worker, which becomes single-threaded replication.
  • Row-by-row distribution strategy: If two transactions do not update the same rows, they can run in parallel on the standby database. Obviously, this mode requires that the binlog format must be row.
    • The row-by-row parallel replication solution solves the problem of hotspot tables and has a higher degree of parallelism. However, the disadvantage is: compared with the table-by-table parallel distribution strategy, the row-by-row parallel strategy requires consumption when deciding thread distribution. More computing resources.

Parallel replication strategy of MySQL 5.6 version

MySQL 5.6 version supports parallel replication, but the supported granularity is Per-database parallelism (based on Schema).

The core idea is: When tables under different schemas are submitted concurrently, the data will not affect each other, that is, the slave library can allocate a SQL thread-like function to different schemas in the relay log. thread to replay the transactions submitted by the main library in the relay log to keep the data consistent with the main library.

If there are multiple DBs on the main database, using this strategy can greatly improve the replication speed from the slave database. But usually there are multiple tables in a single database, so database-based concurrency has no effect, and parallel replay cannot be done at all, so this strategy is not used much.

Parallel replication strategy of MySQL 5.7

MySQL 5.7 introduces parallel replication based on group submission, the parameter slave_parallel_workers sets the number of parallel threads, by the parameter slave-parallel-type to control the parallel replication strategy:

  • is configured as DATABASE, which means using the database-by-database parallel strategy of MySQL 5.6 version;
  • is configured as LOGICAL_CLOCK , indicating the use of a parallel replication strategy based on group submission;

Using the group commit (group commit) mechanism of binlog, it can be concluded that transactions submitted by a group can be executed in parallel for the following reasons: Transactions that can be submitted in the same group will definitely not modify the same row (due to MySQL's locking mechanism), because the transaction has passed the lock conflict test.

The specific process of parallel replication based on group submission is as follows:

  1. Transactions submitted together in a group have the same commit_id, and the next group is commit_id 1; commit_id is written directly into the binlog;
  2. When transmitted to the standby database application, transactions with the same commit_id are distributed to multiple workers for execution;
  3. After all executions of this group are completed, the coordinator Go and get the next batch for execution.

All transactions in prepare and commit states can be executed in parallel on the standby database.

Two relevant parameters submitted by the binlog group:

  • binlog_group_commit_sync_delay parameter, which indicates the number of microseconds to delay before calling fsync to flush the disk;
  • binlog_group_commit_sync_no_delay_count parameter, which indicates The accumulated number of times before fsync is called.

These two parameters are used to deliberately lengthen the time from binlog write to fsync, thereby reducing the number of binlog writes to disk. In the parallel replication strategy of MySQL 5.7, they can be used to create more "transactions in the prepare phase simultaneously". You can consider adjusting the values ​​of these two parameters to achieve the purpose of improving the concurrency of the standby database replication.

Recommended learning: mysql video tutorial

以上是完全掌握MySQL主從延遲的解決方法的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述:
本文轉載於:csdn.net。如有侵權,請聯絡admin@php.cn刪除