Home  >  Article  >  Database  >  Completely master the solution to MySQL master-slave delay

Completely master the solution to MySQL master-slave delay

WBOY
WBOYforward
2022-06-29 15:03:562925browse

This article brings you relevant knowledge about mysql, which mainly organizes the issues related to the solution of master-slave delay, including what is master-slave delay, the source of master-slave delay, Let’s take a look at the master-slave delay solution and so on. I hope it will be helpful to everyone.

Completely master the solution to MySQL master-slave delay

Recommended learning: mysql video tutorial

In previous projects, read-write separation was achieved based on MySQL master-slave replication and AOP. , also wrote a blog to record this implementation process. Since MySQL master-slave replication is configured, there will naturally be master-slave delay. How to minimize the impact of master-slave delay on the application system is a necessary thinking point. I personally think that the solution to master-slave delay is to implement read Write separation, the essence of MySQL master-slave replication.

Regarding this topic, I actually thought about writing a blog to share it before, but I never put it on the agenda. Recently, a reader left a message asking this question in "SpringBoot Implements MySQL Read-Write Separation", which also inspired me to write this article. Regarding this issue, I read a lot of information and blogs, and through my own practice, I summarized this blog by standing on the shoulders of the boss.

What is master-slave delay

Before discussing how to solve master-slave delay, let us first understand what master-slave delay is.

In order to complete the master-slave replication, the slave library needs to obtain the binlog content read by the dump thread in the master library through the I/O thread and write it into its own relay log, and then the SQL thread of the slave library needs to Reading the relay log and redoing the logs in the relay log is equivalent to executing SQL again and updating your own database to achieve data consistency.

The time points related to data synchronization mainly include the following three:

  1. The main library completes a transaction and writes it to the binlog, recording this moment as T1;
  2. After that, it is passed to the slave library, and the time when the binlog is received from the slave library is recorded as T2;
  3. The slave library executes the transaction and records this time as T3.

The so-called master-slave delay is the difference between the time when the execution of the slave library is completed and the time when the execution of the main library is completed for the same transaction, that is, T3 - T1.

You can execute the show slave status command on the standby database, and its return result will display seconds_behind_master, which is used to indicate how many seconds the current standby database is delayed.
seconds_behind_master The calculation method is as follows:

  1. There is a time field in the binlog of each transaction, which is used to record the time written on the master library;
  2. The standby database takes out the value of the time field of the currently executing transaction, calculates the difference between it and the current system time, and obtains seconds_behind_master.

When the network is normal, the time required for the log to be transmitted from the master database to the slave database is very short, that is, the value of T2 - T1 is very small. In other words, under normal network conditions, the main source of master-slave delay is the time difference between the slave library receiving the binlog and executing the transaction.

Due to the existence of master-slave delay, we may find that the data has just been written to the master library, but the result cannot be found because it may not have been synchronized to the slave library. The more serious the master-slave delay is, the more obvious this problem becomes.

Source of master-slave delay

There is a time difference problem between the master library and the slave library when executing the same transaction. The main reasons include but are not limited to the following situations:

  • Under some deployment conditions, the performance of the machine where the slave library is located is worse than the performance of the main library.
  • The slave library is under greater pressure, that is, the slave library is subjected to a large number of requests.
  • Execute large transactions. Because the main database must wait until the transaction execution is completed before writing it to the binlog and then passing it to the standby database. If a statement on a master database takes 10 minutes to execute, then this transaction may cause a 10-minute delay in the slave database.
  • Parallel copy capability from the library.

Solutions for master-slave delay

There are mainly the following solutions to solve master-slave delay:

  1. With semi-sync semi-synchronous replication;
  2. One master and multiple slaves, share the pressure from the slave database;
  3. Force the master database solution (strong consistency);
  4. sleep solution: After the main library is updated, sleep before reading from the slave library;
  5. Determine the master and backup no-delay scheme (for example, determine whether the seconds_behind_master parameter is already equal to 0, compare the bit point);
  6. Parallel replication - Solve the problem of replication delay from the library;

Here we mainly introduce several solutions I used in the project, namely Semi-synchronous replication, real-time operations force the main database to be used, and parallel replication.

semi-sync Semi-synchronous replication

MySQL has three synchronization modes, namely:

「Asynchronous Replication」: MySQL’s default replication is asynchronous. The main database will immediately return the results to the client after executing the transaction submitted by the client. It does not care whether the slave database has been Receive and process. This will cause a problem. Once the main database goes down, the transactions that have been submitted on the main database may not be transmitted to the slave database due to network reasons. If a failover is performed at this time and the slave is forcibly promoted to the master, it may cause The data on the new master is incomplete.

"Fully synchronous replication": It means that when the main database has completed a transaction and all slave databases have executed the transaction, the main database will submit the transaction and return the results to the client. Because you need to wait for all slave libraries to complete the transaction before returning, the performance of fully synchronous replication will inevitably be seriously affected.

"Semi-synchronous replication": It is a type between fully synchronous replication and fully asynchronous replication. The master library only needs to wait for at least one slave library to receive and write to the Relay Log Just file, the main library does not need to wait for all slave libraries to return ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" confirmation to the client.

MySQL's default replication is asynchronous, so there will be a certain delay in data between the master database and the slave database. More importantly, asynchronous replication may cause data loss. However, fully synchronous replication will lengthen the time to complete a transaction and reduce performance. So I turned my attention to semi-synchronous replication. Starting from MySQL 5.5, MySQL supports semi-sync semi-synchronous replication in the form of a plug-in.

Compared with asynchronous replication, semi-synchronous replication improves data security and reduces master-slave delay. Of course, it still has a certain degree of delay. This delay is at least one TCP/IP round-trip time. Therefore, semi-synchronous replication is best used in a low-latency network.

It should be noted that:

  • Both the master library and the slave library must enable semi-synchronous replication before semi-synchronous replication can be performed. Otherwise, the master library will be restored to the default Asynchronous replication.
  • If during the waiting process, the waiting time has exceeded the configured timeout and no ACK is received from any slave library, then the main library will automatically convert to asynchronous replication at this time. When at least one semi-synchronous slave node catches up, the master database will automatically convert to semi-synchronous replication.

Potential problems with semi-synchronous replication

In traditional semi-synchronous replication (introduced in MySQL 5.5), the main database writes data to binlog and executes commit after committing the transaction. , will always wait for an ACK from the slave library, that is, after the slave library writes the Relay Log, writes the data to disk, and then returns the ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" message to the client. confirm.

Completely master the solution to MySQL master-slave delay

This will cause a problem, that is, the main library has actually committed the transaction to the storage engine layer. The application can already see the data changes and is just waiting for the return. That’s all. If the master database is down at this time , the slave database may not have written the Relay Log, and data inconsistency between the master and slave databases will occur.

In order to solve the above problems, MySQL 5.7 introduces enhanced semi-synchronous replication. For the above picture, "Waiting Slave dump" is adjusted to before "Storage Commit", that is, after the master library writes data to the binlog, it starts to wait for the response ACK from the slave library until at least one slave library writes to the Relay Log, and then The data is written to disk, and then ACK is returned to the main library, notifying the main library that it can perform the commit operation, and thenthe main library submits the transaction to the transaction engine layer, and the application can see the data changes at this time.

Completely master the solution to MySQL master-slave delay

Of course, the previous semi-synchronization scheme is also supported. MySQL 5.7.2 introduces a new parameter rpl_semi_sync_master_wait_point for control. This parameter has two values:

  1. AFTER_SYNC: This is a new semi-synchronization scheme, Waiting Slave dump before Storage Commit.
  2. AFTER_COMMIT: This is the old semi-synchronous scheme.

In MySQL 5.5 - 5.6 using after_commit mode, after the client transaction is submitted at the storage engine layer, while the main database is waiting for confirmation from the slave database, the main database goes down. At this time, although the result is not returned to the current client, the transaction has been submitted, and other clients will read the submitted transaction. If the slave database does not receive the transaction or does not write it to the relay log, and the master database is down, and then switches to the standby database, the previously read transactions will disappear, and phantom reads will occur, which means the data will be lost.

The default value of MySQL 5.7 is after_sync. The master database writes each transaction to binlog, passes it to the slave database and flushes it to disk (relay log). The main library waits until the slave library returns ack, then commits the transaction and returns the commit OK result to the client. Even if the main library crashes, all transactions that have been committed on the main library can be guaranteed to be synchronized to the relay log of the slave library, solving the problems of phantom reading and data loss caused by the after_commit mode, Data consistency during failover Will be promoted. Because if the slave database does not write successfully, the master database will not commit the transaction. In addition, waiting for ACK from the slave library before committing can also accumulate transactions, which is beneficial to group commit group submission and improves performance.

But this will also have a problem. Assuming that the main library hangs before the storage engine is submitted, then it is obvious that the transaction is unsuccessful. However, since the corresponding Binlog has already performed a Sync operation, from now on The library has received these Binlogs and executed them successfully, which is equivalent to having extra data on the slave library (the slave library has this data but the main library does not), which can be considered a problem, but the extra data is generally not a serious problem. What it can guarantee is that no data will be lost. Having more data is better than losing data.

One master and multiple slaves

If the slave database undertakes a large number of query requests, the query operation on the slave database will consume a lot of CPU resources, thus affecting the synchronization speed and causing the master to from delay. Then we can connect several more slave libraries and let these slave libraries share the reading pressure.

In short, it is to add machines. The method is simple and crude, but it will also bring a certain cost.

Forcing the main database solution

If some operations have strict requirements on real-time data, they need to reflect the latest real-time data, such as finance involving money. system, online real-time system, or business that reads immediately after writing, then we have to give up the separation of reading and writing, and let such read requests also go through the main library, so there is no delay problem.

Of course, this also loses the performance improvement brought to us by the separation of reading and writing, so appropriate trade-offs are required.

Parallel replication

Generally, MySQL master-slave replication involves three threads, all of which are single threads: Binlog Dump thread, IO thread, and SQL thread. Replication delays generally occur in two places:

  • SQL threads are too busy (the main reason);
  • Network jitter causes IO thread replication delays (secondary reasons).

The execution of the log on the standby database is the logic of the SQL thread on the standby database executing the relay log (relay log) to update the data.

Before MySQL version 5.6, MySQL only supported single-threaded replication. As a result, serious master-slave delay problems would occur when the main database concurrency and TPS were high. Starting from MySQL 5.6, there is the concept of multiple SQL threads, which can restore data concurrently, that is, parallel replication technology. This can very well solve the MySQL master-slave delay problem.

From single-threaded replication to the latest version of multi-threaded replication, the evolution has gone through several versions. In fact, in the final analysis, all multi-threaded replication mechanisms are to split the sql_thread with only one thread into multiple threads, which means they all conform to the following multi-threading model:

coordinator is the original sql_thread, but now it no longer directly updates data, it is only responsible for reading the transit log and distributing transactions. What actually updates the log becomes the worker thread. The number of worker threads is determined by the parameter slave_parallel_workers.

Since worker threads run concurrently, in order to ensure transaction isolation and avoid update coverage problems, the coordinator needs to meet the following two basic requirements when distributing:

  1. Two transactions that update the same row must be distributed to the same worker (to avoid update coverage) .
  2. The same transaction cannot be split and must be placed in the same worker (to ensure transaction isolation).

Various versions of multi-threaded replication follow these two basic principles.

The following are table-by-table distribution strategy and row-by-row distribution strategy, which can help understand the iteration of the MySQL official version of parallel replication strategy:

  • Distribution by table strategy: If two transactions update different tables, they can be parallel. Because the data is stored in the table, distribution by table ensures that two workers will not update the same row.
    • The table-based distribution scheme works well in scenarios where multiple tables have even loads, but the disadvantage is: if a hot table is encountered, for example, when all update transactions involve a certain table, All transactions will be assigned to the same worker, which becomes single-threaded replication.
  • Row-by-row distribution strategy: If two transactions do not update the same rows, they can run in parallel on the standby database. Obviously, this mode requires that the binlog format must be row.
    • The row-by-row parallel replication solution solves the problem of hotspot tables and has a higher degree of parallelism. However, the disadvantage is: compared with the table-by-table parallel distribution strategy, the row-by-row parallel strategy requires consumption when deciding thread distribution. More computing resources.

Parallel replication strategy of MySQL 5.6 version

MySQL 5.6 version supports parallel replication, but the supported granularity is Per-database parallelism (based on Schema).

The core idea is: When tables under different schemas are submitted concurrently, the data will not affect each other, that is, the slave library can allocate a SQL thread-like function to different schemas in the relay log. thread to replay the transactions submitted by the main library in the relay log to keep the data consistent with the main library.

If there are multiple DBs on the main database, using this strategy can greatly improve the replication speed from the slave database. But usually there are multiple tables in a single database, so database-based concurrency has no effect, and parallel replay cannot be done at all, so this strategy is not used much.

Parallel replication strategy of MySQL 5.7

MySQL 5.7 introduces parallel replication based on group submission, the parameter slave_parallel_workers sets the number of parallel threads, by the parameter slave-parallel-type to control the parallel replication strategy:

  • is configured as DATABASE, which means using the database-by-database parallel strategy of MySQL 5.6 version;
  • is configured as LOGICAL_CLOCK , indicating the use of a parallel replication strategy based on group submission;

Using the group commit (group commit) mechanism of binlog, it can be concluded that transactions submitted by a group can be executed in parallel for the following reasons: Transactions that can be submitted in the same group will definitely not modify the same row (due to MySQL's locking mechanism), because the transaction has passed the lock conflict test.

The specific process of parallel replication based on group submission is as follows:

  1. Transactions submitted together in a group have the same commit_id, and the next group is commit_id 1; commit_id is written directly into the binlog;
  2. When transmitted to the standby database application, transactions with the same commit_id are distributed to multiple workers for execution;
  3. After all executions of this group are completed, the coordinator Go and get the next batch for execution.

All transactions in prepare and commit states can be executed in parallel on the standby database.

Two relevant parameters submitted by the binlog group:

  • binlog_group_commit_sync_delay parameter, which indicates the number of microseconds to delay before calling fsync to flush the disk;
  • binlog_group_commit_sync_no_delay_count parameter, which indicates The accumulated number of times before fsync is called.

These two parameters are used to deliberately lengthen the time from binlog write to fsync, thereby reducing the number of binlog writes to disk. In the parallel replication strategy of MySQL 5.7, they can be used to create more "transactions in the prepare phase simultaneously". You can consider adjusting the values ​​of these two parameters to achieve the purpose of improving the concurrency of the standby database replication.

Recommended learning: mysql video tutorial

The above is the detailed content of Completely master the solution to MySQL master-slave delay. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete