Home >Database >Mysql Tutorial >Completely master the solution to MySQL master-slave delay
This article brings you relevant knowledge about mysql, which mainly organizes the issues related to the solution of master-slave delay, including what is master-slave delay, the source of master-slave delay, Let’s take a look at the master-slave delay solution and so on. I hope it will be helpful to everyone.
Recommended learning: mysql video tutorial
In previous projects, read-write separation was achieved based on MySQL master-slave replication and AOP. , also wrote a blog to record this implementation process. Since MySQL master-slave replication is configured, there will naturally be master-slave delay. How to minimize the impact of master-slave delay on the application system is a necessary thinking point. I personally think that the solution to master-slave delay is to implement read Write separation, the essence of MySQL master-slave replication.
Regarding this topic, I actually thought about writing a blog to share it before, but I never put it on the agenda. Recently, a reader left a message asking this question in "SpringBoot Implements MySQL Read-Write Separation", which also inspired me to write this article. Regarding this issue, I read a lot of information and blogs, and through my own practice, I summarized this blog by standing on the shoulders of the boss.
Before discussing how to solve master-slave delay, let us first understand what master-slave delay is.
In order to complete the master-slave replication, the slave library needs to obtain the binlog content read by the dump thread in the master library through the I/O thread and write it into its own relay log, and then the SQL thread of the slave library needs to Reading the relay log and redoing the logs in the relay log is equivalent to executing SQL again and updating your own database to achieve data consistency.
The time points related to data synchronization mainly include the following three:
The so-called master-slave delay is the difference between the time when the execution of the slave library is completed and the time when the execution of the main library is completed for the same transaction, that is, T3 - T1
.
You can execute the show slave status
command on the standby database, and its return result will display seconds_behind_master
, which is used to indicate how many seconds the current standby database is delayed. seconds_behind_master
The calculation method is as follows:
seconds_behind_master
. When the network is normal, the time required for the log to be transmitted from the master database to the slave database is very short, that is, the value of T2 - T1
is very small. In other words, under normal network conditions, the main source of master-slave delay is the time difference between the slave library receiving the binlog and executing the transaction.
Due to the existence of master-slave delay, we may find that the data has just been written to the master library, but the result cannot be found because it may not have been synchronized to the slave library. The more serious the master-slave delay is, the more obvious this problem becomes.
There is a time difference problem between the master library and the slave library when executing the same transaction. The main reasons include but are not limited to the following situations:
There are mainly the following solutions to solve master-slave delay:
seconds_behind_master
parameter is already equal to 0, compare the bit point);Here we mainly introduce several solutions I used in the project, namely Semi-synchronous replication, real-time operations force the main database to be used, and parallel replication.
MySQL has three synchronization modes, namely:
「Asynchronous Replication」: MySQL’s default replication is asynchronous. The main database will immediately return the results to the client after executing the transaction submitted by the client. It does not care whether the slave database has been Receive and process. This will cause a problem. Once the main database goes down, the transactions that have been submitted on the main database may not be transmitted to the slave database due to network reasons. If a failover is performed at this time and the slave is forcibly promoted to the master, it may cause The data on the new master is incomplete.
"Fully synchronous replication": It means that when the main database has completed a transaction and all slave databases have executed the transaction, the main database will submit the transaction and return the results to the client. Because you need to wait for all slave libraries to complete the transaction before returning, the performance of fully synchronous replication will inevitably be seriously affected.
"Semi-synchronous replication": It is a type between fully synchronous replication and fully asynchronous replication. The master library only needs to wait for at least one slave library to receive and write to the Relay Log Just file, the main library does not need to wait for all slave libraries to return ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" confirmation to the client.
MySQL's default replication is asynchronous, so there will be a certain delay in data between the master database and the slave database. More importantly, asynchronous replication may cause data loss. However, fully synchronous replication will lengthen the time to complete a transaction and reduce performance. So I turned my attention to semi-synchronous replication. Starting from MySQL 5.5, MySQL supports semi-sync semi-synchronous replication in the form of a plug-in.
Compared with asynchronous replication, semi-synchronous replication improves data security and reduces master-slave delay. Of course, it still has a certain degree of delay. This delay is at least one TCP/IP round-trip time. Therefore, semi-synchronous replication is best used in a low-latency network.
It should be noted that:
- Both the master library and the slave library must enable semi-synchronous replication before semi-synchronous replication can be performed. Otherwise, the master library will be restored to the default Asynchronous replication.
- If during the waiting process, the waiting time has exceeded the configured timeout and no ACK is received from any slave library, then the main library will automatically convert to asynchronous replication at this time. When at least one semi-synchronous slave node catches up, the master database will automatically convert to semi-synchronous replication.
In traditional semi-synchronous replication (introduced in MySQL 5.5), the main database writes data to binlog and executes commit after committing the transaction. , will always wait for an ACK from the slave library, that is, after the slave library writes the Relay Log, writes the data to disk, and then returns the ACK to the main library. Only after the main library receives this ACK can it return a "transaction completed" message to the client. confirm.
This will cause a problem, that is, the main library has actually committed the transaction to the storage engine layer. The application can already see the data changes and is just waiting for the return. That’s all. If the master database is down at this time , the slave database may not have written the Relay Log, and data inconsistency between the master and slave databases will occur.
In order to solve the above problems, MySQL 5.7 introduces enhanced semi-synchronous replication. For the above picture, "Waiting Slave dump" is adjusted to before "Storage Commit", that is, after the master library writes data to the binlog, it starts to wait for the response ACK from the slave library until at least one slave library writes to the Relay Log, and then The data is written to disk, and then ACK is returned to the main library, notifying the main library that it can perform the commit operation, and thenthe main library submits the transaction to the transaction engine layer, and the application can see the data changes at this time.
Of course, the previous semi-synchronization scheme is also supported. MySQL 5.7.2 introduces a new parameter
rpl_semi_sync_master_wait_point
for control. This parameter has two values:
- AFTER_SYNC: This is a new semi-synchronization scheme, Waiting Slave dump before Storage Commit.
- AFTER_COMMIT: This is the old semi-synchronous scheme.
In MySQL 5.5 - 5.6 using after_commit mode, after the client transaction is submitted at the storage engine layer, while the main database is waiting for confirmation from the slave database, the main database goes down. At this time, although the result is not returned to the current client, the transaction has been submitted, and other clients will read the submitted transaction. If the slave database does not receive the transaction or does not write it to the relay log, and the master database is down, and then switches to the standby database, the previously read transactions will disappear, and phantom reads will occur, which means the data will be lost.
The default value of MySQL 5.7 is after_sync. The master database writes each transaction to binlog, passes it to the slave database and flushes it to disk (relay log). The main library waits until the slave library returns ack, then commits the transaction and returns the commit OK result to the client. Even if the main library crashes, all transactions that have been committed on the main library can be guaranteed to be synchronized to the relay log of the slave library, solving the problems of phantom reading and data loss caused by the after_commit mode, Data consistency during failover Will be promoted. Because if the slave database does not write successfully, the master database will not commit the transaction. In addition, waiting for ACK from the slave library before committing can also accumulate transactions, which is beneficial to group commit group submission and improves performance.
But this will also have a problem. Assuming that the main library hangs before the storage engine is submitted, then it is obvious that the transaction is unsuccessful. However, since the corresponding Binlog has already performed a Sync operation, from now on The library has received these Binlogs and executed them successfully, which is equivalent to having extra data on the slave library (the slave library has this data but the main library does not), which can be considered a problem, but the extra data is generally not a serious problem. What it can guarantee is that no data will be lost. Having more data is better than losing data.
If the slave database undertakes a large number of query requests, the query operation on the slave database will consume a lot of CPU resources, thus affecting the synchronization speed and causing the master to from delay. Then we can connect several more slave libraries and let these slave libraries share the reading pressure.
In short, it is to add machines. The method is simple and crude, but it will also bring a certain cost.
If some operations have strict requirements on real-time data, they need to reflect the latest real-time data, such as finance involving money. system, online real-time system, or business that reads immediately after writing, then we have to give up the separation of reading and writing, and let such read requests also go through the main library, so there is no delay problem.
Of course, this also loses the performance improvement brought to us by the separation of reading and writing, so appropriate trade-offs are required.
Generally, MySQL master-slave replication involves three threads, all of which are single threads: Binlog Dump thread, IO thread, and SQL thread. Replication delays generally occur in two places:
The execution of the log on the standby database is the logic of the SQL thread on the standby database executing the relay log (relay log) to update the data.
Before MySQL version 5.6, MySQL only supported single-threaded replication. As a result, serious master-slave delay problems would occur when the main database concurrency and TPS were high. Starting from MySQL 5.6, there is the concept of multiple SQL threads, which can restore data concurrently, that is, parallel replication technology. This can very well solve the MySQL master-slave delay problem.
From single-threaded replication to the latest version of multi-threaded replication, the evolution has gone through several versions. In fact, in the final analysis, all multi-threaded replication mechanisms are to split the sql_thread with only one thread into multiple threads, which means they all conform to the following multi-threading model:
coordinator is the original sql_thread, but now it no longer directly updates data, it is only responsible for reading the transit log and distributing transactions. What actually updates the log becomes the worker thread. The number of worker threads is determined by the parameter slave_parallel_workers
.
Since worker threads run concurrently, in order to ensure transaction isolation and avoid update coverage problems, the coordinator needs to meet the following two basic requirements when distributing:
Various versions of multi-threaded replication follow these two basic principles.
The following are table-by-table distribution strategy and row-by-row distribution strategy, which can help understand the iteration of the MySQL official version of parallel replication strategy:
MySQL 5.6 version supports parallel replication, but the supported granularity is Per-database parallelism (based on Schema).
The core idea is: When tables under different schemas are submitted concurrently, the data will not affect each other, that is, the slave library can allocate a SQL thread-like function to different schemas in the relay log. thread to replay the transactions submitted by the main library in the relay log to keep the data consistent with the main library.
If there are multiple DBs on the main database, using this strategy can greatly improve the replication speed from the slave database. But usually there are multiple tables in a single database, so database-based concurrency has no effect, and parallel replay cannot be done at all, so this strategy is not used much.
MySQL 5.7 introduces parallel replication based on group submission, the parameter slave_parallel_workers
sets the number of parallel threads, by the parameter slave-parallel-type
to control the parallel replication strategy:
Using the group commit (group commit) mechanism of binlog, it can be concluded that transactions submitted by a group can be executed in parallel for the following reasons: Transactions that can be submitted in the same group will definitely not modify the same row (due to MySQL's locking mechanism), because the transaction has passed the lock conflict test.
The specific process of parallel replication based on group submission is as follows:
All transactions in prepare and commit states can be executed in parallel on the standby database.
Two relevant parameters submitted by the binlog group:
These two parameters are used to deliberately lengthen the time from binlog write to fsync, thereby reducing the number of binlog writes to disk. In the parallel replication strategy of MySQL 5.7, they can be used to create more "transactions in the prepare phase simultaneously". You can consider adjusting the values of these two parameters to achieve the purpose of improving the concurrency of the standby database replication.
Recommended learning: mysql video tutorial
The above is the detailed content of Completely master the solution to MySQL master-slave delay. For more information, please follow other related articles on the PHP Chinese website!