Home >Database >Mysql Tutorial >How to solve the dual-write problem between Redis and MySQL
Strictly speaking, any non-atomic operation cannot guarantee consistency unless blocking reads and writes are used to achieve strong consistency. Therefore, the goal we pursue in the cache architecture is eventual consistency.
Caching improves performance by sacrificing strong consistency.
This is determined by the CAP theory. The applicable scenario for the cache system is the non-strong consistency scenario, which belongs to the AP in CAP.
The following three cache read and write strategies have their own advantages and disadvantages, and there is no best one.
Cache-Aside Pattern, that is, bypass cache mode, is proposed for Solve the data inconsistency problem between cache and database as much as possible.
Read: Read data from the cache and return directly after reading. If it cannot be read, load it from the database, write it to the cache, and then return the response.
Write: When updating, first update the database and then delete the cache.
In the Read/Write Through Pattern, the server regards the cache as the main data storage, reads data from it and writes the data in. The responsibility of the Cache service is to read and write DB data, thereby reducing the burden on the application.
Because the distributed cache Redis we often use does not provide the cache function of writing data to DB, it is not used much.
Write: Check the cache first. If it does not exist in the cache, update the DB directly. If it exists in the cache, the cache will be updated first, and then the cache service will update the DB by itself (Update cache and DB simultaneously).
Read: Read data from the cache and return directly after reading it. If it cannot be read, load it from DB first, write it to cache and then return the response.
Write Behind Pattern is very similar to Read/Write Through Pattern. Both are handled by the cache service to read and write cache and DB.
However, there are big differences between the two: Read/Write Through updates the cache and DB synchronously, while Write Behind Caching only updates the cache and does not directly update the DB, but instead Update DB in asynchronous batch mode.
Obviously, this method brings greater challenges to data consistency. For example, if the cache data may not be updated asynchronously to the DB, the cache service may hang, which will cause A greater disaster.
This strategy is also very rare in our daily development process, but it does not mean that it has few application scenarios. For example, the asynchronous writing of messages in the message queue to disk and MySQL's InnoDB Buffer Pool mechanism all use this kind of strategy.
Write Behind Pattern The write performance of DB is very high, which is very suitable for some scenarios where the data changes frequently and the data consistency requirements are not so high, such as the number of views and likes.
The bypass cache mode is the one we use most in daily life. Based on the bypass cache mode introduced above, we may have the following questions.
Why the write operation deletes the cache instead of updating the cache
Answer: Thread A initiates a write operation first, and updates it first database. Thread B initiates another write operation, and updates the database in the second step. Due to network and other reasons, thread B updates the cache first, and thread A updates the cache.
At this time, the cache saves A's data (old data), and the database saves B's data (new data). The data is inconsistent, and dirty data appears. If deletes the cache instead of updating the cache, this dirty data problem will not occur.
In fact, it is possible to update the cache when writing operations are required, but we need to add a lock/distributed lock to ensure that there are no thread safety issues when updating the cache.
In the process of writing data, why do we need to update the DB first and then delete the cache?
Answer: For example, request 1 is a write operation. If First delete cache A, request 2 is a read operation, first read cache A, find that the cache has been deleted (deleted by request 1), and then read the database, but at this time request 1 has not had time to update the data in time, then request 2 What is read is old data, and request 2 will also put the old data read into the cache, causing data inconsistency.
In fact, it is also possible to delete the cache first and then update the database. For example, if you adopt the delayed double delete strategy
sleep for 1 second and then eliminate the cache again, you can delete all data within 1 second. The cached dirty data caused by this is deleted again. It doesn’t have to be 1 second, it depends on your business, However, this approach is not recommended, because many factors may happen in this 1 second, and its uncertainty is too great.
In the process of writing data, is it okay to update the DB first and then delete the cache?
Answer: In theory, data inconsistency may still occur, but the probability is very small.
Assume that there will be two requests, one requesting A to perform a query operation, and one requesting B to perform an update operation, then the following situation will occur
(1) The cache just expired
(2) Request A to query the database and get an old value
(3) Request B to write the new value into the database
(4) Request B to delete the cache
(5) Request A to write the old value found to the cache ok. If the above situation occurs, dirty data will indeed occur.
However, the probability of this happening is not high
There is a congenital condition for the above situation to occur, that is, the database writing operation in step (3) is smaller than that in step ( The read database operation of 2) takes less time, so it is possible to make step (4) precede step (5).
However, if you think about it carefully, the read operation of the database is much faster than the write operation (otherwise, why do we do the separation of reading and writing? The meaning of doing the separation of reading and writing is because the reading operation is faster and consumes less resources) , so step (3) takes less time than step (2), and this situation is difficult to occur.
Are there any other reasons for the inconsistency?
Answer: If the cache deletion fails, it will cause inconsistency
How to solve it?
Use Canal to subscribe to the binlog of the database and obtain the data that needs to be operated. Start another program to obtain the information from this subscription program and delete the cache.
Defect 1: The first requested data must not be in the cache
Solution: hot data can be put in advance in cache.
Defect 2: Frequent write operations will cause the data in the cache to be frequently deleted, which will affect the cache hit rate.
Strong consistency scenario between database and cache data: When updating the DB, the cache is also updated, but we need to add a lock/distributed lock to ensure that there are no thread safety issues when updating the cache. Scenarios in which the database and cache data are inconsistent can be temporarily allowed: when updating the DB, the cache is also updated, but a relatively short expiration time is added to the cache. This ensures that even if the data is inconsistent, the impact will be relatively small.
The above is the detailed content of How to solve the dual-write problem between Redis and MySQL. For more information, please follow other related articles on the PHP Chinese website!