Home  >  Article  >  Java  >  In a high-concurrency scenario, should the cache or the database be updated first?

In a high-concurrency scenario, should the cache or the database be updated first?

Java学习指南
Java学习指南forward
2023-07-26 14:53:291756browse

In a high-concurrency scenario, should the cache or the database be updated first?


##In large systems, in order to reduce database pressure, it is usually Introducing a caching mechanism can easily lead to inconsistencies between cache and database data, causing users to see old data.
In order to reduce data inconsistency, the mechanism of updating the cache and database is particularly important. Next, we will lead you through the pitfalls.

In a high-concurrency scenario, should the cache or the database be updated first?

##Cache aside

Cache aside

That is, Bypass cache, yes More commonly used caching strategies.

(1)

Read requestCommon process

In a high-concurrency scenario, should the cache or the database be updated first?
Cache aside read request

The application will first determine whether the cache has the data. If the cache hits, the data will be returned directly. If the cache misses, the cache will penetrate to the database and retrieve the data from the database. Query the data and then write it back to the cache, and finally return the data to the client.

(2)Write requestCommon process

In a high-concurrency scenario, should the cache or the database be updated first?
Cache aside Write request

First update the database and then delete the data from the cache.

After looking at the picture of the write request, some students may ask: Why do we need to delete the cache? Can't we just update it directly? There are several pitfalls involved here, let’s step through them step by step.

Cache aside pitfalls

If the Cache aside strategy is used incorrectly, you will encounter deep pits. Let’s step into them one by one.

Pitfall 1: Update the database first, then update the cache

If there are two write requests at the same time Data needs to be updated, each write request The database is updated first and then the cache is updated. Data inconsistency may occur in concurrent scenarios.

In a high-concurrency scenario, should the cache or the database be updated first?
Update the database first, then update the cache

The execution process as shown above:

(1)Write request 1Update the database, update the age field to 18;

(2)Write request 2Update the database, update the age field to 20;

(3)Write request 2Update cache, cache age is set to 20;

(4)Write request 1Update cache, cache age is set to 18 ;

The expected result after execution is that the database age is 20, the cache age is 20, and the result cache age is 18. This causes the cache data to be not the latest and dirty data appears.

Trap 2: Delete the cache first, then update the database

If write requestThe processing flow isDelete the cache first and then update Database , in a read request and a write request concurrent scenario, data inconsistency may occur.

In a high-concurrency scenario, should the cache or the database be updated first?
Delete the cache first, then update the database

The execution process as shown above:

(1)Write requestDelete cached data;

(2)Read request Query cache miss (Hit Miss), then query the database and write the returned data back to the cache;

(3)Write requestUpdate database.

After the whole process, it was found that the age in database was 20, and the age in cache was 18. The cache and database data were inconsistent, and dirty data appeared in the cache.

Trap Three: Update the database first, then delete the cache

In the actual system, for write requests it is still recommended to update first The database then deletes the cache, but there are still problems in theory, as shown in the following example.

In a high-concurrency scenario, should the cache or the database be updated first?
Update the database first, then delete the cache

The execution process as shown above:

(1)Read requestQuery the cache first, if the cache is not hit, query the database to return data;

(2)Write requestUpdate the database and delete the cache;

(3) Read request Write back cache;

After the entire process, it was found that database age is 20, cache age is 18, that is, the database and cache are inconsistent, resulting in application The data read by the program from the cache is old data.

But if we think about it carefully, the probability of the above problem occurring is actually very low, because database update operations usually take several orders of magnitude more time than memory operations. The last step in the figure above is write-back caching (set age 18) It's very fast and usually completes before updating the database.

What if this extreme scenario occurs? We have to think of a solution: Cache data setting expiration time. Usually in the system, a small amount of data can be allowed to be inconsistent for a short period of time.

Read through

In the Cache Aside update mode, the application code needs to maintain two data sources: one is the cache and the other is the database. Under the Read-Through strategy, the application does not need to manage the cache and database, and only needs to entrust the synchronization of the database to the cache provider Cache Provider. All data interactions are completed through the Abstract Cache Layer.

In a high-concurrency scenario, should the cache or the database be updated first?
Read-Through process

As shown above, the application only needs to interact with Cache Provider, and does not need to care whether it is fetched from the cache or database.

When performing a large number of reads, Read-Through can reduce the load on the data source and is also resilient to cache service failures. If the cache service goes down, the cache provider can still operate by going directly to the data source.

Read-Through is suitable for scenarios where the same data is requested multiple times, which is very similar to the Cache-Aside strategy, but there are still some differences between the two, which are emphasized again:

  • In Cache-Aside, the application is responsible for getting data from the data source and updating it to the cache.
  • In Read-Through, this logic is usually supported by an independent cache provider (Cache Provider).

Write through

Write-Through strategy, when a data update (Write) occurs, the cache provider Cache Provider Responsible for updating the underlying data source and cache.

The cache remains consistent with the data source, and writes always reach the data source through the Abstract Cache Layer.

Cache ProviderIt acts like a proxy.

In a high-concurrency scenario, should the cache or the database be updated first?
Write-Through process

Write behind

Write behindIn some places Also called Write back, the simple understanding is: when the application updates data, it only updates the cache, Cache Provider refreshes the data into the database at regular intervals. To put it bluntly, it is Delayed writing.

In a high-concurrency scenario, should the cache or the database be updated first?
Write behind process

As shown above, when the application updates two data, the Cache Provider will write it to the cache immediately, but it will be written to the database in batches after a period of time. middle.

This method has advantages and disadvantages:

  • Advantage is that the data writing speed is very fast and is suitable for frequent writing scenarios. .

  • Disadvantage is that the cache and database are not strongly consistent, so use it with caution in systems with high consistency requirements.

To summarize

After learning so much, I believe everyone has a clear understanding of the cache update strategy. Finally, a little summary.

There are three main strategies for cache update:

  • Cache aside
  • Read/Write through
  • Write behind

Cache aside Usually the database is updated first, and then the cache is deleted. To protect the data, the cache time is usually set.

Read/Write through generally provides read and write operations by a Cache Provider, and the application does not need to know whether the cache or the database is being operated.

Write behind simply understands that it is delayed writing. Cache Provider will batch input the database every once in a while. The advantage is that the application writes very quickly.

Okay, I’m here today. Have you learned it?

The above is the detailed content of In a high-concurrency scenario, should the cache or the database be updated first?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:Java学习指南. If there is any infringement, please contact admin@php.cn delete