Cache avalanche means that a large number of caches in Redis all fail at the same time. If a large number of requests happen to be initiated during this period, the request will directly access the database. , which may overwhelm the database.
Cache avalanche generally describes data that is not in the cache but is in the database, and the request goes directly to the database due to time expiration.
There are many ways to solve the cache avalanche:
1. Lock to ensure single-thread access to the cache. This way there won't be many requests accessing the database at the same time.
2. Do not set the expiration time to be the same. A typical example is when initializing the warm-up data. When storing the data in the cache, a random time can be used to ensure that a large number of caches do not expire at the same time.
3. If memory permits, the cache can be set to never expire.
Cache breakdown is very similar to cache avalanche. The difference is that cache breakdown generally refers to a single cache failure, and at the same time there are many Concurrent requests need to access this key, causing pressure on the database.
The method to solve cache breakdown is very similar to the method to solve cache avalanche:
1. Lock to ensure single-thread access cache. In this way, the cache will be rewritten after the first request reaches the database, and subsequent requests can read the cache directly.
2. If memory permits, the cache can be set to never expire.
The essential difference between cache penetration and the above two phenomena is that the data accessed at this time does not exist in the database, so since the database does not exists, so it will definitely not exist in the cache. If the concurrency is too large, data will continuously arrive at the database, putting great pressure on the database.
For the cache penetration problem, locking does not have a good effect because the key itself does not exist, so even if the number of thread accesses is controlled, the request is still Will continuously arrive at the database.
To solve the problem of cache penetration, the following solutions can generally be used together:
1. The interface layer performs verification and returns illegal keys directly if they are found. For example, if the database uses auto-incrementing ids, then if a non-integer id or negative id comes, it can be returned directly. Or if a 32-bit uuid is used, then if the id length is not equal to 32 bits, it can be returned directly.
2. Cache non-existent data as well. You can directly cache an empty or other agreed invalid value. With this solution, it is best to set a short expiration time for the key, otherwise a large number of non-existent keys will be stored in Redis, which will also occupy a lot of memory.
Regarding the above cache penetration solution, let’s think about it: If a key can bypass the first method Verification, and at this time a large number of non-existent keys are accessed (such as 100 million or 1 billion), then all of them are stored in the cache, which will occupy a very large space, waste a lot of server memory, and lead to insufficient memory.
So is there a better solution? This is the Bloom filter we are going to introduce next. The Bloom filter can solve the problem of too many key values to the greatest extent.
Maybe most people know that there is such an interview question: How to quickly determine whether an element exists in a massive amount of 1 billion disordered data?
To solve this problem, you need to use a Bloom filter, otherwise the memory of most servers cannot store such a large amount of data.
The Bloom Filter was proposed by Bloom in 1970. It's actually a long binary vector (bitmap) and a series of random mapping functions (hash functions).
Bloom filters can be used to retrieve whether an element is in a collection. Its advantage is that space efficiency and query time are much better than ordinary algorithms. Its disadvantage is that it has a certain misrecognition rate and is difficult to delete.
One of the data structures in Redis is the bitmap. The important implementation of the Bloom filter is the implementation of the bitmap, which is the bit array, and in this array Each position in has only two states: 0 and 1, and each position only occupies 1 byte, where 0 indicates that no element exists and 1 indicates that an element exists. The following figure is a simple example of a Bloom filter (a key value can be determined by hashing and bit operations where it should fall):
We found above that lonely and wolf fall in the same position. This phenomenon of different key values getting the same value after hashing is called a hash collision. After a hash collision occurs and then bit operations are performed, it will definitely end up in the same position.
If too many hash collisions occur, it will affect the accuracy of the judgment. Therefore, in order to reduce hash collisions, we generally consider the following two factors:
1. Increase the size of the bitmap array (the larger the bitmap array, the larger the memory occupied).
2. Increase the number of hash functions (the same key value is equal after one function, then after the calculation of two or more hash functions, equal results will be obtained) The probability will naturally be reduced).
We need to consider the above two methods comprehensively: for example, increasing the bit array will consume more space, and more hash calculations will also consume the CPU and affect the impact. The final calculation time, so how big the bit array is, and how many times the hash function needs to be calculated requires specific analysis of the specific situation.
The following is a Bloom filter obtained by two hash functions. According to the figure below, we can easily see that if our Redis does not exist at all, but the two positions obtained by Redis after two hash functions are already 1 (one is obtained by wolf through f2, and the other is obtained by Nosql through f1).
So through the above phenomenon, we can conclude from the perspective of Bloom filter that Bloom filter mainly has two major characteristics:
1. If the Bloom filter determines that an element exists, then this element may exist.
2. If the Bloom filter determines that an element does not exist, then this element must not exist.
From the perspective of elements, we can also draw two major characteristics:
1. If the element actually exists, then the Bloom filter must Will judge existence.
2. If the element does not exist, the Bloom filter may determine that it exists.
PS: It should be noted that if the hash function is passed N times, the N positions need to be 1 to determine the existence. As long as one is 0, it can be determined. The element does not exist in the bloom filter.
Because there is always a false positive rate in the Bloom filter, because hash collisions cannot be avoided 100%. The Bloom filter calls this false positive probability, that is, False Positive Probability, or fpp for short.
When using Bloom filters in practice, you can define an fpp yourself, and then you can calculate how many hash functions and how large a bit array space is needed based on the theory of Bloom filters. It should be noted that this fpp cannot be defined as 100%, because there is no 100% guarantee that hash collisions will not occur.
The above is the detailed content of Avalanche and breakdown issues in Java cache and solutions. For more information, please follow other related articles on the PHP Chinese website!