If you find that the access delay suddenly increases when using Redis, how to troubleshoot?
First of all, the first step is to check the slow log of Redis. Through the slow log command statistics function of Redis, we can set the following options to see which commands cause a large delay in execution.
First set the slow log threshold of Redis. Only commands that exceed the threshold will be recorded. The unit here is microseconds. For example, set the slow log threshold to 5 milliseconds and set only the most recent 1,000 slow log records to be retained. :
# 命令执行超过5毫秒记录慢日志 CONFIG SET slowlog-log-slower-than 5000 # 只保留最近1000条慢日志 CONFIG SET slowlog-max-len 1000
After the setting is completed, all executed commands will be recorded by Redis if the delay is greater than 5 milliseconds. We execute SLOWLOG get 5 to query the last 5 slow logs:
127.0.0.1:6379> SLOWLOG get 5 1) 1) (integer) 32693 # 慢日志ID 2) (integer) 1593763337 # 执行时间 3) (integer) 5299 # 执行耗时(微妙) 4) 1) "LRANGE" # 具体执行的命令和参数 2) "user_list_2000" 3) "0" 4) "-1" 2) 1) (integer) 32692 2) (integer) 1593763337 3) (integer) 5044 4) 1) "GET" 2) "book_price_1000" ...
By viewing the slow logs Through logging, we can know which commands are time-consuming to execute at what time. If your business often uses commands with complexity above O(N), such as sort, sunion, zunionstore, keys, scan, or when executing O(N) ) command operates a relatively large amount of data, and in these cases it will be very time-consuming for Redis to process the data.
If the CPU usage of the Redis instance is high, but your service request volume is not large, it is probably caused by the use of commands with high complexity.
The solution is not to use these complex commands, and not to obtain too much data at one time. Try to operate a small amount of data each time so that Redis can process and return in time.
If you query the slow log and find that it is not caused by commands with high complexity, for example, SET and DELETE operations appear in the slow log record, then you should be suspicious. Is there a situation where Redis writes bigkey?
When Redis writes new data, memory space is allocated for it, and when data is deleted from Redis, the corresponding memory space is also released.
When the data written by a key is very large, Redis allocating memory will also become more time-consuming. Similarly, when deleting the data of this key, it will take a long time to release the memory.
You need to check your business code to see if bigkey is being written. You need to evaluate the amount of data written. The business layer should avoid storing an excessive amount of data in a key.
In response to the problem of bigkey, Redis officially launched the lazy-free mechanism in version 4.0, which is used to asynchronously release the memory of bigkey and reduce the impact on Redis performance. Even so, we do not recommend using bigkey. Bigkey will also affect the performance of the migration during the cluster migration process. This will be introduced in detail later in the cluster-related articles.
Sometimes you will find that there is no large delay when using Redis, but at a certain point in time, a wave of delays suddenly occurs, and the reporting time is slow. Points are very regular, such as a certain hour, or how often they occur.
If this happens, you need to consider whether there are a large number of keys that are collectively expired.
If a large number of keys expire at a fixed point in time, when accessing Redis at this point in time, the delay may increase.
Redis’s expiration strategy adopts two strategies: regular deletion and lazy deletion;
Note that Redis’s regular deletion scheduled tasks are also executed in the Redis main thread, that is to say, if During the process of active expiration, it may happen that a large number of expired keys need to be deleted. Then during business access, the business request must be processed only after the expiration task is completed. At this time, the problem of increased business access delay will occur, and the maximum delay is 25 milliseconds.
And this access delay will not be recorded in the slow log. The slow log only records the actual execution time of a certain command. Redis active expiration policy is executed before the operation command. If the operation command time does not reach the slow log threshold, it will not be calculated in the slow log statistics, but we business has experienced increased delays.
The solution is to add a random time to the centralized expiration and spread out the times of these keys that need to expire.
Sometimes when we use Redis as a pure cache, we will set a memory upper limit maxmemory for the instance, and then enable the LRU elimination strategy.
When the memory of the instance reaches maxmemory, you will find that writing new data may become slower every time.
The reason for the slowdown is that when the Redis memory reaches maxmemory, before each new data is written, part of the data must be kicked out to keep the memory below maxmemory.
The logic of kicking out old data also takes time, and the specific length of time depends on the configured elimination strategy
If If your Redis has the function of automatically generating RDB and AOF rewriting enabled, it may increase the access delay of Redis when generating RDB and AOF rewriting in the background. After these tasks are completed, the delay disappears.
Encountering this kind of situation is usually caused by executing the tasks of generating RDB and AOF rewriting.
Generating RDB and AOF requires the parent process to fork out a child process for data persistence. During the fork execution process, the parent process needs to copy the memory page table to the child process. If the entire instance takes up a large amount of memory, then a copy is required. The memory page table will be time-consuming. This process will consume a lot of CPU resources. Before the fork is completed, the entire instance will be blocked and unable to process any requests. If the CPU resources are tight at this time, the fork will take longer. Even reaching the second level. This will seriously affect the performance of Redis.
Many times, when we deploy services, in order to improve performance and reduce the performance loss of context switching when the program uses multiple CPUs, we generally use process binding to the CPU. operate.
But when using Redis, we do not recommend doing this for the following reasons.
Bound CPU Redis, when performing data persistence, the child process forked out will inherit the CPU usage preference of the parent process, and at this time the child process will consume a large amount of CPU resources for data persistence. ization, the child process will compete with the main process for the CPU, which will also cause the main process to have insufficient CPU resources and increase the access delay.
So when deploying the Redis process, if you need to enable the RDB and AOF rewriting mechanism, you must not perform CPU binding operations
If you find that Redis suddenly changes It is very slow, and each access takes hundreds of milliseconds or even seconds. Then check whether Redis uses Swap. In this case, Redis is basically unable to provide high-performance services.
We know that the operating system provides a Swap mechanism. The purpose is to swap part of the data in the memory to the disk when the memory is insufficient, so as to buffer the memory usage.
But after the data in the memory is swapped to the disk, accessing the data requires reading from the disk, which is much slower than the memory!
Especially for a high-performance in-memory database like Redis, if the memory in Redis is swapped to disk, this operation time is unacceptable for an extremely performance-sensitive database like Redis. You can temporarily shut down the operating system Swap
The characteristic is that it starts to slow down from a certain point in time and continues. At this time, you need to check whether the machine's network card traffic is exhausted.
High network load can lead to problems such as data sending delays and data loss at the network layer and TCP level. In addition to memory, Redis is high-performance because of its excellent network IO performance. However, as the number of requests continues to increase, the load on the network card will increase accordingly.
If this happens, you need to check which Redis instance on this machine has excessive traffic and fills up the network bandwidth, and then confirm whether the sudden increase in traffic is a normal business situation. If it is, then you need to expand the capacity in time. Or migrate the instance to prevent other instances of this machine from being affected.
The above is the detailed content of How to solve common latency problems in Redis. For more information, please follow other related articles on the PHP Chinese website!