This article brings you relevant knowledge about Redis, which mainly introduces related issues about persistence, including why persistence is needed, RDB persistence, AOF persistence, etc. Let’s take a look at the content below. I hope it will be helpful to everyone.
Recommended learning: Redis video tutorial
Redis' operations on data are all based on memory. When encountering unexpected situations such as process exit and server downtime, if there is no persistence mechanism, the data in Redis will be lost and cannot be recovered. With the persistence mechanism, Redis can use previously persisted files for data recovery the next time it is restarted. Two persistence mechanisms supported by Redis:
RDB: Generate a snapshot of the current data and save it on the hard disk.
AOF: Record every operation on data to the hard disk.
Write the snapshot of the data set in the memory to the disk within the specified time interval. When it is restored, the snapshot file is read directly. in memory. RDB (Redis DataBase) persistence is to generate a snapshot of all the current data in Redis and save it on the hard disk. RDB persistence can be triggered manually or automatically.
Redis will create (fork) a child process separately for persistence. It will first write the data to a temporary file. After the persistence process is completed, this temporary file will be used to replace the last persistence. Ok file. During the entire process, the main process does not perform any IO operations, which ensures extremely high performance. If large-scale data recovery is required and the integrity of data recovery is not very sensitive, then the RDB method is more efficient than the AOF method. The disadvantage of RDB is that the last persisted data may be lost.
save
Both the bgsave
command can manually trigger RDB persistence.
save
save
command will manually trigger RDB persistence, but ## The #save command will block the Redis service until RDB persistence is completed. When the Redis service stores a large amount of data, it will cause long-term congestion and is not recommended.
Executing the command will also manually trigger RDB persistence, which is different from the
save command. : Redis services generally do not block. The Redis process will perform a fork operation to create a child process. The child process is responsible for RDB persistence and will not block the Redis service process. The blocking of the Redis service only occurs in the fork phase, and generally the time is very short.
The specific process of the command is as follows:
command. The Redis process first determines whether the current There is an executing RDB or AOF sub-thread. If it exists, it will be terminated directly.
2. The Redis process performs a fork operation to create a child thread. The Redis process will be blocked during the fork operation. command ends. From then on, the Redis process will not be blocked and can respond to other commands.
4. The child process generates a snapshot file based on the memory of the Redis process and replaces the original RDB file. bgsave to reduce the blocking of the Redis process. So, under what circumstances will it be automatically triggered?
is set in the configuration file, such as
sava m n, which means that when the data is modified n times within m seconds, Automatically trigger the
bgsave operation.
operation and send the generated RDB file to the slave node.
command, the
bgsave operation will also be automatically triggered.
command, if AOF persistence is not enabled, the
bgsave operation will be automatically triggered.
Every time bgsave
is performed, a fork operation must be performed to create a child. It is a heavyweight operation. The cost of frequent execution is too high, so real-time persistence cannot be achieved. ization, or second-level persistence.
In addition, due to the continuous iteration of Redis versions, there are RDB versions in different formats, and there may be a problem that lower version RDB formats are not compatible with higher version RDB files.
Snapshot period: Although the memory snapshot can be manually executed by technicians SAVE
or BGSAVE
command, but in most production environments, periodic execution conditions will be set.
# 周期性执行条件的设置格式为 save <seconds> <changes> # 默认的设置为: save 900 1 save 300 10 save 60 10000 # 以下设置方式为关闭RDB快照功能 save ""</changes></seconds>
The meaning of the above three default information settings is:
# 文件名称 dbfilename dump.rdb # 文件保存路径 dir ./ # 如果持久化出错,主进程是否停止写入 stop-writes-on-bgsave-error yes # 是否压缩 rdbcompression yes # 导入时是否检查 rdbchecksum yes
bgsave
子进程相互不影响。但是,如果主线程要修改一块数据(例如图中的键值对 C),那么,这块数据就会被复制一份,生成该数据的副本。然后,bgsave
子进程会把这个副本数据写入 RDB 文件,而在这个过程中,主线程仍然可以直接修改原来的数据。针对RDB不适合实时持久化的问题,Redis提供了AOF持久化方式来解决
AOF(Append Only File)持久化是把每次写命令追加写入日志中,当需要恢复数据时重新执行AOF文件中的命令就可以了。AOF解决了数据持久化的实时性,也是目前主流的Redis持久化方式。
Redis是“写后”日志,Redis先执行命令,把数据写入内存,然后才记录日志。日志里记录的是Redis收到的每一条命令,这些命令是以文本形式保存。PS: 大多数的数据库采用的是写前日志(WAL),例如MySQL,通过写前日志和两阶段提交,实现数据和逻辑的一致性。
而AOF日志采用写后日志,即先写内存,后写日志。
为什么采用写后日志?
Redis要求高性能,采用写日志有两方面好处:
但这种方式存在潜在风险:
AOF日志记录Redis的每个写命令,步骤分为:命令追加(append)、文件写入(write)和文件同步(sync)。
默认情况下,Redis是没有开启AOF的,可以通过配置redis.conf文件来开启AOF持久化,关于AOF的配置如下:
# appendonly参数开启AOF持久化 appendonly no # AOF持久化的文件名,默认是appendonly.aof appendfilename "appendonly.aof" # AOF文件的保存位置和RDB文件的位置相同,都是通过dir参数设置的 dir ./ # 同步策略 # appendfsync always appendfsync everysec # appendfsync no # aof重写期间是否同步 no-appendfsync-on-rewrite no # 重写触发配置 auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb # 加载aof出错如何处理 aof-load-truncated yes # 文件重写策略 aof-rewrite-incremental-fsync yes
以下是Redis中关于AOF的主要配置信息:
appendfsync:这个参数项是AOF功能最重要的设置项之一,主要用于设置“真正执行”操作命令向AOF文件中同步的策略。
什么叫“真正执行”呢?还记得Linux操作系统对磁盘设备的操作方式吗? 为了保证操作系统中I/O队列的操作效率,应用程序提交的I/O操作请求一般是被放置在linux Page Cache中的,然后再由Linux操作系统中的策略自行决定正在写到磁盘上的时机。而Redis中有一个fsync()函数,可以将Page Cache中待写的数据真正写入到物理设备上,而缺点是频繁调用这个fsync()函数干预操作系统的既定策略,可能导致I/O卡顿的现象频繁 。
与上节对应,appendfsync参数项可以设置三个值,分别是:always、everysec、no,默认的值为everysec。
no-appendfsync-on-rewrite:always和everysec的设置会使真正的I/O操作高频度的出现,甚至会出现长时间的卡顿情况,这个问题出现在操作系统层面上,所有靠工作在操作系统之上的Redis是没法解决的。为了尽量缓解这个情况,Redis提供了这个设置项,保证在完成fsync函数调用时,不会将这段时间内发生的命令操作放入操作系统的Page Cache(这段时间Redis还在接受客户端的各种写操作命令)。
auto-aof-rewrite-percentage: As mentioned above, in a production environment, it is impossible for technicians to use the "BGREWRITEAOF
" command anytime and anywhere. Rewrite AOF files. So more often we need to rely on the automatic rewriting strategy of AOF files in Redis. Redis provides two settings for triggering automatic rewrite of AOF files:
auto-aof-rewrite-percentage means if the size of the current AOF file exceeds the last After rewriting a certain percentage of the AOF file, start rewriting the AOF file again. For example, the default setting value of this parameter value is 100, which means that if the size of the AOF file exceeds 1 times the size of the last AOF file rewrite, the rewrite operation will be started.
auto-aof-rewrite-min-size: The setting item indicates the minimum size of the AOF file to start the AOF file rewrite operation. If the AOF file size is lower than this value, the rewrite operation will not be triggered. Note that auto-aof-rewrite-percentage and auto-aof-rewrite-min-size are only used to control the automatic rewriting of AOF files in Redis. If a technician manually calls "BGREWRITEAOF
" command is not subject to these two restrictions.
AOF will record each write command to the AOF file. As time goes by, the AOF file will become larger and larger. . If not controlled, it will affect the Redis server and even the operating system. Moreover, the larger the AOF file, the slower the data recovery will be. In order to solve the problem of AOF file size expansion, Redis provides an AOF file rewriting mechanism to "slim down" AOF files.
Illustration explaining AOF rewriting
Will AOF rewriting block?
The AOF rewriting process is completed by the background process bgrewriteaof. The main thread forks out of the bgrewriteaof child process in the background. The fork will copy the memory of the main thread to the bgrewriteaof child process, which contains the latest data of the database. Then, the bgrewriteaof sub-process can write the copied data into operations one by one and record them in the rewrite log without affecting the main thread. Therefore, when aof is rewritten, it will block the main thread when forking the process.
When will the AOF log be rewritten?
There are two configuration items to control the triggering of AOF rewrite:
auto-aof-rewrite-min-size: Indicates the minimum size of the file when running AOF rewrite , the default is 64MB.
auto-aof-rewrite-percentage: This value is calculated by dividing the difference between the current aof file size and the aof file size after the last rewrite, divided by the aof file size after the last rewrite. size. That is, the incremental size of the current AOF file compared to the last rewritten AOF file, and the ratio of the AOF file size after the last rewrite.
What should I do if new data is written when rewriting the log?
The rewriting process can be summarized as: "One copy, two logs". When forking out of the child process, and when rewriting, if new data is written, the main thread will record the command into two aof log memory buffers. If the AOF writeback policy is configured to always, the command will be written directly back to the old log file and a copy of the command will be saved in the AOF rewrite buffer. These operations will have no impact on the new log file. (Old log file: the log file used by the main thread, new log file: the log file used by the bgrewriteaof process)
After the bgrewriteaof child process completes the rewriting operation of the log file, it will prompt that the main thread has After completing the rewrite operation, the main thread will append the commands in the AOF rewrite buffer to the end of the new log file. At this time, under high concurrency conditions, the AOF rewrite buffer accumulation may be very large, which will cause blocking. Redis later used Linux pipeline technology to allow simultaneous playback during the AOF rewrite, so that after the AOF rewrite is completed, only A small amount of remaining data needs to be played back. Finally, by modifying the file name, the atomicity of file switching is ensured.
If a downtime occurs during AOF rewriting the log, because the log file has not been switched, the old log file will still be used when restoring data.
Summary operation:
Warm reminder
The processes and threads here The concept is a bit confusing. Because the background bgreweiteaof process has only one thread operating, and the main thread is the Redis operating process, which is also a single thread. What I want to express here is that after the Redis main process forks a background process, the operations of the background process have no connection with the main process, and will not block the main thread
#How does the main thread fork out the child process and copy the memory data?
Fork uses the copy on write mechanism provided by the operating system to avoid copying a large amount of memory data at once and blocking the child process. When forking a child process, the child process will copy the page table of the parent process, that is, the virtual and real mapping relationship (the mapping index table between virtual memory and physical memory), but will not copy the physical memory. This copy will consume a lot of CPU resources, and the main thread will be blocked before the copy is completed. The blocking time depends on the amount of data in the memory. The larger the amount of data, the larger the memory page table. After the copy is completed, the parent and child processes use the same memory address space.
But the main process can write data, and at this time the data in the physical memory will be copied. As shown below (process 1 is regarded as the main process, process 2 is regarded as the child process):
When the main process has data written, and this data happens to be in page c, the operating system will Create a copy of this page (a copy of page c), that is, copy the physical data of the current page and map it to the main process, while the child process still uses the original page c.
When rewriting the entire log process, where will the main thread be blocked?
Why does AOF rewriting not reuse the original AOF log?
Redis 4.0 proposes a method of mixed use of AOF logs and memory snapshots. Simply put, memory snapshots are executed at a certain frequency, and between two snapshots, AOF logs are used to record all command operations during this period.
In this way, snapshots do not need to be executed very frequently, which avoids the impact of frequent forks on the main thread. Moreover, the AOF log only records operations between two snapshots, which means that there is no need to record all operations. Therefore, the file will not be too large and rewriting overhead can be avoided.
As shown in the figure below, the modifications at T1 and T2 are recorded in the AOF log. When the second full snapshot is taken, the AOF log can be cleared, because all modifications at this time have been recorded in the snapshot. , the log will no longer be used during recovery.
This method can not only enjoy the benefits of fast recovery of RDB files, but also enjoy the simple advantage of AOF only recording operation commands. It is widely used in actual environments.
After the data backup and persistence are completed, how do we restore data from these persistent files? If there are both RDB files and AOF files on a server, which one should be loaded?
In fact, if you want to recover data from these files, you only need to restart Redis. We still understand this process through the diagram:
So why is AOF loaded first? Because the data saved by AOF is more complete, through the above analysis we know that AOF basically loses up to 1 second of data.
Recommended learning:The above is the detailed content of Fully master Redis persistence: RDB and AOF. For more information, please follow other related articles on the PHP Chinese website!