Home  >  Article  >  Database  >  Fully master Redis persistence: RDB and AOF

Fully master Redis persistence: RDB and AOF

WBOY
WBOYforward
2022-06-16 12:10:461905browse

This article brings you relevant knowledge about Redis, which mainly introduces related issues about persistence, including why persistence is needed, RDB persistence, AOF persistence, etc. Let’s take a look at the content below. I hope it will be helpful to everyone.

Fully master Redis persistence: RDB and AOF

Recommended learning: Redis video tutorial

1. Why is persistence needed?

Redis' operations on data are all based on memory. When encountering unexpected situations such as process exit and server downtime, if there is no persistence mechanism, the data in Redis will be lost and cannot be recovered. With the persistence mechanism, Redis can use previously persisted files for data recovery the next time it is restarted. Two persistence mechanisms supported by Redis:

RDB: Generate a snapshot of the current data and save it on the hard disk.
AOF: Record every operation on data to the hard disk.

2. RDB persistence

Write the snapshot of the data set in the memory to the disk within the specified time interval. When it is restored, the snapshot file is read directly. in memory. RDB (Redis DataBase) persistence is to generate a snapshot of all the current data in Redis and save it on the hard disk. RDB persistence can be triggered manually or automatically.

1. How is backup performed?

Redis will create (fork) a child process separately for persistence. It will first write the data to a temporary file. After the persistence process is completed, this temporary file will be used to replace the last persistence. Ok file. During the entire process, the main process does not perform any IO operations, which ensures extremely high performance. If large-scale data recovery is required and the integrity of data recovery is not very sensitive, then the RDB method is more efficient than the AOF method. The disadvantage of RDB is that the last persisted data may be lost.

2. RDB persistence process

Fully master Redis persistence: RDB and AOF

3. Manual trigger

save Both the bgsave command can manually trigger RDB persistence.

  1. save
    Executing the save command will manually trigger RDB persistence, but ## The #save command will block the Redis service until RDB persistence is completed. When the Redis service stores a large amount of data, it will cause long-term congestion and is not recommended.
  2. bgsave Executing the
    bgsave command will also manually trigger RDB persistence, which is different from the save command. : Redis services generally do not block. The Redis process will perform a fork operation to create a child process. The child process is responsible for RDB persistence and will not block the Redis service process. The blocking of the Redis service only occurs in the fork phase, and generally the time is very short.
    bgsaveThe specific process of the command is as follows:
    Fully master Redis persistence: RDB and AOF 1. Execute the
    bgsave command. The Redis process first determines whether the current There is an executing RDB or AOF sub-thread. If it exists, it will be terminated directly. 2. The Redis process performs a fork operation to create a child thread. The Redis process will be blocked during the fork operation.
    3. After the Redis process fork is completed, the
    bgsave command ends. From then on, the Redis process will not be blocked and can respond to other commands. 4. The child process generates a snapshot file based on the memory of the Redis process and replaces the original RDB file.
    5. At the same time, a signal is sent to the main process to notify the main process that rdb persistence is completed, and the main process updates relevant statistical information (rdb_* related options under info Persitence).
4. Automatic triggering

In addition to manual triggering by executing the above command, RDB persistence can be automatically triggered within Redis. Automatically triggered RDB persistence uses

bgsave to reduce the blocking of the Redis process. So, under what circumstances will it be automatically triggered?

    The related configuration of
  1. save is set in the configuration file, such as sava m n, which means that when the data is modified n times within m seconds, Automatically trigger the bgsave operation.
  2. When the slave node performs full replication, the master node will automatically perform the
  3. bgsave operation and send the generated RDB file to the slave node.
  4. When executing the
  5. debug reload command, the bgsave operation will also be automatically triggered.
  6. When executing the
  7. shutdown command, if AOF persistence is not enabled, the bgsave operation will be automatically triggered.
5. RDB advantages

The RDB file is a compact binary compressed file, which is a snapshot of all Redis data at a certain point in time. Therefore, the speed of data recovery using RDB is much faster than that of AOF, which is very suitable for scenarios such as backup, full replication, and disaster recovery.

6. Disadvantages of RDB

Every time bgsave is performed, a fork operation must be performed to create a child. It is a heavyweight operation. The cost of frequent execution is too high, so real-time persistence cannot be achieved. ization, or second-level persistence.

In addition, due to the continuous iteration of Redis versions, there are RDB versions in different formats, and there may be a problem that lower version RDB formats are not compatible with higher version RDB files.

7. Configure RDB in dump.rdb

Snapshot period: Although the memory snapshot can be manually executed by technicians SAVE or BGSAVE command, but in most production environments, periodic execution conditions will be set.

  • New default cycle settings in Redis
# 周期性执行条件的设置格式为
save <seconds> <changes>

# 默认的设置为:
save 900 1
save 300 10
save 60 10000

# 以下设置方式为关闭RDB快照功能
save ""</changes></seconds>

The meaning of the above three default information settings is:

  • 如果900秒内有1条Key信息发生变化,则进行快照;
  • 如果300秒内有10条Key信息发生变化,则进行快照;
  • 如果60秒内有10000条Key信息发生变化,则进行快照。读者可以按照这个规则,根据自己的实际请求压力进行设置调整。
  • 其它相关配置
# 文件名称
dbfilename dump.rdb

# 文件保存路径
dir ./

# 如果持久化出错,主进程是否停止写入
stop-writes-on-bgsave-error yes

# 是否压缩
rdbcompression yes

# 导入时是否检查
rdbchecksum yes
  • dbfilename:RDB文件在磁盘上的名称。
  • dir:RDB文件的存储路径。默认设置为“./”,也就是Redis服务的主目录。
  • stop-writes-on-bgsave-error:上文提到的在快照进行过程中,主进程照样可以接受客户端的任何写操作的特性,是指在快照操作正常的情况下。如果快照操作出现异常(例如操作系统用户权限不够、磁盘空间写满等等)时,Redis就会禁止写操作。这个特性的主要目的是使运维人员在第一时间就发现Redis的运行错误,并进行解决。一些特定的场景下,您可能需要对这个特性进行配置,这时就可以调整这个参数项。该参数项默认情况下值为yes,如果要关闭这个特性,指定即使出现快照错误Redis一样允许写操作,则可以将该值更改为no。
  • rdbcompression:该属性将在字符串类型的数据被快照到磁盘文件时,启用LZF压缩算法。Redis官方的建议是请保持该选项设置为yes,因为“it’s almost always a win”。
  • rdbchecksum:从RDB快照功能的version 5 版本开始,一个64位的CRC冗余校验编码会被放置在RDB文件的末尾,以便对整个RDB文件的完整性进行验证。这个功能大概会多损失10%左右的性能,但获得了更高的数据可靠性。所以如果您的Redis服务需要追求极致的性能,就可以将这个选项设置为no。

8、 RDB 更深入理解

  • 由于生产环境中我们为Redis开辟的内存区域都比较大(例如6GB),那么将内存中的数据同步到硬盘的过程可能就会持续比较长的时间,而实际情况是这段时间Redis服务一般都会收到数据写操作请求。那么如何保证数据一致性呢?
    RDB中的核心思路是Copy-on-Write,来保证在进行快照操作的这段时间,需要压缩写入磁盘上的数据在内存中不会发生变化。在正常的快照操作中,一方面Redis主进程会fork一个新的快照进程专门来做这个事情,这样保证了Redis服务不会停止对客户端包括写请求在内的任何响应。另一方面这段时间发生的数据变化会以副本的方式存放在另一个新的内存区域,待快照操作结束后才会同步到原来的内存区域。
    举个例子:如果主线程对这些数据也都是读操作(例如图中的键值对 A),那么,主线程和 bgsave 子进程相互不影响。但是,如果主线程要修改一块数据(例如图中的键值对 C),那么,这块数据就会被复制一份,生成该数据的副本。然后,bgsave 子进程会把这个副本数据写入 RDB 文件,而在这个过程中,主线程仍然可以直接修改原来的数据。
    Fully master Redis persistence: RDB and AOF
  • 在进行快照操作的这段时间,如果发生服务崩溃怎么办?
    很简单,在没有将数据全部写入到磁盘前,这次快照操作都不算成功。如果出现了服务崩溃的情况,将以上一次完整的RDB快照文件作为恢复内存数据的参考。也就是说,在快照操作过程中不能影响上一次的备份数据。Redis服务会在磁盘上创建一个临时文件进行数据操作,待操作成功后才会用这个临时文件替换掉上一次的备份。
  • 可以每秒做一次快照吗?
    对于快照来说,所谓“连拍”就是指连续地做快照。这样一来,快照的间隔时间变得很短,即使某一时刻发生宕机了,因为上一时刻快照刚执行,丢失的数据也不会太多。但是,这其中的快照间隔时间就很关键了。
    如下图所示,我们先在 T0 时刻做了一次快照,然后又在 T0+t 时刻做了一次快照,在这期间,数据块 5 和 9 被修改了。如果在 t 这段时间内,机器宕机了,那么,只能按照 T0 时刻的快照进行恢复。此时,数据块 5 和 9 的修改值因为没有快照记录,就无法恢复了。
    Fully master Redis persistence: RDB and AOF

针对RDB不适合实时持久化的问题,Redis提供了AOF持久化方式来解决

三、AOF持久化

AOF(Append Only File)持久化是把每次写命令追加写入日志中,当需要恢复数据时重新执行AOF文件中的命令就可以了。AOF解决了数据持久化的实时性,也是目前主流的Redis持久化方式。

Redis是“写后”日志,Redis先执行命令,把数据写入内存,然后才记录日志。日志里记录的是Redis收到的每一条命令,这些命令是以文本形式保存。PS: 大多数的数据库采用的是写前日志(WAL),例如MySQL,通过写前日志和两阶段提交,实现数据和逻辑的一致性。

而AOF日志采用写后日志,即先写内存,后写日志
Fully master Redis persistence: RDB and AOF
为什么采用写后日志?
Redis要求高性能,采用写日志有两方面好处:

  • 避免额外的检查开销:Redis 在向 AOF 里面记录日志的时候,并不会先去对这些命令进行语法检查。所以,如果先记日志再执行命令的话,日志中就有可能记录了错误的命令,Redis 在使用日志恢复数据时,就可能会出错。
  • 不会阻塞当前的写操作

但这种方式存在潜在风险:

  • 如果命令执行完成,写日志之前宕机了,会丢失数据。
  • 主线程写磁盘压力大,导致写盘慢,阻塞后续操作。

1、如何实现AOF?

AOF日志记录Redis的每个写命令,步骤分为:命令追加(append)、文件写入(write)和文件同步(sync)。

  • 命令追加 当AOF持久化功能打开了,服务器在执行完一个写命令之后,会以协议格式将被执行的写命令追加到服务器的 aof_buf 缓冲区。
  • 文件写入和同步 关于何时将 aof_buf 缓冲区的内容写入AOF文件中,Redis提供了三种写回策略:
    Fully master Redis persistence: RDB and AOF
  • Always,同步写回:每个写命令执行完,立马同步地将日志写回磁盘;
  • Everysec,每秒写回:每个写命令执行完,只是先把日志写到AOF文件的内存缓冲区,每隔一秒把缓冲区中的内容写入磁盘;
  • No,操作系统控制的写回:每个写命令执行完,只是先把日志写到AOF文件的内存缓冲区,由操作系统决定何时将缓冲区内容写回磁盘。

2、redis.conf中配置AOF

默认情况下,Redis是没有开启AOF的,可以通过配置redis.conf文件来开启AOF持久化,关于AOF的配置如下:

# appendonly参数开启AOF持久化
appendonly no

# AOF持久化的文件名,默认是appendonly.aof
appendfilename "appendonly.aof"

# AOF文件的保存位置和RDB文件的位置相同,都是通过dir参数设置的
dir ./

# 同步策略
# appendfsync always
appendfsync everysec
# appendfsync no

# aof重写期间是否同步
no-appendfsync-on-rewrite no

# 重写触发配置
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# 加载aof出错如何处理
aof-load-truncated yes

# 文件重写策略
aof-rewrite-incremental-fsync yes

以下是Redis中关于AOF的主要配置信息:
appendfsync:这个参数项是AOF功能最重要的设置项之一,主要用于设置“真正执行”操作命令向AOF文件中同步的策略。

什么叫“真正执行”呢?还记得Linux操作系统对磁盘设备的操作方式吗? 为了保证操作系统中I/O队列的操作效率,应用程序提交的I/O操作请求一般是被放置在linux Page Cache中的,然后再由Linux操作系统中的策略自行决定正在写到磁盘上的时机。而Redis中有一个fsync()函数,可以将Page Cache中待写的数据真正写入到物理设备上,而缺点是频繁调用这个fsync()函数干预操作系统的既定策略,可能导致I/O卡顿的现象频繁 。

与上节对应,appendfsync参数项可以设置三个值,分别是:always、everysec、no,默认的值为everysec。

no-appendfsync-on-rewrite:always和everysec的设置会使真正的I/O操作高频度的出现,甚至会出现长时间的卡顿情况,这个问题出现在操作系统层面上,所有靠工作在操作系统之上的Redis是没法解决的。为了尽量缓解这个情况,Redis提供了这个设置项,保证在完成fsync函数调用时,不会将这段时间内发生的命令操作放入操作系统的Page Cache(这段时间Redis还在接受客户端的各种写操作命令)。

auto-aof-rewrite-percentage: As mentioned above, in a production environment, it is impossible for technicians to use the "BGREWRITEAOF" command anytime and anywhere. Rewrite AOF files. So more often we need to rely on the automatic rewriting strategy of AOF files in Redis. Redis provides two settings for triggering automatic rewrite of AOF files:
auto-aof-rewrite-percentage means if the size of the current AOF file exceeds the last After rewriting a certain percentage of the AOF file, start rewriting the AOF file again. For example, the default setting value of this parameter value is 100, which means that if the size of the AOF file exceeds 1 times the size of the last AOF file rewrite, the rewrite operation will be started.
auto-aof-rewrite-min-size: The setting item indicates the minimum size of the AOF file to start the AOF file rewrite operation. If the AOF file size is lower than this value, the rewrite operation will not be triggered. Note that auto-aof-rewrite-percentage and auto-aof-rewrite-min-size are only used to control the automatic rewriting of AOF files in Redis. If a technician manually calls "BGREWRITEAOF" command is not subject to these two restrictions.

3. In-depth understanding of AOF rewriting

AOF will record each write command to the AOF file. As time goes by, the AOF file will become larger and larger. . If not controlled, it will affect the Redis server and even the operating system. Moreover, the larger the AOF file, the slower the data recovery will be. In order to solve the problem of AOF file size expansion, Redis provides an AOF file rewriting mechanism to "slim down" AOF files.

Illustration explaining AOF rewriting
Fully master Redis persistence: RDB and AOF
Will AOF rewriting block?
The AOF rewriting process is completed by the background process bgrewriteaof. The main thread forks out of the bgrewriteaof child process in the background. The fork will copy the memory of the main thread to the bgrewriteaof child process, which contains the latest data of the database. Then, the bgrewriteaof sub-process can write the copied data into operations one by one and record them in the rewrite log without affecting the main thread. Therefore, when aof is rewritten, it will block the main thread when forking the process.

When will the AOF log be rewritten?
There are two configuration items to control the triggering of AOF rewrite:
auto-aof-rewrite-min-size: Indicates the minimum size of the file when running AOF rewrite , the default is 64MB.
auto-aof-rewrite-percentage: This value is calculated by dividing the difference between the current aof file size and the aof file size after the last rewrite, divided by the aof file size after the last rewrite. size. That is, the incremental size of the current AOF file compared to the last rewritten AOF file, and the ratio of the AOF file size after the last rewrite.

What should I do if new data is written when rewriting the log?
The rewriting process can be summarized as: "One copy, two logs". When forking out of the child process, and when rewriting, if new data is written, the main thread will record the command into two aof log memory buffers. If the AOF writeback policy is configured to always, the command will be written directly back to the old log file and a copy of the command will be saved in the AOF rewrite buffer. These operations will have no impact on the new log file. (Old log file: the log file used by the main thread, new log file: the log file used by the bgrewriteaof process)

After the bgrewriteaof child process completes the rewriting operation of the log file, it will prompt that the main thread has After completing the rewrite operation, the main thread will append the commands in the AOF rewrite buffer to the end of the new log file. At this time, under high concurrency conditions, the AOF rewrite buffer accumulation may be very large, which will cause blocking. Redis later used Linux pipeline technology to allow simultaneous playback during the AOF rewrite, so that after the AOF rewrite is completed, only A small amount of remaining data needs to be played back. Finally, by modifying the file name, the atomicity of file switching is ensured.

If a downtime occurs during AOF rewriting the log, because the log file has not been switched, the old log file will still be used when restoring data.

Summary operation:

  • The main thread forks out the child process and rewrites the aof log
  • The child process rewrites After the log is completed, the main thread appends the aof log buffer
  • Replace the log file

Warm reminder

The processes and threads here The concept is a bit confusing. Because the background bgreweiteaof process has only one thread operating, and the main thread is the Redis operating process, which is also a single thread. What I want to express here is that after the Redis main process forks a background process, the operations of the background process have no connection with the main process, and will not block the main thread

Fully master Redis persistence: RDB and AOF
#How does the main thread fork out the child process and copy the memory data?
Fork uses the copy on write mechanism provided by the operating system to avoid copying a large amount of memory data at once and blocking the child process. When forking a child process, the child process will copy the page table of the parent process, that is, the virtual and real mapping relationship (the mapping index table between virtual memory and physical memory), but will not copy the physical memory. This copy will consume a lot of CPU resources, and the main thread will be blocked before the copy is completed. The blocking time depends on the amount of data in the memory. The larger the amount of data, the larger the memory page table. After the copy is completed, the parent and child processes use the same memory address space.

But the main process can write data, and at this time the data in the physical memory will be copied. As shown below (process 1 is regarded as the main process, process 2 is regarded as the child process):
Fully master Redis persistence: RDB and AOF
When the main process has data written, and this data happens to be in page c, the operating system will Create a copy of this page (a copy of page c), that is, copy the physical data of the current page and map it to the main process, while the child process still uses the original page c.

When rewriting the entire log process, where will the main thread be blocked?

  • When forking a child process, the virtual page table needs to be copied, which will block the main thread.
  • When the main process writes bigkey, the operating system will create a copy of the page and copy the original data, which will block the main thread.
  • After the sub-process rewrite log is completed, the main thread may be blocked when the main process appends the aof rewrite buffer.

Why does AOF rewriting not reuse the original AOF log?

  • Writing the same file between the parent and child processes will cause competition problems and affect the performance of the parent process.
  • If the AOF rewrite process fails, it is equivalent to contaminating the original AOF file and cannot be used for recovery data.

3. RDB and AOF hybrid method (version 4.0)

Redis 4.0 proposes a method of mixed use of AOF logs and memory snapshots. Simply put, memory snapshots are executed at a certain frequency, and between two snapshots, AOF logs are used to record all command operations during this period.

In this way, snapshots do not need to be executed very frequently, which avoids the impact of frequent forks on the main thread. Moreover, the AOF log only records operations between two snapshots, which means that there is no need to record all operations. Therefore, the file will not be too large and rewriting overhead can be avoided.

As shown in the figure below, the modifications at T1 and T2 are recorded in the AOF log. When the second full snapshot is taken, the AOF log can be cleared, because all modifications at this time have been recorded in the snapshot. , the log will no longer be used during recovery.
Fully master Redis persistence: RDB and AOF
This method can not only enjoy the benefits of fast recovery of RDB files, but also enjoy the simple advantage of AOF only recording operation commands. It is widely used in actual environments.

4. Restore data from persistence

After the data backup and persistence are completed, how do we restore data from these persistent files? If there are both RDB files and AOF files on a server, which one should be loaded?

In fact, if you want to recover data from these files, you only need to restart Redis. We still understand this process through the diagram:
Fully master Redis persistence: RDB and AOF

    ## When redis restarts, it determines whether to enable aof. If aof is enabled, then the aof file will be loaded first;
  • If aof exists, then load the aof file. If the loading is successful, redis restarts successfully. If the aof file fails to load, a log will be printed indicating that the startup failed. At this time, you can repair the aof file and restart;
  • If The aof file does not exist, then redis will load the rdb file instead. If the rdb file does not exist, redis will start directly successfully;
  • If the rdb file exists, it will load the rdb file to restore the data. If the loading fails, then The print log indicates that the startup failed. If the loading is successful, then redis is restarted successfully and the rdb file is used to restore the data;

So why is AOF loaded first? Because the data saved by AOF is more complete, through the above analysis we know that AOF basically loses up to 1 second of data.

Recommended learning:

Redis video tutorial

The above is the detailed content of Fully master Redis persistence: RDB and AOF. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete