I believe that many friends have already configured master-slave replication, but they do not have an in-depth understanding of the workflow and common problems of redis master-slave replication. Kaka spent two days this time to compile all the knowledge points about redis master-slave replication.
The environment required to implement this article
centos7.0
redis4.0
Master-slave replication means that there are two redis servers now, and the data of one redis is synchronized to the other redis database. The former is called the master node, and the latter is the slave node. Data can only be synchronized in one direction from the master to the slave.
But in the actual process, it is impossible to have only two redis servers for master-slave replication, which means that each redis server may be called the master Node (master)
#In the case below, our slave3 is both the slave node of the master and the master node of the slave.
First understand this concept, and continue to read below for more detailed explanations.
Assume that we have a redis server now, which is a stand-alone state.
The first problem that will arise in this case is server downtime, which directly leads to data loss. If the project is related to RMB, the consequences can be imagined.
The second situation is the memory problem. When there is only one server, the memory will definitely reach the peak. It is impossible to upgrade one server infinitely.
So in response to the above two problems, we will prepare a few more servers and configure master-slave replication. Store data on multiple servers. And ensure that the data of each server is synchronized. Even if a server goes down, it will not affect users' use. Redis can continue to achieve high availability and redundant backup of data.
There should be many questions at this time. How to connect master and slave? How to synchronize data? What if the master server goes down? Don't worry, solve your problems bit by bit.
Modify the configuration file, mainly to modify the port. For the convenience of viewing, the names of log files and persistent files are identified with their respective ports.
Then open two redis services, one with port 6379 and one with port 6380. Execute the command redis-server redis6380.conf
, and then use redis-cli -p 6380
to connect. Because the default port of redis is 6379, we start another redis server and use it directly redis-server redis6379.conf
Then use redis-cli
to connect directly.
At this time we have successfully configured two redis services, one for 6380 and one for 6379. This is just for demonstration. In actual work, it needs to be configured on two different servers.
We must first have a concept, that is, when configuring master-slave replication, all operations are performed on the slave node, that is, slave.
Then we execute a command on the slave node as slaveof 127.0.0.1 6379
. After execution, it means we are connected.
# Let’s test first to see if master-slave replication is achieved. Execute two set kaka 123 and set master 127.0.0.1
on the master server, and then the slave6380 port can be successfully obtained, which means that our master-slave replication has been configured. However, the implementation of the production environment is not the end of the world. Later, the master-slave replication will be further optimized until high availability is achieved.
slaveof no one on the slave host to disconnect the master-slave replication.
Where can I check that the slave node has disconnected from the master node? Enter the command line
info on the client of the master node to view
This picture is the information printed by entering info
on the client of the master node after using the slave node to connect to the master node using the client command line. You can see that there is a piece of information about slave0.
This picture is printed on the master node after the slave node executes slaveof no one
info
, indicating that the slave node has been disconnected from the master node.
Start the redis service according to the configuration file, redis-server redis6380.conf
After the slave node is restarted, you can directly view the connection information of the slave node on the master node.
# Test data, things written by the master node will still be automatically synchronized by the slave node.
This method of configuration is also very simple. When starting the redis server, start the master-slave replication directly and execute the command: redis-server --slaveof host port
.
This is the log information of the master node
This is the information of the slave node, which includes the connection information of the master node and RDB snapshot storage.
#The above picture is a complete master-slave replication connection establishment workflow. Then use short words to describe the above workflow.
During the process of establishing a connection, the slave node will save the address and port of the master, and the master node master will save the port of the slave node slave.
This picture describes in detail the data synchronization process when the slave node connects to the master node for the first time.
When the slave node connects to the master node for the first time, it will first perform a full copy. This full copy is unavoidable.
After the full replication is completed, the master node will send the data in the replication backlog buffer, and then the slave node will execute bgrewriteaof to restore the data, which is also partial replication.
Three new points are mentioned at this stage, full copy, partial copy, and copy buffer backlog. These points will be explained in detail in the FAQ below.
When the master database is modified and the data of the master and slave servers are inconsistent, the master and slave data will be synchronized to be consistent. This process is called command propagation.
#The master will send the received data change command to the slave, and the slave will execute the command after receiving the command to make the master-slave data consistent.
Partial replication during the command propagation phase
This process is the most complete process explanation of master-slave replication. So let’s briefly introduce each step of the process
psync ? 1 pync runid offset
Find the corresponding runid
to request data. But here you can consider that when the slave node connects for the first time, it does not know the runid and offset
of the master node at all. So the first command sent is psync? 1
means that I want all the data of the master node. psync runid offset
2
to continue performing full replication. The runid mismatch here may only be caused by restarting the slave node. This problem will be solved later. The offset (offset) mismatch is caused by the replication backlog buffer overflow. If the runid or offset check passes, if the offset of the slave node is the same as the offset of the master node, it will be ignored. If the runid or offset check passes and the offset of the slave node is different from the offset, CONTINUE offset (this offset belongs to the master node) will be sent, and the data from the slave node offset to the master node offset in the replication buffer will be sent through the socket. ##1-4 is full copy 5-8 is partial copy
In the command propagation stage, the master node and the slave node always need to exchange information. Switch and use the heartbeat mechanism for maintenance to keep the connection between the master node and the slave node online.
Notes on the heartbeat phase
In order to ensure data stability, the master node will When the number of drops or the delay is too high. All information synchronization will be refused.
There are two parameters for configuration adjustment:
min-slaves-to-write 2
min-slaves-max-lag 8
This The two parameters indicate that there are only 2 slave nodes left, or when the delay of the slave node is greater than 8 seconds, the master node will forcibly turn off the master function and stop data synchronization.
#So what if the master node knows the data and delay time of the slave node hanging up! In the heartbeat mechanism, the slave will send the perlconf ack command every second. This command can carry the offset, the delay time of the slave node, and the number of slave nodes.
Let’s first take a look at what this run id is. You can see it by executing the info command. We can also see this when we look at the startup log information above.
Redis will automatically generate a random id when it is started (it should be noted here that the id will be different every time it is started), which is composed of 40 random hexadecimal strings and is used to uniquely identify a redis node.
When the master-slave replication is first started, the master will send its runid to the slave, and the slave will save the master's id. We can use the info command to view it
When disconnected and reconnected, the slave sends this id to the master , if the runid saved by the slave is the same as the current runid of the master, the master will try to use partial replication (another factor in whether this block can be copied successfully is the offset). If the runid saved by the slave is different from the current runid of the master, full copy will be performed directly.
The copy buffer backlog is a first-in-first-out queue, user storage Master command records for collecting data. The default storage space of the copy buffer is 1M.
You can modify repl-backlog-size 1mb
in the configuration file to control the buffer size. This ratio can be modified according to your own server memory. Click About 30% is reserved here.
What exactly is stored in the copy buffer?
When executing a command as set name kaka
, we can view the persistence file to view
Then the copy backlog buffer is the stored aof persistent data, separated by bytes, and each byte has its own offset. This offset is also the copy offset (offset)
Then why is it said that the copy buffer backlog may cause the full amount Copy it
In the command propagation phase, the master node will store the collected data in the replication buffer and then send it to the slave node. This is where the problem arises. When the amount of data on the master node is extremely large in an instant, and exceeds the memory of the replication buffer, some data will be squeezed out, resulting in data inconsistency between the master node and the slave node. To make a full copy. If the buffer size is not set appropriately, it may cause an infinite loop. The slave node will always copy in full, clear the data, and copy in full.
The master node replication offset is to send a record once to the slave node, and the slave node is to receive a record once.
is used to synchronize information, compare the differences between the master node and the slave node, and restore data usage when the slave is disconnected.
#This value is the offset from the copy buffer backlog.
When the master node restarts, the value of runid will change, which will cause all slave nodes to perform full replication.
We don’t need to consider this issue, we just need to know how the system is optimized.
After the master-slave replication is established, the master node will create the master-replid variable. The generated strategy is the same as the runid, with a length of 41 bits and a runid length of 40 bits. Then sent to the slave node.
When the shutdown save command is executed on the master node, an RDB persistence will be performed and the runid and offset will be saved to the RDB file. You can use the command redis-check-rdb to view this information.
Load the RDB file after the master node restarts, and load the repl-id and repl-offset in the file into memory. Even if all slave nodes are considered to be the previous master nodes.
Due to poor network environment, the slave node Node network outage. The replication backlog buffer memory is too small, causing data overflow. Along with the slave node offset crossing the boundary, full replication occurs. This may result in repeated full copies.
Solution: Modify the size of the replication backlog buffer: repl-backlog-size
Setup recommendation: Test the master node connection The time of the slave node, obtains the average total number of commands generated by the master node per second write_size_per_second
Copy buffer space setting = 2 Master-slave connection time Master The total amount of data generated by the node per second
Due to the CPU of the main node The occupancy is too high, or the slave node is frequently connected. The result of this situation is that various resources of the master node are seriously occupied, including but not limited to buffers, bandwidth, connections, etc.
Why are the master node resources severely occupied?
#In the heartbeat mechanism, the slave node will send a command replconf ack command to the master node every second.
The slave node executed a slow query, occupying a large amount of CPU
The master node called the replication timing function replicationCron every second, and then the slave node did not respond for a long time.
solution:
Set slave node timeout release
Set parameters: repl-timeout
This parameter defaults to 60 seconds . After 60 seconds, release the slave.
Due to network factors, the data of multiple slave nodes will be inconsistent. There is no way to avoid this factor.
There are two solutions to this problem:
The first data needs to be configured with a high degree of consistency The redis server uses one server for both reading and writing. This method is limited to a small amount of data, and the data needs to be highly consistent.
The second monitors the offset of the master-slave node. If the delay of the slave node is too large, the client's access to the slave node is temporarily blocked. Set the parameter to slave-serve-stale-data yes|no. Once this parameter is set, it can only respond to a few commands such as info slaveof.
This article mainly explains what is master-slave replication and the three major aspects of master-slave replication. Stages, workflows, and the three core components of partial replication. Heartbeat mechanism during the command propagation phase. Finally, common problems with master-slave replication are explained.
The above is the detailed content of Redis master-slave replication working principle and common problems. For more information, please follow other related articles on the PHP Chinese website!