Home >Database >Redis >Redis master-slave replication working principle and common problems

Redis master-slave replication working principle and common problems

咔咔
咔咔Original
2020-06-02 01:26:102025browse

I believe that many friends have already configured master-slave replication, but they do not have an in-depth understanding of the workflow and common problems of redis master-slave replication. Kaka spent two days this time to compile all the knowledge points about redis master-slave replication.

The environment required to implement this article

centos7.0

redis4.0


1. What is Redis master-slave replication?


Master-slave replication means that there are two redis servers now, and the data of one redis is synchronized to the other redis database. The former is called the master node, and the latter is the slave node. Data can only be synchronized in one direction from the master to the slave.


But in the actual process, it is impossible to have only two redis servers for master-slave replication, which means that each redis server may be called the master Node (master)


#In the case below, our slave3 is both the slave node of the master and the master node of the slave.


First understand this concept, and continue to read below for more detailed explanations.

Redis master-slave replication working principle and common problems


2. Why is Redis master-slave replication needed?


Assume that we have a redis server now, which is a stand-alone state.


The first problem that will arise in this case is server downtime, which directly leads to data loss. If the project is related to RMB, the consequences can be imagined.


The second situation is the memory problem. When there is only one server, the memory will definitely reach the peak. It is impossible to upgrade one server infinitely.

Redis master-slave replication working principle and common problems

So in response to the above two problems, we will prepare a few more servers and configure master-slave replication. Store data on multiple servers. And ensure that the data of each server is synchronized. Even if a server goes down, it will not affect users' use. Redis can continue to achieve high availability and redundant backup of data.


There should be many questions at this time. How to connect master and slave? How to synchronize data? What if the master server goes down? Don't worry, solve your problems bit by bit.

Redis master-slave replication working principle and common problems


##3. The role of Redis master-slave replication


We talked about why we use redis's master-slave replication above, so the role of master-slave replication is to explain why it is used.


    We continue to use this diagram to talk about
  1. The first point is data redundancy, which realizes hot backup of data and is persistence Another way.
  2. The second point is about single machine failure. When there is a problem with the master node, the service can be provided by the slave node, which is the slave, achieving rapid recovery from failures, which is service redundancy.
  3. The third point is the separation of reading and writing. The master server is mainly used for writing, and the slave is mainly used for reading data, which can improve the load capacity of the server. At the same time, the number of slave nodes can be added according to changes in demand.
  4. The fourth point is load balancing. In conjunction with the separation of reading and writing, the master node provides writing services and the slave nodes provide reading services to share the server load. Especially in the case of less writing and more reading, through multiple slave nodes Sharing the read load can greatly increase the concurrency and load of the redis server.
  5. The fifth point is the cornerstone of high availability. Master-slave replication is the basis for the implementation of sentinels and clusters. Therefore, we can say that master-slave replication is the cornerstone of high availability.


Redis master-slave replication working principle and common problems


##4. Configure Redis master-slave replication


says Having said that, let’s first simply configure a master-slave replication case, and then talk about the implementation principles.


The redis storage path is: usr/local/redis


The log and configuration files are stored in: usr/local /redis/data


First we configure two configuration files, namely redis6379.conf and redis6380.conf

Redis master-slave replication working principle and common problems

Modify the configuration file, mainly to modify the port. For the convenience of viewing, the names of log files and persistent files are identified with their respective ports.

Redis master-slave replication working principle and common problems

Then open two redis services, one with port 6379 and one with port 6380. Execute the command redis-server redis6380.conf, and then use redis-cli -p 6380 to connect. Because the default port of redis is 6379, we start another redis server and use it directly redis-server redis6379.conf Then use redis-cli to connect directly.

Redis master-slave replication working principle and common problems

At this time we have successfully configured two redis services, one for 6380 and one for 6379. This is just for demonstration. In actual work, it needs to be configured on two different servers.


Redis master-slave replication working principle and common problems


1. Start using the client command line


We must first have a concept, that is, when configuring master-slave replication, all operations are performed on the slave node, that is, slave.


Then we execute a command on the slave node as slaveof 127.0.0.1 6379. After execution, it means we are connected.

Redis master-slave replication working principle and common problems

# Let’s test first to see if master-slave replication is achieved. Execute two set kaka 123 and set master 127.0.0.1 on the master server, and then the slave6380 port can be successfully obtained, which means that our master-slave replication has been configured. However, the implementation of the production environment is not the end of the world. Later, the master-slave replication will be further optimized until high availability is achieved.


Redis master-slave replication working principle and common problems


##2. Use the configuration file to enable


Before using the configuration file to start master-slave replication! First, you need to disconnect the previous connection using the client command line, and execute

slaveof no one on the slave host to disconnect the master-slave replication.

Redis master-slave replication working principle and common problems

Where can I check that the slave node has disconnected from the master node? Enter the command line

info on the client of the master node to view


This picture is the information printed by entering info on the client of the master node after using the slave node to connect to the master node using the client command line. You can see that there is a piece of information about slave0.

Redis master-slave replication working principle and common problems

This picture is printed on the master node after the slave node executes slaveof no one info, indicating that the slave node has been disconnected from the master node.

Redis master-slave replication working principle and common problems

Start the redis service according to the configuration file, redis-server redis6380.conf


After the slave node is restarted, you can directly view the connection information of the slave node on the master node.

Redis master-slave replication working principle and common problems

# Test data, things written by the master node will still be automatically synchronized by the slave node.

Redis master-slave replication working principle and common problems


3. Start when starting the redis server


This method of configuration is also very simple. When starting the redis server, start the master-slave replication directly and execute the command: redis-server --slaveof host port.


4. View the log information after the master-slave replication is started


This is the log information of the master node

Redis master-slave replication working principle and common problems

This is the information of the slave node, which includes the connection information of the master node and RDB snapshot storage.

Redis master-slave replication working principle and common problems


##5. Working principle of master-slave replication


1. Three stages of master-slave replication


The complete workflow of master-slave replication is divided into the following three stages. Each segment has its own internal work flow, so we will talk about three process processes.


  • Connection establishment process: This process is the process of connecting slave to master
  • Data synchronization process: It is the process of master synchronizing data to slave
  • Command propagation process: It is repeated synchronization Data
    Redis master-slave replication working principle and common problems


2. Phase 1: Connection establishment process


Redis master-slave replication working principle and common problems

#The above picture is a complete master-slave replication connection establishment workflow. Then use short words to describe the above workflow.


  1. Set the master's address and port, save the master's information
  2. Establish a socket connection (what this connection does will be described below)
  3. Continue to send ping command
  4. Authentication
  5. Send slave port information


During the process of establishing a connection, the slave node will save the address and port of the master, and the master node master will save the port of the slave node slave.


3. Phase 2: Data synchronization phase process


Redis master-slave replication working principle and common problems

This picture describes in detail the data synchronization process when the slave node connects to the master node for the first time.


When the slave node connects to the master node for the first time, it will first perform a full copy. This full copy is unavoidable.


After the full replication is completed, the master node will send the data in the replication backlog buffer, and then the slave node will execute bgrewriteaof to restore the data, which is also partial replication.


Three new points are mentioned at this stage, full copy, partial copy, and copy buffer backlog. These points will be explained in detail in the FAQ below.


4. The third stage: command propagation stage


When the master database is modified and the data of the master and slave servers are inconsistent, the master and slave data will be synchronized to be consistent. This process is called command propagation.


#The master will send the received data change command to the slave, and the slave will execute the command after receiving the command to make the master-slave data consistent.


Partial replication during the command propagation phase


  • Occurs during the command propagation phase If the network is disconnected or the network jitters, the connection will be lost (connection lost)
  • At this time, the master node master will continue to write data to the replbackbuffer (replication buffer backlog area)
  • Slave node Will continue to try to connect to the master
  • When the slave node sends its runid and replication offset to the master node, and executes the pysnc command to synchronize
  • If the master determines the offset is within the copy buffer range, the continue command will be returned. And send the data in the copy buffer to the slave node.
  • Receive data from the slave node and execute bgrewriteaof to restore the data


6. Detailed introduction to the principle of master-slave replication (full replication and partial replication)


Redis master-slave replication working principle and common problems

This process is the most complete process explanation of master-slave replication. So let’s briefly introduce each step of the process


  1. Send instructions from the nodepsync ? 1 pync runid offset Find the corresponding runid to request data. But here you can consider that when the slave node connects for the first time, it does not know the runid and offset of the master node at all. So the first command sent is psync? 1 means that I want all the data of the master node.
  2. The master node starts to execute bgsave to generate the RDB file and record the current replication offset offset
  3. At this time, the master node will send its own runid and offset through the FULLRESYNC runid offset command to send the RDB file through the socket. to the slave node.
  4. The slave node receives FULLRESYNC, saves the runid and offset of the master node, then clears all current data, receives the RDB file through the socket, and starts to restore the RDB data.
  5. After full replication, the slave node has obtained the runid and offset of the master node and begins to send instructions psync runid offset
  6. The master node receives the instruction and determines whether the runid matches. , determine whether offset is in the copy buffer.
  7. The master node determines that one of the runid and offset is not satisfied, and will return to step 2 to continue performing full replication. The runid mismatch here may only be caused by restarting the slave node. This problem will be solved later. The offset (offset) mismatch is caused by the replication backlog buffer overflow. If the runid or offset check passes, if the offset of the slave node is the same as the offset of the master node, it will be ignored. If the runid or offset check passes and the offset of the slave node is different from the offset, CONTINUE offset (this offset belongs to the master node) will be sent, and the data from the slave node offset to the master node offset in the replication buffer will be sent through the socket.
  8. Receive CONTINUE from the node and save the master's offset. After receiving the information through the socket, execute bgrewriteaof to restore the data.


##1-4 is full copy 5-8 is partial copy


Under step 3 of the master node, the master node has been receiving client data during the master-slave replication period, and the offset of the master node has been changing. Only changes will be sent to each slave. This sending process is called the heartbeat mechanism


7. Heartbeat mechanism


In the command propagation stage, the master node and the slave node always need to exchange information. Switch and use the heartbeat mechanism for maintenance to keep the connection between the master node and the slave node online.


  • master heartbeat
    • Command: ping
    • Default is once every 10 seconds. It is determined by the parameter repl-ping-slave-period
    • The main thing it does is to determine whether the slave node is online
    • You can use info replication to check the interval of connection time after the slave node is rented. If lag is 0 or 1, it is a normal state.
  • slave heartbeat task
    • Command: replconf ack {offset}
    • Execute once per second
    • The main thing it does is to send its own replication offset to the master node, obtain the latest data change command from the master node, and also determine whether the master node is online.


Notes on the heartbeat phase

In order to ensure data stability, the master node will When the number of drops or the delay is too high. All information synchronization will be refused.


There are two parameters for configuration adjustment:


min-slaves-to-write 2


min-slaves-max-lag 8


This The two parameters indicate that there are only 2 slave nodes left, or when the delay of the slave node is greater than 8 seconds, the master node will forcibly turn off the master function and stop data synchronization.


#So what if the master node knows the data and delay time of the slave node hanging up! In the heartbeat mechanism, the slave will send the perlconf ack command every second. This command can carry the offset, the delay time of the slave node, and the number of slave nodes.


8. Three core elements of partial replication


1. Server’s running id (run id)


Let’s first take a look at what this run id is. You can see it by executing the info command. We can also see this when we look at the startup log information above.


Redis master-slave replication working principle and common problems

Redis will automatically generate a random id when it is started (it should be noted here that the id will be different every time it is started), which is composed of 40 random hexadecimal strings and is used to uniquely identify a redis node.


When the master-slave replication is first started, the master will send its runid to the slave, and the slave will save the master's id. We can use the info command to view it


Redis master-slave replication working principle and common problems


When disconnected and reconnected, the slave sends this id to the master , if the runid saved by the slave is the same as the current runid of the master, the master will try to use partial replication (another factor in whether this block can be copied successfully is the offset). If the runid saved by the slave is different from the current runid of the master, full copy will be performed directly.


2. Copy backlog buffer


The copy buffer backlog is a first-in-first-out queue, user storage Master command records for collecting data. The default storage space of the copy buffer is 1M.


You can modify repl-backlog-size 1mb in the configuration file to control the buffer size. This ratio can be modified according to your own server memory. Click About 30% is reserved here.


What exactly is stored in the copy buffer?


When executing a command as set name kaka, we can view the persistence file to view

Redis master-slave replication working principle and common problems

Then the copy backlog buffer is the stored aof persistent data, separated by bytes, and each byte has its own offset. This offset is also the copy offset (offset)

Redis master-slave replication working principle and common problems

Then why is it said that the copy buffer backlog may cause the full amount Copy it


In the command propagation phase, the master node will store the collected data in the replication buffer and then send it to the slave node. This is where the problem arises. When the amount of data on the master node is extremely large in an instant, and exceeds the memory of the replication buffer, some data will be squeezed out, resulting in data inconsistency between the master node and the slave node. To make a full copy. If the buffer size is not set appropriately, it may cause an infinite loop. The slave node will always copy in full, clear the data, and copy in full.


3. Copy offset


Redis master-slave replication working principle and common problems

The master node replication offset is to send a record once to the slave node, and the slave node is to receive a record once.


is used to synchronize information, compare the differences between the master node and the slave node, and restore data usage when the slave is disconnected.


#This value is the offset from the copy buffer backlog.


9. Common problems with master-slave replication


##1. Master node restart problem (internal optimization)


When the master node restarts, the value of runid will change, which will cause all slave nodes to perform full replication.


We don’t need to consider this issue, we just need to know how the system is optimized.


After the master-slave replication is established, the master node will create the master-replid variable. The generated strategy is the same as the runid, with a length of 41 bits and a runid length of 40 bits. Then sent to the slave node.


When the shutdown save command is executed on the master node, an RDB persistence will be performed and the runid and offset will be saved to the RDB file. You can use the command redis-check-rdb to view this information.


Redis master-slave replication working principle and common problems

Load the RDB file after the master node restarts, and load the repl-id and repl-offset in the file into memory. Even if all slave nodes are considered to be the previous master nodes.


2. The slave node network interrupt offset crosses the boundary, causing full replication


Due to poor network environment, the slave node Node network outage. The replication backlog buffer memory is too small, causing data overflow. Along with the slave node offset crossing the boundary, full replication occurs. This may result in repeated full copies.


Solution: Modify the size of the replication backlog buffer: repl-backlog-size


Setup recommendation: Test the master node connection The time of the slave node, obtains the average total number of commands generated by the master node per second write_size_per_second


Copy buffer space setting = 2 Master-slave connection time Master The total amount of data generated by the node per second


3. Frequent network interruption


Due to the CPU of the main node The occupancy is too high, or the slave node is frequently connected. The result of this situation is that various resources of the master node are seriously occupied, including but not limited to buffers, bandwidth, connections, etc.


Why are the master node resources severely occupied?


#In the heartbeat mechanism, the slave node will send a command replconf ack command to the master node every second.

The slave node executed a slow query, occupying a large amount of CPU

The master node called the replication timing function replicationCron every second, and then the slave node did not respond for a long time.


solution:


Set slave node timeout release


Set parameters: repl-timeout


This parameter defaults to 60 seconds . After 60 seconds, release the slave.


4. Data inconsistency problem


Due to network factors, the data of multiple slave nodes will be inconsistent. There is no way to avoid this factor.


There are two solutions to this problem:


The first data needs to be configured with a high degree of consistency The redis server uses one server for both reading and writing. This method is limited to a small amount of data, and the data needs to be highly consistent.


The second monitors the offset of the master-slave node. If the delay of the slave node is too large, the client's access to the slave node is temporarily blocked. Set the parameter to slave-serve-stale-data yes|no. Once this parameter is set, it can only respond to a few commands such as info slaveof.


10. Summary


This article mainly explains what is master-slave replication and the three major aspects of master-slave replication. Stages, workflows, and the three core components of partial replication. Heartbeat mechanism during the command propagation phase. Finally, common problems with master-slave replication are explained.

The above is the detailed content of Redis master-slave replication working principle and common problems. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn