Home  >  Article  >  Database  >  Asked about the Redis cluster in the interview, I was tortured to death...

Asked about the Redis cluster in the interview, I was tortured to death...

咔咔
咔咔Original
2020-08-28 17:23:421897browse

Sentinel is mainly a solution for single node failure that cannot automatically recover. Cluster is mainly a solution for single node capacity, concurrency issues, and linear scalability. This article uses the official The redis cluster provided. At the end of the article, you can set up the ssh background you want!

##Preface

Kaka compiled a road map to create an interview guide, and prepared to write articles according to such a road map. Later, I found that there were no added knowledge points. Add it. I also look forward to your partners helping to add it. See you in the comment area!

Asked about the Redis cluster in the interview, I was tortured to death...
Insert picture description here

The main content of this article Introduce the cluster around the following aspects

  • Cluster introduction
  • Cluster function
  • Configuring the cluster
  • Manual and automatic failover
  • Failover principle

Implementation environment of this article

  • centos 7.3
  • redis 4.0
  • redis working directory /usr/local/redis
  • All operations are performed in virtual machine simulation

1. Introduction to clusters

The cluster is to solve the problem of single-machine memory limit and concurrency in master-slave replication. If you are now The cloud service memory is 256GB. When this memory is reached, redis will no longer be able to provide services. At the same time, if the data volume reaches this point, the amount of data written will be very large, which can easily cause buffer overflow and cause unlimited full copy from the node. Master and slave are not working properly.

Asked about the Redis cluster in the interview, I was tortured to death...
Insert picture description here

Then we need to change the master-slave of a single machine to a many-to-many mode and all master nodes They are all connected together and communicate with each other. This method can not only share the memory of a single machine, but also distribute requests and improve the availability of the system.

As shown in the figure: When there are a large number of requests for writing, the instructions will no longer be sent to a single master node, but the instructions will be distributed to each master node to share the memory and avoid a large number of requests.

So how are instructions shunted and stored? We need to find out more about the cluster storage structure. Asked about the Redis cluster in the interview, I was tortured to death...

2. Cluster function

  • distributes the storage capacity of a single machine while It can also be easily expanded.
  • Diversion of access requests from a single machine
  • Improve the availability of the system

How to understand how to improve the availability of the system In terms of availability, let's look at the picture below. When master1 goes down, the impact on the system will not be that big, and normal services can still be provided.

At this time, someone will ask, how does the cluster work when master1 goes down? This question will be answered for you in the failover below. And this issue will be explained in detail in the principle chapterAsked about the Redis cluster in the interview, I was tortured to death...

3. Cluster storage structure

1. Storage structure

The storage of a single machine is to store the key directly into its own memory after the user initiates a request. Asked about the Redis cluster in the interview, I was tortured to death...The storage structure of the cluster is not that simple. First, what needs to be done when the user initiates a key command.

  • A value will be calculated through CRC16 (key)
  • Use this value modulo 16384 to get a value, we first think It is 28
  • This value 28 is the space location where the key is saved

Then the question now comes, in which redis storage should this key be stored? Inside the space! Asked about the Redis cluster in the interview, I was tortured to death...

In fact, redis has divided the storage space into 16384 parts after the cluster is started, and each host saves a part.

It should be noted here that the number I gave each redis storage space is equivalent to a small storage space (professional term "hash slot"), you can understand it as a The number inside the building. A building is the entire storage space of redis. The number of each house is equivalent to a storage space. This storage space will have a certain area to save the corresponding key, which is not the position after taking the model in the above picture. .

The 28 pointed by the arrow means that 28 will be stored in this area. This house may store 29, 30, 31, etc.

Asked about the Redis cluster in the interview, I was tortured to death...The question arises at this time, what should we do if we add or remove a machine? Look at pictures to speak, and try not to use words if you can use pictures to explain.

After adding a new machine, certain slots will be allocated to the new machine from the other three storage spaces. Here you can set how many slots you want to put in the new machine.

Similarly, after reducing a machine, the removed slots will be reallocated to other existing machines. Just like adding new nodes, you can specify the node receiving slots.

The so-called adding a node or removing a node means changing the location where the slot is stored. Asked about the Redis cluster in the interview, I was tortured to death...After understanding the storage structure of the cluster, we need to explain another question, how to design internal communication in the cluster! A value comes, a key is obtained, where to get the data, and we will look at this question below.

2. Communication design

#Each node in the cluster will send pings to other nodes in a certain period of time message, other nodes return pong as a response. After a period of time, all nodes will know the slot information of all nodes in the cluster.

There are three nodes in the figure below, then the 16384 hash slots will be divided into three parts.

are 0-5500, 5501-11000, 11001-16384 respectively

When a user initiates a key request, how does the cluster process the request?

The black box in the picture below represents the slot information of all nodes in the cluster, and there is a lot of other information in it. Asked about the Redis cluster in the interview, I was tortured to death...

As shown in the figure, the user initiates a request for key. After redis receives it, it calculates the slot position of the key and finds the corresponding node based on the slot position.

If the accessed slot is on the node itself, then the data corresponding to the key will be returned directly.

Otherwise, it will reply with a moved redirection error and return the correct node to the client.

Then resend the key command

Asked about the Redis cluster in the interview, I was tortured to death...
Insert picture description here

4. Configure the cluster

1. Modify the configuration file

Asked about the Redis cluster in the interview, I was tortured to death...You only need to pay attention to the configuration information in the click circle

  • cluster-enabled yes:Enable cluster mode
  • cluster-config-file nodes-6379.conf:Cluster configuration file
  • clustre-node-timeout 10000: Node timeout, here is set to 10s for the convenience of testing
Asked about the Redis cluster in the interview, I was tortured to death...
Insert here Image description

2. Build configuration files for 6 nodes and start them all

Please provide a command that can easily replace the file sed 's/6379/6380/g' 6379-redis.conf > 6380-redis.conf

Create it in this way Configuration files of 6 different ports

Asked about the Redis cluster in the interview, I was tortured to death...Open any configuration file to view and check whether the replacement is successfulAsked about the Redis cluster in the interview, I was tortured to death...In order to facilitate viewing log information, all are started in the foreground. And check whether the services are started normally. Execute the command ps -ef | grep redis

. You can see that there is an additional cluster identifier after startup, which means that they are all nodes in the cluster. Asked about the Redis cluster in the interview, I was tortured to death...All nodes have been started. The cluster startup instructions need to be based on ruby ​​(Kaka uses redis version 4.0). Next, install it together

3. Install ruby

Execute commandwget https://cache.ruby-lang.org/pub /ruby/2.7/ruby-2.7.1.tar.gz

Decompression: tar -xvzf ruby-2.7.1.tar.gz Decompress according to the version you downloaded

Installation:./configure | make | make installThese three instructions are completed in one go.

Check ruby ​​and gem versions: ruby -v

Asked about the Redis cluster in the interview, I was tortured to death...
Insert image description here

4. Start the cluster

The execution command of the cluster is in /usr/local/redis/src/redis-trib.rb

Note that if you need to use redis-trib directly. The rb command requires ln to the bin directory, otherwise the ./redis-trib.rb method must be used.

If you follow the steps, an error will appear hereAsked about the Redis cluster in the interview, I was tortured to death...Execute gem install redisUnfortunately, an error will also appear here. Asked about the Redis cluster in the interview, I was tortured to death...You then need to install yum install zlib-devel and yum install openssl-devel

After the installation is complete, execute ruby extconf.rb in /ruby-2.7.1/ext/openssl and /ruby-2.7.1/ext/zlib respectively. And execute make | make install

Then execute gem install redis and it will be OKAsked about the Redis cluster in the interview, I was tortured to death...At this time, go back and execute ./redis -trib.rb create --replicas 1 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384Asked about the Redis cluster in the interview, I was tortured to death...「Message Interpretation》

Create a cluster and assign hash slots to 6 nodes. The last three nodes are configured as slave nodes of the first three nodes.Asked about the Redis cluster in the interview, I was tortured to death...Display the hash slot information of each node and Node ID. In the last step, you need to enter yesAsked about the Redis cluster in the interview, I was tortured to death... to view the changes in the configuration file in the data directory. The main information in the configuration file is the slot assigned to each master nodeAsked about the Redis cluster in the interview, I was tortured to death...Asked about the Redis cluster in the interview, I was tortured to death..."View the running log of the host point"

The main information given here is cluster status changed: ok The cluster status is normalAsked about the Redis cluster in the interview, I was tortured to death...

##5. Cluster settings and data acquisition

When setting the data directly, an error will be reported, and the slot position after converting the name key is 5798, and the IP address and port number are given.

You need to use the commandAsked about the Redis cluster in the interview, I was tortured to death...redis-cli -c

When setting the value, you will be prompted to redirect to the slot of 5798

Next, you will get the data and it will automatically switch. node. Asked about the Redis cluster in the interview, I was tortured to death...Asked about the Redis cluster in the interview, I was tortured to death...

5. Failover

1. Cluster slave Node offline

#According to the above cluster startup information, we know that port 6383 is the slave node of 6379.

The next step is to let 6383 go offline to view the log information of 6379.

6379 will report that connection 6383 is lost and mark it as fail, indicating that it is unavailable. At this time, the cluster is still working normally.

"Summary: Going offline from the slave node has no impact on the cluster" Asked about the Redis cluster in the interview, I was tortured to death...When port 6383 comes online, all nodes will clear the fail markAsked about the Redis cluster in the interview, I was tortured to death...

2. The cluster master node goes offline

#Manually offline the master node 6379 and check the log information of the slave node 6383

At this time, node 6383 will continue to connect to 6379 for a total of 10 times. Then why 10 times! It is determined based on the parameters we configured cluster-node-timeout 10. The information given here is to connect once per second

until the time expires and then start failover.

At this time, 6383 was successful in the failover election, and the slave turned over and became the master node. Asked about the Redis cluster in the interview, I was tortured to death...At this time, check the node information of the cluster with the command cluster nodes.

You will find that there are four master nodes here, but one of the master nodes is offlineAsked about the Redis cluster in the interview, I was tortured to death...「6379 original master node comes online」

After 6379 comes online, the same All nodes will also clear fail information.

And the node information will also change. At this time, 6379 changes to the slave node of 6383. Asked about the Redis cluster in the interview, I was tortured to death...

3. Add a new master node

##Add two new ports 6385 and 6386

Execute new command Asked about the Redis cluster in the interview, I was tortured to death..../redis-trib.rb add-node 127.0.0.1:6385 127.0.0.1:6379, what is sent here is the meet message

Execute the add-node command, The first parameter is the IP port of the new node and the second parameter is the node that already exists in the cluster. According to the figure below, we can see that the newly added nodes already exist in the cluster.

"Note: Although 6385 has become a node in the cluster, it is different from other nodes. It has no data, that is, no hash slot" Next we will Some hash slots in the cluster need to be allocated to this new node. After the allocation is completed, this node will become the real master nodeAsked about the Redis cluster in the interview, I was tortured to death...

Execute the command

./redis-trib.rb reshard 127.0.0.1:6385

will prompt how many hash slots to transfer and fill in the

id

of the receiving node. The last step asks whether to transfer from all nodes: Kaka uses

all

Use command: cluster nodes Check that the node 6385 already has hash slots in three rangesAsked about the Redis cluster in the interview, I was tortured to death...

"The master node has been added , then you need to configure a slave node 6386 for the master node 6385"

Command: ./redis-trib.rb add-node --slave --master-id dcc0ec4d0c932ac5c35ae76af4f9c5d27a422d9f 127.0.0.1:6386 127.0.0.1:6385

master-id is the id of 6385. The first parameter is the ip port of the new node and the second parameter is the specified master node ip portAsked about the Redis cluster in the interview, I was tortured to death...

4. Manual fault migration

#If you want to upgrade the master node in the cluster, you can manually perform the fault migration Move to the slave node to avoid affecting cluster availability.

Execute the command on the slave node: cluster failover

「Execution process」

View the node information and you can see 6386 This node has become the master point.

When the cluster failover command is sent to the slave node, the slave node will send the CLUSTERMSG_TYPE_MFSTART packet to the master node. The slave node requests the master node to stop access, so that the data offsets of the two are consistent.

At this time, the client will not connect to the eliminated master node. At the same time, the master node sends the replication offset to the slave node. After the slave node obtains the replication offset, the failover begins, and then notifies the master node to perform configuration switching. When the client The client is unlocked on the old master and reconnects to the new master node. Asked about the Redis cluster in the interview, I was tortured to death...

6. Principles of Failover

In the above we tested the failover. After the line, the slave node becomes the master node. Next, analyze this process.

1. Fault discovery to confirmation

Each node in the cluster will periodically send pings to other nodes message, the receiver responds with pong.

If the ping message continues to fail within the cluster-node-timeout time, the receiving node will be marked as pfail status, that is, subjectively offline.

Is this offline status very familiar? Yes, this is somewhat similar to the sentinel's judgment of whether the master node is abnormal. When a sentinel detects a problem with the master node, it will also mark the master node as objectively offline (s_down). I suddenly realized that I was off topic, embarrassing...

Asked about the Redis cluster in the interview, I was tortured to death... Let me mention the sentinels. When a sentinel thinks that the master node is abnormal, it will be marked subjectively offline. But how can other sentinels agree? You can't What is said is what is said. They will all try to connect to the abnormal master node. When more than half of the sentinels believe that the master node is abnormal, they will directly take the master node offline objectively.

Similarly, the cluster will not just judge that its status is offline because of a node. The nodes are directly propagated through gossip messages. The nodes in the cluster will continuously collect the offline feedback of the faulty node and store it under the local faulty node. Online reporting. When more than half of the cluster master nodes are marked as subjectively offline, the status changes to objectively offline.

Finally, a fail message is broadcast to the cluster, notifying all nodes to mark the failed node as objectively offline.

For example: Node A sends a ping to node B and then marks node B as pfail after a communication abnormality. Afterwards, node A will continue to send pings to node C and carry the pfail information of node B. Then node C will save the fault of node B offline. Reporting. When the number of offline reports is greater than half of the number of master nodes with hash slots, it will try to objectively go offline.

2. Fault recovery (the slave node will turn over from then on and the slave will sing)

When the fault node is defined as objective After going offline, all slave nodes of the faulty node bear the responsibility for fault recovery.

Fault recovery is a fault recovery process that will be executed after the slave node discovers that its host point is objectively offline through scheduled tasks.

『1. Qualification check』

All slave nodes will check the last connection time with the master node, and the disconnection time is greater than cluster-node-time*cluster -slave-validity-factor is not eligible for failover.

『2. Preparation for the election time』

Let’s first talk about why there is a preparation time for the election.

If there are multiple slave nodes after the qualification check, then different delay election times need to be used to support priority. The priority here is Based on the replication offset, the larger the offset, the smaller the delay between the failed master node and the greater the chance of replacing the master node.

The main function is to ensure that the node with the best data consistency initiates the election first

「3. Election voting」

Voting mechanism of redis cluster Slave nodes are not used for leader election. Remember not to confuse this with sentry. The voting mechanism of the cluster is based on the host points holding the slots.

The slave node of the failed node will broadcast a FAILOVER_AUTH_REQUEST packet to all master nodes holding slots to request votes.

When the master node replies to the FAILOVER_AUTH_ACK vote, it cannot vote for other slave nodes during the period of NODE_TIMEOUT * 2

After the slave node obtains more than half of the votes, it will proceed to the failure recovery phase

『4. Failover』

The successfully elected slave node cancels the replication change The master node

Deletes the slot of the failed node and entrusts the slot of the failed node to itself

Broadcasts its own pong message to the cluster, notifying the host of changes and taking over the failed node's slot slot information.

The ssh background you want! ! !

A redis sentinel article that took two nights to complete, but your focus is not on the article itself, ah! The editor is heartbroken

In order to meet everyone's requirements, Kaka reluctantly talks about how to set up a bright and blind background. Asked about the Redis cluster in the interview, I was tortured to death...The tool used by Kaka is xsheelAsked about the Redis cluster in the interview, I was tortured to death...Open the tool selection optionAsked about the Redis cluster in the interview, I was tortured to death...Then you can set xsheel transparency by checking if there is a transparent window. Asked about the Redis cluster in the interview, I was tortured to death...Yes! You are right. This is the desktop background. Are you ready to start setting it up? How about coming back after setting it up and reading the article? Kaka also needs experts from all walks of life to provide technical points and correct mistakes.

Persistence in learning, persistence in blogging, and persistence in sharing are the beliefs that Kaka has always upheld since his career. I hope that Kaka’s articles in the huge Internet can bring you a little Please help. See you in the next issue.

Recommendation: "redis tutorial"

The above is the detailed content of Asked about the Redis cluster in the interview, I was tortured to death.... For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn