❝Sentinel is mainly a solution for single node failure that cannot automatically recover. Cluster is mainly a solution for single node capacity, concurrency issues, and linear scalability. This article uses the official The redis cluster provided. At the end of the article, you can set up the ssh background you want!
❞
❝Kaka compiled a road map to create an interview guide, and prepared to write articles according to such a road map. Later, I found that there were no added knowledge points. Add it. I also look forward to your partners helping to add it. See you in the comment area!❞
The cluster is to solve the problem of single-machine memory limit and concurrency in master-slave replication. If you are now The cloud service memory is 256GB. When this memory is reached, redis will no longer be able to provide services. At the same time, if the data volume reaches this point, the amount of data written will be very large, which can easily cause buffer overflow and cause unlimited full copy from the node. Master and slave are not working properly.
Then we need to change the master-slave of a single machine to a many-to-many mode and all master nodes They are all connected together and communicate with each other. This method can not only share the memory of a single machine, but also distribute requests and improve the availability of the system.
As shown in the figure: When there are a large number of requests for writing, the instructions will no longer be sent to a single master node, but the instructions will be distributed to each master node to share the memory and avoid a large number of requests.
So how are instructions shunted and stored? We need to find out more about the cluster storage structure.
How to understand how to improve the availability of the system In terms of availability, let's look at the picture below. When master1 goes down, the impact on the system will not be that big, and normal services can still be provided.
At this time, someone will ask, how does the cluster work when master1 goes down? This question will be answered for you in the failover below. And this issue will be explained in detail in the principle chapter
The storage of a single machine is to store the key directly into its own memory after the user initiates a request. The storage structure of the cluster is not that simple. First, what needs to be done when the user initiates a key command.
Then the question now comes, in which redis storage should this key be stored? Inside the space!
In fact, redis has divided the storage space into 16384
parts after the cluster is started, and each host saves a part.
It should be noted here that the number I gave each redis storage space is equivalent to a small storage space (professional term "hash slot"
), you can understand it as a The number inside the building. A building is the entire storage space of redis. The number of each house is equivalent to a storage space. This storage space will have a certain area to save the corresponding key, which is not the position after taking the model in the above picture. .
The 28 pointed by the arrow means that 28 will be stored in this area. This house may store 29, 30, 31, etc.
The question arises at this time, what should we do if we add or remove a machine? Look at pictures to speak, and try not to use words if you can use pictures to explain.
After adding a new machine, certain slots will be allocated to the new machine from the other three storage spaces. Here you can set how many slots you want to put in the new machine.
Similarly, after reducing a machine, the removed slots will be reallocated to other existing machines. Just like adding new nodes, you can specify the node receiving slots.
The so-called adding a node or removing a node means changing the location where the slot is stored. After understanding the storage structure of the cluster, we need to explain another question, how to design internal communication in the cluster! A value comes, a key is obtained, where to get the data, and we will look at this question below.
#Each node in the cluster will send pings to other nodes in a certain period of time message, other nodes return pong as a response. After a period of time, all nodes will know the slot information of all nodes in the cluster.
There are three nodes in the figure below, then the 16384 hash slots will be divided into three parts.
are 0-5500, 5501-11000, 11001-16384 respectively
When a user initiates a key request, how does the cluster process the request?
The black box in the picture below represents the slot information of all nodes in the cluster, and there is a lot of other information in it.
As shown in the figure, the user initiates a request for key. After redis receives it, it calculates the slot position of the key and finds the corresponding node based on the slot position.
If the accessed slot is on the node itself, then the data corresponding to the key will be returned directly.
Otherwise, it will reply with a moved redirection error and return the correct node to the client.
Then resend the key command
You only need to pay attention to the configuration information in the click circle
Please provide a command that can easily replace the file sed 's/6379/6380/g' 6379-redis.conf > 6380-redis.conf
Create it in this way Configuration files of 6 different ports
Open any configuration file to view and check whether the replacement is successfulIn order to facilitate viewing log information, all are started in the foreground. And check whether the services are started normally. Execute the command ps -ef | grep redis
. You can see that there is an additional cluster identifier after startup, which means that they are all nodes in the cluster. All nodes have been started. The cluster startup instructions need to be based on ruby (Kaka uses redis version 4.0). Next, install it together
Execute commandwget https://cache.ruby-lang.org/pub /ruby/2.7/ruby-2.7.1.tar.gz
Decompression: tar -xvzf ruby-2.7.1.tar.gz
Decompress according to the version you downloaded
Installation:./configure | make | make install
These three instructions are completed in one go.
Check ruby and gem versions: ruby -v
The execution command of the cluster is in /usr/local/redis/src/redis-trib.rb
Note that if you need to use redis-trib directly. The rb
command requires ln to the bin directory, otherwise the ./redis-trib.rb
method must be used.
If you follow the steps, an error will appear hereExecute gem install redis
Unfortunately, an error will also appear here. You then need to install yum install zlib-devel
and yum install openssl-devel
After the installation is complete, execute ruby extconf.rb in
/ruby-2.7.1/ext/openssl and
/ruby-2.7.1/ext/zlib respectively.
And execute make | make install
Then execute gem install redis
and it will be OKAt this time, go back and execute ./redis -trib.rb create --replicas 1 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384
「Message Interpretation》
Create a cluster and assign hash slots to 6 nodes. The last three nodes are configured as slave nodes of the first three nodes.Display the hash slot information of each node and Node ID. In the last step, you need to enter yes
to view the changes in the configuration file in the data directory. The main information in the configuration file is the slot assigned to each master node"View the running log of the host point"
The main information given here is cluster status changed: ok The cluster status is normal
You need to use the commandredis-cli -c
Next, you will get the data and it will automatically switch. node.
6379 will report that connection 6383 is lost and mark it as fail, indicating that it is unavailable. At this time, the cluster is still working normally.
"Summary: Going offline from the slave node has no impact on the cluster" When port 6383 comes online, all nodes will clear the fail mark
#Manually offline the master node 6379 and check the log information of the slave node 6383
At this time, node 6383 will continue to connect to 6379 for a total of 10 times. Then why 10 times!
It is determined based on the parameters we configured cluster-node-timeout 10
. The information given here is to connect once per second
until the time expires and then start failover.
At this time, 6383 was successful in the failover election, and the slave turned over and became the master node. At this time, check the node information of the cluster with the command cluster nodes
.
You will find that there are four master nodes here, but one of the master nodes is offline「6379 original master node comes online」
After 6379 comes online, the same All nodes will also clear fail information.
And the node information will also change. At this time, 6379 changes to the slave node of 6383.
Execute new command ./redis-trib.rb add-node 127.0.0.1:6385 127.0.0.1:6379, what is sent here is the meet message
"Note: Although 6385 has become a node in the cluster, it is different from other nodes. It has no data, that is, no hash slot" Next we will Some hash slots in the cluster need to be allocated to this new node. After the allocation is completed, this node will become the real master node
Execute the command./redis-trib.rb reshard 127.0.0.1:6385
id
all
Use command: cluster nodes
Check that the node 6385 already has hash slots in three ranges
"The master node has been added , then you need to configure a slave node 6386 for the master node 6385"
Command: ./redis-trib.rb add-node --slave --master-id dcc0ec4d0c932ac5c35ae76af4f9c5d27a422d9f 127.0.0.1:6386 127.0.0.1:6385
master-id is the id of 6385. The first parameter is the ip port of the new node and the second parameter is the specified master node ip port
#If you want to upgrade the master node in the cluster, you can manually perform the fault migration Move to the slave node to avoid affecting cluster availability.
Execute the command on the slave node: cluster failover
「Execution process」
View the node information and you can see 6386 This node has become the master point.
When the cluster failover command is sent to the slave node, the slave node will send the CLUSTERMSG_TYPE_MFSTART packet to the master node. The slave node requests the master node to stop access, so that the data offsets of the two are consistent.
At this time, the client will not connect to the eliminated master node. At the same time, the master node sends the replication offset to the slave node. After the slave node obtains the replication offset, the failover begins, and then notifies the master node to perform configuration switching. When the client The client is unlocked on the old master and reconnects to the new master node.
In the above we tested the failover. After the line, the slave node becomes the master node. Next, analyze this process.
Each node in the cluster will periodically send pings to other nodes message, the receiver responds with pong.
If the ping message continues to fail within the cluster-node-timeout time, the receiving node will be marked as pfail status, that is, subjectively offline.
Is this offline status very familiar? Yes, this is somewhat similar to the sentinel's judgment of whether the master node is abnormal. When a sentinel detects a problem with the master node, it will also mark the master node as objectively offline (s_down). I suddenly realized that I was off topic, embarrassing...
Let me mention the sentinels. When a sentinel thinks that the master node is abnormal, it will be marked subjectively offline. But how can other sentinels agree? You can't What is said is what is said. They will all try to connect to the abnormal master node. When more than half of the sentinels believe that the master node is abnormal, they will directly take the master node offline objectively.
Similarly, the cluster will not just judge that its status is offline because of a node. The nodes are directly propagated through gossip messages. The nodes in the cluster will continuously collect the offline feedback of the faulty node and store it under the local faulty node. Online reporting. When more than half of the cluster master nodes are marked as subjectively offline, the status changes to objectively offline.
Finally, a fail message is broadcast to the cluster, notifying all nodes to mark the failed node as objectively offline.
For example: Node A sends a ping to node B and then marks node B as pfail after a communication abnormality. Afterwards, node A will continue to send pings to node C and carry the pfail information of node B. Then node C will save the fault of node B offline. Reporting. When the number of offline reports is greater than half of the number of master nodes with hash slots, it will try to objectively go offline.
When the fault node is defined as objective After going offline, all slave nodes of the faulty node bear the responsibility for fault recovery.
Fault recovery is a fault recovery process that will be executed after the slave node discovers that its host point is objectively offline through scheduled tasks.
『1. Qualification check』
All slave nodes will check the last connection time with the master node, and the disconnection time is greater than cluster-node-time*cluster -slave-validity-factor is not eligible for failover.
『2. Preparation for the election time』
Let’s first talk about why there is a preparation time for the election.
If there are multiple slave nodes after the qualification check, then different delay election times need to be used to support priority. The priority here is Based on the replication offset, the larger the offset, the smaller the delay between the failed master node and the greater the chance of replacing the master node.
The main function is to ensure that the node with the best data consistency initiates the election first
「3. Election voting」
Voting mechanism of redis cluster Slave nodes are not used for leader election. Remember not to confuse this with sentry. The voting mechanism of the cluster is based on the host points holding the slots.
The slave node of the failed node will broadcast a FAILOVER_AUTH_REQUEST packet to all master nodes holding slots to request votes.
When the master node replies to the FAILOVER_AUTH_ACK vote, it cannot vote for other slave nodes during the period of NODE_TIMEOUT * 2
After the slave node obtains more than half of the votes, it will proceed to the failure recovery phase
『4. Failover』
The successfully elected slave node cancels the replication change The master node
Deletes the slot of the failed node and entrusts the slot of the failed node to itself
Broadcasts its own pong message to the cluster, notifying the host of changes and taking over the failed node's slot slot information.
A redis sentinel article that took two nights to complete, but your focus is not on the article itself, ah! The editor is heartbroken
In order to meet everyone's requirements, Kaka reluctantly talks about how to set up a bright and blind background. The tool used by Kaka is xsheelOpen the tool selection optionThen you can set xsheel transparency by checking if there is a transparent window. Yes! You are right. This is the desktop background. Are you ready to start setting it up? How about coming back after setting it up and reading the article? Kaka also needs experts from all walks of life to provide technical points and correct mistakes.
❝Persistence in learning, persistence in blogging, and persistence in sharing are the beliefs that Kaka has always upheld since his career. I hope that Kaka’s articles in the huge Internet can bring you a little Please help. See you in the next issue.
❞
Recommendation: "redis tutorial"
The above is the detailed content of Asked about the Redis cluster in the interview, I was tortured to death.... For more information, please follow other related articles on the PHP Chinese website!