What is the principle of Sentinel failover in Redis?-Redis-php.cn

Home

Database

Redis

What is the principle of Sentinel failover in Redis?

王林

May 27, 2023 am 10:55 AM

redis

What is Sentinel?

Sentinel is a high-availability solution for redis. The master-slave replication we talked about earlier is the basis of high availability, but pure master-slave replication requires manual intervention to complete. Failover, Sentinel can solve this problem. In the case of master-slave replication, when the master node fails, Sentinel can automatically detect the failure and complete the failover to achieve true redis high availability. In the sentinel cluster, sentinel will monitor the status of all redis servers and other sentinel nodes, detect failures in time and complete transfer, thereby ensuring the high availability of redis.

Building a Sentinel Cluster

Although Sentinel is essentially a Redis service, it provides different functions from ordinary Redis services. Sentinel is a distributed architecture, because if you want to ensure the high availability of redis, you first need to ensure your own high availability, so if we need to build a sentinel, we need to deploy at least three instances, preferably an odd number, because in subsequent failover There will be voting involved.

We can download the sentinel configuration file under the redis GitHub project. There is a file called sentinel.conf under the project. You can use it as our sentinel configuration template. Of course, you can also use redis.conf Configuration file, just add the sentinel related configuration.

There are not many configuration items related to Sentinel. There are mainly the following configuration items:

// 端口号，默认是 redis 实例+20000，所以我们沿用这个规则就好了 port 26379  // 是否守护进程运行 daemonize yes // 日志存放的位置，这个非常重要，通过日志可以查看故障转移的过程 logfile "26379.log"  // 监视一个名为 mymaster（自定义） 的 redis 主服务器， 这个主服务器的 IP 地址为 127.0.0.1 ， 端口号为 6379 ， // 最后面的 2 代表着至少有两个哨兵认为主服务器出现故障才会进行故障转移，否则认定主服务未失效 sentinel monitor mymaster 127.0.0.1 6379 2  // 哨兵判断服务器失效的响应时间，超过这个时间未接收到服务器的回应，就认为该服务器失效了 sentinel down-after-milliseconds mymaster 30000  // 完成故障转移之后，最多多少个从服务器可以同时发起数据复制，数字越小，说明完成全部从服务数据复制的时间越长 // 数字越大，对主服务器的压力就变大了 sentinel parallel-syncs mymaster 1  // 故障转移超时时间 sentinel failover-timeout mymaster 180000

Except for the different port and logfile configurations for each Sentinel instance, other configuration items are the same. . After modifying the configuration, we can use the ./redis-sentinel sentinel.conf command to start the sentinel. The command is similar to the redis instance startup. Because the sentinel is also a redis instance, we can use the ./redis-cli -p 26379 info sentinel command to view it. The current sentinel information is as shown in the figure below:

What is the principle of Sentinel failover in Redis?

Sentinel information

Question: How to discover the slave server and other servers when only the master server is configured Sentinel ?

Slave server discovery, Sentinel can obtain slave server information by asking the master server. For discovery of other Sentinel nodes, it is implemented through the publish and subscribe function, and is achieved by sending information to the channel sentinel:hello There are mainly two steps:

1. Each Sentinel will send a message to the sentinel:hello channel of all master services and slave servers through the publish and subscribe function every 2 seconds. The message contains Sentinel IP address, port number and running ID (runid)

2. Each Sentinel subscribes to the sentinel:hello channel of all master servers and slave servers monitored by it, and looks for sentinels that have not appeared before ( looking for unknown sentinels). When a Sentinel discovers a new Sentinel, it adds the new Sentinel to a list that holds all other Sentinels known to the Sentinel and monitoring the same primary server

Sentinel failover principle

Failover is the main job of Sentinel. The implementation logic behind it is also very complicated. Please check the relevant books for the specific implementation logic. I have summarized the following three points about Sentinel's failover:

1. Listening server

Each Sentinel node sends a ping command to the master node, slave node, and other Sentinel nodes every 1 second for heartbeat detection to determine the status of the server.

The node will also respond accordingly to Sentinel. Among these replies, the following three replies are valid replies:

Return PONG
Return-LOADING
Return-MASTERDOWN

If the node is in master-down-after-milliseconds set in the sentinel configuration file Within the value of the option, if there is no valid reply even once, then Sentinel will mark the server as offline. We call this kind of offline as subjective offline, which means that only this sentinel thinks that the server is offline.

If the server that is subjectively offline is the main server, in order to confirm whether the main server is really offline, the Sentinel will ask other Sentinels that are also monitoring the main server to see if they also think that The main server enters the offline state. When enough Sentinels believe that the main server is offline, the Sentinel will judge the main server as objectively offline. This is truly offline, and will perform a failover operation on it.

2. Elect Sentinel nodes to complete the transfer task

Failure transfer is not completed by all sentinels together, but by electing a sentinel node as the leader to complete this Therefore, when the main server is marked as objectively offline, the sentinels will elect a leader through the Raft algorithm to complete the failover work. The general rules and methods are as follows. Redis conducts sentinel leader election

All online sentinels are eligible to be elected as leaders, which means that every sentinel has the opportunity to become a leader.
When sentinel marks the master server as subjectively offline, it will send the sentinel is-master-down-by-addr command to other Sentinel nodes, requesting to set itself as the leader
The Sentinel node that receives the command adopts the first-come-first-served rule. If it has not agreed to the sentinel is-master-down-by-addr command of other Sentinel nodes, it will agree to the request, otherwise it will be rejected
If the Sentinel node finds that it has more than half of the votes, it will become the leader
If no one is elected within the specified time sentinel leader, then it will be re-elected after a period of time until the sentinel leader is elected.

3. Elect the new master server to complete the failover

The elected sentinel leader will complete the remaining failover work and failover There are mainly three steps:

(1) Select a new master server

Select a slave among all the slave servers of the offline master server. server and convert it to the master server. The rules for selecting a new master server are as follows:

Among the slave servers under the failed master server, those that are marked as subjective offline, The slave servers that have been disconnected or the last reply to the PING command is more than five seconds will be eliminated.
Among the slave servers under the failed master server, those slave servers that are related to the failed master server The slave servers whose connection is disconnected for more than ten times the time specified by the down-after option will be eliminated
After the above two rounds of elimination, the remaining slave servers will be selected. The slave server with the largest replication offset becomes the new master server; if the replication offset is unavailable, or the replication offsets of the slave servers are the same, then the slave server with the smallest running ID becomes the new master server. The master server

executes the slaveof no one command on the selected slave server to make it the master node.

(2) Modify the replication targets of other slave servers

When the new master server appears, the next step the sentinel leader needs to do is to let other slave servers The server replicates the new master server by sending the slaveof new_master port command to other slave servers. The replication rules are related to the parallel-syncs parameter of the configuration file

(3) Change the old master server to As a slave server

The last thing to do in the failover operation is to set the offline master server as the slave server of the new master service, keep an eye on it, and command it to go after it recovers. Copy the new master node.

The above is the detailed content of What is the principle of Sentinel failover in Redis?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:亿速云. If there is any infringement, please contact admin@php.cn delete

Redis: Beyond SQL - The NoSQL PerspectiveMay 08, 2025 am 12:25 AM

Redis goes beyond SQL databases because of its high performance and flexibility. 1) Redis achieves extremely fast read and write speed through memory storage. 2) It supports a variety of data structures, such as lists and collections, suitable for complex data processing. 3) Single-threaded model simplifies development, but high concurrency may become a bottleneck.

Redis: A Comparison to Traditional Database ServersMay 07, 2025 am 12:09 AM

Redis is superior to traditional databases in high concurrency and low latency scenarios, but is not suitable for complex queries and transaction processing. 1.Redis uses memory storage, fast read and write speed, suitable for high concurrency and low latency requirements. 2. Traditional databases are based on disk, support complex queries and transaction processing, and have strong data consistency and persistence. 3. Redis is suitable as a supplement or substitute for traditional databases, but it needs to be selected according to specific business needs.

Redis: Introduction to a Powerful In-Memory Data StoreMay 06, 2025 am 12:08 AM

Redisisahigh-performancein-memorydatastructurestorethatexcelsinspeedandversatility.1)Itsupportsvariousdatastructureslikestrings,lists,andsets.2)Redisisanin-memorydatabasewithpersistenceoptions,ensuringfastperformanceanddatasafety.3)Itoffersatomicoper

Is Redis Primarily a Database?May 05, 2025 am 12:07 AM

Redis is primarily a database, but it is more than just a database. 1. As a database, Redis supports persistence and is suitable for high-performance needs. 2. As a cache, Redis improves application response speed. 3. As a message broker, Redis supports publish-subscribe mode, suitable for real-time communication.

Redis: Database, Server, or Something Else?May 04, 2025 am 12:08 AM

Redisisamultifacetedtoolthatservesasadatabase,server,andmore.Itfunctionsasanin-memorydatastructurestore,supportsvariousdatastructures,andcanbeusedasacache,messagebroker,sessionstorage,andfordistributedlocking.

Redis: Unveiling Its Purpose and Key ApplicationsMay 03, 2025 am 12:11 AM

Redisisanopen-source,in-memorydatastructurestoreusedasadatabase,cache,andmessagebroker,excellinginspeedandversatility.Itiswidelyusedforcaching,real-timeanalytics,sessionmanagement,andleaderboardsduetoitssupportforvariousdatastructuresandfastdataacces

Redis: A Guide to Key-Value Data StoresMay 02, 2025 am 12:10 AM

Redis is an open source memory data structure storage used as a database, cache and message broker, suitable for scenarios where fast response and high concurrency are required. 1.Redis uses memory to store data and provides microsecond read and write speed. 2. It supports a variety of data structures, such as strings, lists, collections, etc. 3. Redis realizes data persistence through RDB and AOF mechanisms. 4. Use single-threaded model and multiplexing technology to handle requests efficiently. 5. Performance optimization strategies include LRU algorithm and cluster mode.

Redis: Caching, Session Management, and MoreMay 01, 2025 am 12:03 AM

Redis's functions mainly include cache, session management and other functions: 1) The cache function stores data through memory to improve reading speed, and is suitable for high-frequency access scenarios such as e-commerce websites; 2) The session management function shares session data in a distributed system and automatically cleans it through an expiration time mechanism; 3) Other functions such as publish-subscribe mode, distributed locks and counters, suitable for real-time message push and multi-threaded systems and other scenarios.

See all articles