關於Redis集群故障的分析-php教程-PHP中文網

首頁

後端開發

php教程

關於Redis集群故障的分析

小云云

Dec 14, 2017 pm 03:04 PM

redis分析故障

Redis叢集是一個實現分散式並且允許單點故障的Redis進階版本。 Redis叢集沒有最重要或說中心節點，這個版本最主要的一個目標是設計一個線性可伸縮（可隨意增刪節點）的功能。

本文主要介紹了詳細分析Redis叢集故障的相關內容，希望能幫助大家。

故障表象：

業務層面顯示提示查詢redis失敗

叢集組成：

3主3從，每個節點的資料有8GB

#機器分佈：

在同一個機架中，

xx.x.xxx.199
xx.x.xxx.200
xx.x.xxx.201

redis-server進程狀態：

透過指令ps -eo pid,lstart | grep $pid，

發現進程已經持續執行了3個月

發生故障前叢集的節點狀態：

xx.x.xxx.200:8371(bedab2c537fe94f8c0363ac4ae97d56832316e65) master
xx.x.xxx.199:8373(792020fe66c00ae56e27cd7a048ba6bb2b67adb6) slave
#xx.x.xxx.201:8375(5ab4f85306da6#xx.x.xxx.201:8375(5ab4f85306da6d633e483fx 01:8372(826607654f5ec81c3756a4a21f357e644efe605a) slave
xx.x.xxx.199:8370(462cadcb41e635d460425430d318f2fe464665c5) master
#xx.x.xxx.200:8374(1238085b578390f388da#adsl
--------- ------------------------下面是日誌分析---------------------- ------------------

步驟1：

主節點8371失去和從節點8373的連接：

46590:M 09 Sep 18:57:51.379 # Connection with slave xx.x.xxx.199:8373 lost.

主節點8370/8375判定8371失聯：42645:M 09 Sep 18:57:50.117 * Marking node b編輯##從節點8372/8373/8374收到主節點8375說8371失聯：
46986:S 09 Sep 18:57:50.120 * FAIL message received from 5ab4f85306dad633306dbabout 7fe94f8c0363ac4ae97d56832316e65

步驟4：
主節點8370/8375授權8373升級為主節點轉移：42645:M 09 Sep 18:57:51.055 # Failover auth granted to 792020fe66c0056e # Failover auth granted to 792020fe66c0056e for 47686276200b^oon^m

##步驟5：

原主節點8371修改自己的配置，成為8373的從節點：46590:M 09 Sep 18:57:51.488 # Configuration change detected. Reconfiguring myself as a replica of 792020fe66c00ae56e27cd7a048ba6bb2b67adb6

#步驟6：

#主節點8370/8375/837308375/17370837221370/8375/8373：F83725: 狀態失敗。 18:57:51.522 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: master without slots is reachable again.

#137313731371374個節點。，第一次全量同步資料：

8373日誌：：4255:M 09 Sep 18:57:51.906 * Full resync requested by slave xx.x.xxx.200:83714255:M 09 Sep 18:57:51.906 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 18:57:51.941 * Background saving started by pid 5230
8371日誌：##46590:#46 Sep 18:57:51.948 * Full resync from master: d7751c4ebf1e63d3baebea1ed409e0e7243a4423:440721826993

##182683
##175/23

##1753222275：275222223

1475：2312275：23

##132 ##42645:M 09 Sep 18:58:00.320 * Marking node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 as failing (quorum reached).

#11adb6 2#1/

#11382#1382#122#1/##1382#72122#2#122#122

#1382#721222#122

##1382#7/# 8373(新主)恢復：
60295:M 09 Sep 18:58:18.181 * Clear FAIL state for node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6: is# reachable asserving some#. 步驟10：

主節點8373完成全量同步所需的BGSAVE操作：
5230:C 09 Sep 18:59:01.474 * DB saved on disk
5230:C 09 Sep 18:59:01.491 * RDB: 7112 MB of memory used by copy-on-write

4255:M 09 Sep 18:59:01.877 * Background saving terminated with success

步驟11：

從節點8371開始從主節點8373接收到資料：

46590:S 09 Sep 18:59:02.263 * MASTER SLAVE sync: receiving 2657606930 by> master#ync: receiving 2657606930 by>

##步驟12：

主節點8373發現從節點8371對output buffer作了限制：
4255:M 09 Sep 19:00:19.014 # Client id =14259015 addr=xx.x.xxx.200:21772 fd=844 name= age=148 idle=148 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl= 16349 oll=4103 omem=95944066 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
4255:M 09 Sep 19:00:19.0155 #xxx $. 8371 lost.

###

步驟13：
從節點8371從主節點8373同步資料失敗，連線斷了，第一次全量同步失敗：
46590:S 09 Sep 19:00:19.018 # I/O error trying to sync with MASTER: connection lost
46590:S 09 Sep 19:00:20.102 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 090 19:00 :20.102 * MASTER SLAVE sync started

步驟14：
從節點8371重新開始同步，連線失敗，主節點8373的連線數滿了：
46590:S 09 Sep 19:00:21.103 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:21.103 * MASTER &SLlt-&nc gt; started
46590:S 09 Sep 19:00:21.104 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:21.104 # Error masterreply to PING master: number of clients reached'

步驟15：
從節點8371重新連接主節點8373，第二次開始全量同步：
8371日誌：
46590:S 09 Sep 19:00:49.175 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:49.175 * MASTER 46590:S 09 Sep 19:00:49.175 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:49.176 * Master replireped to PING, lication# can continue... S 09 Sep 19:00:49.179 * Partial resynchronization not possible (no cached master)
46590:S 09 Sep 19:00:49.501 * Full resync from master: d7751c482202020:49.5098: d7751c48 63454
8373日誌：
4255 :M 09 Sep 19:00:49.176 * Slave xx.x.xxx.200:8371 asks for synchronization
4255:M 09 Sep 19:00:49.176 * Full resync requested by ave xx.x. 8371
4255:M 09 Sep 19:00:49.176 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 19:00:49.498 * Background saving started by pid 18413
18413:C 09 Sep 19:01:52.466 * DB saved on disk
18413:C 09 Sep 19:01:52.620 * RDB: 2124 MB of memory used by copy-on-write
4255:M 09 Sep 19:01: 53.186 * Background saving terminated with success

步驟16：

從節點8371同步資料成功，開始載入經記憶體：46590:S 09 Sep 19: 01:53.190 * MASTER SLAVE sync: receiving 2637183250 bytes from master
46590:S 09 Sep 19:04:51.485 * MASTER 09 Sep 19:05:58.695 * MASTER SLAVE sync: Loading DB in memory

步驟17：

叢集恢復正常：42645 :M 09 Sep 19:05:58.786 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: slave is reachable again.

##1181821847個資料步驟

##1182182：2182：2182：2183個節點2
##1183，資料。耗時7分鐘：
46590:S 09 Sep 19:08:19.303 * MASTER SLAVE sync: Finished with success

8371失聯原因分析：

由於幾台機器在同一個機架，不太可能發生網路中斷的情況，於是透過SLOWLOG GET指令查看了慢查詢日誌，發現有一個KEYS指令被執行了，耗時8.3秒，再查看叢集節點逾時設置，發現是5s(cluster-node-timeout 5000)

出現節點失聯的原因：

客戶端執行了耗時1條8.3s的指令，

2016/9/9 18:57:43 開始執行KEYS指令
2016/9/9 18:57:50 8371被判斷失聯（redis日誌）
2016/9/9 18:57:51 執行完KEYS指令

總結來說，有以下幾個問題：

#1.由於cluster-node-timeout設定比較短，慢查詢KEYS導致了叢集判斷節點8371失聯

2.由於8371失聯，導致8373升級為主，開始主從同步

3.由於配置client-output-buffer-limit的限制，導致第一次全量同步失敗了

4.又由於PHP客戶端的連線池有問題，瘋狂連線伺服器，產生了類似SYN攻擊的效果

5.第一次全量同步失敗後，從節點重連主節點花了30秒（超過了最大連接數1w）

#關於client-output-buffer-limit參數：

# The syntax of every client-output-buffer-limit directive is the following: 
# 
# client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds> 
# 
# A client is immediately disconnected once the hard limit is reached, or if 
# the soft limit is reached and remains reached for the specified number of 
# seconds (continuously). 
# So for instance if the hard limit is 32 megabytes and the soft limit is 
# 16 megabytes / 10 seconds, the client will get disconnected immediately 
# if the size of the output buffers reach 32 megabytes, but will also get 
# disconnected if the client reaches 16 megabytes and continuously overcomes 
# the limit for 10 seconds. 
# 
# By default normal clients are not limited because they don&#39;t receive data 
# without asking (in a push way), but just after a request, so only 
# asynchronous clients may create a scenario where data is requested faster 
# than it can read. 
# 
# Instead there is a default limit for pubsub and slave clients, since 
# subscribers and slaves receive data in a push fashion. 
# 
# Both the hard or the soft limit can be disabled by setting them to zero. 
client-output-buffer-limit normal 0 0 0 
client-output-buffer-limit slave 256mb 64mb 60 
client-output-buffer-limit pubsub 32mb 8mb 60

#採取措施：

##1.單一實例的切割到4G以下，否則發生主從切換會耗時很長

2.調整client-output-buffer-limit參數，防止同步進行到一半失敗

3.調整cluster-node-timeout，不能少於15s

4.禁止任何耗時超過cluster-node-timeout的慢查詢，因為會導致主從切換

###5.修復客戶端類似SYN攻擊的瘋狂連線方式######相關建議：###########Redis群集搭建全記錄############詳解redis群集規範知識#######

redis集群實戰

以上是關於Redis集群故障的分析的詳細內容。更多資訊請關注PHP中文網其他相關文章！

陳述

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

絕對會話超時有什麼區別？May 03, 2025 am 12:21 AM

絕對會話超時從會話創建時開始計時，閒置會話超時則從用戶無操作時開始計時。絕對會話超時適用於需要嚴格控制會話生命週期的場景，如金融應用；閒置會話超時適合希望用戶長時間保持會話活躍的應用，如社交媒體。

如果會話在服務器上不起作用，您將採取什麼步驟？May 03, 2025 am 12:19 AM

服務器會話失效可以通過以下步驟解決：1.檢查服務器配置，確保會話設置正確。 2.驗證客戶端cookies，確認瀏覽器支持並正確發送。 3.檢查會話存儲服務，如Redis，確保其正常運行。 4.審查應用代碼，確保會話邏輯正確。通過這些步驟，可以有效診斷和修復會話問題，提升用戶體驗。

session_start（）函數的意義是什麼？May 03, 2025 am 12:18 AM

session_start（）iscucialinphpformanagingusersessions.1）ItInitiateSanewsessionifnoneexists，2）resumesanexistingsessions，and3）setsasesessionCookieforContinuityActinuityAccontinuityAcconActInityAcconActInityAcconAccRequests，EnablingApplicationsApplicationsLikeUseAppericationLikeUseAthenticationalticationaltication and PersersonalizedContentent。

為會話cookie設置httponly標誌的重要性是什麼？May 03, 2025 am 12:10 AM

設置httponly標誌對會話cookie至關重要，因為它能有效防止XSS攻擊，保護用戶會話信息。具體來說，1）httponly標誌阻止JavaScript訪問cookie，2）在PHP和Flask中可以通過setcookie和make_response設置該標誌，3）儘管不能防範所有攻擊，但應作為整體安全策略的一部分。

PHP會議在網絡開發中解決了什麼問題？May 03, 2025 am 12:02 AM

phpsessions solvathepromblymaintainingStateAcrossMultipleHttpRequestsbyStoringDataTaNthEserVerAndAssociatingItwithaIniquesestionId.1）他們儲存了AtoredAtaserver side，通常是Infilesordatabases，InseasessessionIdStoreDistordStoredStoredStoredStoredStoredStoredStoreDoreToreTeReTrestaa.2）

可以在PHP會話中存儲哪些數據？May 02, 2025 am 12:17 AM

phpsessionscanStorestrings，數字，數組和原始物。

您如何開始PHP會話？May 02, 2025 am 12:16 AM

tostartaphpsession，usesesses_start（）attheScript'Sbeginning.1）placeitbeforeanyOutputtosetThesessionCookie.2）useSessionsforuserDatalikeloginstatusorshoppingcarts.3）regenerateSessiveIdStopreventFentfixationAttacks.s.4）考慮使用AttActAcks.s.s.4）