搜尋
首頁後端開發php教程關於Redis集群故障的分析

關於Redis集群故障的分析

Dec 14, 2017 pm 03:04 PM
redis分析故障

Redis叢集是一個實現分散式並且允許單點故障的Redis進階版本。 Redis叢集沒有最重要或說中心節點,這個版本最主要的一個目標是設計一個線性可伸縮(可隨意增刪節點)的功能。

本文主要介紹了詳細分析Redis叢集故障的相關內容,希望能幫助大家。

故障表象:

業務層面顯示提示查詢redis失敗

叢集組成:

3主3從,每個節點的資料有8GB

#機器分佈:

在同一個機架中,

xx.x.xxx.199
xx.x.xxx.200
xx.x.xxx.201

redis-server進程狀態:

透過指令ps -eo pid,lstart | grep $pid,

發現進程已經持續執行了3個月

發生故障前叢集的節點狀態:

xx.x.xxx.200:8371(bedab2c537fe94f8c0363ac4ae97d56832316e65) master
xx.x.xxx.199:8373(792020fe66c00ae56e27cd7a048ba6bb2b67adb6) slave
#xx.x.xxx.201:8375(5ab4f85306da6#xx.x.xxx.201:8375(5ab4f85306da6d633e483fx 01:8372(826607654f5ec81c3756a4a21f357e644efe605a) slave
xx.x.xxx.199:8370(462cadcb41e635d460425430d318f2fe464665c5) master
#xx.x.xxx.200:8374(1238085b578390f388da#adsl
--------- ------------------------下面是日誌分析---------------------- ------------------

步驟1:

主節點8371失去和從節點8373的連接:

46590:M 09 Sep 18:57:51.379 # Connection with slave xx.x.xxx.199:8373 lost.


主節點8370/8375判定8371失聯:42645:M 09 Sep 18:57:50.117 * Marking node b編輯##從節點8372/8373/8374收到主節點8375說8371失聯:
46986:S 09 Sep 18:57:50.120 * FAIL message received from 5ab4f85306dad633306dbabout 7fe94f8c0363ac4ae97d56832316e65

步驟4:
主節點8370/8375授權8373升級為主節點轉移:
42645:M 09 Sep 18:57:51.055 # Failover auth granted to 792020fe66c0056e # Failover auth granted to 792020fe66c0056e for 47686276200b^oo​​n^m

##步驟5:

原主節點8371修改自己的配置,成為8373的從節點:46590:M 09 Sep 18:57:51.488 # Configuration change detected. Reconfiguring myself as a replica of 792020fe66c00ae56e27cd7a048ba6bb2b67adb6


#步驟6:

#主節點8370/8375/837308375/17370837221370/8375/8373:F83725: 狀態失敗。 18:57:51.522 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: master without slots is reachable again.


#137313731371374個節點。 ,第一次全量同步資料:

8373日誌::4255:M 09 Sep 18:57:51.906 * Full resync requested by slave xx.x.xxx.200:83714255:M 09 Sep 18:57:51.906 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 18:57:51.941 * Background saving started by pid 5230
8371日誌:##46590:#46 Sep 18:57:51.948 * Full resync from master: d7751c4ebf1e63d3baebea1ed409e0e7243a4423:440721826993


##182683
##175/23

##1753222275:275222223



1475:2312275:23

##132 ##42645:M 09 Sep 18:58:00.320 * Marking node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 as failing (quorum reached).

#11adb6 2#1/

#11382#1382#122#1/##1382#72122#2#122#122


#1382#721222#122

##1382#7/# 8373(新主)恢復:
60295:M 09 Sep 18:58:18.181 * Clear FAIL state for node 792020fe66c00ae56e27cd7a048ba6bb2b67adb6: is# reachable asserving some#.
步驟10:

主節點8373完成全量同步所需的BGSAVE操作:
5230:C 09 Sep 18:59:01.474 * DB saved on disk
5230:C 09 Sep 18:59:01.491 * RDB: 7112 MB of memory used by copy-on-write

4255:M 09 Sep 18:59:01.877 * Background saving terminated with success


步驟11:

從節點8371開始從主節點8373接收到資料:

46590:S 09 Sep 18:59:02.263 * MASTER SLAVE sync: receiving 2657606930 by> master#ync: receiving 2657606930 by>

##步驟12:

主節點8373發現從節點8371對output buffer作了限制:
4255:M 09 Sep 19:00:19.014 # Client id =14259015 addr=xx.x.xxx.200:21772 fd=844 name= age=148 idle=148 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl= 16349 oll=4103 omem=95944066 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
4255:M 09 Sep 19:00:19.0155 #xxx $. 8371 lost.

###

步驟13:
從節點8371從主節點8373同步資料失敗,連線斷了,第一次全量同步失敗:
46590:S 09 Sep 19:00:19.018 # I/O error trying to sync with MASTER: connection lost
46590:S 09 Sep 19:00:20.102 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 090 19:00 :20.102 * MASTER SLAVE sync started

步驟14:
從節點8371重新開始同步,連線失敗,主節點8373的連線數滿了:
46590:S 09 Sep 19:00:21.103 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:21.103 * MASTER &SLlt-&nc gt; started
46590:S 09 Sep 19:00:21.104 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:21.104 # Error masterreply to PING master: number of clients reached'

步驟15:
從節點8371重新連接主節點8373,第二次開始全量同步:
8371日誌:
46590:S 09 Sep 19:00:49.175 * Connecting to MASTER xx.x.xxx.199:8373
46590:S 09 Sep 19:00:49.175 * MASTER 46590:S 09 Sep 19:00:49.175 * Non blocking connect for SYNC fired the event.
46590:S 09 Sep 19:00:49.176 * Master replireped to PING, lication# can continue... S 09 Sep 19:00:49.179 * Partial resynchronization not possible (no cached master)
46590:S 09 Sep 19:00:49.501 * Full resync from master: d7751c482202020:49.5098: d7751c48 63454
8373日誌:
4255 :M 09 Sep 19:00:49.176 * Slave xx.x.xxx.200:8371 asks for synchronization
4255:M 09 Sep 19:00:49.176 * Full resync requested by ave xx.x. 8371
4255:M 09 Sep 19:00:49.176 * Starting BGSAVE for SYNC with target: disk
4255:M 09 Sep 19:00:49.498 * Background saving started by pid 18413
18413:C 09 Sep 19:01:52.466 * DB saved on disk
18413:C 09 Sep 19:01:52.620 * RDB: 2124 MB of memory used by copy-on-write
4255:M 09 Sep 19:01: 53.186 * Background saving terminated with success


步驟16:

從節點8371同步資料成功,開始載入經記憶體:46590:S 09 Sep 19: 01:53.190 * MASTER SLAVE sync: receiving 2637183250 bytes from master
46590:S 09 Sep 19:04:51.485 * MASTER 09 Sep 19:05:58.695 * MASTER SLAVE sync: Loading DB in memory



步驟17:

叢集恢復正常:42645 :M 09 Sep 19:05:58.786 * Clear FAIL state for node bedab2c537fe94f8c0363ac4ae97d56832316e65: slave is reachable again.


##1181821847個資料步驟

##1182182:2182:2182:2183個節點2
##1183,資料。耗時7分鐘:
46590:S 09 Sep 19:08:19.303 * MASTER SLAVE sync: Finished with success

8371失聯原因分析:

由於幾台機器在同一個機架,不太可能發生網路中斷的情況,於是透過SLOWLOG GET指令查看了慢查詢日誌,發現有一個KEYS指令被執行了,耗時8.3秒,再查看叢集節點逾時設置,發現是5s(cluster-node-timeout 5000)

出現節點失聯的原因:

客戶端執行了耗時1條8.3s的指令,

2016/9/9 18:57:43 開始執行KEYS指令
2016/9/9 18:57:50 8371被判斷失聯(redis日誌)
2016/9/9 18:57:51 執行完KEYS指令

總結來說,有以下幾個問題:

#1.由於cluster-node-timeout設定比較短,慢查詢KEYS導致了叢集判斷節點8371失聯

2.由於8371失聯,導致8373升級為主,開始主從同步

3.由於配置client-output-buffer-limit的限制,導致第一次全量同步失敗了

4.又由於PHP客戶端的連線池有問題,瘋狂連線伺服器,產生了類似SYN攻擊的效果

5.第一次全量同步失敗後,從節點重連主節點花了30秒(超過了最大連接數1w)

#關於client-output-buffer-limit參數:

# The syntax of every client-output-buffer-limit directive is the following: 
# 
# client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds> 
# 
# A client is immediately disconnected once the hard limit is reached, or if 
# the soft limit is reached and remains reached for the specified number of 
# seconds (continuously). 
# So for instance if the hard limit is 32 megabytes and the soft limit is 
# 16 megabytes / 10 seconds, the client will get disconnected immediately 
# if the size of the output buffers reach 32 megabytes, but will also get 
# disconnected if the client reaches 16 megabytes and continuously overcomes 
# the limit for 10 seconds. 
# 
# By default normal clients are not limited because they don&#39;t receive data 
# without asking (in a push way), but just after a request, so only 
# asynchronous clients may create a scenario where data is requested faster 
# than it can read. 
# 
# Instead there is a default limit for pubsub and slave clients, since 
# subscribers and slaves receive data in a push fashion. 
# 
# Both the hard or the soft limit can be disabled by setting them to zero. 
client-output-buffer-limit normal 0 0 0 
client-output-buffer-limit slave 256mb 64mb 60 
client-output-buffer-limit pubsub 32mb 8mb 60


#採取措施:





##1.單一實例的切割到4G以下,否則發生主從切換會耗時很長


2.調整client-output-buffer-limit參數,防止同步進行到一半失敗

3.調整cluster-node-timeout,不能少於15s

4.禁止任何耗時超過cluster-node-timeout的慢查詢,因為會導致主從切換

###5.修復客戶端類似SYN攻擊的瘋狂連線方式######相關建議:###########Redis群集搭建全記錄############詳解redis群集規範知識#######

redis集群實戰

#

以上是關於Redis集群故障的分析的詳細內容。更多資訊請關注PHP中文網其他相關文章!

陳述
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
絕對會話超時有什麼區別?絕對會話超時有什麼區別?May 03, 2025 am 12:21 AM

絕對會話超時從會話創建時開始計時,閒置會話超時則從用戶無操作時開始計時。絕對會話超時適用於需要嚴格控制會話生命週期的場景,如金融應用;閒置會話超時適合希望用戶長時間保持會話活躍的應用,如社交媒體。

如果會話在服務器上不起作用,您將採取什麼步驟?如果會話在服務器上不起作用,您將採取什麼步驟?May 03, 2025 am 12:19 AM

服務器會話失效可以通過以下步驟解決:1.檢查服務器配置,確保會話設置正確。 2.驗證客戶端cookies,確認瀏覽器支持並正確發送。 3.檢查會話存儲服務,如Redis,確保其正常運行。 4.審查應用代碼,確保會話邏輯正確。通過這些步驟,可以有效診斷和修復會話問題,提升用戶體驗。

session_start()函數的意義是什麼?session_start()函數的意義是什麼?May 03, 2025 am 12:18 AM

session_start()iscucialinphpformanagingusersessions.1)ItInitiateSanewsessionifnoneexists,2)resumesanexistingsessions,and3)setsasesessionCookieforContinuityActinuityAccontinuityAcconActInityAcconActInityAcconAccRequests,EnablingApplicationsApplicationsLikeUseAppericationLikeUseAthenticationalticationaltication and PersersonalizedContentent。

為會話cookie設置httponly標誌的重要性是什麼?為會話cookie設置httponly標誌的重要性是什麼?May 03, 2025 am 12:10 AM

設置httponly標誌對會話cookie至關重要,因為它能有效防止XSS攻擊,保護用戶會話信息。具體來說,1)httponly標誌阻止JavaScript訪問cookie,2)在PHP和Flask中可以通過setcookie和make_response設置該標誌,3)儘管不能防範所有攻擊,但應作為整體安全策略的一部分。

PHP會議在網絡開發中解決了什麼問題?PHP會議在網絡開發中解決了什麼問題?May 03, 2025 am 12:02 AM

phpsessions solvathepromblymaintainingStateAcrossMultipleHttpRequestsbyStoringDataTaNthEserVerAndAssociatingItwithaIniquesestionId.1)他們儲存了AtoredAtaserver side,通常是Infilesordatabases,InseasessessionIdStoreDistordStoredStoredStoredStoredStoredStoredStoreDoreToreTeReTrestaa.2)

可以在PHP會話中存儲哪些數據?可以在PHP會話中存儲哪些數據?May 02, 2025 am 12:17 AM

phpsessionscanStorestrings,數字,數組和原始物。

您如何開始PHP會話?您如何開始PHP會話?May 02, 2025 am 12:16 AM

tostartaphpsession,usesesses_start()attheScript'Sbeginning.1)placeitbeforeanyOutputtosetThesessionCookie.2)useSessionsforuserDatalikeloginstatusorshoppingcarts.3)regenerateSessiveIdStopreventFentfixationAttacks.s.4)考慮使用AttActAcks.s.s.4)

什麼是會話再生,如何提高安全性?什麼是會話再生,如何提高安全性?May 02, 2025 am 12:15 AM

會話再生是指在用戶進行敏感操作時生成新會話ID並使舊ID失效,以防會話固定攻擊。實現步驟包括:1.檢測敏感操作,2.生成新會話ID,3.銷毀舊會話ID,4.更新用戶端會話信息。

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

Video Face Swap

Video Face Swap

使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱工具

Dreamweaver CS6

Dreamweaver CS6

視覺化網頁開發工具

PhpStorm Mac 版本

PhpStorm Mac 版本

最新(2018.2.1 )專業的PHP整合開發工具

WebStorm Mac版

WebStorm Mac版

好用的JavaScript開發工具

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

Atom編輯器mac版下載

Atom編輯器mac版下載

最受歡迎的的開源編輯器