Redis Cluster does not guarantee strong consistency. In some special scenarios, the client may still lose data even if it receives write confirmation.
Scenario 1: Asynchronous replication
- client writes to master B
- master B replies OK
- master B synchronizes to slave B1 B2 B3
B replies to the client without waiting for confirmation from B1 B2 B3. If the master goes down before slave synchronization is completed, one of the slaves will be selected as the master. At this time, the data previously written by the client is lost.
wait
command can enhance data security in this scenario.
wait
will block the current client until the previous write operation is successfully synchronized by the specified number of slaves.
wait
can improve data security, but does not guarantee strong consistency.
Because even if this synchronous replication method is used, there are special situations: a slave that has not completed synchronization is elected as the master.
Scenario 2: Network partition
6 nodesA, B, C, A1, B1, C1
, 3 masters, 3 slaves, and one client ,Z1
.
After the network partition occurred, 2 zones were formed, A, C, A1, B1, C1
and B Z1
.
At this time, Z1 can still write to B. If the partition is restored in a short time, then there is no problem. The cluster continues to work normally, but if time passes, B1 will become the master of the partition where it is located, and the data written by Z1 to B will be lost.
maximum window (maximum time window)
can reduce data loss and control the total number of writes from Z1 to B:
After a certain period of time has passed, most edges of the partition will An election will be held and the slave will become the master. At this time, the master on the minority side of the partition will refuse to receive write requests.
This amount of time is very important and is called node expiration time.
After a master reaches the expiration time, it is considered to be faulty, enters the error state, stops receiving write requests, and can be replaced by a slave.
Summary
Redis Cluster does not guarantee strong consistency, and there are scenarios of data loss:
- Asynchronous replication
Write in the master Successful, but before the slave synchronization is completed, the master crashes, the slave becomes the master, and the data is lost. The
wait
command can be used for synchronous replication, but it cannot completely guarantee that data will not be lost and will affect performance.
- Network Partition
After partitioning, a master continues to receive write requests. After partition recovery, the master may become a slave, and the previously written data will be lost.
You can set the node expiration time to reduce the number of writes received by the master during the partition and reduce the cost of data loss.
Recommended study: "redis tutorial"