Home  >  Article  >  Database  >  What is split-brain in MySQL?

What is split-brain in MySQL?

青灯夜游
青灯夜游Original
2022-06-27 10:59:315226browse

In MySQL, split-brain means that in a high availability (HA) system, when two connected nodes are disconnected, the system that was originally a whole is split into two independent nodes. At this time, the two nodes begin to compete for shared resources, resulting in system chaos and data corruption. For the HA system of stateless services, it doesn't matter whether it is split-brain or not; but for the HA of stateful services (such as MySQL), split-brain must be strictly prevented.

What is split-brain in MySQL?

The operating environment of this tutorial: windows7 system, mysql8 version, Dell G3 computer.

Split-brain

refers to a high availability (HA) system, when the two connected nodes are disconnected. When the connection is opened, the system that was originally a whole is split into two independent nodes. At this time, the two nodes begin to compete for shared resources, which will lead to system chaos and data damage.

In a high-availability cluster environment there is an active node and one or more standby nodes that take over the service when the active node fails or stops responding.

This sounds like a reasonable assumption until you consider the network layers between nodes. What if the network path between nodes fails?

Neither node can now communicate with the other node, in which case the standby server may promote itself to the active server on the basis that it believes the active node has failed. This causes both nodes to become "alive" because each node will think the other node is dead. As a result, data integrity and consistency are compromised because data changes on both nodes. This is called "split-brain".

For HA of stateless services, it doesn't matter whether it is split-brain or not; but for HA of stateful services (such as MySQL), split-brain must be strictly prevented . (But some systems in production environments configure stateful services according to the stateless service HA set, and the results can be imagined...)

How to prevent HA cluster split-brain

Generally use 2 methods 1) Arbitration When two nodes disagree, the arbiter of the third party decides whose decision to listen to. This arbiter may be a lock service, a shared disk or something else.

2) fencing When the status of a node cannot be determined, kill the other node through fencing to ensure that shared resources are completely released. The premise is that there must be reliable fence equipment.

Ideally, neither of the above should be missing. However, if the node does not use shared resources, such as database HA based on master-slave replication, the fence device can be safely omitted and only the quorum is retained. And many times there are no fence devices available in our environment, such as in cloud hosts.

So can we omit arbitration and only keep the fence device? Can't. Because when two nodes lose contact with each other, they will fencing each other at the same time. If the fencing method is reboot, then the two machines will restart continuously. If the fencing method is power off, then the outcome may be that two nodes die together, or one may survive. But if the reason why two nodes lose contact with each other is that one of the nodes has a network card failure, and the one that survives happens to be the faulty node, then the ending will be tragic. Therefore, a simple double node cannot prevent split brain in any case.

How to implement the above strategy

You can implement a set of scripts that conform to the above logic from scratch. It is recommended to use mature cluster software to build, such as Pacemaker Corosync and appropriate resource agents. Keepalived is not suitable for HA of stateful services. Even if arbitration and fences are added to the solution, it always feels awkward.

There are also some precautions when using Pacemaker Corosync. 1) Understand the functions and principles of resource agents Only by understanding the functions and principles of Resource Agent can you know the scenarios in which it is applicable. For example, the resource agent of pgsql is relatively complete, supports synchronous and asynchronous stream replication, and can automatically switch between the two, and can ensure that data will not be lost during synchronous replication. But the current resource agent of MySQL is very weak. Without GTID and without log compensation, it is easy to lose data. It is better not to use it and continue to use MHA (but be sure to guard against split-brain when deploying MHA).

2) Ensure the legal number of votes (quorum) Quorum can be thought of as Pacemkaer's own arbitration mechanism. A majority of all nodes in the cluster selects a coordinator, and all instructions in the cluster are issued by this coordinator, which can perfectly eliminate the split-brain problem. In order for this mechanism to work effectively, there must be at least 3 nodes in the cluster, and no-quorum-policy is set to stop, which is also the default value. (Many tutorials set no-quorum-policy to ignore for the convenience of demonstration. If the production environment also does this and there is no other arbitration mechanism, it is very dangerous!)

However, if there are only 2 nodes what to do?

  • 一是拉一个机子借用一下凑足3个节点,再设置location限制,不让资源分配到那个节点上。
  • 二是把多个不满足quorum小集群拉到一起,组成一个大的集群,同样适用location限制控制资源的分配的位置。

但是如果你有很多双节点集群,找不到那么多用于凑数的节点,又不想把这些双节点集群拉到一起凑成一个大的集群(比如觉得不方便管理)。那么可以考虑第三种方法。 第三种方法是配置一个抢占资源,以及服务和这个抢占资源的colocation约束,谁抢到抢占资源谁提供服务。这个抢占资源可以是某个锁服务,比如基于zookeeper包装一个,或者干脆自己从头做一个,就像下面这个例子。这个例子是基于http协议的短连接,更细致的做法是使用长连接心跳检测,这样服务端可以及时检出连接断开而释放锁)但是,一定要同时确保这个抢占资源的高可用,可以把提供抢占资源的服务做成lingyig高可用的,也可以简单点,部署3个服务,双节点上个部署一个,第三个部署在另外一个专门的仲裁节点上,至少获取3个锁中的2个才视为取得了锁。这个仲裁节点可以为很多集群提供仲裁服务(因为一个机器只能部署一个Pacemaker实例,否则可以用部署了N个Pacemaker实例的仲裁节点做同样的事情。但是,如非迫不得已,尽量还是采用前面的方法,即满足Pacemaker法定票数,这种方法更简单,可靠。

--------------------------------------------------------------keepalived的脑裂问题----------------------------------------------

1)解决keepalived脑裂问题

检测思路:正常情况下keepalived的VIP地址是在主节点上的,如果在从节点发现了VIP,就设置报警信息。脚本(在从节点上)如下:

[root@slave-ha ~]# vim split-brainc_check.sh
#!/bin/bash
# 检查脑裂的脚本,在备节点上进行部署
LB01_VIP=192.168.1.229
LB01_IP=192.168.1.129
LB02_IP=192.168.1.130
while true
do
  ping -c 2 -W 3 $LB01_VIP &>/dev/null
    if [ $? -eq 0 -a `ip add|grep "$LB01_VIP"|wc -l` -eq 1 ];then
        echo "ha is brain."
    else
        echo "ha is ok"
    fi
    sleep 5
done

执行结果如下:
[root@slave-ha ~]# bash check_split_brain.sh 
ha is ok
ha is ok
ha is ok
ha is ok
当发现异常时候的执行结果:
[root@slave-ha ~]# bash check_split_brain.sh 
ha is ok
ha is ok
ha is ok
ha is ok
ha is brain.
ha is brain.

2)曾经碰到的一个keepalived脑裂的问题(如果启用了iptables,不设置"系统接收VRRP协议"的规则,就会出现脑裂)

曾经在做keepalived+Nginx主备架构的环境时,当重启了备用机器后,发现两台机器都拿到了VIP。这也就是意味着出现了keepalived的脑裂现象,检查了两台主机的网络连通状态,发现网络是好的。然后在备机上抓包:

[root@localhost ~]#  tcpdump -i eth0|grep VRRP  
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode  
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes  
22:10:17.146322 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:17.146577 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:17.146972 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:18.147136 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:18.147576 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:25.151399 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:25.151942 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:26.151703 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:26.152623 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:27.152456 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:27.153261 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:28.152955 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:28.153461 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:29.153766 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:29.155652 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:30.154275 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:30.154587 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:31.155042 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:31.155428 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:32.155539 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:32.155986 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:33.156357 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:33.156979 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  
22:10:34.156801 IP 192.168.1.96 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 50, authtype simple, intvl 1s, length 20  
22:10:34.156989 IP 192.168.1.54 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 51, prio 160, authtype simple, intvl 1s, length 20  

备机能接收到master发过来的VRRP广播,那为什么还会有脑裂现象?
接着发现重启后iptables开启着,检查了防火墙配置。发现系统不接收VRRP协议。
于是修改iptables,添加允许系统接收VRRP协议的配置:
-A INPUT -i lo -j ACCEPT   
-----------------------------------------------------------------------------------------
我自己添加了下面的iptables规则:
-A INPUT -s 192.168.1.0/24 -d 224.0.0.18 -j ACCEPT       #允许组播地址通信
-A INPUT -s 192.168.1.0/24 -p vrrp -j ACCEPT             #允许VRRP(虚拟路由器冗余协)通信
-----------------------------------------------------------------------------------------

最后重启iptables,发现备机上的VIP没了。
虽然问题解决了,但备机明明能抓到master发来的VRRP广播包,却无法改变自身状态。只能说明网卡接收到数据包是在iptables处理数据包之前发生的事情。

3)预防keepalived脑裂问题     

1)可以采用第三方仲裁的方法。由于keepalived体系中主备两台机器所处的状态与对方有关。如果主备机器之间的通信出了网题,就会发生脑裂,此时keepalived体系中会出现双主的情况,产生资源竞争。      2)一般可以引入仲裁来解决这个问题,即每个节点必须判断自身的状态。最简单的一种操作方法是,在主备的keepalived的配置文件中增加check配置,服务器周期性地ping一下网关,如果ping不通则认为自身有问题 。     3)最容易的是借助keepalived提供的vrrp_script及track_script实现。如下所示:

# vim /etc/keepalived/keepalived.conf
   ......
   vrrp_script check_local {
    script "/root/check_gateway.sh" 
    interval 5
    }
   ......

   track_script {     
   check_local                   
   }

   脚本内容:
   # cat /root/check_gateway.sh
   #!/bin/sh
   VIP=$1
   GATEWAY=192.168.1.1 
   /sbin/arping -I em1 -c 5 -s $VIP $GATEWAY &>/dev/null

   check_gateway.sh 就是我们的仲裁逻辑,发现ping不通网关,则关闭keepalived service keepalived stop。

4)推荐自己写脚本

写一个while循环,每轮ping网关,累计连续失败的次数,当连续失败达到一定次数则运行service keepalived stop关闭keepalived服务。如果发现又能够ping通网关,再重启keepalived服务。最后在脚本开头再加上脚本是否已经运行的判断逻辑,将该脚本加到crontab里面。

【相关推荐:mysql视频教程

The above is the detailed content of What is split-brain in MySQL?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn