MHA::MasterRotate::main() -> do_master_online_switch: Phase 1: Configuration Check Phase -> identify_orig_master connect_all_and_read_server_status: connect_check:首先进行connect check,确保各个server的MySQL服务都正常 connect_and_get_status:获取MySQL实例的server_id/mysql_version/log_bin..等信息 这一步还有一个重要的作用,是获取当前的master节点。通过执行show slave status, 如果输出为空,说明当前节点是master节点。 validate_current_master:取得master节点的信息,并判断配置的正确性 check是否有server down,若有则退出rotate check master alive or not,若dead则退出rotate check_repl_priv: 查看用户是否有replication的权限 获取monitor_advisory_lock,以保证当前没有其他的monitor进程在master上运行 执行:SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', ?) AS Value 获取failover_advisory_lock,以保证当前没有其他的failover进程在slave上运行 执行:SELECT GET_LOCK('MHA_Master_High_Availability_Failover', ?) AS Value check_replication_health: 执行:SHOW SLAVE STATUS来判断如下状态:current_slave_position/has_replication_problem 其中,has_replication_problem具体check如下内容:IO线程/SQL线程/Seconds_Behind_Master(1s) get_running_update_threads: 使用show processlist来查询当前有没有执行update的线程存在,若有则退出switch -> identify_new_master set_latest_slaves:当前的slave节点都是latest slave select_new_master:选出新的master节点 If preferred node is specified, one of active preferred nodes will be new master. If the latest server behinds too much (i.e. stopping sql thread for online backups), we should not use it as a new master, we should fetch relay log there. Even though preferred master is configured, it does not become a master if it's far behind. get_candidate_masters: 就是配置文件中配置了candidate_master>0的节点 get_bad_candidate_masters: # The following servers can not be master: # - dead servers # - Set no_master in conf files (i.e. DR servers) # - log_bin is disabled # - Major version is not the oldest # - too much replication delay(slave与master的binlog position差距大于100000000) Searching from candidate_master slaves which have received the latest relay log events if NOT FOUND: Searching from all candidate_master slaves if NOT FOUND: Searching from all slaves which have received the latest relay log events if NOT FOUND: Searching from all slaves
Phase 2: Rejecting updates Phase reject_update:lock table来reject write binlog 如果MHA的配置文件中设置了"master_ip_online_change_script"参数,则执行该脚本来disable writes on the current master 该脚本在使用了vip的时候才需要设置 reconnect:确保当前与master的连接正常 lock_all_tables:执行FLUSH TABLES WITH READ LOCK,来lock table check_binlog_stop:连续两次show master status,来判断写binlog是否已经停止