Maison > Article > base de données > Principe de commutation entre le basculement automatique MHA et le basculement manuel
Informations sur le cluster
Rôle 192.168.244.10 1 1 Écrire
Candidat maître 192.168 .244.20 2 Lire
S lave 192.168.244.30 3 Lire
Esclave ' ' ' peut renvoyer à un autre blog :
MySQLDéploiement et principe de la solution haute disponibilité MHA
Afin de juger clairement le principe de mise en œuvre du basculement automatique MHA via les journaux MHA, le scénario suivant doit être simulé : Lorsque la bibliothèque principale tombe en panne, il reste encore des journaux binlog dans le maître qui ne sont pas disponibles Transmis au maître candidat et Esclave, et il y a plus de journaux binaires sur l'esclave que sur le maître candidat. J'ai essayé plusieurs solutions et j'ai finalement obtenu ce que je voulais.
Option 1 :
1. Fermez la réplication maître-esclave du candidat maître et de l'esclave.
2. Générez des données de test via
procédure stockée
3. Activez la réplication maître-esclave du maître candidat et de l'esclave et tuez le processus mysqld du maître pour simuler une défaillance dans le master database, effectuez une opération de basculement automatique
À cet effet, j'ai également écrit un script Malheureusement, l'effet n'est pas idéal Pendant le processus de basculement automatique, le message "SSH n'est PAS rchaquecapable ».
Option 2 : Dommage que l'effet soit le même que le Plan 1. Option 3 : En réplication maître-esclave, fermez directement la base de données mysqld du maître. Cependant, de cette manière, l'effet d'avoir plus de logs binaires sur l'esclave que sur le maître candidat ne sera pas obtenu. Option 4 : En interceptant le journal du relais et en éteignant l'esclave, modifiez les points de localisation de master.info et relay-log.info pour créer artificiellement le Le volume du journal du candidat maître est inférieur à celui de l'esclave. En y réfléchissant après, en fait, cette méthode équivaut à d'abord fermer la réplication maître-esclave du maître candidat, puis effectuer une opération sur le maître, puis fermer la réplication maître-esclave sur l'esclave, puis effectuer une opération sur le maître. De cette façon, le journal binaire du maître > le journal de relais de l'esclave > L'option 4 a finalement obtenu ce qu'elle voulait Jetons un coup d'œil au principe de mise en œuvre du MHA à travers l'option 4 1. Créez un tableau de test, et insérez les données de test
mysql> create table sbtest.b(id int,name varchar(10)); Query OK, 0 rows affected (0.12 sec) mysql> insert into sbtest.b values(1,'a'); Query OK, 1 row affected (0.00 sec) mysql> insert into sbtest.b values(2,'b'); Query OK, 1 row affected (0.01 sec) mysql> insert into sbtest.b values(3,'c'); Query OK, 1 row affected (0.00 sec) mysql> insert into sbtest.b values(4,'d'); Query OK, 1 row affected (0.00 sec) mysql> insert into sbtest.b values(5,'e'); Query OK, 1 row affected (0.01 sec)
2 . Vérifiez le contenu du binlog sur le maître et le contenu du journal relais sur l'esclave
Esclave
mysql> show binlog events;+------------------+------+-------------+-----------+-------------+------------------------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+------------------+------+-------------+-----------+-------------+------------------------------------------------+| mysql-bin.000001 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.31-log, Binlog ver: 4 || mysql-bin.000001 | 120 | Query | 1 | 238 | create table sbtest.b(id int,name varchar(10)) || mysql-bin.000001 | 238 | Query | 1 | 315 | BEGIN || mysql-bin.000001 | 315 | Query | 1 | 421 | insert into sbtest.b values(1,'a') || mysql-bin.000001 | 421 | Xid | 1 | 452 | COMMIT /* xid=102 */ || mysql-bin.000001 | 452 | Query | 1 | 529 | BEGIN || mysql-bin.000001 | 529 | Query | 1 | 635 | insert into sbtest.b values(2,'b') || mysql-bin.000001 | 635 | Xid | 1 | 666 | COMMIT /* xid=103 */ || mysql-bin.000001 | 666 | Query | 1 | 743 | BEGIN || mysql-bin.000001 | 743 | Query | 1 | 849 | insert into sbtest.b values(3,'c') || mysql-bin.000001 | 849 | Xid | 1 | 880 | COMMIT /* xid=104 */ || mysql-bin.000001 | 880 | Query | 1 | 957 | BEGIN || mysql-bin.000001 | 957 | Query | 1 | 1063 | insert into sbtest.b values(4,'d') || mysql-bin.000001 | 1063 | Xid | 1 | 1094 | COMMIT /* xid=105 */ || mysql-bin.000001 | 1094 | Query | 1 | 1171 | BEGIN || mysql-bin.000001 | 1171 | Query | 1 | 1277 | insert into sbtest.b values(5,'e') || mysql-bin.000001 | 1277 | Xid | 1 | 1308 | COMMIT /* xid=106 */ |+------------------+------+-------------+-----------+-------------+------------------------------------------------+17 rows in set (0.01 sec)
En comparant le binlog événement dans le maître, vous pouvez voir le End_log_pos dans show relaylog events Il fait en fait référence à la position de l'événement binaire
mysql> show relaylog events in 'mysqld-relay-bin.000002';+-------------------------+------+-------------+-----------+-------------+------------------------------------------------+| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |+-------------------------+------+-------------+-----------+-------------+------------------------------------------------+| mysqld-relay-bin.000002 | 4 | Format_desc | 2 | 120 | Server ver: 5.6.31-log, Binlog ver: 4 || mysqld-relay-bin.000002 | 120 | Rotate | 1 | 0 | mysql-bin.000001;pos=120 || mysqld-relay-bin.000002 | 167 | Format_desc | 1 | 0 | Server ver: 5.6.31-log, Binlog ver: 4 || mysqld-relay-bin.000002 | 283 | Query | 1 | 238 | create table sbtest.b(id int,name varchar(10)) || mysqld-relay-bin.000002 | 401 | Query | 1 | 315 | BEGIN || mysqld-relay-bin.000002 | 478 | Query | 1 | 421 | insert into sbtest.b values(1,'a') || mysqld-relay-bin.000002 | 584 | Xid | 1 | 452 | COMMIT /* xid=102 */ || mysqld-relay-bin.000002 | 615 | Query | 1 | 529 | BEGIN || mysqld-relay-bin.000002 | 692 | Query | 1 | 635 | insert into sbtest.b values(2,'b') || mysqld-relay-bin.000002 | 798 | Xid | 1 | 666 | COMMIT /* xid=103 */ || mysqld-relay-bin.000002 | 829 | Query | 1 | 743 | BEGIN || mysqld-relay-bin.000002 | 906 | Query | 1 | 849 | insert into sbtest.b values(3,'c') || mysqld-relay-bin.000002 | 1012 | Xid | 1 | 880 | COMMIT /* xid=104 */ || mysqld-relay-bin.000002 | 1043 | Query | 1 | 957 | BEGIN || mysqld-relay-bin.000002 | 1120 | Query | 1 | 1063 | insert into sbtest.b values(4,'d') || mysqld-relay-bin.000002 | 1226 | Xid | 1 | 1094 | COMMIT /* xid=105 */ || mysqld-relay-bin.000002 | 1257 | Query | 1 | 1171 | BEGIN || mysqld-relay-bin.000002 | 1334 | Query | 1 | 1277 | insert into sbtest.b values(5,'e') || mysqld-relay-bin.000002 | 1440 | Xid | 1 | 1308 | COMMIT /* xid=106 */ |+-------------------------+------+-------------+-----------+-------------+------------------------------------------------+19 rows in set (0.00 sec)
Vérifions le contenu du log de relais correspondant dans Candidat maître
Afficher le code
[root@node2 mysql]# mysqlbinlog mysqld-relay-bin.000002/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;/*!40019 SET @@session.max_insert_delayed_threads=0*/;/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4#170524 17:16:37 server id 2 end_log_pos 120 CRC32 0x4faba9ae Start: binlog v 4, server v 5.6.31-log created 170524 17:16:37BINLOG 'dU8lWQ8CAAAAdAAAAHgAAABAAAQANS42LjMxLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAa6p q08='/*!*/;# at 120#700101 8:00:00 server id 1 end_log_pos 0 CRC32 0x74c6d70c Rotate to mysql-bin.000001 pos: 120# at 167#170524 17:15:49 server id 1 end_log_pos 0 CRC32 0xed2672eb Start: binlog v 4, server v 5.6.31-log created 170524 17:15:49BINLOG 'RU8lWQ8BAAAAdAAAAAAAAAAAAAQANS42LjMxLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAety Ju0='/*!*/;# at 283#170524 17:17:20 server id 1 end_log_pos 238 CRC32 0xdd48c118 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617440/*!*/; SET @@session.pseudo_thread_id=2/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=1075838976/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;/*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; create table sbtest.b(id int,name varchar(10))/*!*/; # at 401#170524 17:17:27 server id 1 end_log_pos 315 CRC32 0xae393750 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617447/*!*/; BEGIN/*!*/; # at 478#170524 17:17:27 server id 1 end_log_pos 421 CRC32 0x28a781ae Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617447/*!*/; insert into sbtest.b values(1,'a')/*!*/; # at 584#170524 17:17:27 server id 1 end_log_pos 452 CRC32 0x680f1bfe Xid = 29COMMIT/*!*/; # at 615#170524 17:17:33 server id 1 end_log_pos 529 CRC32 0x6a1aae7e Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617453/*!*/; BEGIN/*!*/; # at 692#170524 17:17:33 server id 1 end_log_pos 635 CRC32 0x117786ca Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617453/*!*/; insert into sbtest.b values(2,'b')/*!*/; # at 798#170524 17:17:33 server id 1 end_log_pos 666 CRC32 0xa8400ec6 Xid = 30COMMIT/*!*/; # at 829#170524 17:17:38 server id 1 end_log_pos 743 CRC32 0x24f9a1d2 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617458/*!*/; BEGIN/*!*/; # at 906#170524 17:17:38 server id 1 end_log_pos 849 CRC32 0x56fa9e89 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617458/*!*/; insert into sbtest.b values(3,'c')/*!*/; # at 1012#170524 17:17:38 server id 1 end_log_pos 880 CRC32 0x2ac656d4 Xid = 31COMMIT/*!*/; # at 1043#170524 17:17:44 server id 1 end_log_pos 957 CRC32 0x73a903bf Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617464/*!*/; BEGIN/*!*/; # at 1120#170524 17:17:44 server id 1 end_log_pos 1063 CRC32 0x171b9b27 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617464/*!*/; insert into sbtest.b values(4,'d')/*!*/; # at 1226#170524 17:17:44 server id 1 end_log_pos 1094 CRC32 0x47d6fe57 Xid = 32COMMIT/*!*/; # at 1257#170524 17:17:49 server id 1 end_log_pos 1171 CRC32 0x2d37da37 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617469/*!*/; BEGIN/*!*/; # at 1334#170524 17:17:49 server id 1 end_log_pos 1277 CRC32 0xd2201fa2 Query thread_id=2 exec_time=0 error_code=0SET TIMESTAMP=1495617469/*!*/; insert into sbtest.b values(5,'e')/*!*/; # at 1440#170524 17:17:49 server id 1 end_log_pos 1308 CRC32 0xac1b464e Xid = 33COMMIT/*!*/; DELIMITER ; # End of log fileROLLBACK /* added by mysqlbinlog */;/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
mysql中binlog有个有意思的地方是,位置点其实是也是字节的大小。
譬如,上面这个relay log中,最后一个位点是# at 1440,算上最后一个commit操作需占用31个字节,所以整个文件的大小是1471,与实际大小吻合。
[root@node2 mysql]# ll mysqld-relay-bin.000002-rw-rw---- 1 mysql mysql 1471 May 24 17:17 mysqld-relay-bin.000002
3. 通过show slave status查看io thread和sql thread的位置信息
mysql> show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.244.10 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 1308 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 1471 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1308 Relay_Log_Space: 1645 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 2a6365e0-1d05-11e7-956d-000c29c64704 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 01 row in set (0.00 sec)
待会儿需要修改上面Master_Log_File,Read_Master_Log_Pos,Relay_Log_File,Relay_Log_Pos,Relay_Master_Log_File,Exec_Master_Log_Pos的值。
虽然这几个参数的值与master.info和relay-log.info文件是相对应的,
但通过修改master.info和relay-log.info的值,并重启slave,并不会将上述几个参数值修改。
上述几个参数是保存到内存中的,唯一可行的方案是首先关闭slave实例,再修改master.info和relay-log.info文件,然后重新启动mysql实例。
4. 关闭Candicate master实例,剪裁relay log,修改master.info和relay-log.info文件。
关闭实例
[root@node2 mysql]# service mysqld stop
剪裁relay log
这里,写了个python脚本实现该功能
#!/usr/bin/pythonf1 = open('mysqld-relay-bin.000002','r') f2 = open('tmp_relay_bin','w+') size1=f1.read(615)f2.write(size1) f1.seek(1471) size2=f1.read() f2.write(size2) f1.close() f2.close()
在上述脚本size1中,615对应的是insert into sbtest.b values(1,'a')这条记录
[root@node2 mysql]# python 1.py
[root@node2 mysql]# mv tmp_relay_bin mysqld-relay-bin.000002
修改master.info的内容
主要是修改第三行
23mysql-bin.0000011308
修改为
23mysql-bin.000001452
修改relay-log.info的内容
原文件如下:
7./mysqld-relay-bin.0000021471mysql-bin.0000011308001
修改为:
7./mysqld-relay-bin.000002615mysql-bin.000001452001
启动slave,注意,配置文件中必须设置skip-slave-start,不然它自动开启主从复制。
[root@node2 mysql]# service mysqld start
5. 对于slave,同样如此处理,只不过relay log的位置点要靠后些
#!/usr/bin/pythonf1 = open('mysqld-relay-bin.000002','r') f2 = open('tmp_relay_bin','w+') size1=f1.read(1043) f2.write(size1) f1.seek(1471) size2=f1.read() f2.write(size2) f1.close() f2.close()
1043对应的是insert into sbtest.b values(3,'c')这条记录
修改master.info的内容
23mysql-bin.000001 880
修改relay-log.info的内容
7./mysqld-relay-bin.000002 1043mysql-bin.000001 8800 01
6. 开启MHA监控
# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null >
/masterha/app1/manager.log 2>&1 &
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:1615) is running(0:PING_OK), master:192.168.244.10
7. 关闭master实例
# service mysqld stop
8. 查看MHA切换日志,了解整个切换过程。
该日志的时间点与上面relay log的时间点并不吻合,原因在于这个反复测试了很多次。
Sun May 21 20:50:46 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 20:50:46 2017 - [warning] Connection failed 1 time(s).. Sun May 21 20:50:46 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 --master_user=monitor --master_password=monitor123 --ping_type=SELECT Sun May 21 20:50:46 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin Monitoring server 192.168.244.20 is reachable, Master is not reachable from 192.168.244.20. OK. Sun May 21 20:50:46 2017 - [info] HealthCheck: SSH to 192.168.244.10 is reachable. Monitoring server 192.168.244.30 is reachable, Master is not reachable from 192.168.244.30. OK.-- 当monitor检测到master mysqld不可用的时候,即根据masterha_secondary_check脚本从Candicate master和Slave上判断master mysqld的可用性, 根据上面的显示信息,通过192.168.244.20和192.168.244.30也判断到master mysqld不可用。 如果任意一个slave判断到master mysqld可用,则输出的信息如下: # /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 Master is reachable from 192.168.244.20! Sun May 21 20:50:46 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start. Sun May 21 20:50:47 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 20:50:47 2017 - [warning] Connection failed 2 time(s).. Sun May 21 20:50:48 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 20:50:48 2017 - [warning] Connection failed 3 time(s).. Sun May 21 20:50:49 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 20:50:49 2017 - [warning] Connection failed 4 time(s).. Sun May 21 20:50:49 2017 - [warning] Master is not reachable from health checker!Sun May 21 20:50:49 2017 - [warning] Master 192.168.244.10(192.168.244.10:3306) is not reachable!Sun May 21 20:50:49 2017 - [warning] SSH is reachable.-- 一共判断了4次,均判断master mysqld不可用,但是master主机通过ssh还是能登录上去。 如果这里显示的是SSH is NOT reachable,则代表master主机也已经宕机了, 刚开始还以为是通过ssh来判断主机是否宕机,但在之前的测试方案中(具体可见文末),master并没有宕机,这里却显示SSH is NOT reachable. 通过上面的输出才知道是通过save_binary_logs脚本来判断ssh可用性的。 Sun May 21 20:50:49 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Sun May 21 20:50:49 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 21 20:50:49 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun May 21 20:50:49 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun May 21 20:50:50 2017 - [warning] SQL Thread is stopped(no error) on 192.168.244.20(192.168.244.20:3306) Sun May 21 20:50:50 2017 - [warning] SQL Thread is stopped(no error) on 192.168.244.30(192.168.244.30:3306) Sun May 21 20:50:50 2017 - [info] GTID failover mode = 0Sun May 21 20:50:50 2017 - [info] Dead Servers: Sun May 21 20:50:50 2017 - [info] 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:50 2017 - [info] Alive Servers: Sun May 21 20:50:50 2017 - [info] 192.168.244.20(192.168.244.20:3306) Sun May 21 20:50:50 2017 - [info] 192.168.244.30(192.168.244.30:3306) Sun May 21 20:50:50 2017 - [info] Alive Slaves: Sun May 21 20:50:50 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:50 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:50 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 20:50:50 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:50 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:50 2017 - [info] Checking slave configurations.. Sun May 21 20:50:50 2017 - [info] Checking replication filtering settings.. Sun May 21 20:50:50 2017 - [info] Replication filtering check ok. Sun May 21 20:50:50 2017 - [info] Master is down!Sun May 21 20:50:50 2017 - [info] Terminating monitoring script. Sun May 21 20:50:50 2017 - [info] Got exit code 20 (Master dead). Sun May 21 20:50:50 2017 - [info] MHA::MasterFailover version 0.56. Sun May 21 20:50:50 2017 - [info] Starting master failover.-- 读取MHA的配置文件,检查slave的相关配置,比如read_only参数,是否设置了复制的过滤规则 从上面的输出中可以看出,SQL Thread正常停止了并不影响MHA的切换。 Sun May 21 20:50:50 2017 - [info] Sun May 21 20:50:50 2017 - [info] * Phase 1: Configuration Check Phase.. Sun May 21 20:50:50 2017 - [info] Sun May 21 20:50:51 2017 - [warning] SQL Thread is stopped(no error) on 192.168.244.20(192.168.244.20:3306) Sun May 21 20:50:51 2017 - [warning] SQL Thread is stopped(no error) on 192.168.244.30(192.168.244.30:3306) Sun May 21 20:50:51 2017 - [info] GTID failover mode = 0Sun May 21 20:50:51 2017 - [info] Dead Servers: Sun May 21 20:50:51 2017 - [info] 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:51 2017 - [info] Checking master reachability via MySQL(double check)... Sun May 21 20:50:51 2017 - [info] ok. Sun May 21 20:50:51 2017 - [info] Alive Servers: Sun May 21 20:50:51 2017 - [info] 192.168.244.20(192.168.244.20:3306) Sun May 21 20:50:51 2017 - [info] 192.168.244.30(192.168.244.30:3306) Sun May 21 20:50:51 2017 - [info] Alive Slaves: Sun May 21 20:50:51 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:51 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:51 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 20:50:51 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:51 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:51 2017 - [info] Starting SQL thread on 192.168.244.20(192.168.244.20:3306) .. Sun May 21 20:50:51 2017 - [info] done. Sun May 21 20:50:51 2017 - [info] Starting SQL thread on 192.168.244.30(192.168.244.30:3306) .. Sun May 21 20:50:51 2017 - [info] done. Sun May 21 20:50:51 2017 - [info] Starting Non-GTID based failover. Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:51 2017 - [info] ** Phase 1: Configuration Check Phase completed.-- 第一阶段,检查了MHA的配置信息,并再次判断了master的可用性。 第二阶段,关闭dead master。 包括执行摘除master上的vip,同时执行shutdown_script脚本,因为该脚本在配置文件中没有定义,故跳过。 Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:51 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Sun May 21 20:50:51 2017 - [info] Executing master IP deactivation script: Sun May 21 20:50:51 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --command=stopssh --ssh_user=root Disabling the VIP an old master: 192.168.244.10 SIOCSIFFLAGS: Cannot assign requested address Sun May 21 20:50:51 2017 - [info] done. Sun May 21 20:50:51 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun May 21 20:50:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun May 21 20:50:51 2017 - [info] -- 第三阶段 3.1 判断哪个slave的二进制日志是最新的。 通过下面的输出可以看出,所有的slave中,最新的二进制日志位置是mysql-bin.000001:880(通过show slave status中的Master_Log_File, Read_Master_Log_Pos得到) 最旧的二进制日志位置是mysql-bin.000001:452 Sun May 21 20:50:51 2017 - [info] * Phase 3: Master Recovery Phase.. Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:51 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:51 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:880Sun May 21 20:50:51 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun May 21 20:50:51 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:51 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:51 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:452Sun May 21 20:50:51 2017 - [info] Oldest slaves: Sun May 21 20:50:51 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:51 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:51 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 20:50:51 2017 - [info] -- 3.2 保存master的binlog 注意,上面已经判断到slave中最新二进制日志是mysql-bin.000001:880,所以它把该位置后所有二进制日志都拼接起来,并scp到monitor的/masterha/app1目录下。 Sun May 21 20:50:51 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..Sun May 21 20:50:51 2017 - [info] Sun May 21 20:50:52 2017 - [info] Fetching dead master's binary logs..Sun May 21 20:50:52 2017 - [info] Executing command on the dead master 192.168.244.10(192.168.244.10:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=880 --binlog_dir=/var/lib/mysql --output_file=/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000001 pos 880 to mysql-bin.000001 EOF into /tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 120.. ok. Dumping effective binlog data from /var/lib/mysql/mysql-bin.000001 position 880 to tail(1308).. ok. Binlog Checksum enabled Concat succeeded. Sun May 21 20:50:52 2017 - [info] scp from root@192.168.244.10:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog succeeded. Sun May 21 20:50:52 2017 - [info] HealthCheck: SSH to 192.168.244.20 is reachable. Sun May 21 20:50:53 2017 - [info] HealthCheck: SSH to 192.168.244.30 is reachable. Sun May 21 20:50:53 2017 - [info] -- 3.3 选新主阶段 首先判断最新的slave中是否包括最旧的二进制日志(mysql-bin.000001:452)以后的relay log。 接着选新主, 因为192.168.244.20中设置了candidate_master设置了,所以192.168.244.20被指定为新主。 Sun May 21 20:50:53 2017 - [info] * Phase 3.3: Determining New Master Phase.. Sun May 21 20:50:53 2017 - [info] Sun May 21 20:50:53 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun May 21 20:50:53 2017 - [info] Checking whether 192.168.244.30 has relay logs from the oldest position.. Sun May 21 20:50:53 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=880 --target_mlf=mysql-bin.000001 --target_rmlp=452 --server_id=3 --workdir=/tmp --timestamp=20170521205050 --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ : Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003 Fast relay log position search failed. Reading relay logs to find.. Reading mysqld-relay-bin.000003 Binlog Checksum enabled Reading mysqld-relay-bin.000002 Binlog Checksum enabled Master Version is 5.6.31-log Binlog Checksum enabled mysqld-relay-bin.000002 contains master mysql-bin.000001 from position 120Target relay log FOUND!Sun May 21 20:50:53 2017 - [info] OK. 192.168.244.30 has all relay logs. Sun May 21 20:50:53 2017 - [info] Searching new master from slaves.. Sun May 21 20:50:53 2017 - [info] Candidate masters from the configuration file: Sun May 21 20:50:53 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 20:50:53 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 20:50:53 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 20:50:53 2017 - [info] Non-candidate masters: Sun May 21 20:50:53 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sun May 21 20:50:53 2017 - [info] Not found. Sun May 21 20:50:53 2017 - [info] Searching from all candidate_master slaves.. Sun May 21 20:50:53 2017 - [info] New master is 192.168.244.20(192.168.244.20:3306) Sun May 21 20:50:53 2017 - [info] Starting master failover.. Sun May 21 20:50:53 2017 - [info] From:192.168.244.10(192.168.244.10:3306) (current master) +--192.168.244.20(192.168.244.20:3306) +--192.168.244.30(192.168.244.30:3306) To:192.168.244.20(192.168.244.20:3306) (new master) +--192.168.244.30(192.168.244.30:3306) Sun May 21 20:50:53 2017 - [info] -- 3.3 获取新主所需的差异二进制日志,包括两部分 1> 新主和最新的slave之间差异的relay log 2> 保存在MHA Manager上的最新的slave和原master之前差异的binlog 其中,差异的relay log通过如下方式获取: ssh到192.168.244.30上,执行apply_diff_relay_logs获取差异的relay log。将差异的relay log scp到192.168.244.20。 Sun May 21 20:50:53 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sun May 21 20:50:53 2017 - [info] Sun May 21 20:50:53 2017 - [info] Server 192.168.244.20 received relay logs up to: mysql-bin.000001:452Sun May 21 20:50:53 2017 - [info] Need to get diffs from the latest slave(192.168.244.30) up to: mysql-bin.000001:880 (using the latest slave's relay logs)Sun May 21 20:50:53 2017 - [info] Connecting to the latest slave host 192.168.244.30, generating diff relay log files.. Sun May 21 20:50:53 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.244.20 --latest_mlf=mysql-bin.000001 --latest_rmlp=880 --target_mlf=mysql-bin.000001 --target_rmlp=452 --server_id=3 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog --workdir=/tmp --timestamp=20170521205050 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ Sun May 21 20:50:54 2017 - [info] Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000003 Fast relay log position search failed. Reading relay logs to find.. Reading mysqld-relay-bin.000003 Binlog Checksum enabled Reading mysqld-relay-bin.000002 Binlog Checksum enabled Master Version is 5.6.31-log Binlog Checksum enabled mysqld-relay-bin.000002 contains master mysql-bin.000001 from position 120 Target relay log file/position found. start_file:mysqld-relay-bin.000002, start_pos:615. Concat binary/relay logs from mysqld-relay-bin.000002 pos 615 to mysqld-relay-bin.000003 EOF into /tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog .. Binlog Checksum enabled Binlog Checksum enabled Dumping binlog format description event, from position 0 to 283.. ok. Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000002 position 615 to tail(1066).. ok. Dumping binlog head events (rotate events), skipping format description events from /var/lib/mysql/mysqld-relay-bin.000003.. Binlog Checksum enabled dumped up to pos 120. ok. No need to dump effective binlog data from /var/lib/mysql/mysqld-relay-bin.000003 (pos starts 120, filesize 120). Skipping. Binlog Checksum enabled Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog . scp node3:/tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog to root@192.168.244.20(22) succeeded. Sun May 21 20:50:54 2017 - [info] Generating diff files succeeded. Sun May 21 20:50:54 2017 - [info] Sending binlog.. Sun May 21 20:50:54 2017 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog to root@192.168.244.20:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog succeeded. Sun May 21 20:50:54 2017 - [info] -- 3.4 应用从master保存的二进制日志事件 首先,等原来的所有的relay log都应用完。 其次,再通过apply_diff_relay_logs应用差异的relay log,及差异的binlog。 应用完毕后,得到新的master binlog的文件和位置,其它slave可根据该文件和位置来建立主从复制关系。 第三,执行master_ip_failover脚本,执行如下操作 1> 将新主的read_only设置为0 2> 启动vip Sun May 21 20:50:54 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Sun May 21 20:50:54 2017 - [info] Sun May 21 20:50:54 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun May 21 20:50:54 2017 - [info] Starting recovery on 192.168.244.20(192.168.244.20:3306).. Sun May 21 20:50:54 2017 - [info] Generating diffs succeeded. Sun May 21 20:50:54 2017 - [info] Waiting until all relay logs are applied. Sun May 21 20:50:54 2017 - [info] done. Sun May 21 20:50:54 2017 - [info] Getting slave status.. Sun May 21 20:50:54 2017 - [info] This slave(192.168.244.20)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:452). No need to recover from Exec_Master_Log_Pos.Sun May 21 20:50:54 2017 - [info] Connecting to the target slave host 192.168.244.20, running recover script.. Sun May 21 20:50:54 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.244.20 --slave_ip=192.168.244.20 --slave_port=3306 --apply_files=/tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog,/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog --workdir=/tmp --target_version=5.6.31-log --timestamp=20170521205050 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Sun May 21 20:50:55 2017 - [info] Concat all apply files to /tmp/total_binlog_for_192.168.244.20_3306.20170521205050.binlog .. Copying the first binlog file /tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog to /tmp/total_binlog_for_192.168.244.20_3306.20170521205050.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog.. Binlog Checksum enabled dumped up to pos 120. ok. /tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog has effective binlog events from pos 120. Dumping effective binlog data from /tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog position 120 to tail(548).. ok. Concat succeeded. All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.244.20_3306.20170521205050.binlog . MySQL client version is 5.6.31. Using --binary-mode. Applying differential binary/relay log files /tmp/relay_from_read_to_latest_192.168.244.20_3306_20170521205050.binlog,/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog on 192.168.244.20:3306. This may take long time... Applying log files succeeded. Sun May 21 20:50:55 2017 - [info] All relay logs were successfully applied. Sun May 21 20:50:55 2017 - [info] Getting new master's binlog name and position..Sun May 21 20:50:55 2017 - [info] mysql-bin.000002:976Sun May 21 20:50:55 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=976, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun May 21 20:50:55 2017 - [info] Executing master IP activate script: Sun May 21 20:50:55 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' Set read_only=0 on the new master. Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20 Sun May 21 20:50:55 2017 - [info] OK. Sun May 21 20:50:55 2017 - [info] ** Finished master recovery successfully. Sun May 21 20:50:55 2017 - [info] * Phase 3: Master Recovery Phase completed.-- 第四阶段 slave恢复阶段-- 4.1 因为192.168.244.30拥有最新的relay log,所以也没必要获取差异的relay log-- 4.2 开始slave的恢复阶段 1> 将monitor上保存的master上的差异的二进制日志scp到slave上。 2> 应用差异日志。 3> 清除原来的复制关系,并再次执行change master命令建立新的主从同步。 如果有多个slave,则该恢复过程是并行的。 Sun May 21 20:50:55 2017 - [info] Sun May 21 20:50:55 2017 - [info] * Phase 4: Slaves Recovery Phase.. Sun May 21 20:50:55 2017 - [info] Sun May 21 20:50:55 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun May 21 20:50:55 2017 - [info] Sun May 21 20:50:55 2017 - [info] -- Slave diff file generation on host 192.168.244.30(192.168.244.30:3306) started, pid: 4966. Check tmp log /masterha/app1/192.168.244.30_3306_20170521205050.log if it takes time.. Sun May 21 20:50:56 2017 - [info] Sun May 21 20:50:56 2017 - [info] Log messages from 192.168.244.30 ... Sun May 21 20:50:56 2017 - [info] Sun May 21 20:50:55 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun May 21 20:50:56 2017 - [info] End of log messages from 192.168.244.30. Sun May 21 20:50:56 2017 - [info] -- 192.168.244.30(192.168.244.30:3306) has the latest relay log events. Sun May 21 20:50:56 2017 - [info] Generating relay diff files from the latest slave succeeded. Sun May 21 20:50:56 2017 - [info] Sun May 21 20:50:56 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun May 21 20:50:56 2017 - [info] Sun May 21 20:50:56 2017 - [info] -- Slave recovery on host 192.168.244.30(192.168.244.30:3306) started, pid: 4968. Check tmp log /masterha/app1/192.168.244.30_3306_20170521205050.log if it takes time.. Sun May 21 20:50:58 2017 - [info] Sun May 21 20:50:58 2017 - [info] Log messages from 192.168.244.30 ... Sun May 21 20:50:58 2017 - [info] Sun May 21 20:50:56 2017 - [info] Sending binlog.. Sun May 21 20:50:57 2017 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog to root@192.168.244.30:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog succeeded. Sun May 21 20:50:57 2017 - [info] Starting recovery on 192.168.244.30(192.168.244.30:3306).. Sun May 21 20:50:57 2017 - [info] Generating diffs succeeded. Sun May 21 20:50:57 2017 - [info] Waiting until all relay logs are applied. Sun May 21 20:50:57 2017 - [info] done. Sun May 21 20:50:57 2017 - [info] Getting slave status.. Sun May 21 20:50:57 2017 - [info] This slave(192.168.244.30)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:880). No need to recover from Exec_Master_Log_Pos.Sun May 21 20:50:57 2017 - [info] Connecting to the target slave host 192.168.244.30, running recover script.. Sun May 21 20:50:57 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.244.30 --slave_ip=192.168.244.30 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog --workdir=/tmp --target_version=5.6.31-log --timestamp=20170521205050 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Sun May 21 20:50:57 2017 - [info] MySQL client version is 5.6.31. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.244.10_3306_20170521205050.binlog on 192.168.244.30:3306. This may take long time... Applying log files succeeded. Sun May 21 20:50:57 2017 - [info] All relay logs were successfully applied. Sun May 21 20:50:57 2017 - [info] Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 192.168.244.20(192.168.244.20:3306).. Sun May 21 20:50:58 2017 - [info] Executed CHANGE MASTER. Sun May 21 20:50:58 2017 - [info] Slave started. Sun May 21 20:50:58 2017 - [info] End of log messages from 192.168.244.30. Sun May 21 20:50:58 2017 - [info] -- Slave recovery on host 192.168.244.30(192.168.244.30:3306) succeeded. Sun May 21 20:50:58 2017 - [info] All new slave servers recovered successfully.-- 第五阶段 清理阶段 从MHA的配置文件中剔除server1的配置信息 Sun May 21 20:50:58 2017 - [info] Sun May 21 20:50:58 2017 - [info] * Phase 5: New master cleanup phase.. Sun May 21 20:50:58 2017 - [info] Sun May 21 20:50:58 2017 - [info] Resetting slave info on the new master.. Sun May 21 20:50:58 2017 - [info] 192.168.244.20: Resetting slave info succeeded. Sun May 21 20:50:58 2017 - [info] Master failover to 192.168.244.20(192.168.244.20:3306) completed successfully. Sun May 21 20:50:58 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf . Sun May 21 20:50:58 2017 - [info] -- 生成 Failover 报告,发送告警邮件----- Failover Report -----app1: MySQL Master failover 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306) succeeded Master 192.168.244.10(192.168.244.10:3306) is down!Check MHA Manager logs at node4:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.244.10(192.168.244.10:3306) The latest slave 192.168.244.30(192.168.244.30:3306) has all relay logs for recovery. Selected 192.168.244.20(192.168.244.20:3306) as a new master.192.168.244.20(192.168.244.20:3306): OK: Applying all logs succeeded.192.168.244.20(192.168.244.20:3306): OK: Activated master IP address.192.168.244.30(192.168.244.30:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded.192.168.244.30(192.168.244.30:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.244.20(192.168.244.20:3306)192.168.244.20(192.168.244.20:3306): Resetting slave info succeeded. Master failover to 192.168.244.20(192.168.244.20:3306) completed successfully. Sun May 21 20:50:58 2017 - [info] Sending mail.. Unknown option: conf
从上面的日志输出可以看出整个MHA的切换过程
首先,MHA Manager检测到master不可用,则会通过另外两个slave检查master的可用性。一共检测4次。
同时判断MHA Manager到master的ssh可用性。
ssh可用性的判断结果影响后后续切换中的“Phase 3.2: Saving Dead Master's Binlog Phase”
Phase 1: Configuration Check Phase..
检查了MHA的配置信息,并再次判断了master的可用性。
Phase 2: Dead Master Shutdown Phase..
宕机的master处理阶段,包括摘除VIP,执行shutdown_script中定义的脚本。
Phase 3: Master Recovery Phase..
Phase 3.1: Getting Latest Slaves Phase..
判断哪个slave拥有最新的relay log(通过比较show slave status中的Master_Log_File, Read_Master_Log_Pos位置),
哪个slave拥有最旧的relay log
Phase 3.2: Saving Dead Master's Binlog Phase..
根据上面得到的slave的最新位置信息,将差异的二进制日志保存到MHA Manager的指定目录下。
如果在第一步骤中,判断了MHA Manager到master的ssh不可用,则会跳过这个阶段。
Phase 3.3: Determining New Master Phase..
选择新的master
Phase 3.3: New Master Diff Log Generation Phase..
将差异的relay log和master差异日志scp到新的master上。
Phase 3.4: Master Log Apply Phase..
首先,等待slave上已有的relay log都应用完。
其次,通过apply_diff_relay_logs应用差异的relay log,及差异的binlog。
应用完毕后,得到新的master binlog的文件和位置,其它slave可根据该文件和位置来建立主从复制关系。
第三,执行master_ip_failover脚本,执行如下操作
1> 将新主的read_only设置为0
2> 启动vip
Phase 4: Slaves Recovery Phase
Phase 4.1: Starting Parallel Slave Diff Log Generation Phase
为slave获取差异的relay log
因为192.168.244.30拥有最新的relay log,所以也没必要获取差异的relay log
Phase 4.2: Starting Parallel Slave Log Apply Phase
开始slave的恢复阶段
1> 将差异的relay log和master差异日志scp到slave上。
2> 应用差异日志。
3> 清除原来的复制关系,并再次执行change master命令建立新的主从同步。
如果有多个slave,则该恢复过程是并行的。
Phase 5: New master cleanup phase
从MHA的配置文件中剔除server1的配置信息
最后,生成Failover Report并发送告警邮件。
手动Failover
与自动Failover相对应的是手动Failover,即当master发生故障时,通过手动执行脚本来进行故障切换。
命令如下:
# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.244.10 --dead_master_port=3306 --new_master_host=192.168.244.30 --new_master_port=3306 --ignore_last_failover
输出日志信息如下:
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.244.10. Wed May 24 19:44:20 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed May 24 19:44:20 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed May 24 19:44:20 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed May 24 19:44:20 2017 - [info] MHA::MasterFailover version 0.56. Wed May 24 19:44:20 2017 - [info] Starting master failover. Wed May 24 19:44:20 2017 - [info] Wed May 24 19:44:20 2017 - [info] * Phase 1: Configuration Check Phase.. Wed May 24 19:44:20 2017 - [info] Wed May 24 19:44:21 2017 - [info] GTID failover mode = 0Wed May 24 19:44:21 2017 - [info] Dead Servers: Wed May 24 19:44:21 2017 - [info] 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:21 2017 - [info] Checking master reachability via MySQL(double check)... Wed May 24 19:44:21 2017 - [info] ok. Wed May 24 19:44:21 2017 - [info] Alive Servers: Wed May 24 19:44:21 2017 - [info] 192.168.244.20(192.168.244.20:3306) Wed May 24 19:44:21 2017 - [info] 192.168.244.30(192.168.244.30:3306) Wed May 24 19:44:21 2017 - [info] Alive Slaves: Wed May 24 19:44:21 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:21 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:21 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed May 24 19:44:21 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:21 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Master 192.168.244.10(192.168.244.10:3306) is dead. Proceed? (yes/NO): yes Wed May 24 19:44:23 2017 - [info] Starting Non-GTID based failover. Wed May 24 19:44:23 2017 - [info] Wed May 24 19:44:23 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed May 24 19:44:23 2017 - [info] Wed May 24 19:44:23 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed May 24 19:44:23 2017 - [info] Wed May 24 19:44:24 2017 - [info] HealthCheck: SSH to 192.168.244.10 is reachable. Wed May 24 19:44:24 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed May 24 19:44:24 2017 - [info] Executing master IP deactivation script: Wed May 24 19:44:24 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --command=stopssh --ssh_user=root Disabling the VIP an old master: 192.168.244.10 SIOCSIFFLAGS: Cannot assign requested address Wed May 24 19:44:24 2017 - [info] done. Wed May 24 19:44:24 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed May 24 19:44:24 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed May 24 19:44:24 2017 - [info] Wed May 24 19:44:24 2017 - [info] * Phase 3: Master Recovery Phase.. Wed May 24 19:44:24 2017 - [info] Wed May 24 19:44:24 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed May 24 19:44:24 2017 - [info] Wed May 24 19:44:24 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:120Wed May 24 19:44:24 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed May 24 19:44:24 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:24 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed May 24 19:44:24 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:24 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:120Wed May 24 19:44:24 2017 - [info] Oldest slaves: Wed May 24 19:44:24 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:24 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed May 24 19:44:24 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:44:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:44:24 2017 - [info] Wed May 24 19:44:24 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..Wed May 24 19:44:24 2017 - [info] Wed May 24 19:44:24 2017 - [info] Fetching dead master's binary logs..Wed May 24 19:44:24 2017 - [info] Executing command on the dead master 192.168.244.10(192.168.244.10:3306): save_binary_logs --comman d=save --start_file=mysql-bin.000002 --start_pos=120 --binlog_dir=/var/lib/mysql --output_file=/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000002 pos 120 to mysql-bin.000002 EOF into /tmp/saved_master_binlog_from_192.168.244.10_330 6_20170524194420.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 120.. ok. Dumping effective binlog data from /var/lib/mysql/mysql-bin.000002 position 120 to tail(143).. ok. Binlog Checksum enabled Concat succeeded. saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog 100% 143 0.1KB/s 00:00 Wed May 24 19:44:24 2017 - [info] scp from root@192.168.244.10:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlo g to local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog succeeded.Wed May 24 19:44:25 2017 - [info] HealthCheck: SSH to 192.168.244.20 is reachable. Wed May 24 19:44:25 2017 - [info] HealthCheck: SSH to 192.168.244.30 is reachable. Wed May 24 19:44:25 2017 - [info] Wed May 24 19:44:25 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed May 24 19:44:25 2017 - [info] Wed May 24 19:44:25 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed May 24 19:44:25 2017 - [info] All slaves received relay logs to the same position. No need to resync each other. Wed May 24 19:44:25 2017 - [info] 192.168.244.30 can be new master. Wed May 24 19:44:25 2017 - [info] New master is 192.168.244.30(192.168.244.30:3306) Wed May 24 19:44:25 2017 - [info] Starting master failover.. Wed May 24 19:44:25 2017 - [info] From:192.168.244.10(192.168.244.10:3306) (current master) +--192.168.244.20(192.168.244.20:3306) +--192.168.244.30(192.168.244.30:3306) To:192.168.244.30(192.168.244.30:3306) (new master) +--192.168.244.20(192.168.244.20:3306) Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.30(192.168.244.30:3306)? (yes/NO): yes Wed May 24 19:44:32 2017 - [info] New master decided manually is 192.168.244.30(192.168.244.30:3306) Wed May 24 19:44:32 2017 - [info] Wed May 24 19:44:32 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Wed May 24 19:44:32 2017 - [info] Wed May 24 19:44:32 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed May 24 19:44:32 2017 - [info] Sending binlog.. saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog 100% 143 0.1KB/s 00:00 Wed May 24 19:44:33 2017 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog to root@192.168.244.30:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog succeeded.Wed May 24 19:44:33 2017 - [info] Wed May 24 19:44:33 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Wed May 24 19:44:33 2017 - [info] Wed May 24 19:44:33 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed May 24 19:44:33 2017 - [info] Starting recovery on 192.168.244.30(192.168.244.30:3306).. Wed May 24 19:44:33 2017 - [info] Generating diffs succeeded. Wed May 24 19:44:33 2017 - [info] Waiting until all relay logs are applied. Wed May 24 19:44:33 2017 - [info] done. Wed May 24 19:44:33 2017 - [info] Getting slave status.. Wed May 24 19:44:33 2017 - [info] This slave(192.168.244.30)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000002:120). No need to recover from Exec_Master_Log_Pos.Wed May 24 19:44:33 2017 - [info] Connecting to the target slave host 192.168.244.30, running recover script.. Wed May 24 19:44:33 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.244.30 --slave_ip=192.168.244.30 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog --workdir=/tmp --target_version=5.6.31-log --timestamp=20170524194420 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxxWed May 24 19:44:33 2017 - [info] MySQL client version is 5.6.31. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog on 192.168.244.30:3306. This may take long time...Applying log files succeeded. Wed May 24 19:44:33 2017 - [info] All relay logs were successfully applied. Wed May 24 19:44:33 2017 - [info] Getting new master's binlog name and position..Wed May 24 19:44:33 2017 - [info] mysql-bin.000001:1429Wed May 24 19:44:33 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_ HOST='192.168.244.30', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=1429, MASTER_USER='repl', MASTER_PASSWORD='xxx';Wed May 24 19:44:33 2017 - [info] Executing master IP activate script: Wed May 24 19:44:33 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --new_master_host=192.168.244.30 --new_master_ip=192.168.244.30 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' Set read_only=0 on the new master. Enabling the VIP 192.168.244.188 on the new master: 192.168.244.30 Wed May 24 19:44:33 2017 - [info] OK. Wed May 24 19:44:33 2017 - [info] ** Finished master recovery successfully. Wed May 24 19:44:33 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed May 24 19:44:33 2017 - [info] Wed May 24 19:44:33 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed May 24 19:44:33 2017 - [info] Wed May 24 19:44:33 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed May 24 19:44:33 2017 - [info] Wed May 24 19:44:33 2017 - [info] -- Slave diff file generation on host 192.168.244.20(192.168.244.20:3306) started, pid: 1598. Check tmp log /masterha/app1/192.168.244.20_3306_20170524194420.log if it takes time..Wed May 24 19:44:34 2017 - [info] Wed May 24 19:44:34 2017 - [info] Log messages from 192.168.244.20 ... Wed May 24 19:44:34 2017 - [info] Wed May 24 19:44:33 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed May 24 19:44:34 2017 - [info] End of log messages from 192.168.244.20. Wed May 24 19:44:34 2017 - [info] -- 192.168.244.20(192.168.244.20:3306) has the latest relay log events. Wed May 24 19:44:34 2017 - [info] Generating relay diff files from the latest slave succeeded. Wed May 24 19:44:34 2017 - [info] Wed May 24 19:44:34 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed May 24 19:44:34 2017 - [info] Wed May 24 19:44:34 2017 - [info] -- Slave recovery on host 192.168.244.20(192.168.244.20:3306) started, pid: 1600. Check tmp log /ma sterha/app1/192.168.244.20_3306_20170524194420.log if it takes time..saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog 100% 143 0.1KB/s 00:00 Wed May 24 19:44:35 2017 - [info] Wed May 24 19:44:35 2017 - [info] Log messages from 192.168.244.20 ... Wed May 24 19:44:35 2017 - [info] Wed May 24 19:44:34 2017 - [info] Sending binlog.. Wed May 24 19:44:35 2017 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog to root@192.168.244.20:/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog succeeded.Wed May 24 19:44:35 2017 - [info] Starting recovery on 192.168.244.20(192.168.244.20:3306).. Wed May 24 19:44:35 2017 - [info] Generating diffs succeeded. Wed May 24 19:44:35 2017 - [info] Waiting until all relay logs are applied. Wed May 24 19:44:35 2017 - [info] done. Wed May 24 19:44:35 2017 - [info] Getting slave status.. Wed May 24 19:44:35 2017 - [info] This slave(192.168.244.20)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000002:120). No need to recover from Exec_Master_Log_Pos.Wed May 24 19:44:35 2017 - [info] Connecting to the target slave host 192.168.244.20, running recover script.. Wed May 24 19:44:35 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.244.20 --slave_ip=192.168.244.20 --slave_port=3306 --apply_files=/tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog --workdir=/tmp --target_version=5.6.31-log --timestamp=20170524194420 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxxWed May 24 19:44:35 2017 - [info] MySQL client version is 5.6.31. Using --binary-mode. Applying differential binary/relay log files /tmp/saved_master_binlog_from_192.168.244.10_3306_20170524194420.binlog on 192.168.244.20:3306. This may take long time...Applying log files succeeded. Wed May 24 19:44:35 2017 - [info] All relay logs were successfully applied. Wed May 24 19:44:35 2017 - [info] Resetting slave 192.168.244.20(192.168.244.20:3306) and starting replication from the new master 192.168.244.30(192.168.244.30:3306)..Wed May 24 19:44:35 2017 - [info] Executed CHANGE MASTER. Wed May 24 19:44:35 2017 - [info] Slave started. Wed May 24 19:44:35 2017 - [info] End of log messages from 192.168.244.20. Wed May 24 19:44:35 2017 - [info] -- Slave recovery on host 192.168.244.20(192.168.244.20:3306) succeeded. Wed May 24 19:44:35 2017 - [info] All new slave servers recovered successfully. Wed May 24 19:44:35 2017 - [info] Wed May 24 19:44:35 2017 - [info] * Phase 5: New master cleanup phase.. Wed May 24 19:44:35 2017 - [info] Wed May 24 19:44:35 2017 - [info] Resetting slave info on the new master.. Wed May 24 19:44:35 2017 - [info] 192.168.244.30: Resetting slave info succeeded. Wed May 24 19:44:35 2017 - [info] Master failover to 192.168.244.30(192.168.244.30:3306) completed successfully. Wed May 24 19:44:35 2017 - [info] ----- Failover Report -----app1: MySQL Master failover 192.168.244.10(192.168.244.10:3306) to 192.168.244.30(192.168.244.30:3306) succeeded Master 192.168.244.10(192.168.244.10:3306) is down!Check MHA Manager logs at node4 for details. Started manual(interactive) failover. Invalidated master IP address on 192.168.244.10(192.168.244.10:3306) The latest slave 192.168.244.20(192.168.244.20:3306) has all relay logs for recovery. Selected 192.168.244.30(192.168.244.30:3306) as a new master.192.168.244.30(192.168.244.30:3306): OK: Applying all logs succeeded.192.168.244.30(192.168.244.30:3306): OK: Activated master IP address.192.168.244.20(192.168.244.20:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded.192.168.244.20(192.168.244.20:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.244.30(192.168.244.30:3306)192.168.244.30(192.168.244.30:3306): Resetting slave info succeeded. Master failover to 192.168.244.30(192.168.244.30:3306) completed successfully. Wed May 24 19:44:35 2017 - [info] Sending mail.. Unknown option: conf
在切换的过程中,会有两次确认操作。
通过日志的输出,可以看出,手动Failover的切换逻辑和自动Failover的切换逻辑基本一致。
在上面的命令中,显示指定了--new_master_host参数,如果,没有显示指定该参数的话,则默认新主为192.168.244.20,因为在/etc/masterha/app1.cnf中将主机设置为candidate_master=1。
事实上,如果master alive的话,是不允许切换的。
Wed May 24 19:26:24 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed May 24 19:26:24 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed May 24 19:26:24 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed May 24 19:26:24 2017 - [info] MHA::MasterFailover version 0.56. Wed May 24 19:26:24 2017 - [info] Starting master failover. Wed May 24 19:26:24 2017 - [info] Wed May 24 19:26:24 2017 - [info] * Phase 1: Configuration Check Phase.. Wed May 24 19:26:24 2017 - [info] Wed May 24 19:26:26 2017 - [info] GTID failover mode = 0Wed May 24 19:26:26 2017 - [info] Dead Servers: Wed May 24 19:26:26 2017 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover. Wed May 24 19:26:26 2017 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
View Code
在默认情况下,如果MHA检测到两次切换的时间小于8小时,则不允许进行Failover,这个时候,需指定--ignore_last_failover参数
# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.244.10 --dead_master_port=3306--dead_master_ip=<dead_master_ip> is not set. Using 192.168.244.10. Wed May 24 19:26:24 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed May 24 19:26:24 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed May 24 19:26:24 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed May 24 19:26:24 2017 - [info] MHA::MasterFailover version 0.56. Wed May 24 19:26:24 2017 - [info] Starting master failover. Wed May 24 19:26:24 2017 - [info] Wed May 24 19:26:24 2017 - [info] * Phase 1: Configuration Check Phase.. Wed May 24 19:26:24 2017 - [info] Wed May 24 19:26:26 2017 - [info] GTID failover mode = 0Wed May 24 19:26:26 2017 - [info] Dead Servers: Wed May 24 19:26:26 2017 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover. Wed May 24 19:26:26 2017 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_sw itch line 53.[root@node4 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.244.10 --dead_mas ter_port=3306--dead_master_ip=<dead_master_ip> is not set. Using 192.168.244.10. Wed May 24 19:29:23 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed May 24 19:29:23 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed May 24 19:29:23 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed May 24 19:29:23 2017 - [info] MHA::MasterFailover version 0.56. Wed May 24 19:29:23 2017 - [info] Starting master failover. Wed May 24 19:29:23 2017 - [info] Wed May 24 19:29:23 2017 - [info] * Phase 1: Configuration Check Phase.. Wed May 24 19:29:23 2017 - [info] Wed May 24 19:29:24 2017 - [info] GTID failover mode = 0Wed May 24 19:29:24 2017 - [info] Dead Servers: Wed May 24 19:29:24 2017 - [info] 192.168.244.10(192.168.244.10:3306) Wed May 24 19:29:24 2017 - [info] Checking master reachability via MySQL(double check)... Wed May 24 19:29:24 2017 - [info] ok. Wed May 24 19:29:24 2017 - [info] Alive Servers: Wed May 24 19:29:24 2017 - [info] 192.168.244.20(192.168.244.20:3306) Wed May 24 19:29:24 2017 - [info] 192.168.244.30(192.168.244.30:3306) Wed May 24 19:29:24 2017 - [info] Alive Slaves: Wed May 24 19:29:24 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:29:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Wed May 24 19:29:24 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed May 24 19:29:24 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabledWed May 24 19:29:24 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Master 192.168.244.10(192.168.244.10:3306) is dead. Proceed? (yes/NO): yes Wed May 24 19:29:46 2017 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln309] Last failover was done at 2017/05/24 16:04:54. Current time is too early to do failover again. If you want to do failover, manually remove /masterha/app1/app1.failover.complete and run this script again.Wed May 24 19:29:46 2017 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_sw itch line 53.
View Code
后面附上之前提到的测试结果不理想的两种方案
方案一:
模拟步骤如下:
1. master上创建测试表
mysql sbtest.t1(id ,col1 (),col2 (),col3 (),col4 (),col5 ),col6 (),col7 (),col8 (),col9 (),col10 ()); Query OK, rows affected ( sec)
2. 关闭Candicate master和Slave的主从复制。
mysql> stop slave; Query OK, 0 rows affected (0.03 sec)
3. 在master上生成测试数据
创建存储过程
delimiter //create procedure p1()begin declare v1 int default 0; while v1 <=700000 do insert into sbtest.t1 values(v1,'aaaaaaaaaa','bbbbbbbbbb','cccccccccc','dddddddddd','eeeeeeeeee','ffffffffff','gggggggggg','hhhhhhhhhh','iiiiiiiiii','jjjjjjjjjj'); set v1=v1+1; if v1 %1000 =0 then commit; end if; end while;end//delimiter ;
执行存储过程
mysql> call p1;
共执行了两次,生成了500M左右的binlog
[root@node1 ~]# ll -h /var/lib/mysql/mysql-bin.000046 -rw-rw---- 1 mysql mysql 502M Apr 24 15:18 /var/lib/mysql/mysql-bin.00004
5. 开启Candicate master和Slave的主从复制并kill掉master的mysqld进程,模拟主库发生故障,进行自动failover操作
在这里,我写了个脚本来实现开启slave复制并kill掉master mysqld进程的功能
# vim monitor_slave.py
= MySQLdb.connect(host, , === slave_status_result[10], slave_status_result[11= [, == CheckSlaveStatus(check_host[1 check_slave_status = slave_status == [, 510 == subprocess.Popen(command_line, stdout=subprocess.PIPE, shell= ==
执行该脚本
6. 查看MHA的切换日志
Sun May 21 09:46:56 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) Sun May 21 09:46:56 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.244.20 -s 192.168.244.30 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 --user=root --master_host=192.168.244.10 --master_ip=192.168.244.10 --master_port=3306 --master_user=monitor --masteConnection timed out during banner exchange Sun May 21 09:47:08 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 21 09:47:08 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun May 21 09:47:08 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 09:46:57 2017 - [warning] Connection failed 2 time(s).. Sun May 21 09:46:58 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 09:46:58 2017 - [warning] Connection failed 3 time(s).. Sun May 21 09:46:59 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111) Sun May 21 09:46:59 2017 - [warning] Connection failed 4 time(s).. Monitoring server 192.168.244.20 is reachable, Master is not reachable from 192.168.244.20. OK. Sun May 21 09:47:01 2017 - [warning] HealthCheck: SSH to 192.168.244.10 is NOT reachable. Monitoring server 192.168.244.30 is reachable, Master is not reachable from 192.168.244.30. OK. Sun May 21 09:47:07 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start. Sun May 21 09:47:07 2017 - [warning] Master is not reachable from health checker!Sun May 21 09:47:07 2017 - [warning] Master 192.168.244.10(192.168.244.10:3306) is not reachable!Sun May 21 09:47:07 2017 - [warning] SSH is NOT reachable. Sun May 21 09:47:07 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status.. Sun May 21 09:47:07 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun May 21 09:47:07 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun May 21 09:47:07 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun May 21 09:47:08 2017 - [info] GTID failover mode = 0Sun May 21 09:47:08 2017 - [info] Dead Servers: Sun May 21 09:47:08 2017 - [info] 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:08 2017 - [info] Alive Servers: Sun May 21 09:47:08 2017 - [info] 192.168.244.20(192.168.244.20:3306) Sun May 21 09:47:08 2017 - [info] 192.168.244.30(192.168.244.30:3306) Sun May 21 09:47:08 2017 - [info] Alive Slaves: Sun May 21 09:47:08 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:08 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:08 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 09:47:08 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:08 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:08 2017 - [info] Checking slave configurations.. Sun May 21 09:47:08 2017 - [info] Checking replication filtering settings.. Sun May 21 09:47:08 2017 - [info] Replication filtering check ok. Sun May 21 09:47:08 2017 - [info] Master is down!Sun May 21 09:47:08 2017 - [info] Terminating monitoring script. Sun May 21 09:47:08 2017 - [info] Got exit code 20 (Master dead). Sun May 21 09:47:08 2017 - [info] MHA::MasterFailover version 0.56. Sun May 21 09:47:08 2017 - [info] Starting master failover. Sun May 21 09:47:08 2017 - [info] Sun May 21 09:47:08 2017 - [info] * Phase 1: Configuration Check Phase.. Sun May 21 09:47:08 2017 - [info] Sun May 21 09:47:09 2017 - [info] GTID failover mode = 0Sun May 21 09:47:09 2017 - [info] Dead Servers: Sun May 21 09:47:09 2017 - [info] 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:09 2017 - [info] Checking master reachability via MySQL(double check)... Sun May 21 09:47:09 2017 - [info] ok. Sun May 21 09:47:09 2017 - [info] Alive Servers: Sun May 21 09:47:09 2017 - [info] 192.168.244.20(192.168.244.20:3306) Sun May 21 09:47:09 2017 - [info] 192.168.244.30(192.168.244.30:3306) Sun May 21 09:47:09 2017 - [info] Alive Slaves: Sun May 21 09:47:09 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:09 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:09 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 09:47:09 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:09 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:09 2017 - [info] Starting Non-GTID based failover. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] ** Phase 1: Configuration Check Phase completed. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Sun May 21 09:47:09 2017 - [info] Executing master IP deactivation script: Sun May 21 09:47:09 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --command=stop Disabling the VIP an old master: 192.168.244.10 Sun May 21 09:47:09 2017 - [info] done. Sun May 21 09:47:09 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun May 21 09:47:09 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] * Phase 3: Master Recovery Phase.. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:37651149Sun May 21 09:47:09 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun May 21 09:47:09 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:09 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:09 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 09:47:09 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:35083093Sun May 21 09:47:09 2017 - [info] Oldest slaves: Sun May 21 09:47:09 2017 - [info] 192.168.244.30(192.168.244.30:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:09 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] * Phase 3.3: Determining New Master Phase.. Sun May 21 09:47:09 2017 - [info] Sun May 21 09:47:09 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Sun May 21 09:47:10 2017 - [info] HealthCheck: SSH to 192.168.244.20 is reachable. Sun May 21 09:47:10 2017 - [info] Checking whether 192.168.244.20 has relay logs from the oldest position.. Sun May 21 09:47:10 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=37651149 --target_mlf=mysql-bin.000001 --target_rmlp=35083093 --server_id=2 --workdir=/tmp --timestamp=20170521094708 --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ : Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:mysqld-relay-bin.000004, start_pos:35083256. Target relay log FOUND!Sun May 21 09:47:11 2017 - [info] OK. 192.168.244.20 has all relay logs. Sun May 21 09:47:11 2017 - [info] HealthCheck: SSH to 192.168.244.30 is reachable. Sun May 21 09:47:13 2017 - [info] Searching new master from slaves.. Sun May 21 09:47:13 2017 - [info] Candidate masters from the configuration file: Sun May 21 09:47:13 2017 - [info] 192.168.244.20(192.168.244.20:3306) Version=5.6.31-log (oldest major version between slaves) log-bin:enabled Sun May 21 09:47:13 2017 - [info] Replicating from 192.168.244.10(192.168.244.10:3306) Sun May 21 09:47:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun May 21 09:47:13 2017 - [info] Non-candidate masters: Sun May 21 09:47:13 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Sun May 21 09:47:13 2017 - [info] New master is 192.168.244.20(192.168.244.20:3306) Sun May 21 09:47:13 2017 - [info] Starting master failover.. Sun May 21 09:47:13 2017 - [info] From:192.168.244.10(192.168.244.10:3306) (current master) +--192.168.244.20(192.168.244.20:3306) +--192.168.244.30(192.168.244.30:3306) To:192.168.244.20(192.168.244.20:3306) (new master) +--192.168.244.30(192.168.244.30:3306) Sun May 21 09:47:13 2017 - [info] Sun May 21 09:47:13 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Sun May 21 09:47:13 2017 - [info] Sun May 21 09:47:13 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Sun May 21 09:47:13 2017 - [info] Sun May 21 09:47:13 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Sun May 21 09:47:13 2017 - [info] Sun May 21 09:47:13 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Sun May 21 09:47:13 2017 - [info] Starting recovery on 192.168.244.20(192.168.244.20:3306).. Sun May 21 09:47:13 2017 - [info] This server has all relay logs. Waiting all logs to be applied.. Sun May 21 09:47:56 2017 - [info] done. Sun May 21 09:47:56 2017 - [info] All relay logs were successfully applied. Sun May 21 09:47:56 2017 - [info] Getting new master's binlog name and position..Sun May 21 09:47:56 2017 - [info] mysql-bin.000010:166054508Sun May 21 09:47:56 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000010', MASTER_LOG_POS=166054508, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun May 21 09:47:56 2017 - [info] Executing master IP activate script: Sun May 21 09:47:56 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.244.10 --orig_master_ip=192.168.244.10 --orig_master_port=3306 --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' Set read_only=0 on the new master. Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20 Sun May 21 09:47:59 2017 - [info] OK. Sun May 21 09:47:59 2017 - [info] ** Finished master recovery successfully. Sun May 21 09:47:59 2017 - [info] * Phase 3: Master Recovery Phase completed. Sun May 21 09:47:59 2017 - [info] Sun May 21 09:47:59 2017 - [info] * Phase 4: Slaves Recovery Phase.. Sun May 21 09:47:59 2017 - [info] Sun May 21 09:47:59 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Sun May 21 09:47:59 2017 - [info] Sun May 21 09:47:59 2017 - [info] -- Slave diff file generation on host 192.168.244.30(192.168.244.30:3306) started, pid: 3606. Check tmp log /masterha/app1/192.168.244.30_3306_20170521094708.log if it takes time.. Sun May 21 09:48:04 2017 - [info] Sun May 21 09:48:04 2017 - [info] Log messages from 192.168.244.30 ... Sun May 21 09:48:04 2017 - [info] Sun May 21 09:47:59 2017 - [info] Server 192.168.244.30 received relay logs up to: mysql-bin.000001:35083093Sun May 21 09:47:59 2017 - [info] Need to get diffs from the latest slave(192.168.244.20) up to: mysql-bin.000001:37651149 (using the latest slave's relay logs)Sun May 21 09:47:59 2017 - [info] Connecting to the latest slave host 192.168.244.20, generating diff relay log files.. Sun May 21 09:47:59 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.244.30 --latest_mlf=mysql-bin.000001 --latest_rmlp=37651149 --target_mlf=mysql-bin.000001 --target_rmlp=35083093 --server_id=2 --diff_file_readtolatest=/tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog --workdir=/tmp --timestamp=20170521094708 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ Sun May 21 09:48:04 2017 - [info] Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:mysqld-relay-bin.000004, start_pos:35083256. Concat binary/relay logs from mysqld-relay-bin.000004 pos 35083256 to mysqld-relay-bin.000004 EOF into /tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog .. Binlog Checksum enabled Binlog Checksum enabled Dumping binlog format description event, from position 0 to 283.. ok. Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000004 position 35083256 to tail(37651312).. ok. Binlog Checksum enabled Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog . scp node2:/tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog to root@192.168.244.30(22) succeeded. Sun May 21 09:48:04 2017 - [info] Generating diff files succeeded. Sun May 21 09:48:04 2017 - [info] End of log messages from 192.168.244.30. Sun May 21 09:48:04 2017 - [info] -- Slave diff log generation on host 192.168.244.30(192.168.244.30:3306) succeeded. Sun May 21 09:48:04 2017 - [info] Generating relay diff files from the latest slave succeeded. Sun May 21 09:48:04 2017 - [info] Sun May 21 09:48:04 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Sun May 21 09:48:04 2017 - [info] Sun May 21 09:48:04 2017 - [info] -- Slave recovery on host 192.168.244.30(192.168.244.30:3306) started, pid: 3613. Check tmp log /masterha/app1/192.168.244.30_3306_20170521094708.log if it takes time.. Sun May 21 09:48:12 2017 - [info] Sun May 21 09:48:12 2017 - [info] Log messages from 192.168.244.30 ... Sun May 21 09:48:12 2017 - [info] Sun May 21 09:48:04 2017 - [info] Starting recovery on 192.168.244.30(192.168.244.30:3306).. Sun May 21 09:48:04 2017 - [info] Generating diffs succeeded. Sun May 21 09:48:04 2017 - [info] Waiting until all relay logs are applied. Sun May 21 09:48:04 2017 - [info] done. Sun May 21 09:48:04 2017 - [info] Getting slave status.. Sun May 21 09:48:04 2017 - [info] This slave(192.168.244.30)'s Exec_Master_Log_Pos(mysql-bin.000001:35083010) does not equal to Read_Master_Log_Pos(mysql-bin.000001:35083093). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.Sun May 21 09:48:04 2017 - [info] Saving local relay logs from exec pos to read pos on 192.168.244.30: from mysqld-relay-bin.000004:35083173 to the end of the relay log.. Sun May 21 09:48:04 2017 - [info] Executing command : save_binary_logs --command=save --start_file=mysqld-relay-bin.000004 --start_pos=35083173 --output_file=/tmp/relay_from_exec_to_read_192.168.244.30_3306_20170521094708.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --binlog_dir=/var/lib/mysql/ Sun May 21 09:48:05 2017 - [info] Creating /tmp if not exists.. ok. Concat binary/relay logs from mysqld-relay-bin.000004 pos 35083173 to mysqld-relay-bin.000004 EOF into /tmp/relay_from_exec_to_read_192.168.244.30_3306_20170521094708.binlog .. Binlog Checksum enabled Binlog Checksum enabled Dumping binlog format description event, from position 0 to 283.. ok. Dumping effective binlog data from /var/lib/mysql/mysqld-relay-bin.000004 position 35083173 to tail(35083256).. ok. Binlog Checksum enabled Binlog Checksum enabled Concat succeeded. Sun May 21 09:48:05 2017 - [info] Connecting to the target slave host 192.168.244.30, running recover script.. Sun May 21 09:48:05 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='monitor' --slave_host=192.168.244.30 --slave_ip=192.168.244.30 --slave_port=3306 --apply_files=/tmp/relay_from_exec_to_read_192.168.244.30_3306_20170521094708.binlog,/tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog --workdir=/tmp --target_version=5.6.31-log --timestamp=20170521094708 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Sun May 21 09:48:12 2017 - [info] Concat all apply files to /tmp/total_binlog_for_192.168.244.30_3306.20170521094708.binlog .. Copying the first binlog file /tmp/relay_from_exec_to_read_192.168.244.30_3306_20170521094708.binlog to /tmp/total_binlog_for_192.168.244.30_3306.20170521094708.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog.. Binlog Checksum enabled Binlog Checksum enabled dumped up to pos 283. ok. /tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog has effective binlog events from pos 283. Dumping effective binlog data from /tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog position 283 to tail(2568339).. ok. Concat succeeded. All apply target binary logs are concatinated at /tmp/total_binlog_for_192.168.244.30_3306.20170521094708.binlog . MySQL client version is 5.6.31. Using --binary-mode. Applying differential binary/relay log files /tmp/relay_from_exec_to_read_192.168.244.30_3306_20170521094708.binlog,/tmp/relay_from_read_to_latest_192.168.244.30_3306_20170521094708.binlog on 192.168.244.30:3306. This may take long time... Applying log files succeeded. Sun May 21 09:48:12 2017 - [info] All relay logs were successfully applied. Sun May 21 09:48:12 2017 - [info] Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 192.168.244.20(192.168.244.20:3306).. Sun May 21 09:48:12 2017 - [info] Executed CHANGE MASTER. Sun May 21 09:48:12 2017 - [info] Slave started. Sun May 21 09:48:12 2017 - [info] End of log messages from 192.168.244.30. Sun May 21 09:48:12 2017 - [info] -- Slave recovery on host 192.168.244.30(192.168.244.30:3306) succeeded. Sun May 21 09:48:12 2017 - [info] All new slave servers recovered successfully. Sun May 21 09:48:12 2017 - [info] Sun May 21 09:48:12 2017 - [info] * Phase 5: New master cleanup phase.. Sun May 21 09:48:12 2017 - [info] Sun May 21 09:48:12 2017 - [info] Resetting slave info on the new master.. Sun May 21 09:48:12 2017 - [info] 192.168.244.20: Resetting slave info succeeded. Sun May 21 09:48:12 2017 - [info] Master failover to 192.168.244.20(192.168.244.20:3306) completed successfully. Sun May 21 09:48:12 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf . Sun May 21 09:48:12 2017 - [info] ----- Failover Report -----app1: MySQL Master failover 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306) succeeded Master 192.168.244.10(192.168.244.10:3306) is down!Check MHA Manager logs at node4:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Invalidated master IP address on 192.168.244.10(192.168.244.10:3306) The latest slave 192.168.244.20(192.168.244.20:3306) has all relay logs for recovery. Selected 192.168.244.20(192.168.244.20:3306) as a new master.192.168.244.20(192.168.244.20:3306): OK: Applying all logs succeeded.192.168.244.20(192.168.244.20:3306): OK: Activated master IP address.192.168.244.30(192.168.244.30:3306): Generating differential relay logs up to 192.168.244.20(192.168.244.20:3306)succeeded. Generating relay diff files from the latest slave succeeded.192.168.244.30(192.168.244.30:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.244.20(192.168.244.20:3306)192.168.244.20(192.168.244.20:3306): Resetting slave info succeeded. Master failover to 192.168.244.20(192.168.244.20:3306) completed successfully. Sun May 21 09:48:12 2017 - [info] Sending mail.. Unknown option: conf
View Code
方案二
对Candicate master上和Slave上的网卡进行带宽限制
Candicate master
[root@node2 ~]# /sbin/tc qdisc add dev eth0 root tbf rate 50kbit latency 50ms burst 15kb
Slave
/sbin/tc qdisc add dev eth0 root tbf rate 80kbit latency 50ms burst 15kb
注意:Candicate master的网卡限制在50kbit,slave的网卡限制在80kbit,
要确保Candicate master的小于slave的,这样才能模拟出Candicate master上relay log的位置小于slave上relay log的位置。
关于如何确认网卡的速率控制到多大合适,
可通过压测,测试master的吞吐量,
以及结合如下方法,测试主机之间的传输速率来确定网卡合适的速率。
[root@node1 ~]# dd if=/dev/zero of=hello.txt bs=100M count=1
[root@node1 ~]# time scp hello.txt 192.168.244.20:/tmp/
hello.txt 100% 100MB 2.6MB/s 00:39
参考
《深入浅出MySQL》
Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!