1. Introduction
"MySQL master-slave replication" technology is widely used in common high-availability architectures in the Internet industry, such as the common one-master-slave replication architecture, keepalived + MySQL dual-master (master-slave) replication architecture , MHA + one master and two slave replication architecture, etc. all apply MySQL master-slave replication technology. However, since master-slave replication is a logical replication based on binlog, the risk of inconsistency in replicated data is inevitable. This risk will not only cause inconsistency in user data access, but also lead to 1032 and 1062 errors in subsequent replication, which will cause the hidden danger of stagnation in the replication architecture. , In order to discover and solve this problem in time, we need to regularly or irregularly carry out master-slave replication data consistency verification and repair work, so how to achieve this work? How to automate this work? Let's explore these questions.
2. Data consistency check and repair method
In order to achieve master-slave replication data consistency check and repair, we first recommend two popular tools, respectively They are Percona's pt-table-checksum and pt-table-sync. The former is used to verify the consistency of master-slave replication data, and the latter is used to repair data and restore the data to consistency.
GRANTUPDATE,INSERT,DELETE,SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'hangxing'@'MasterIP'identified by 'PASSWORD'; GRANTALL ON test.* TO 'hangxing'@'MasterIP' IDENTIFIED BY 'PASSWORD';(2) Create a verification account in the main library Verification information table
CREATETABLE IF NOT EXISTS checksums ( db char(64)NOT NULL, tblchar(64) NOT NULL, chunk intNOT NULL, chunk_timefloat NULL, chunk_indexvarchar(200) NULL, lower_boundarytext NULL, upper_boundarytext NULL, this_crcchar(40) NOT NULL, this_cntint NOT NULL, master_crcchar(40) NULL, master_cntint NULL, tstimestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY(db, tbl, chunk), INDEXts_db_tbl (ts, db, tbl) )ENGINE=InnoDB;(3) Determine the primary key If there is no primary key for verification and repair, the impact on performance will be very serious. The most important constraint for data verification and repair is the primary key. No primary key or unique index will result in unsuccessful repair. Primary key judgment statement:
SELECTDISTINCT CONCAT(t.table_schema,'.',t.table_name) astbl,t.engine,IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type ='FULLTEXT','FULLTEXT','') as ftidx,IF(s.index_type = 'SPATIAL','SPATIAL','') asgisidx FROM information_schema.tables AS t LEFT JOINinformation_schema.key_column_usage AS c ON (t.table_schema =c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name ='PRIMARY')LEFT JOIN information_schema.statistics AS s ON (t.table_schema =s.table_schema AND t.table_name = s.table_name AND s.index_type IN('FULLTEXT','SPATIAL')) WHERE t.table_schema NOT IN('information_schema','performance_schema','mysql') AND t.table_type = 'BASETABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL ORs.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name;(4) Master-slave data verification Master-slave data verification is implemented using pt-table-checksum and must be executed on the main database , when performing verification, control whether to verify the entire database and all tables or only the core table through parameters. Example of verification instructions: ./pt-table-checksum--nocheck-binlog-format --nocheck-plan --nocheck-replication-filters--replicate=test.checksums --databases=db1--tables=tb1 -h 192.168.XXX.XX -P 3306-u'hangxing' -p'PASSOWRD' --recursion-method="processlist"Analysis:
--no-check-binlog-format Does not check the copied binlog mode. --nocheck-replication-filters does not check replication filters, it is recommended to enable it. --replicate=test.checksums The check results are written to the checksums table of the test library. --databases=db1 --tables=tb1 Verify the tb1 table in the db1 database. If there are no parameters, verify the entire database and all tables. -h 192.168.XXX.XX -P 3306 Main library IP address and 3306 port. -u'hangxing' -p'PASSOWRD' Verify account password. --recursion-method="processlist" Use the processlist method to discover slave libraries. Output result after execution:
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 03-23T15:29:17 0 1 30000 1 0 1.270 testhx1.testhx1Analysis:
TS : Time to complete the check.
ERRORS : The number of errors and warnings that occurred during the check.
DIFFS : 0 means consistent, greater than 0 means inconsistent. It mainly depends on whether there is any inconsistent data in this column.
ROWS : The number of rows in the table.
CHUNKS : The number of blocks divided into the table.
SKIPPED : Number of blocks to skip due to errors or warnings or too large.
TIME : Execution time.
TABLE : The name of the table being checked.
上述输出关键看DIFFS列,结果为0说明数据一致,无需进行数据修复,如果不为0则需要继续开展数据一致性修复工作。上述语句执行后也会将详细的内容会写入test库的checksums表中,可以查看这个库表得到详细的数据校验信息,此表中信息内容格式举例如下:
主库的test.checksums中输出this_crc和master_crc,无不一致。
mysql> select * fromtest.checksums; +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+ | db | tbl | chunk | chunk_time |chunk_index |lower_boundary | upper_boundary | this_crc | this_cnt |master_crc| master_cnt |ts| +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+ | testhx1 | testhx1 | 1 | 0.003661 | NULL | NULL | NULL| cac6c46f| 4 | cac6c46f | 4 | 2016-03-23 15:29:16 | +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------
------------+
1 row in set (0.00 sec)
从库的test.checksums中输出this_crc和master_crc,不一致。
mysql>select * from checksums; +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+ |db | tbl | chunk | chunk_time | chunk_index |lower_boundary | upper_boundary | this_crc |this_cnt |master_crc | master_cnt|ts | +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+ |testhx1 | testhx1 | 1 | 0.003661 | NULL | NULL | NULL | 7c2e5f75| 5 | cac6c46f | 4 | 2016-03-23 15:29:16 | +---------+---------+-------+------------+-------------+----------------+----------------+----------+----------+------------+------------+---------------------+ 1row in set (0.00 sec)
(5)主从数据修复
用pt-table-checksum工具确定确实存在数据不一致的情况下开始修复数据,数据修复使用工具pt-table-sync,内带校验功能,但前提是修复的表必须要有主键,这个工具也要在主库上执行。
方法1:语句量大的情况下将修复的语句导入到sql文件中,再直接导入执行
在主库用pt-table-sync打印出修复不一致数据的SQL,后将修复语句在从库执行。
举例:
pt-table-sync --print--sync-to-master h='SlaveIP',P=3306,u=hangxing,p='PASSWORD' --databases=db1--tables=tb1 > /tmp/repair.sql
方法2:语句量不大的情况下,将修复的语句print出来,再execute
举例:
打印数据修复语句
pt-table-sync--print --sync-to-master h='SlaveIP',P=3306,u=hangxing,p=' PASSWORD '--databases=testhx1 --tables=testhx1 DELETE FROM`testhx1`.`testhx1` WHERE `id`='11' LIMIT 1 /*percona-toolkit src_db:testhx1src_tbl:testhx1 src_dsn:P=3306,h=’MasterIP’, p=...,u=checksums dst_db:testhx1dst_tbl:testhx1 dst_dsn:P=3306,h='SlaveIP',p=...,u=checksums lock:1transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24745 user:hangxinghost:XXXXXXXXXX*/; REPLACEINTO `testhx1`.`testhx1`(`name`, `age`, `id`) VALUES ('bobby', '6', '7')/*percona-toolkit src_db:testhx1 src_tbl:testhx1 src_dsn:P=3306,h=’MasterIP’, p=...,u=hangxingdst_db:testhx1 dst_tbl:testhx1 dst_dsn:P=3306,h=’SlaveIP’,p=...,u=hangxinglock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24745user:root host: XXXXXXXXXX */;REPLACEINTO `testhx1`.`testhx1`(`name`, `age`, `id`) VALUES ('lily', '5', '9')/*percona-toolkit src_db:testhx1 src_tbl:testhx1 src_dsn:P=3306,h=’MasterIP’,p=...,u=hangxing dst_db:testhx1 dst_tbl:testhx1dst_dsn:P=3306,h=’SlaveIP’,p=...,u=hangxing lock:1 transaction:1 changing_src:1replicate:0 bidirectional:0 pid:24745 user:root host: XXXXXXXXXX */;
执行数据修复语句
pt-table-sync--execute --sync-to-master h='SlaveIP',P=3306,u=hangxing,p='PASSWORD'--databases=testhx1 --tables=testhx1
(6)再次校验
上述修复完成之后,需要再次执行一次数据校验,确保数据成功修复,校验方法同(4)主从数据校验。
2.4 值得注意的点
(1)校验修复工作每月定期开展;
(2)主从复制架构在割接操作前后均需执行数据校验和修复工作;
(3)主从复制出现故障后要开展数据校验和修复工作;
(4)校验修复需在业务低谷期进行,CPU利用率超过60%时不建议做数据校验和修复;
(5)校验和修复必须在主库进行;
(6)数据库的表要有主键,否则校验效率极差,并且修复不成功。
3.数据一致性校验和修复的自动化实现
理解上述方法后,我们可以顺利完成主从复制数据一致性的校验和修复工作,但是这项工作在MySQL主从复制架构维护中开展频率较高,包括定期和各种不定期的情况,每次都手工开展耗时耗力,并且容易出现人为错误及隐患,因此,我们考虑将这项工作通过脚本实现自动化。
3.1前提准备
创建校验账号,创建校验结果输出表,配置两台主机的ssh免密码登录。
3.2自动化实现
(1)部署自动化脚本和定时任务
理解数据一致性校验和修复的全部原理和详细步骤,将其转化为多个自动化脚本,分别部署在主从库上,每月定期执行可通过在主库制定crontab定时任务调用主从库脚本实现,不定期执行可通过手动调用主从库部署的自动化脚本来实现。
(2)自动化脚本实现步骤
a.将DB相关信息赋予对应参数,如账户密码、IP、端口、常用指令等等
dbuser=XXXX dbpasswd="XXXXX" port=3306 mysql_commend="mysql-u${dbuser} -p${dbpasswd} -P${port}" master_ip=XXXXX slave_ip=XXXXX password="XXXXX" date=`date+%Y%m%d` logfile="XXXXX" hostname=`XXXXX`
b.检查ssh免密码登录是否成功;
ssh_status=`XXXXX` if [ $ssh_status != $hostname ]; then echo -e "\nthe ssh should berepair" >$logfile exit else echo -e "\nthe ssh is ok">$logfile fi
c.脚本实现准备工作:包括账号密码的创建、建立校验结果输出表,代码可参见第2小节;
d.将主库的脚本执行校验和主键判断写成联合SQL语句,实现剔除无主键表的所有表的自动数据校验,并将结果存入所建表中;
selectXXXXX NOT IN XXXXX
e.从库部署检查校验结果输出表的脚本,主库执行d后自动登录从库调用这个脚本,实现对从库上输出表中校验字段的对比如master_crc 和 this_crc,找到数据不一致的表,并且通过执行调用修复工具的指令实现不一致数据修复语句的print;
master_cnt<> this_cnt OR master_crc <> this_crc OR isnull(master_crc)<> isnull(this_crc))
f.print结果自动存储从库的某个路径文件下;
intooutfile '/tmp/execute_sql.sh'
g.主库自动登录从库scp获取语句修复文件;
scp/tmp/execute_sql.sh root@$master_ip:/tmp/execute_sql.sh
h.主库上自动执行修复语句;
sh/tmp/execute_sql.sh
i.清理掉各个中间文件,中间表等
上述内容记录了该项工作的自动化实现思路及部分实现要点,自动化便是通过在这个思路的基础上编写主从库部署的脚本来实现,目前已亲测成功,已实现自动化的数据校验和修复,说明上述思路正确。
4.结语
本文分享了MySQL复制数据一致性校验和修复的详细步骤及其自动化实现思路和方法,对MySQL复制架构运维中该项工作的实施及其自动化具有较好的借鉴意义。
The above is the content of MySQL master-slave replication data consistency check and repair method and automatic implementation. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!