Home  >  Article  >  Database  >  How to expand the fields of large tables under MySQL cascade replication

How to expand the fields of large tables under MySQL cascade replication

WBOY
WBOYforward
2023-05-29 23:13:39749browse

Field expansion of large tables under MySQL cascade replication

1. Background

A customer’s business has a table with about 400 million rows. Due to business expansion, the open_id in the table varchar(50) needs to be expanded to varchar(500).
During the change, try to minimize the impact on the main library (it is best not to have any impact -> In the end, we strived for a 4-hour window period).

2. Database table information

Environment: Mysql 8.0.22
1 Master 1 slave replication based on Gtid

1. The first question, this is a big picture Table? Yes, please look at the ibd file of this table. 280G count has not returned for a long time. Use the standby database to check and confirm the number of rows > 400 million.

以下语句也可以查看:
show table status from dbname like 'tablename'\G # Rows 的值不准,有时误差有2倍

SELECT a.table_schema,a.table_name,concat(round(sum(DATA_LENGTH/1024/1024)+sum(INDEX_LENGTH/1024/1024),2) ,'MB')total_size,concat(round(sum(DATA_LENGTH/1024/1024),2),'MB') AS data_size,concat(round(sum(INDEX_LENGTH/1024/1024),2),'MB') AS index_size FROM information_schema.TABLES a WHERE a.table_schema = 'dbname' AND a.table_name = 'tablename'; #看下此表的数据量

Since it is Big table, what method should we use to make changes?

3. Solution selection

M in the following represents the main database, S1 is slave 1, and S2 is slave 2

MethodOnlineDDLGh-ostPt-osc##M-S1-S2Time can be estimatedCascade replication, manual operation1 star #Why didn’t we choose the first 3 options?
Advantages Disadvantages Feasibility
Native, when using intermediate temporary table ALGORITHM=COPY, DML will be blocked. Recommended version>MySQL5.7 5 stars
Use binlog playback thread instead of trigger Third-party tools, which lead to longer execution time depending on different parameters 4 stars
Good version compatibility, use triggers to keep the main and secondary tables consistent Third-party tools, and there are many restrictions on use 3 stars

Based on the actual situation assessment, the demand on the business side this time is

This table has business traffic 24 hours a day, and does not accept business unavailability for more than 4 hours

OnlineDDL method, when ALGORITHM=COPY, DML will be blocked during the period (read-only). Finally, during the rename operation of the main and secondary tables (not readable and writable), until DDL is completed (the time required is uncertain).

It is recommended to connect to the slave library in Gh-ost mode for conversion. This mode has the least impact on the main library, and the flow control can be set through parameters. The fatal weakness of this tool is that the change takes too long. For a table with 400 million pieces of data, it took 70 hours in the test environment. Finally, we also need to issue a switching command and manually delete the intermediate table *_del. The advantage of this solution is that there is a slave database to ensure data security, so if you use a 1 master 2 slave architecture, it is more recommended.

Pt-osc and Gh-ost both belong to third parties. Pt-osc's operation of large tables and OnlineDDL have a common disadvantage, which is that the cost of rollback on failure is very high.

If it is a lower version such as MySQL Finally we chose the DBA’s favorite (xin ku) method, which is performed under M-S1-S2 cascade replication.

4. How to operate

Create a new S1 slave library and build M-S1-S2 cascade replication
  • Use OnlineDDL to perform field expansion on S2 (the advantage is that the master of M-S1 is never affected during the period)
  • After the expansion is completed, wait for delayed synchronization of M-S1-S2 (reduce S2 Data difference with M, and perform data verification)
  • Remove S1 and establish a master-slave relationship between M-S2 (allowing S2 to continue synchronizing M's data)
  • Backup S2 and restore S1, establish M-S2-S1 cascade replication
  • Stop the application and wait for the master-slave data to be consistent (the advantage is the synchronization time of different data amounts Very short)
  • Finally S2 becomes the main library and S1 is the slave library (the application needs to modify the front-end connection information)
  • Apply for regression verification
  • The above content may seem complicated, but it is essentially backup and recovery. Readers can consider this as an alternative. Share the specific steps?
环境装备:开启Gtid,注意M,S1 binlog保存时长,磁盘剩余空间大于待变更表的2倍
show global variables like 'binlog_expire_logs_seconds'; # 默认604800
set global binlog_expire_logs_seconds=1209600; # 主库和级联主库都需要设置
1.搭建 1主2从的级联复制,M -> S1 -> S2 ,安装MySQL注意本次环境lower_case_table_names = 0 
2.在S2 上做字段扩容。 预估 10个小时
`参数设置:`
set global slave_type_conversions='ALL_NON_LOSSY'; # 防止复制报错SQL_Errno: 13146,属于字段类型长度不一致无法回放
set global interactive_timeout=144000;set global wait_timeout =144000;
`磁盘IO参数设置:`
set global innodb_buffer_pool_size=32*1024*1024*1024;# 增加buffer_pool 防止Error1206The total number of locks exceeds the lock table size 资源不足
set global sync_binlog=20000;set global innodb_flush_log_at_trx_commit=2;
set global innodb_io_capacity=600000;set global innodb_io_capacity_max=1200000; # innodb_io_capacity需要设置两次
show variables like '%innodb_io%'; # 验证以上设置
screen 下执行:
time mysql -S /data/mysql/3306/data/mysqld.sock -p'' dbname -NBe "ALTER TABLE tablename MODIFY COLUMN open_id VARCHAR(500) NULL DEFAULT NULL COMMENT 'Id' COLLATE 'utf8mb4_bin';"
查看DDL进度:
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED  FROM performance_schema.events_stages_current;
3.扩容完成后,等待延迟同步M-S1-S2 
数据同步至主从一致,对比主从Gtid
4.移除S1,建立M-S2的主从关系
S1 (可选)
stop slave;
reset slave all;
systemctl stop mysql_3306
S2
stop slave;
reset slave all;
# MASTER_HOST='M主机IP'  
CHANGE MASTER TO
  MASTER_HOST='',
  MASTER_USER='',
  MASTER_PASSWORD=',
  MASTER_PORT=3306,
  MASTER_AUTO_POSITION=1,
  MASTER_CONNECT_RETRY=10;
start slave; (flush privileges;# 验证数据可正常同步)
5.备份S2恢复S1,建立M-S2-S1级联复制
物理备份S2,重做S2->S1 级联主从
rm -rf binlog/*
rm -rf redolog/*
xtrabackup --defaults-file=/data/mysql/3306/my.cnf.3306 --move-back --target-dir=/data/actionsky/xtrabackup_recovery/data
chown -R mysql. data/
chown -R mysql. binlog/*
chown -R mysql. redolog/*
systemctl start mysql_3306
set global gtid_purged='';
reset slave all;
# MASTER_HOST='S2主机IP'  ,已扩容变更完的主机
CHANGE MASTER TO
  MASTER_HOST='',
  MASTER_USER='',
  MASTER_PASSWORD='',
  MASTER_PORT=3306,
  MASTER_AUTO_POSITION=1,
  MASTER_CONNECT_RETRY=10;
`MySQL8.0版本需要在上面语句中添加 GET_MASTER_PUBLIC_KEY=1; #防止 Last_IO_Errno: 2061 message: Authentication plugin 'caching_sha2_password' reported error: Authentication requires secure connection.`
start slave;
6.应用停服,等待主从数据一致
主库停服+可设置read_only+flush privileges,对比主从Gtid
7.最终S2成为主库,S1为从库
应用更改配置连接新主库。
S2上:
stop slave;reset slave all;
set global read_only=0;set global super_read_only=0;
`show master status\G 观察是否有新事务写入`

收尾:还原第2步的参数设置。
set global interactive_timeout=28800;set global wait_timeout =28800;
set global innodb_buffer_pool_size=8*1024*1024*1024;
set global slave_type_conversions='';
set global sync_binlog=1;set global innodb_flush_log_at_trx_commit=1;
set global innodb_io_capacity=2000;set global innodb_io_capacity_max=4000;

Supplementary scenario: Test based on disk IO capability

Modify directly on the main database without traffic Case:
Scenario 1, the disk is a physical machine of NVME, and it takes about 5 hours to process 400 million data (disk performance 1G/s).
Scenario 2, the virtual machine whose disk is a mechanical disk, this amount of data takes about 40 hours (disk performance 100M/s).


The above is the detailed content of How to expand the fields of large tables under MySQL cascade replication. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete