Home  >  Article  >  Database  >  Let mysqldump become the magic of concurrent export and import

Let mysqldump become the magic of concurrent export and import

高洛峰
高洛峰Original
2016-11-21 15:18:581541browse

First of all, the export speed of mysqldump is not slow. After testing, it can reach a speed of 50M/s. It takes 3 minutes for 10G data. It can be seen that the bottleneck lies in the network and disk IO. No matter how much the export tool is, it cannot be much faster. But the import took 60 minutes, and only about 20% of the disk and network were used. The bottleneck was the writing speed of the target library (general sequential writing cannot reach the IOPS limit), so mypumpkin was born - taking into account the import speed of myloader. and mysqldump export flexibility.

95712e0b0bc62c64b1fddfac486f9cb8

Use python to construct a queue, put all the tables that need to be exported into the queue at once, start N python threads at the same time, each take out the table name from this Queue, and subprocess Call the mysqldump command of the operating system to export the data to a file named dbname.tablename.sql. Load in is similar to dump out. According to the specified library name or table name, all sql files are found from the dump_dir directory and pushed into the queue. N threads call mysql at the same time to construct a new command and simulate the ac5915a1075029823508cbc152f1506d or < is not allowed to be used. If the directory specified by dump-dir does not exist, it will try to be created automatically.

--threads=N, N specifies the number of concurrent export or import threads. The default number of threads for dump out is 2, and the default number of threads for mypumpkin load in is the number of CPUs * 2.
Note: The larger the number of threads, the better. The main measurement indicators here are network bandwidth, disk IO, and target library IOPS. It is best to use dstat to observe it.

-B, --tables, --ignore-table, use the same as mysqldump, such as:

In mysqldump, --tables will override the --databases/-B option

In mysqldump, --tables and --ignore-table cannot appear at the same time

In mysqldump, if -B is not specified, --tables or --ignore-table must follow the db name

Other options, mypumpkin will remain intact. Put it in the shell to execute. Therefore, if there are errors in other options, the check is left to the native mysqldump. If a failure occurs during the execution, the thread will exit.

4.2 example

Export:

## 导出源库所有db到visit_dumpdir2目录 (不包括information_schema和performance_schema)
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword -P3306 \
 --single-transaction --opt -A --dump-dir visit_dumpdir2

## 导出源库db1,db2,会从原库查询所有表名来过滤
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword -P3306 \
 --single-transaction --opt -B db1 db2 --dump-dir visit_dumpdir2

## 只导出db1库的t1,t2表,如果指定表不存在则有提示
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword -P3306 \
 --single-transaction --opt -B db1 --tables t1 t2 --dump-dir visit_dumpdir2

## 导出db1,db2库,但忽略 db1.t1, db2.t2, db2.t3表
## mysqldump只支持--ignore-table=db1.t1这种,使用多个重复指令来指定多表。这里做了兼容扩展
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword --single-transaction \
 --opt -B db1 db2 --ignore-table=db1.t1 --ignore-table db2.t2 db2.t3 --dump-dir visit_dumpdir2 (如果-A表示全部db)

## 不带 -A/-B
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword -P3306 \
 --single-transaction --opt db1 --ignore-table=db1.t1 --dump-dir=visit_dumpdir2

## 其它选项不做处理
$ ./mypumpkin.py mysqldump -h dbhost_name -utest_user -pyourpassword -P3306 \
 --single-transaction --set-gtid-purged=OFF --no-set-names --skip-add-locks -e -q -t -n --skip-triggers \
 --max-allowed-packet=134217728 --net-buffer-length=1638400 --default-character-set=latin1 \
 --insert-ignore --hex-blob --no-autocommit \
 db1 --tables t1 --dump-dir visit_dumpdir2

Import:
-A, -B, --tables, --ignore-table, --threads, --dump-dir usage and function are exactly the same as above, here are some examples :

## 导入dump-dir目录下所有表
$ ./mypumpkin.py mysql -h dbhost_name -utest_user -pyourpassword --port 3307 -A \
 --dump-dir=visit_dumpdir2

## 导入db1库(所有表)
$ ./mypumpkin.py mysql -h dbhost_name -utest_user -pyourpassword --port 3307 -B db1 \
 --dump-dir=visit_dumpdir2

## 只导入db.t1表
$ ./mypumpkin.py mysql -h dbhost_name -utest_user -pyourpassword --port 3307 \
 --default-character-set=utf8mb4 --max-allowed-packet=134217728 --net-buffer-length=1638400 \
 -B db1 --tables t1 --dump-dir=visit_dumpdir2

## 导入db1,db2库,但忽略db1.t1表(会到dump-dir目录检查db1,db2有无对应的表存在,不在目标库检查)
$ ./mypumpkin.py mysql -h dbhost_name -utest_user -pyourpassword --port 3307 \
 -B db1 db2 --ignore-table=db1.t1 --dump-dir=visit_dumpdir2


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn