bitsCN.com
千万级别mysql合并表快速去重简析 mysql合并表去重目标:现有表a和b,把两个表中的数据合并去重到c表中。其中a和b表中数据量大概在2千万左右。基本情况操作系统版本:CentOS release 5.6 64位操作系统内存:8G数据库版本:5.1.56-community 64位数据库初始化参数:默认 数据库表和数据量表a: mysql> desc a2kw;+-------+-------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1 | varchar(20) | YES | MUL | NULL | || c2 | varchar(30) | YES | | NULL | || c3 | varchar(12) | YES | | NULL | || c4 | varchar(20) | YES | | NULL | |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec)表bmysql> desc b2kw;+-------+-------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1 | varchar(20) | YES | | NULL | || c2 | varchar(30) | YES | | NULL | || c3 | varchar(12) | YES | | NULL | || c4 | varchar(20) | YES | | NULL | |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec) a和b表的数据概况如下mysql> select * from a2kw limit 10;+-----------+-----------+------+----------+| c1 | c2 | c3 | c4 |+-----------+-----------+------+----------+| 662164461 | 131545534 | TOM0 | 20120520 || 226662142 | 605685564 | TOM0 | 20120516 || 527008225 | 172557633 | TOM0 | 20120514 || 574408183 | 350897450 | TOM0 | 20120510 || 781619324 | 583989494 | TOM0 | 20120510 || 158872754 | 775676430 | TOM0 | 20120512 || 815875622 | 631631832 | TOM0 | 20120514 || 905943640 | 477433083 | TOM0 | 20120514 || 660790641 | 616774715 | TOM0 | 20120512 || 999083595 | 953186525 | TOM0 | 20120513 |+-----------+-----------+------+----------+10 rows in set (0.01 sec) 基本步骤 1、在B表上创建索引mysql> select count(*) from b2kw;+----------+| count(*) |+----------+| 20000002 |+----------+1 row in set (0.00 sec)mysql> create index ind_b2kw_c1 on b2kw(c1);Query OK, 20000002 rows affected (1 min 2.94 sec)Records: 20000002 Duplicates: 0 Warnings: 0数据量为:20000002 ,时间为:1 min 2.94 sec2、把a、b分别插入中间表temp表中 创建中间表mysql> create table temp select * from c2kw where 1=2;Query OK, 0 rows affected (0.00 sec)Records: 0 Duplicates: 0 Warnings: 0插入数据mysql> insert into temp select * from a2kw;Query OK, 20000002 rows affected (13.23 sec)Records: 20000002 Duplicates: 0 Warnings: 0mysql> insert into temp select * from b2kw;Query OK, 20000002 rows affected (13.27 sec)Records: 20000002 Duplicates: 0 Warnings: 0 mysql> select count(*) from temp;+----------+| count(*) |+----------+| 40000004 |+----------+1 row in set (0.00 sec)数据量为:40000004 ,时间为:26.50 sec3、temp建立联合索引,强制索引去掉重复数据mysql> create index ind_temp_c123 on temp(c1,c2,c3);Query OK, 40000004 rows affected (3 min 43.87 sec)Records: 40000004 Duplicates: 0 Warnings: 0查看执行计划mysql> explain select c1,c2,c3,max(c4) from temp FORCE INDEX
(ind_temp_c123) group by c1,c2,c3 ;+----+-------------+-------+-------+---------------+----------
-----+---------+------+----------+-------+| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+-------------
--+---------+------+----------+-------+| 1 | SIMPLE | temp | index | NULL | ind_temp_c123 | 71
| NULL | 40000004 | |+----+-------------+-------+ -------+---------------+--------
-------+---------+------+----------+-------+1 row in set (0.05 sec) mysql> insert into c2kw select c1,c2,c3,max(c4) from temp
FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ;Query OK, 20000004 rows affected (2 min 0.85 sec)Records: 20000004 Duplicates: 0 Warnings: 0实际大约花费实际为:6 min
4、删除中间表mysql> drop table temp;Query OK, 0 rows affected (0.99 sec)实际大约花费实际为:1 sec
5、建立c索引mysql> create index ind_c2kw_c1 on c2kw(c1);Query OK, 20000004 rows affected (49.74 sec)Records: 20000004 Duplicates: 0 Warnings: 0mysql> create index ind_c2kw_c2 on c2kw(c2);Query OK, 20000004 rows affected (1 min 47.20 sec)Records: 20000004 Duplicates: 0 Warnings: 0mysql> create index ind_c2kw_c3 on c2kw(c3);Query OK, 20000004 rows affected (2 min 42.02 sec)Records: 20000004 Duplicates: 0 Warnings: 0实际大约花费实际为:5分钟
6、清空a、b表mysql> truncate table a2kw;Query OK, 0 rows affected (1.15 sec)mysql> truncate table b2kw;Query OK, 0 rows affected (1.34 sec)实际大约花费实际为:3sec 一共花费的时间大概在15分钟左右 作者 RuleV5 bitsCN.com

mysql的msi与zip版本的区别:1、zip包含的安装程序是一种主动安装,而msi包含的是被installer所用的安装文件以提交请求的方式安装;2、zip是一种数据压缩和文档存储的文件格式,msi是微软格式的安装包。

在mysql中,可以利用char()和REPLACE()函数来替换换行符;REPLACE()函数可以用新字符串替换列中的换行符,而换行符可使用“char(13)”来表示,语法为“replace(字段名,char(13),'新字符串') ”。

转换方法:1、利用cast函数,语法“select * from 表名 order by cast(字段名 as SIGNED)”;2、利用“select * from 表名 order by CONVERT(字段名,SIGNED)”语句。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于MySQL复制技术的相关问题,包括了异步复制、半同步复制等等内容,下面一起来看一下,希望对大家有帮助。

在mysql中,可以利用REGEXP运算符判断数据是否是数字类型,语法为“String REGEXP '[^0-9.]'”;该运算符是正则表达式的缩写,若数据字符中含有数字时,返回的结果是true,反之返回的结果是false。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了mysql高级篇的一些问题,包括了索引是什么、索引底层实现等等问题,下面一起来看一下,希望对大家有帮助。

在mysql中,是否需要commit取决于存储引擎:1、若是不支持事务的存储引擎,如myisam,则不需要使用commit;2、若是支持事务的存储引擎,如innodb,则需要知道事务是否自动提交,因此需要使用commit。

“mysql-connector”是mysql官方提供的驱动器,可以用于连接使用mysql;可利用“pip install mysql-connector”命令进行安装,利用“import mysql.connector”测试是否安装成功。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

Zend Studio 13.0.1
Powerful PHP integrated development environment

SublimeText3 Chinese version
Chinese version, very easy to use

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.
