Home  >  Article  >  Database  >  count distinct优化

count distinct优化

WBOY
WBOYOriginal
2016-06-07 16:41:472723browse

系统要进行压力测试,开启漫日志查询后。 [root@ora11g mysql]# less ora11g-slow.log /usr/sbin/mysqld, Version: 5.6.12 (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock Time Id Command Argument

系统要进行压力测试,开启漫日志查询后。<br> [root@ora11g mysql]# less ora11g-slow.log<br> /usr/sbin/mysqld, Version: 5.6.12 (MySQL Community Server (GPL)). started with:<br> Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock<br> Time Id Command Argument<br> # Time: 140508 12:15:52<br> # User@Host: root[root] @ [10.8.8.64] Id: 86<br> # Query_time: 124.894071 Lock_time: 0.000228 Rows_sent: 1 Rows_examined: 510103<br> use decathlon_production;<br> SET timestamp=1399522552;<br> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );<br> # Time: 140508 12:17:33<br> # User@Host: root[root] @ [10.8.8.64] Id: 91<br> # Query_time: 144.808880 Lock_time: 0.000330 Rows_sent: 1 Rows_examined: 510103<br> SET timestamp=1399522653;<br> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );

检索下列sql语句消耗了较多的性能,这个表是innodb存储引擎。
mysql> explain select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | PRIMARY,ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

而我们看下下面的几个同等含义的sql语句:
mysql> explain select count(*) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

mysql> explain select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)

分别执行下看看执行时间:
mysql> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (45.33 sec)

mysql> select count(*) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.70 sec)

mysql> select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.74 sec)

上面三个sql虽然执行计划一样ref,都是直接走customer_type字段的索引ind_CUSTOMER_TYPE,并且extra Using where; Using index的using index告诉我们这里优化器只读取了索引,并没有通过索引来回表(mysql没有oracle的索引快速扫描),但是执行时间却完全不是一个数量级的,加上distinct的sql语句足足要1分多钟。

开启profile来查看下其资源消耗的具体信息:
mysql> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (1 min 3.71 sec)

mysql> select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.73 sec)

mysql> show profiles;
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
| 1 | 43.56143300 | select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ) |
| 2 | 0.68889800 | select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ) |
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

这里的区别主要还是sending data这一步骤时间的差别,这个差别主要是cpu_user、cpu_system、context_voluntary和coutext_involuntary,加上distinct后优化器需要对其进行排序去重后才会去计算count,这个排序去重是很消耗cpu资源的,所以这里的sending data的cpu user和cpu system差别比较大。

mysql的explain不考虑各种cache和mysql在执行查询时所作的优化工作,这里mysql并没有显示排序去重的执行计划,而由于customer_no是主键,不需要进行distinct去重,所以这里直接改写应用程序的sql即可,如果不是主键可以拆分成子查询的方式来进行优化

mysql> explain select count(*) as x0_0_ from (select distinct CUSTOMER_NO from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ))a;
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | PRIMARY | | ALL | NULL | NULL | NULL | NULL | 258611 | NULL |
| 2 | DERIVED | customer0_ | ref | PRIMARY,ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
2 rows in set (0.00 sec)

mysql> select count(*) as x0_0_ from (select distinct CUSTOMER_NO from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ))a;
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (1.40 sec)

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn