MySQL在关联复杂情况下所能做出的一些优化

Home

Database

Mysql Tutorial

MySQL在关联复杂情况下所能做出的一些优化_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:00 PM

mysql

昨天处理了一则复杂关联SQL的优化，这类SQL的优化往往考虑以下四点：

第一.查询所返回的结果集，通常查询返回的结果集很少，是有信心进行优化的；

第二.驱动表的选择至关重要，通过查看执行计划，可以看到优化器选择的驱动表,从执行计划中的rows可以大致反映出问题的所在；

第三.理清各表之间的关联关系，注意关联字段上是否有合适的索引；

第四.使用straight_join关键词来强制表之间的关联顺序，可以方便我们验证某些猜想；

SQL:
执行时间：

mysql> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

1 row in set (0.75 sec)

这条SQL查询实际只返回了一行数据，但却执行耗费了750ms，查看执行计划：

mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> c.mm,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+

可以看到执行计划中有两处比较显眼的性能瓶颈：

| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |

| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |

由于d是left join的表，所以驱动表不会选择d表，我们在来看看a,b,c三表的大小：

mysql> select count(*) from c;
+———-+
| count(*) |
+———-+
| 53731 |
+———-+

mysql> select count(*) from a;
+———-+
| count(*) |
+———-+
| 53335 |
+———-+

mysql> select count(*) from b;
+———-+
| count(*) |
+———-+
| 105809 |
+———-+

由于b表的数据量大于其他的两表，同时b表上基本没有查询过滤条件，所以驱动表选择B的可能排除；

优化器实际选择了a表作为驱动表，而为什么不是c表作为驱动表？我们来分析一下：

第一阶段：a表作为驱动表
a–>b–>c–>d:
(1):a.jg_id=b.jg_id—>(b索引:PRIMARY KEY (`JG_ID`,`YH_ID`) )

(2):b.yh_id=c.yh_id—>(c索引:PRIMARY KEY (`YH_ID`))

(3):c.yh_id=d.yh_id—>(d索引:PRIMARY KEY (`JS_DM`,`YH_ID`))
由于d表上没有yh_id的索引，索引在d表上添加索引：

alter table d add index ind_yh_id(yh_id);

执行计划：

+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+

执行时间：

1 row in set (0.77 sec)

在d表上添加索引后，d表的扫描行数下降到272行（最开始为：54584 ）

| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |

第二阶段：c表作为驱动表

d
^
|
c–>b–>a
由于在c表上有yh_dm过滤性很高的筛选条件，所以我们在yh_dm上创建一个索引：

mysql> select count(*) from c where yh_dm = '006939748XX';
+———-+
| count(*) |
+———-+
| 2 |
+———-+

添加索引：

alter table c add index ind_yh_dm(yh_dm)

查看执行计划：

+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY,ind_yh_dm | PRIMARY | 98 | test.b.YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+

执行时间：

1 row in set (0.74 sec)

在c表上添加索引后，索引还是没有走上，执行计划还是以a表作为驱动表，所以我们这里来分析一下为什么还是以a表作为驱动表？

1):c.yh_id=b.yh_id—>( PRIMARY KEY (`JG_ID`,`YH_ID`) )

a.如果以c表为驱动表，则c表与b表在关联的时候，由于在b表没有yh_id字段的索引，由于b表的数据量很大，所以优化器认为这里如果以c表作为驱动表，则会与b表产生较大的关联（这里可以使用straight_join强制使用c表作为驱动表）；
b.如果以a表为驱动表，则a表与b表在关联的时候，由于在b表上有jg_id字段的索引，所以优化器认为以a作为驱动表的代价是小于以c作为驱动板的代价；
所以我们如果要以C表为驱动表，只需要在b上添加yh_id的索引：

alter table b add index ind_yh_id(yh_id);

2):b.jg_id=a.jg_id—>( PRIMARY KEY (`JG_ID`) )

3):c.yh_id=d.yh_id—>( KEY `ind_yh_id` (`YH_ID`) )
执行计划：

+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| 1 | SIMPLE | c | ref | PRIMARY,ind_yh_dm | ind_yh_dm | 57 | const | 2 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 272 | Using index |
| 1 | SIMPLE | b | ref | PRIMARY,ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 531 | Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY,INDEX_JG | PRIMARY | 98 | test.b.JG_ID | 1 | Using where |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+

执行时间：

1 row in set (0.00 sec)

可以看到执行计划中的rows已经大大降低，执行时间也由原来的750ms降低到0 ms级别；

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

MySQL's Purpose: Storing and Managing Data EffectivelyApr 16, 2025 am 12:16 AM

MySQL is an efficient relational database management system suitable for storing and managing data. Its advantages include high-performance queries, flexible transaction processing and rich data types. In practical applications, MySQL is often used in e-commerce platforms, social networks and content management systems, but attention should be paid to performance optimization, data security and scalability.

SQL and MySQL: Understanding the RelationshipApr 16, 2025 am 12:14 AM

The relationship between SQL and MySQL is the relationship between standard languages and specific implementations. 1.SQL is a standard language used to manage and operate relational databases, allowing data addition, deletion, modification and query. 2.MySQL is a specific database management system that uses SQL as its operating language and provides efficient data storage and management.

Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

See all articles