Home >Database >Mysql Tutorial >复合索引创建案例分享-索引的access和filter(一)
好久没写blog了,确实刚来南京才一个多月,新的工作也需要慢慢适应,学习脚步因此也确实放慢了很多,虽然不见得以后一直做技术,但是如果能做一天就当认真对待,言归正传,有个sql语句因为走全表扫描的执行计划需要优化,具体如下: SELECT OID, SUBSID FROM
好久没写blog了,确实刚来南京才一个多月,新的工作也需要慢慢适应,学习脚步因此也确实放慢了很多,虽然不见得以后一直做技术,但是如果能做一天就当认真对待,言归正传,有个sql语句因为走全表扫描的执行计划需要优化,具体如下:
SELECT OID, SUBSID
FROM SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE
AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 591110695
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 263K(100)| | | |
|* 1 | COUNT STOPKEY | | | | | | | |
|* 2 | FILTER | | | | | | | |
|* 3 | FILTER | | | | | | | |
| 4 | PARTITION RANGE SINGLE | | 90 | 3060 | 263K (2)| 00:52:44 | 4 | 4 |
|* 5 | TABLE ACCESS FULL | SUBS_SERVICE | 90 | 3060 | 263K (2)| 00:52:44 | 4 | 4 |
| 6 | PARTITION RANGE SINGLE | | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID| SUBS_SERVICE | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 1 | | 4 (0)| 00:00:01 | KEY | KEY |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM
2 - filter( IS NULL)
3 - filter(SYSDATE@!-395
5 - filter((INTERNAL_FUNCTION("SERVICEID") AND "STATUS"8 AND "STATUS"9 AND "ENDDATE"
"ENDDATE">=SYSDATE@!-395 AND "REGION"=23))
7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))
这里很明显有个bad的执行计划table access full SUBS_SERVICE的全表扫描,通过谓词条件基本可以得知正式因为这个bad的执行计划导致,而优化这个sql其实很简单就是建立合适的索引。
那么接下来如何建立索引了,我们看执行计划ID:5对应的谓词条件是enddate和serviceid的两个列的条件,而serviceid的不同值较少,enddate的不同值相对较多。
Table Number Empty Chain Average Global Sample Date
Name of Rows Blocks Blocks Count Row Len Stats Size MM-DD-YYYY
------------------------------ -------------- --------------- ------------ -------- ------- ------ -------------- ----------
SUBS_SERVICE 322,621,100 56,409,85 0 0 109 YES 16,131,055 09-28-2014
Column Distinct Number Number Sample Date
Name Values Density Buckets Nulls Size MM-DD-YYYY
------------------------------ ------------ ----------- ------- ------------ -------------- ----------
SUBSID 18,668,054 .00000005 1 0 16,131,055 09-28-2014
REGION 4 .25000000 1 0 16,131,055 09-28-2014
SERVICEID 402 .00248756 1 0 16,131,055 09-28-2014
ENDDATE 1,628,520 .00000061 1 258,160,160 3,223,047 09-28-2014
STATUS 7 .14285714 1 0 16,131,055 09-28-2014
...
这种情况下,我们一般可能都是选择的enddate作为前导列,serviceid作为后导列的组合索引,也是为了能够有效的利用enddate作为前导列去驱动别的sql语句来利用这个索引。
挖掘shared pool和sql历史的执行信息:
SQL> select sql_id
2 from gv$sql_plan
3 where options = 'FULL'
4 and object_owner = 'TBCS'
5 and object_name like 'SUBS_SERVICE'
6 and instr(filter_predicates, 'ENDDATE') > 0
7 and instr(filter_predicates, 'SERVICEID')
no rows selected
SQL> select sql_id
2 from dba_hist_sql_plan
3 where object_owner = 'TBCS'
4 and object_name like 'SUBS_SERVICE'
5 and instr(filter_predicates, 'ENDDATE') > 0
6 and instr(filter_predicates, 'SERVICEID')
挖掘shared pool中发觉没有单独出现enddate的sql语句,那么是没有别的sql语句能够利用enddate为前缀的索引来索引扫描的。
虽然这里enddate的前缀索引可能没有办法作用于别的sql语句,但是对于这个sql而言,建立索引是必然的,首先建立enddate的前缀的复合索引ind_enddate_serviceid
SQL> create index tbcs.ind_enddate_serviceid on tbcs.subs_service(enddate,serviceid)
Index created.
执行上述sql语句,查看每个步骤产生的逻辑读和资源消耗
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f1cmqc5sag5s0, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE
AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 2714263820
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 2 | FILTER | | 1 | | 200 |00:00:16.94 | 2619 | 1797 |
|* 3 | FILTER | | 1 | | 261 |00:00:44.31 | 1237 | 1218 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 261 |00:00:44.31 | 1237 | 1218 |
|* 5 | INDEX RANGE SCAN | IND_ENDDATE_SERVICEID | 1 | 20 | 2078 |00:00:02.73 | 873 | 859 |
| 6 | PARTITION RANGE SINGLE | | 257 | 1 | 58 |00:00:04.86 | 1382 | 579 |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 257 | 1 | 58 |00:00:04.86 | 1382 | 579 |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 257 | 1 | 325 |00:00:04.22 | 1057 | 510 |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM
2 - filter( IS NULL)
3 - filter(SYSDATE@!-395
4 - filter(("STATUS"8 AND "STATUS"9 AND "REGION"=23))
5 - access("ENDDATE">=SYSDATE@!-395 AND "ENDDATE"
filter(("SERVICEID"=:SERVICEID1 OR "SERVICEID"=:SERVICEID2 OR "SERVICEID"=:SERVICEID3))
7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))
sql语句已经走了ind_enddate_serviceid的索引范围扫描,执行计划id:5索引范围扫描消耗的逻辑读为873,而后回表达到了1237的逻辑读
而我们看ID:5 oracle通过(enddate,serviceid)组合索引IND_ENDDATE_SERVICEID只能用access过滤满足enddate的条件,需要filter再次对serviceid的条件进行过滤
这种情况下index range scan只需要扫描满足enddate的谓词条件,会扫描更多的叶块节点,产生更多的逻辑读。
那么既然ind_enddate_serviceid的索引其实在index range scan部分索引的范围扫描只针对了enddate条件的,那么是否我们可以直接建立enddate的单列索引了。
SQL> create index tbcs.ind_enddate on tbcs.subs_service(enddate);
Index created.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c12gvu0qs792j, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics index(A,ind_enddate)*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE
AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 439697064
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 2 | FILTER | | 1 | | 200 |00:10:41.48 | 37372 | 32157 |
|* 3 | FILTER | | 1 | | 261 |00:06:07.29 | 35990 | 32095 |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 261 |00:06:07.29 | 35990 | 32095 |
|* 5 | INDEX RANGE SCAN | IND_ENDDATE | 1 | 2810 | 199K|00:00:03.08 | 660 | 646 |
| 6 | PARTITION RANGE SINGLE | | 257 | 1 | 58 |00:00:00.11 | 1382 | 62 |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 257 | 1 | 58 |00:00:00.11 | 1382 | 62 |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 257 | 1 | 325 |00:00:00.09 | 1057 | 50 |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM
2 - filter( IS NULL)
3 - filter(SYSDATE@!-395
4 - filter((INTERNAL_FUNCTION("SERVICEID") AND "STATUS"8 AND "STATUS"9 AND "REGION"=23))
5 - access("ENDDATE">=SYSDATE@!-395 AND "ENDDATE"
7 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
8 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))
由于ind_enddate是单列索引,每个叶块存储的键值会多些,那么index range scan部分消耗应该会更小,其实果然也是如我们推断的,但是我们发觉在通过ind_enddate回表时逻辑读增加到了35990
这个是由于虽然扫描的索引叶块更少了,但是扫描完后不能做任何进一步的过滤,导致需要回表的rowid非常多,导致回表成本增加,而在回表后再进行serviceid谓词条件的过滤。
那么除了上述的两种索引的创建方式,是否还有一种更优秀的:
如果我们创建serviceid作为前导列,enddate作为后导列的索引ind_serviceid_enddate
SQL> create index tbcs.ind_serviceid_enddateon tbcs.subs_service(serviceid,enddate);
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 48bfmd32ag3nk, child number 0
-------------------------------------
SELECT /*+gather_plan_statistics*/OID, SUBSID
FROM tbcs.SUBS_SERVICE A
WHERE SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND ENDDATE
AND ENDDATE >= SYSDATE - 395
AND REGION = 23
AND STATUS 8
AND STATUS 9
AND NOT EXISTS (SELECT 1
FROM tbcs.SUBS_SERVICE B
WHERE B.REGION = A.REGION
AND B.SUBSID = A.SUBSID
AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
AND ROWNUM
Plan hash value: 771671576
----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 200 |00:00:00.01 | 1745 |
|* 1 | COUNT STOPKEY | | 1 | | 200 |00:00:00.01 | 1745 |
|* 2 | FILTER | | 1 | | 200 |00:00:00.01 | 1745 |
|* 3 | FILTER | | 1 | | 256 |00:00:00.01 | 373 |
| 4 | INLIST ITERATOR | | 1 | | 256 |00:00:00.01 | 373 |
|* 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 1 | 90 | 256 |00:00:00.01 | 373 |
|* 6 | INDEX RANGE SCAN | IND_SERVICEID_ENDDATE | 1 | 20 | 3186 |00:00:00.01 | 33 |
| 7 | PARTITION RANGE SINGLE | | 256 | 1 | 56 |00:00:00.01 | 1372 |
|* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 256 | 1 | 56 |00:00:00.01 | 1372 |
|* 9 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 256 | 1 | 328 |00:00:00.01 | 1044 |
----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM
2 - filter( IS NULL)
3 - filter(SYSDATE@!-395
5 - filter(("STATUS"8 AND "STATUS"9 AND "REGION"=23))
6 - access((("SERVICEID"=:SERVICEID1 OR "SERVICEID"=:SERVICEID2 OR "SERVICEID"=:SERVICEID3)) AND
"ENDDATE">=SYSDATE@!-395 AND "ENDDATE"
8 - filter(("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365))
9 - access("B"."SUBSID"=:B1)
filter(("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3))
以serviceid前导列索引前提下,执行计划ID:6 oracle通过(serviceid,enddate)组合索引IND_SERVICEID_ENDDATE可以全部用access的方式完成index range scan,
此时index range scan只会扫描同时满足serviceid和enddate的谓词条件的索引叶块,相对索引ind_enddate_serviceid而言,serviceid作为前导列的索引在index range scan扫描更少索引叶块,相应的逻辑读也会更低。
而如果我们细心观察发现索引ind_enddate_serviceid和ind_serviceid_enddate主要区别是在是否在index range scan阶段能够直接access方式读取数据,而在回表阶段其实消耗的逻辑读大体相同。
ind_enddate_serviceid回表的逻辑读=1237-873=364
ind_serviceid_enddate回表的逻辑读=373-33=340
区别主要在于index range scan部分的区别,两个复合索引扫描的叶块是完全不同的,索引能够全部走access的索引必然成本要低很多,而先走access然后走filter的索引,虽然回表成本不变,但是index range scan部分会扫描很多不满足的条件的leaf block,导致index range scan部分逻辑读增加。
这里我们再看一个sql语句,如果是两个范围的索引如何去创建索引:
SQL> SELECT OID, SUBSID
2 FROM tbcs.SUBS_SERVICE A
3 WHERE SERVICEID between :serviceid1 and :serviceid3
4 AND ENDDATE
5 AND ENDDATE >= SYSDATE - 395
6 AND REGION = 23
7 AND STATUS 8
8 AND STATUS 9
9 AND NOT EXISTS (SELECT 1
10 FROM tbcs.SUBS_SERVICE B
11 WHERE B.REGION = A.REGION
12 AND B.SUBSID = A.SUBSID
13 AND B.SERVICEID IN (:SERVICEID1, :SERVICEID2, :SERVICEID3)
14 AND NVL(ENDDATE, SYSDATE) > SYSDATE - 365)
15 AND ROWNUM
200 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 928699648
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30 | 1020 | 4813 (1)| 00:00:58 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
|* 2 | FILTER | | | | | | | |
|* 3 | FILTER | | | | | | | |
|* 4 | TABLE ACCESS BY GLOBAL INDEX ROWID| SUBS_SERVICE | 30 | 1020 | 4807 (1)| 00:00:58 | 4 | 4 |
|* 5 | INDEX RANGE SCAN | IND_SERVICEID_ENDDATE | 12 | | 4776 (1)| 00:00:58 | | |
| 6 | PARTITION RANGE SINGLE | | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 7 | TABLE ACCESS BY LOCAL INDEX ROWID | SUBS_SERVICE | 1 | 22 | 6 (0)| 00:00:01 | KEY | KEY |
|* 8 | INDEX RANGE SCAN | IDX_SUBS_SERVICE_SUBSID | 1 | | 4 (0)| 00:00:01 | KEY | KEY |
---------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM
2 - filter( NOT EXISTS (SELECT 0 FROM "TBCS"."SUBS_SERVICE" "B" WHERE "B"."SUBSID"=:B1 AND "B"."REGION"=:B2 AND
NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365 AND ("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR
"B"."SERVICEID"=:SERVICEID3)))
3 - filter(SYSDATE@!-395=:SERVICEID1)
4 - filter("STATUS"8 AND "STATUS"9 AND "REGION"=23)
5 - access("SERVICEID">=:SERVICEID1 AND "ENDDATE">=SYSDATE@!-395 AND "SERVICEID"
"ENDDATE"
filter("ENDDATE"=SYSDATE@!-395)
7 - filter("B"."REGION"=:B1 AND NVL("ENDDATE",SYSDATE@!)>SYSDATE@!-365)
8 - access("B"."SUBSID"=:B1)
filter("B"."SERVICEID"=:SERVICEID1 OR "B"."SERVICEID"=:SERVICEID2 OR "B"."SERVICEID"=:SERVICEID3)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1732 consistent gets
2 physical reads
0 redo size
7107 bytes sent via SQL*Net to client
663 bytes received via SQL*Net from client
15 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
200 rows processed
这里看见如果是两个范围的谓词,在执行计划ID:5中对应的谓词信息中显示同时有access和filter,而比较奇怪的是access阶段其实已经有了enddate的过滤条件,但是filter中又包含了同样的access的过滤条件,那么这个index range scan究竟是怎么完成扫描的。
这里oracle是先找到了”SERVICEID”>=:SERVICEID1 AND “ENDDATE”>=SYSDATE@!-395索引入口,然后通过索引的双向指针左右滑动来查找数据,但是由于range scan的过程中,没办法保证每个leaf block都是满足enddate的条件的,事实也是确实如此,指针滑动过程中肯定有可能出现不满足enddate的条件的数据,比如这里出现了(serviceid2,enddate-10000)的键值,而且serviceid2是大于serviceid1的,这个键值也会出现在(serviceid1,enddate-365)的右边,所以在access完成后还需要filter满足enddate的leaf block。
创建复合索引时:如果单纯为了调整某类sql语句,不考虑别的sql是否能够最大程度的使用该索引,一般将等值条件的列作为索引的前导列,这样cbo能够尽可能的在index range scan部分只扫描满足条件的leaf block。
可以参考下兔子的一篇大作: http://blog.chinaunix.net/uid-7655508-id-3639188.html
本文出自:http://www.dbaxiaoyu.com, 原文地址:http://www.dbaxiaoyu.com/archives/2354, 感谢原作者分享。