Home >Database >Mysql Tutorial >关于谓词条件有倾斜性的全表扫描count stopkey的成本估算影响

关于谓词条件有倾斜性的全表扫描count stopkey的成本估算影响

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOriginal
2016-06-07 16:40:171347browse

监控系统发现一个sql语句的相应时间特别长,消耗的IO资源也非常高,但是cost成本却非常低 SQL SELECT A.ATTRVALUE, B.TYPENAME 2 FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B 3 WHERE A.ATTRID = 'res_type' 4 AND A.REGION = 23 5 AND B.ITE

监控系统发现一个sql语句的相应时间特别长,消耗的IO资源也非常高,但是cost成本却非常低

SQL> SELECT A.ATTRVALUE, B.TYPENAME
  2        FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B
  3       WHERE A.ATTRID = 'res_type'
  4         AND A.REGION = 23
  5         AND B.ITEMID = A.ATTRVALUE
  6         AND ROWNUM = 1;

Elapsed: 00:03:18.62

Execution Plan
----------------------------------------------------------
Plan hash value: 1650466411

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                        |     1 |    55 |     7   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY               |                        |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| IM_RES_TYPE            |     1 |    36 |     1   (0)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS              |                        |     4 |   165 |     7   (0)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE SINGLE   |                        |     4 |    76 |     3   (0)| 00:00:01 |     4 |     4 |
|*  5 |      TABLE ACCESS FULL       | GROUP_SUBS_MEMBER_ATTR |     4 |    76 |     3   (0)| 00:00:01 |     4 |     4 |
|*  6 |     INDEX UNIQUE SCAN        | PK_IMRESTYPE           |     1 |       |     0   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   5 - filter("A"."ATTRID"='res_type' AND "A"."ATTRVALUE" IS NOT NULL AND "A"."REGION"=23)
   6 - access("RES_TYPE_ID"="A"."ATTRVALUE")

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
     469240  consistent gets
     469186  physical reads
          0  redo size
        619  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

那么问题是为什么cost的成本这么低,但是逻辑读确如此之高。

首先这里简单的分析下这个执行计划为什么会有如此高的逻辑读,先来看下面的查询:

SQL>      select count(*)
  2         from tbcs.GROUP_SUBS_MEMBER_ATTR A
  3        where A.ATTRID = 'res_type'
  4          AND A.REGION = 23
  5          and "A"."ATTRVALUE" IS NOT NULL
  6  ;

  COUNT(*)
----------
        14

而且这部分数据都在rownum 7千万以上的位置。

SQL> select cn from (select a.*, rownum cn from tbcs.GROUP_SUBS_MEMBER_ATTR A where region=23)b where b. ATTRID = 'res_type'  and ATTRVALUE IS NOT NULL

        CN
----------
  72251394
  72253121
  72261116
  72287094
  72292151
  72296915
  72296922
  72304758
  72333694
  72334266
  72334281
  72334924
  72336096
  72336103

14 rows selected.

虽然sql语句加上了rownum=1的限制,但是由于tbcs.GROUP_SUBS_MEMBER_ATTR A中(“A”.”ATTRID”=’res_type’ AND “A”.”ATTRVALUE” IS NOT NULL AND “A”.”REGION”=23)的满足这个条件的第一行数据在分区段的较后面的block中(全表扫描也是有顺序的)。 此时这个单分区全表扫描 count stopeky的时候也要遍历这个分区的大部分block才能找到满足(“A”.”ATTRID”=’res_type’ AND “A”.”ATTRVALUE” IS NOT NULL AND “A”.”REGION”=23)的第一行数据,然后将这行数据同时与IM_RES_TYPE表做nested loop,当发现这行数据满足”RES_TYPE_ID”=”A”.”ATTRVALUE”即终止查询。 如果不满足再去单分区全扫描找第二行满足谓词条件5的数据行,然后再去和IM_RES_TYPE表比对,一直到找到符合条件的数据行为止。

如果全表扫描前几次IO能够扫描的block刚好能够满足谓词条件,则加上rownum限制条件确实是能够减少大部分的IO消耗。

sys@CRMDB1>select attrid from tbcs.GROUP_SUBS_MEMBER_ATTR A where region=23 and  ATTRVALUE IS NOT NULL rownum=1;

ATTRID
--------------------------------
Flag1

1 row selected.

sys@CRMDB1>select * from tbcs.GROUP_SUBS_MEMBER_ATTR A where region=23 and attrid='Flag1' and ATTRVALUE IS NOT NULL and rownum=1;

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 484799315

------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                        |     1 |    41 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY          |                        |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|                        |     2 |    82 |     2   (0)| 00:00:01 |     4 |     4 |
|*  3 |    TABLE ACCESS FULL    | GROUP_SUBS_MEMBER_ATTR |     2 |    82 |     2   (0)| 00:00:01 |     4 |     4 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("ATTRID"='Flag1' AND "A"."ATTRVALUE" IS NOT NULL AND "REGION"=23)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        968  bytes sent via SQL*Net to client
        521  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

由于满足谓词5的数据都在分区段的后面的block中,所以这个sql会消耗特别多的IO资源。

下面我们来解释下为什么在单分区全表扫描然后count stopkey时cbo的评估的cost成本只有3,先来看下表的统计信息:

关于GROUP_SUBS_MEMBER_ATTR表的统计信息:

Table????????????????????????????????? Number??????????????????????? Empty??? Chain Average Global???????? Sample Date
Name????????????????????????????????? of Rows????????? Blocks?????? Blocks??? Count Row Len Stats??????????? Size MM-DD-YYYY
------------------------------ -------------- --------------- ------------ -------- ------- ------ -------------- ----------
GROUP_SUBS_MEMBER_ATTR??????????? 344,752,080?????? 22,395,19??????????? 0??????? 0????? 41 YES??????? 17,237,604 09-10-2014


Column???????????????????????????? Distinct????????????? Number?????? Number???????? Sample Date
Name???????????????????????????????? Values???? Density Buckets??????? Nulls?????????? Size MM-DD-YYYY
------------------------------ ------------ ----------- ------- ------------ -------------- ----------
REGION??????????????????????????????????? 4?? .25000000?????? 1??????????? 0???? 17,237,604 09-10-2014
ATTRID????????????????????????????????? 166?? .00602410?????? 1??????????? 0???? 17,237,604 09-10-2014
ATTRVALUE?????????????????????????? 189,682?? .00000527?????? 1?? 60,824,860???? 14,196,361 09-10-2014


Index????????????????????????????????????? Leaf?????? Distinct???????? Number????? AV????? Av????? Cluster Date
Name?????????????????????????? BLV???????? Blks?????????? Keys??????? of Rows???? LEA??? Data?????? Factor MM-DD-YYYY
------------------------------ --- ------------ -------------- -------------- ------- ------- ------------ ----------
PK_CM_GROUP_MEMBERATTR?????????? 3??? 3,462,820??? 341,230,660??? 341,230,660?????? 1?????? 1? 108,506,400 09-10-2014

index????????????????????????? Column????????????????????????? Col Column
Name?????????????????????????? Name??????????????????????????? Pos Details
------------------------------ ------------------------------ ---- ------------------
PK_CM_GROUP_MEMBERATTR???????? GRPSUBSMEMOID???????????????????? 1 NUMBER(18,0) NOT NULL
?????????????????????????????? ATTRID??????????????????????????? 2 VARCHAR2(32) NOT NULL
?????????????????????????????? REGION??????????????????????????? 3 NUMBER(5,0) NOT NULL

**********************************************************
Partition Level
**********************************************************

Partition????????????? Number??????????????????????? Empty Average??? Chain Average Global Date
Name????????????????? of Rows????????? Blocks?????? Blocks?? Space??? Count Row Len Stats? MM-DD-YYYY
-------------- -------------- --------------- ------------ ------- -------- ------- ------ ----------
P_R_20??????????? 107,562,800??????? 7,002,86??????????? 0?????? 0??????? 0????? 41 YES??? 10-10-2014
P_R_21???????????? 65,051,340??????? 4,220,31??????????? 0?????? 0??????? 0????? 41 YES??? 07-07-2014
P_R_22???????????? 89,764,040??????? 5,816,18??????????? 0?????? 0??????? 0????? 41 YES??? 08-07-2014
P_R_23???????????? 77,962,200??????? 5,069,40??????????? 0?????? 0??????? 0????? 41 YES??? 07-26-2014
P_R_99????????????????????? 0??????????? 0,00??????????? 0?????? 0??????? 0?????? 0 YES??? 09-15-2013

对于attrid的num_distinct是166,num_nulls是0,attrvalue的num_distinct是189682,num_nulls是60824860,而且他们的number buckets都是1,也就是没有直方图,此时cbo要根据(“A”.”ATTRID”=’res_type’ AND “A”.”ATTRVALUE” IS NOT NULL AND “A”.”REGION”=23)谓词条件取出满足条件的数据,由于根据统计信息得出大部分的数据都是满足于这个条件的,cbo会认为只需要全表扫描的前几次IO都能够取出满足条件的数据。(关于选择率和cost成本的计算太过于复杂,不属于本篇blog讨论的范围)

单独分离出来这部分查询cbo估算的执行成本也很低,但是消耗的逻辑读还是特别高。

sys@CRMDB1>select * from tbcs.GROUP_SUBS_MEMBER_ATTR A where rownum=1 and region=23 and ATTRID = 'res_type'  AND "A"."ATTRVALUE" IS NOT NULL

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 484799315

------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                        |     1 |    41 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY          |                        |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|                        |     2 |    82 |     2   (0)| 00:00:01 |     4 |     4 |
|*  3 |    TABLE ACCESS FULL    | GROUP_SUBS_MEMBER_ATTR |     2 |    82 |     2   (0)| 00:00:01 |     4 |     4 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("ATTRID"='Flag1' AND "A"."ATTRVALUE" IS NOT NULL AND "REGION"=23)


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
     469266  consistent gets
     468242  physical reads
          0  redo size
        984  bytes sent via SQL*Net to client
        521  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这里由于(“ATTRID”=’Flag1′ AND “A”.”ATTRVALUE” IS NOT NULL AND “REGION”=23)的数据具有较大的倾斜性且没有直方图,导致了cbo全表扫描count stopkey时错误的估算了成本。

下面xiaoyu新建一个表来测试有直方图和没直方图时估算类似rownum=1 and column1=A这类查询的成本区别。

SQL> select owner,count(*) from tab01 group by owner;

OWNER                COUNT(*)
------------------------------ ----------
OWBSYS_AUDIT                   24
MDSYS                    4022
QWE                 1
PUBLIC                  67990
OUTLN                      20
CTXSYS                    778
OLAPSYS                  1442
FLOWS_FILES                26
OWBSYS                  4
SYSTEM                   1236
ORACLE_OCM                 16
EXFSYS                    624
APEX_030200              5122
DBSNMP                    114
ORDSYS                   5026
ORDPLUGINS                 20
SYSMAN                   7108
APPQOSSYS                  10
XDB                  2336
ORDDATA                   514
XIAOYU                    154
SYS                 75434
WMSYS                     666

23 rows selected.

SQL> exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TAB01',method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly;
SQL> select * from tab01 where rownum=1 and owner='SYS';


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
      6  consistent gets
     45  physical reads
      0  redo size
       1605  bytes sent via SQL*Net to client
    523  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> select * from tab01 where rownum=1 and owner='XIAOYU';


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='XIAOYU')


Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
       2576  consistent gets
       2673  physical reads
      0  redo size
       1615  bytes sent via SQL*Net to client
    523  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> select * from tab01 where rownum=1 and owner='IMP';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='IMP')


Statistics
----------------------------------------------------------
      1  recursive calls
      0  db block gets
       3359  consistent gets
       3263  physical reads
      0  redo size
       1343  bytes sent via SQL*Net to client
    512  bytes received via SQL*Net from client
      1  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      0  rows processed

这里看出来在num bucket等于1时,全表扫描然后count stop key成本估算都是2,但是由于数据的分布问题实际的逻辑读是有量变的。

SQL> exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TAB01',method_opt=>'for all columns size auto');

PL/SQL procedure successfully completed.

SQL> select * from tab01 where rownum=1 and owner='SYS';


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |     2   (0)| 00:00:01 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='SYS')


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
      6  consistent gets
     45  physical reads
      0  redo size
       1605  bytes sent via SQL*Net to client
    523  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> select * from tab01 where rownum=1 and owner='XIAOYU';


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |    25   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |    25   (0)| 00:00:01 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |    25   (0)| 00:00:01 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='XIAOYU')


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
       2576  consistent gets
       2673  physical reads
      0  redo size
       1615  bytes sent via SQL*Net to client
    523  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

SQL> select * from tab01 where rownum=1 and owner='IMP';

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 4034257318

----------------------------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    98 |   161   (0)| 00:00:02 |       |       |
|*  1 |  COUNT STOPKEY       |       |       |       |        |      |       |       |
|   2 |   PARTITION RANGE ALL|       |     2 |   196 |   161   (0)| 00:00:02 |     1 |    40 |
|*  3 |    TABLE ACCESS FULL | TAB01 |     2 |   196 |   161   (0)| 00:00:02 |     1 |    40 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - filter("OWNER"='IMP')


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
       3359  consistent gets
       3263  physical reads
      0  redo size
       1343  bytes sent via SQL*Net to client
    512  bytes received via SQL*Net from client
      1  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      0  rows processed

如果收集该列的直方图,此时优化器清楚的知道谓词条件的数据分布,对于这类查询优化器就能够较准确的评估cost的成本。

优化这个sql并不难,有一个思路就是利用小表IM_RES_TYPE去做驱动表驱动大表GROUP_SUBS_MEMBER_ATTR,可以利用attrvalue和attrid的等值条件创建索引来避免对大表的单分区全扫描。

直接创建(ATTRVALUE+ATTRID)的索引。

explain plan for
SELECT A.ATTRVALUE, B.TYPENAME
  FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B
 WHERE A.ATTRID = 'res_type'
   AND A.REGION = 23
   AND B.ITEMID = A.ATTRVALUE
   AND ROWNUM = 1;

Plan hash value: 1650466411
 
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                        |     1 |    55 |     7   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY               |                        |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID| IM_RES_TYPE            |     1 |    36 |     1   (0)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS              |                        |     4 |   165 |     7   (0)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE SINGLE   |                        |     4 |    76 |     3   (0)| 00:00:01 |     4 |     4 |
|*  5 |      TABLE ACCESS FULL       | GROUP_SUBS_MEMBER_ATTR |     4 |    76 |     3   (0)| 00:00:01 |     4 |     4 |
|*  6 |     INDEX UNIQUE SCAN        | PK_IMRESTYPE           |     1 |       |     0   (0)| 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
   5 - filter("A"."ATTRID"='res_type' AND "A"."ATTRVALUE" IS NOT NULL AND "A"."REGION"=23)
   6 - access("RES_TYPE_ID"="A"."ATTRVALUE")

这里默认情况下cbo根据cost成本估算并不会选择先用tbcs.RESOURCE_TYPE B去做驱动表做nested loop关联,这是因为两个sql的执行成本太接近,优化器既有可能选择全表扫描count stopkey,也有可能选择索引扫描count stopkey,但是消耗的IO资源是存在量变的。

SQL> SELECT /*+leading(B A)*/A.ATTRVALUE, B.TYPENAME
  2    FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B
  3   WHERE A.ATTRID = 'res_type'
  4     AND A.REGION = 23
  5     AND B.ITEMID = A.ATTRVALUE
  6     AND ROWNUM = 1;

Elapsed: 00:00:00.06

Execution Plan
----------------------------------------------------------
Plan hash value: 458037665

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                        |     1 |    55 |    14   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY                     |                        |       |       |            |          |       |       |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID| GROUP_SUBS_MEMBER_ATTR |     1 |    19 |    12   (0)| 00:00:01 |     4 |     4 |
|   3 |    NESTED LOOPS                    |                        |    10 |    55 |    14   (0)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS FULL              | IM_RES_TYPE            |     1 |    36 |     2   (0)| 00:00:01 |       |       |
|   5 |     PARTITION RANGE SINGLE         |                        |    10 |       |     3   (0)| 00:00:01 |     4 |     4 |
|*  6 |      INDEX RANGE SCAN              | IND_ATTRVALUE_ATTRID   |    10 |       |     3   (0)| 00:00:01 |     4 |     4 |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   2 - filter("A"."REGION"=23)
   6 - access("RES_TYPE_ID"="A"."ATTRVALUE" AND "A"."ATTRID"='res_type')
       filter("A"."ATTRVALUE" IS NOT NULL)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        370  consistent gets
          0  physical reads
          0  redo size
        619  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这里需要强制加上leading(B A)让优化器先以tbcs.RESOURCE_TYPE B去做驱动表,然后驱动GROUP_SUBS_MEMBER_ATTR A表。

我们这里还可以有个想法就是,既然原sql消耗的成本主要在全表扫描取(“A”.”ATTRID”=’res_type’ AND “A”.”ATTRVALUE” IS NOT NULL AND “A”.”REGION”=23)的数据,那么我们能否用另外的一个取数据的办法就是通过index range scan的方式了,那我们能否让cbo走(ATTRVALUE+ATTRID)的索引来取数据。

SQL> SELECT /*+index(A ind_attrvalue_attrid)*/A.ATTRVALUE, B.TYPENAME
  2    FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B
  3   WHERE A.ATTRID = 'res_type'
  4     AND A.REGION = 23
  5     AND B.ITEMID = A.ATTRVALUE
  6     AND ROWNUM = 1;

Elapsed: 00:02:21.41

Execution Plan
----------------------------------------------------------
Plan hash value: 485372855

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |     1 |    55 |    11   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY                        |                        |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID         | IM_RES_TYPE            |     1 |    36 |     1   (0)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                       |                        |     4 |   165 |    11   (0)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE SINGLE            |                        |     4 |    76 |     7   (0)| 00:00:01 |     4 |     4 |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| GROUP_SUBS_MEMBER_ATTR |     4 |    76 |     7   (0)| 00:00:01 |     4 |     4 |
|*  6 |       INDEX FULL SCAN                 | IND_ATTRVALUE_ATTRID   | 98189 |       |     6   (0)| 00:00:01 |     4 |     4 |
|*  7 |     INDEX UNIQUE SCAN                 | PK_IMRESTYPE           |     1 |       |     0   (0)| 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   5 - filter("A"."REGION"=23)
   6 - access("A"."ATTRID"='res_type')
       filter("A"."ATTRID"='res_type' AND "A"."ATTRVALUE" IS NOT NULL)
   7 - access("RES_TYPE_ID"="A"."ATTRVALUE")


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
     227934  consistent gets
     227981  physical reads
          0  redo size
        619  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

cob采取比较糟糕的index full scan然会回表的方式,并没有采取我们所认为的index range scan的方式来回表,那我们想想为什么优化器不能够采用index range scan的方式了,这里是因为我们创建索引是(ATTRVALUE+ATTRID)的组合索引,在这个索引取数据时由于”A”.”ATTRVALUE” IS NOT NULL并不是一个等值的条件,而这个attrvalue又是前导列,这个导致优化器如果想走index range scan需要走类似的index range scan然后 INLIST ITERATOR迭代的方式,由于attrvalue有很大一部分的null value,这将会导致这部分执行cbo估算时较高,而不选择这种执行计划,进而选择了更糟糕的index full scan回表的方式,由于attrid=’res_type’ 具有很大的倾斜性,刚好这部分数据又在索引的后面的几个leaf block中。

再来想想(ATTRID+ATTRVALUE)复合索引,通过attrid是索引前导列,(“A”.”ATTRID”=’res_type’ AND “A”.”ATTRVALUE” IS NOT NULL AND “A”.”REGION”=23)这个谓词条件利用attrid是索引前导列的复合索引可以很精准的从root到branch再到leaf block,从而在leaf block时index range scan。

SQL> SELECT A.ATTRVALUE, B.TYPENAME
  2    FROM tbcs.GROUP_SUBS_MEMBER_ATTR A, tbcs.RESOURCE_TYPE B
  3   WHERE A.ATTRID = 'res_type'
  4     AND A.REGION = 23
  5     AND B.ITEMID = A.ATTRVALUE
  6     AND ROWNUM = 1;

Elapsed: 00:00:00.12

Execution Plan
----------------------------------------------------------
Plan hash value: 2801988880

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |     1 |    55 |     9   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY                        |                        |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY INDEX ROWID         | IM_RES_TYPE            |     1 |    36 |     1   (0)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                       |                        |     4 |   165 |     9   (0)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE SINGLE            |                        |     4 |    76 |     5   (0)| 00:00:01 |     4 |     4 |
|*  5 |      TABLE ACCESS BY LOCAL INDEX ROWID| GROUP_SUBS_MEMBER_ATTR |     4 |    76 |     5   (0)| 00:00:01 |     4 |     4 |
|*  6 |       INDEX RANGE SCAN                | IND_ATTRID_ATTRVALUE   | 98189 |       |     4   (0)| 00:00:01 |     4 |     4 |
|*  7 |     INDEX UNIQUE SCAN                 | PK_IMRESTYPE           |     1 |       |     0   (0)| 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   5 - filter("A"."REGION"=23)
   6 - access("A"."ATTRID"='res_type')
       filter("A"."ATTRVALUE" IS NOT NULL)
   7 - access("RES_TYPE_ID"="A"."ATTRVALUE")


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
         10  consistent gets
          5  physical reads
          0  redo size
        619  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

这里我们需要注意的是在表中添加了rownum的限制又添加了部分谓词的条件时,刚好满足这个谓词条件对应的数据块在表段较后面的数据块中,且对应的列没有直方图又存在数据倾斜,此时cbo估算这个全表扫描 count stopkey成本往往是不准确的,当然上面这个例子确实比较特殊,没有直方图且数据有倾斜性,这里借助这个sql case分析只是为了让我们更好的理解nested loop的原理、以及rownum对于表扫描索引扫描时执行计划和成本估算的影响。

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn