Heim >Datenbank >MySQL-Tutorial >一个标量子查询联想到with as改写方案

一个标量子查询联想到with as改写方案

WBOY
WBOYOriginal
2016-06-07 16:36:531382Durchsuche

群里有朋友提到下列sql语句如何避免反复扫描t_dict表 select o.order_id, (SELECT DICT_NAME FROM T_DICT WHERE DICT_TYPEID = 'ORDERSTAT' AND DICT_ID = o.STAT) AS STAT, (SELECT DICT_NAME FROM T_DICT WHERE DICT_TYPEID = 'PAYSTAT' AND DICT_ID = o.P

群里有朋友提到下列sql语句如何避免反复扫描t_dict表<br> select o.order_id,<br> (SELECT DICT_NAME<br> FROM T_DICT<br> WHERE DICT_TYPEID = 'ORDERSTAT'<br> AND DICT_ID = o.STAT) AS STAT,<br> (SELECT DICT_NAME<br> FROM T_DICT<br> WHERE DICT_TYPEID = 'PAYSTAT'<br> AND DICT_ID = o.PAY_STAT) AS PAY_STAT,<br> (SELECT DICT_NAME<br> FROM T_DICT<br> WHERE DICT_TYPEID = 'ACTSTAT'<br> AND DICT_ID = o.ACT_STAT) AS ACT_STAT<br> FROM T_ORDER o

另一个朋友提到用下列sql语句来避免反复扫描子表t_dict
/* Formatted on 2014/6/12 17:57:18 (QP5 v5.149.1003.31008) */
WITH vts1
AS (SELECT o.order_id, d.DICT_NAME, d.DICT_TYPEID
FROM T_DICT d, T_ORDER o
WHERE ( d.DICT_ID = o.STAT
OR d.DICT_ID = o.PAY_STAT
OR d.DICT_ID = o.ACT_STAT)
AND d.DICT_TYPEID IN ('ORDERSTAT', 'PAYSTAT', 'ACTSTAT'))
SELECT order_id,
(SELECT DICT_NAME
FROM vts1
WHERE DICT_TYPEID = 'ORDERSTAT')
AS stat,
(SELECT DICT_NAME
FROM vts1
WHERE DICT_TYPEID = 'PAYSTAT')
AS stat,
(SELECT DICT_NAME
FROM vts1
WHERE DICT_TYPEID = 'ACTSTAT')
AS stat
FROM vts1;

上面的sql是一个标量子查询,而小鱼手动构造了下列表t01和t02
SQL> create table t01 as select * from dba_objects;

Table created.
SQL> create table t02 as select * from dba_objects;

Table created.
SQL> alter table t01 add id number;

Table altered.
SQL> update t01 set id=object_id;

50328 rows updated.

SQL> commit;

SQL> SELECT t01.object_id,
2 (SELECT object_name
3 FROM t02
4 WHERE t02.object_type = 'TABLE' AND t02.object_id = t01.object_id)
5 col1,
6 (SELECT object_name
7 FROM t02
8 WHERE t02.object_type = 'INDEX'
9 AND t02.object_id = t01.data_object_id)
10 col2,
11 (SELECT object_name
12 FROM t02
13 WHERE t02.object_type = 'VIEW' AND t02.object_id = t01.id)
14 col3
15 FROM t01;

50328 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3013675264

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 46210 | 1759K| 155 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| T02 | 20 | 1800 | 17 (0)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T02 | 18 | 1620 | 18 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| T02 | 33 | 2970 | 7 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T01 | 46210 | 1759K| 155 (2)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T02"."OBJECT_TYPE"='TABLE' AND "T02"."OBJECT_ID"=:B1)
2 - filter("T02"."OBJECT_TYPE"='INDEX' AND "T02"."OBJECT_ID"=:B1)
3 - filter("T02"."OBJECT_TYPE"='VIEW' AND "T02"."OBJECT_ID"=:B1)

Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------
92 recursive calls
0 db block gets
72905089 consistent gets
0 physical reads
0 redo size
1241710 bytes sent via SQL*Net to client
37397 bytes received via SQL*Net from client
3357 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50328 rows processed
由于重复值较少,标量子查询的执行成本很高,不要只看上面的cost是155,这个只是一个循环的成本,我们看下逻辑读高达了72905089,一般这种sql我们优先想到的是将标量子查询写成外连接

SQL> SELECT t01.object_id,
2 a.object_name,
3 b.object_name,
4 c.object_name
5 FROM t01,
6 (SELECT object_name, object_id
7 FROM t02
8 WHERE t02.object_type = 'TABLE') a,
9 (SELECT object_name, object_id
10 FROM t02
11 WHERE t02.object_type = 'INDEX') b,
12 (SELECT object_name, object_id
13 FROM t02
14 WHERE t02.object_type = 'VIEW') c
15 WHERE t01.object_id = a.object_id(+)
16 AND t01.data_object_id = b.object_id(+)
17 AND t01.id = c.object_id(+);

50328 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3250688725

--------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 46210 | 13M| 620 (2)| 00:00:08 |

|* 1 | HASH JOIN RIGHT OUTER | | 46210 | 13M| 620 (2)| 00:00:08 |

|* 2 | TABLE ACCESS FULL | T02 | 3262 | 286K| 154 (1)| 00:00:02 |

|* 3 | HASH JOIN RIGHT OUTER | | 46210 | 9882K| 465 (2)| 00:00:06 |

|* 4 | TABLE ACCESS FULL | T02 | 2032 | 178K| 154 (1)| 00:00:02 |

|* 5 | HASH JOIN RIGHT OUTER| | 46210 | 5821K| 310 (2)| 00:00:04 |

|* 6 | TABLE ACCESS FULL | T02 | 1845 | 162K| 154 (1)| 00:00:02 |

| 7 | TABLE ACCESS FULL | T01 | 46210 | 1759K| 155 (2)| 00:00:02 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T01"."ID"="OBJECT_ID"(+))
2 - filter("T02"."OBJECT_TYPE"(+)='VIEW')
3 - access("T01"."OBJECT_ID"="OBJECT_ID"(+))
4 - filter("T02"."OBJECT_TYPE"(+)='TABLE')
5 - access("T01"."DATA_OBJECT_ID"="OBJECT_ID"(+))
6 - filter("T02"."OBJECT_TYPE"(+)='INDEX')

Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------
17 recursive calls
0 db block gets
6327 consistent gets
0 physical reads
0 redo size
1241731 bytes sent via SQL*Net to client
37397 bytes received via SQL*Net from client
3357 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
50328 rows processed

改成外连接后,逻辑读降到了6327,sql的相应时间大幅度降低,那么是否还可以写成只扫描一次子查询的表t02了,当然也可以尝试改写成with as的写法,尝试了半天,写出来感觉业务逻辑始终存在问题,而上面最先提出的例子给出的with as的写法,小鱼个人觉得也是有问题的

WITH vts1
AS (SELECT o.order_id, d.DICT_NAME, d.DICT_TYPEID
FROM T_DICT d, T_ORDER o
WHERE ( d.DICT_ID = o.STAT
OR d.DICT_ID = o.PAY_STAT
OR d.DICT_ID = o.ACT_STAT)
AND d.DICT_TYPEID IN ('ORDERSTAT', 'PAYSTAT', 'ACTSTAT'))
我们来看看上面这段构造的表vts1,很明显如果返回超过三行数据,那么下面的标量子查询无疑会报错
SELECT order_id,
(SELECT DICT_NAME
FROM vts1
WHERE DICT_TYPEID = 'ORDERSTAT')
AS stat
FROM vts1;

这里只截取一个查询,换句话说如果vts1临时表的dict_typeid=’ORDERSTAT’返回了多行,那么这个标量子查询肯定会报错。

特别明显的就是如果我们构造一个with as的表,其实数据并不需要绝对的满足with as的条件,因为标量子查询是类似于外连接的,子表不存在满足的数据则补全null,当然如果t02这个表比较大,我们也是可以用with as去优化,类似于物化部分结果集为临时表。
SQL> WITH m AS (SELECT object_name, object_id, object_type
2 FROM t02
3 WHERE object_type IN ('TABLE', 'INDEX', 'VIEW'))
4 SELECT t01.object_id,
5 a.object_name,
6 b.object_name,
7 c.object_name
8 FROM t01,
9 (SELECT object_name, object_id
10 FROM m
11 WHERE m.object_type = 'TABLE') a,
12 (SELECT object_name, object_id
13 FROM m
14 WHERE m.object_type = 'INDEX') b,
15 (SELECT object_name, object_id
16 FROM m
17 WHERE m.object_type = 'VIEW') c
18 WHERE t01.object_id = a.object_id(+)
19 AND t01.data_object_id = b.object_id(+)
20 AND t01.id = c.object_id(+);

50293 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3949065611

--------------------------------------------------------------------------------
------------------------
| Id | Operation | Name | Rows | Bytes |
Cost (%CPU)| Time |
--------------------------------------------------------------------------------
------------------------
| 0 | SELECT STATEMENT | | 50293 | 13M|
343 (3)| 00:00:05 |
| 1 | TEMP TABLE TRANSFORMATION | | | |
| |
| 2 | LOAD AS SELECT | | | |
| |
|* 3 | TABLE ACCESS FULL | T02 | 6856 | 261K|
155 (2)| 00:00:02 |
|* 4 | HASH JOIN RIGHT OUTER | | 50293 | 13M|
188 (3)| 00:00:03 |
|* 5 | VIEW | | 6856 | 602K|
10 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_2EC7C2 | 6856 | 261K|
10 (0)| 00:00:01 |
|* 7 | HASH JOIN RIGHT OUTER | | 50293 | 9429K|
177 (3)| 00:00:03 |
|* 8 | VIEW | | 6856 | 602K|
10 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_2EC7C2 | 6856 | 261K|
10 (0)| 00:00:01 |
|* 10 | HASH JOIN RIGHT OUTER | | 50293 | 5009K|
166 (2)| 00:00:02 |
|* 11 | VIEW | | 6856 | 602K|
10 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_2EC7C2 | 6856 | 261K|
10 (0)| 00:00:01 |
| 13 | TABLE ACCESS FULL | T01 | 50293 | 589K|
155 (2)| 00:00:02 |
--------------------------------------------------------------------------------
------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE' OR "OBJECT_TYPE"='V
IEW')
4 - access("T01"."ID"="OBJECT_ID"(+))
5 - filter("M"."OBJECT_TYPE"(+)='VIEW')
7 - access("T01"."DATA_OBJECT_ID"="OBJECT_ID"(+))
8 - filter("M"."OBJECT_TYPE"(+)='INDEX')
10 - access("T01"."OBJECT_ID"="OBJECT_ID"(+))
11 - filter("M"."OBJECT_TYPE"(+)='TABLE')

Statistics
----------------------------------------------------------
2 recursive calls
40 db block gets
4810 consistent gets
33 physical reads
648 redo size
1240850 bytes sent via SQL*Net to client
37364 bytes received via SQL*Net from client
3354 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
50293 rows processed
群里同学给出的那个with as的sql语句,小鱼个人并不赞同。

既然提到了with as,下面我们就来对with as做一个简单的介绍

With as主要体现在增加sql易读性和减少IO成本,增加易读性比较好理解,比如查询业务部分所有人的平均薪水
SQL> select mgr,avg(sal) avg_sal from emp group by mgr;

MGR AVG_SAL
---------- ----------
7839 2758.33333
5000
7782 1300
7698 1310
7902 800
7566 3000
7788 1100

7 rows selected.
SQL> with avgsal as (select mgr,avg(sal) avg_sal from emp group by mgr)
2 select * from avgsal;

7 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4067220884

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6 | 48 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 6 | 48 | 4 (25)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 112 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
715 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
7 rows processed

还有一个比较突出的作用就是减少重复部分IO成本,此时oracle会对重复部分构造一个临时表,而后oracle只会扫描这个临时表来减少重复IO带来的资源消耗。

下面我们来构造一个with as的改写的案例
SQL> SELECT COUNT (*) cnt, object_type
2 FROM t01
3 WHERE object_type = 'TABLE'
4 GROUP BY object_type
5 UNION ALL
6 SELECT COUNT (*), object_type
7 FROM t01
8 WHERE object_type = 'INDEX'
9 GROUP BY object_type
10 UNION ALL
11 SELECT COUNT (*), object_type
12 FROM t01
13 WHERE object_type = 'VIEW'
14 GROUP BY object_type;

CNT OBJECT_TYPE
---------- -------------------
1610 TABLE
1721 INDEX
3672 VIEW

Execution Plan
----------------------------------------------------------
Plan hash value: 2988477672

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 27 | 463 (67)| 00:00:06 |
| 1 | UNION-ALL | | | | | |
| 2 | SORT GROUP BY NOSORT| | 1 | 9 | 154 (1)| 00:00:02 |
|* 3 | TABLE ACCESS FULL | T01 | 1734 | 15606 | 154 (1)| 00:00:02 |
| 4 | SORT GROUP BY NOSORT| | 1 | 9 | 154 (1)| 00:00:02 |
|* 5 | TABLE ACCESS FULL | T01 | 1734 | 15606 | 154 (1)| 00:00:02 |
| 6 | SORT GROUP BY NOSORT| | 1 | 9 | 154 (1)| 00:00:02 |
|* 7 | TABLE ACCESS FULL | T01 | 1734 | 15606 | 154 (1)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("OBJECT_TYPE"='TABLE')
5 - filter("OBJECT_TYPE"='INDEX')
7 - filter("OBJECT_TYPE"='VIEW')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2088 consistent gets
0 physical reads
0 redo size
662 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

SQL> WITH a AS ( SELECT COUNT (*) cnt, object_type
2 FROM t01
3 GROUP BY object_type)
4 SELECT cnt, object_type
5 FROM a
6 WHERE a.object_type = 'TABLE'
7 UNION ALL
8 SELECT cnt, object_type
9 FROM a
10 WHERE a.object_type = 'INDEX'
11 UNION ALL
12 SELECT cnt, object_type
13 FROM a
14 WHERE a.object_type = 'VIEW';

CNT OBJECT_TYPE
---------- -------------------
1610 TABLE
1721 INDEX
3672 VIEW

Execution Plan
----------------------------------------------------------
Plan hash value: 48651815

--------------------------------------------------------------------------------
------------------------
| Id | Operation | Name | Rows | Bytes |
Cost (%CPU)| Time |
--------------------------------------------------------------------------------
------------------------
| 0 | SELECT STATEMENT | | 87 | 2088 |
6 (67)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | |
| |
| 2 | LOAD AS SELECT | | | |
| |
| 3 | HASH GROUP BY | | 29 | 261 |
158 (4)| 00:00:02 |
| 4 | TABLE ACCESS FULL | T01 | 50293 | 442K|
154 (1)| 00:00:02 |
| 5 | UNION-ALL | | | |
| |
|* 6 | VIEW | | 29 | 696 |
2 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660F_2EC7C2 | 29 | 261 |
2 (0)| 00:00:01 |
|* 8 | VIEW | | 29 | 696 |
2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660F_2EC7C2 | 29 | 261 |
2 (0)| 00:00:01 |
|* 10 | VIEW | | 29 | 696 |
2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660F_2EC7C2 | 29 | 261 |
2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - filter("A"."OBJECT_TYPE"='TABLE')
8 - filter("A"."OBJECT_TYPE"='INDEX')
10 - filter("A"."OBJECT_TYPE"='VIEW')

Statistics
----------------------------------------------------------
102 recursive calls
11 db block gets
719 consistent gets
1 physical reads
1584 redo size
662 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

这里我们看下with as的执行计划中存在了TEMP TABLE TRANSFORMATION和LOAD AS SELECT,这里是将with as组成的结果集转换为了一个临时表,而后面的查询则可以使用这个临时表,由于这个临时表一般比原来的表或者表连接的数据量小,所以无论IO还是cpu成本都相对原sql语句都减小了。

oracle一般会在重复使用两次的地方自动将with as转化为临时表,这个思想有点类似物化视图的含义,比如有些sql语句我们无法将其优化,将sql语句查询返回的结果集(with as也是如此)物化为一个实体的表,而这个表的访问成本大大小于之前返回该结果集的方式(比如访问该结果集要采取全表扫描、多表关联hash join、nested loop等)

同样即使with as转换的结果集我们只访问一次,也可以加上hint materialize将其构造为临时表。
SQL> WITH a AS ( SELECT /*+materialize*/COUNT (*) cnt, object_type
2 FROM t01
3 GROUP BY object_type)
4 SELECT cnt, object_type
5 FROM a where a.object_type='TABLE';

CNT OBJECT_TYPE
---------- -------------------
1610 TABLE

Execution Plan
----------------------------------------------------------
Plan hash value: 2073943260

--------------------------------------------------------------------------------
------------------------
| Id | Operation | Name | Rows | Bytes |
Cost (%CPU)| Time |
--------------------------------------------------------------------------------
------------------------
| 0 | SELECT STATEMENT | | 29 | 696 |
160 (4)| 00:00:02 |
| 1 | TEMP TABLE TRANSFORMATION | | | |
| |
| 2 | LOAD AS SELECT | | | |
| |
| 3 | HASH GROUP BY | | 29 | 261 |
158 (4)| 00:00:02 |
| 4 | TABLE ACCESS FULL | T01 | 50293 | 442K|
154 (1)| 00:00:02 |
|* 5 | VIEW | | 29 | 696 |
2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6616_2EC7C2 | 29 | 261 |
2 (0)| 00:00:01 |
--------------------------------------------------------------------------------
------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - filter("A"."OBJECT_TYPE"='TABLE')

Statistics
----------------------------------------------------------
102 recursive calls
10 db block gets
713 consistent gets
1 physical reads
1468 redo size
586 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

生产环境的一个sql语句:
SELECT t1.contactid,
t1.skillid,
t2.turntime starttime,
t2.intime
FROM call.base_contactstate t1,
( SELECT t.contactid,
MIN (t.starttime) turntime,
MAX (t.starttime) intime
FROM call.base_contactstate t
WHERE t.eventid = 806355209 AND t.contactstatus = 2
GROUP BY t.contactid) t2
WHERE t1.contactid = t2.contactid
AND t1.starttime = t2.intime
AND t1.eventid = 806355209
AND t1.contactstatus = 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 2829800415

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 145 | | 76953 (2)| 00:15:24 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 145 | | 76953 (2)| 00:15:24 |
|* 3 | HASH JOIN | | 114K| 15M| 8272K| 76943 (2)| 00:15:24 |
|* 4 | TABLE ACCESS FULL| BASE_CONTACTSTATE | 109K| 6977K| | 38029 (2)| 00:07:37 |
|* 5 | TABLE ACCESS FULL| BASE_CONTACTSTATE | 109K| 8587K| | 38029 (2)| 00:07:37 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("T1"."STARTTIME"=MAX("T"."STARTTIME"))
3 - access("T1"."CONTACTID"="T"."CONTACTID")
4 - filter("T"."EVENTID"=806355209 AND "T"."CONTACTSTATUS"=2)
5 - filter("T1"."EVENTID"=806355209 AND "T1"."CONTACTSTATUS"=2)

Statistics
----------------------------------------------------------
666 recursive calls
0 db block gets
324498 consistent gets
336648 physical reads
0 redo size
52016865 bytes sent via SQL*Net to client
397768 bytes received via SQL*Net from client
36118 SQL*Net roundtrips to/from client
21 sorts (memory)
0 sorts (disk)
541747 rows processed

这个sql语句如果想避免base_contactstate的全表扫描很难,但是我们注意到全表扫描了两次base_contactstate表,我们尝试利用下with as物化这个表的部分结果集为临时表来减小IO成本。
WITH t AS (SELECT contactid, skillid, starttime
FROM call.base_contactstate
WHERE eventid = 806355209 AND contactstatus = 2)
SELECT t1.contactid,
t1.skillid,
t2.turntime starttime,
t2.intime
FROM t t1,
( SELECT t.contactid,
MIN (t.starttime) turntime,
MAX (t.starttime) intime
FROM t
GROUP BY t.contactid) t2
WHERE t1.contactid = t2.contactid AND t1.starttime = t2.intime

Execution Plan
----------------------------------------------------------
Plan hash value: 1354327316

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 183 | | 41280 (2)| 00:08:16 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| 2 | LOAD AS SELECT | | | | | | |
|* 3 | TABLE ACCESS FULL | BASE_CONTACTSTATE | 109K| 7299K| | 38029 (2)| 00:07:37 |
|* 4 | HASH JOIN | | 1 | 183 | 9984K| 3251 (1)| 00:00:40 |
| 5 | VIEW | | 109K| 8694K| | 227 (2)| 00:00:03 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D66ED_3363B5A2 | 109K| 6225K| | 227 (2)| 00:00:03 |
| 7 | VIEW | | 109K| 10M| | 1942 (1)| 00:00:24 |
| 8 | HASH GROUP BY | | 109K| 6977K| 8232K| 1942 (1)| 00:00:24 |
| 9 | VIEW | | 109K| 6977K| | 227 (2)| 00:00:03 |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D66ED_3363B5A2 | 109K| 6225K| | 227 (2)| 00:00:03 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("EVENTID"=806355209 AND "CONTACTSTATUS"=2)
4 - access("T1"."CONTACTID"="T2"."CONTACTID" AND "T1"."STARTTIME"="T2"."INTIME")

Statistics
----------------------------------------------------------
1937 recursive calls
5572 db block gets
173790 consistent gets
173540 physical reads
1768 redo size
52132593 bytes sent via SQL*Net to client
397768 bytes received via SQL*Net from client
36118 SQL*Net roundtrips to/from client
44 sorts (memory)
0 sorts (disk)
541747 rows processed

这里我们用with as改写后,逻辑读和cost成本都降低了接近一半的样子,上次在群里讨论到sql优化,结果群里的有位朋友居然说到没有技术含量可言,在小鱼看来cbo是oracle最复杂的部分,没有之一。

其实在with as的改写后,oracle的优化器会计算是否用临时表的方式来完成查询,如果临时表的执行计划计算而来的cost较大,oracle还是会选择之前的方式,即使已经改写成了with as的临时表的sql语句

SQL> SELECT COUNT (1)
2 FROM (SELECT s.*
3 FROM call.HF_DM_SAMPLE s
4 INNER JOIN
5 ( SELECT biz_sheet_no bizno, MAX (project_task_times) times
6 FROM call.HF_DM_SAMPLE
7 WHERE project_id = '000000000001'
8 GROUP BY biz_sheet_no) tb_tmp
9 ON tb_tmp.bizno = s.biz_sheet_no
10 AND tb_tmp.times = s.project_task_times
11 AND s.blank0 LIKE '%160173576%') c__;

Plan hash value: 1949914683

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13469 (1)| 00:02:42 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 1431 | | 13469 (1)| 00:02:42 |
|* 3 | FILTER | | | | | |
| 4 | HASH GROUP BY | | 1431 | 95877 | 13469 (1)| 00:02:42 |
|* 5 | HASH JOIN | | 10011 | 655K| 13468 (1)| 00:02:42 |
|* 6 | TABLE ACCESS FULL| HF_DM_SAMPLE | 8478 | 314K| 6709 (1)| 00:01:21 |
|* 7 | TABLE ACCESS FULL| HF_DM_SAMPLE | 165K| 4677K| 6757 (2)| 00:01:22 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("S"."PROJECT_TASK_TIMES"=MAX("PROJECT_TASK_TIMES"))
5 - access("BIZ_SHEET_NO"="S"."BIZ_SHEET_NO")
6 - filter("S"."BLANK0" LIKE '%160173576%' AND "S"."BLANK0" IS NOT NULL)
7 - filter("PROJECT_ID"='000000000001')

SQL> WITH a AS (SELECT
biz_sheet_no,
project_task_times,
project_id,
blank0
FROM call.HF_DM_SAMPLE)
SELECT COUNT (1)
FROM (SELECT a.*
FROM a
INNER JOIN
( SELECT biz_sheet_no bizno, MAX (project_task_times) times
FROM a
WHERE a.project_id = '000000000001'
GROUP BY biz_sheet_no) tb_tmp
ON tb_tmp.bizno = a.biz_sheet_no
AND tb_tmp.times = a.project_task_times
AND a.blank0 LIKE '%160173576%') c__;

Plan hash value: 1949914683

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13469 (1)| 00:02:42 |
| 1 | SORT AGGREGATE | | 1 | | | |
| 2 | VIEW | | 1431 | | 13469 (1)| 00:02:42 |
|* 3 | FILTER | | | | | |
| 4 | HASH GROUP BY | | 1431 | 95877 | 13469 (1)| 00:02:42 |
|* 5 | HASH JOIN | | 10011 | 655K| 13468 (1)| 00:02:42 |
|* 6 | TABLE ACCESS FULL| HF_DM_SAMPLE | 8478 | 314K| 6709 (1)| 00:01:21 |
|* 7 | TABLE ACCESS FULL| HF_DM_SAMPLE | 165K| 4677K| 6757 (2)| 00:01:22 |
---------------------------------------------------------------------------------------

看出这里改写为with as临时表的sql语句后,cbo还是用的之前的执行计划的方式,而我们必须加上hint materialize的优化器才会采用临时表的方式完成查询。

SQL> WITH a AS (SELECT /*+materialize*/
2 biz_sheet_no,
3 project_task_times,
4 project_id,
5 blank0
6 FROM call.HF_DM_SAMPLE)
7 SELECT COUNT (1)
8 FROM (SELECT a.*
9 FROM a
10 INNER JOIN
11 ( SELECT biz_sheet_no bizno, MAX (project_task_times) times
12 FROM a
13 WHERE a.project_id = '000000000001'
14 GROUP BY biz_sheet_no) tb_tmp
15 ON tb_tmp.bizno = a.biz_sheet_no
16 AND tb_tmp.times = a.project_task_times
17 AND a.blank0 LIKE '%160173576%') c__;

Execution Plan
----------------------------------------------------------
Plan hash value: 2209692010

------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 425 | | 14130 (1)| 00:02:50 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | |
| 2 | LOAD AS SELECT | | | | | | |
| 3 | TABLE ACCESS FULL | HF_DM_SAMPLE | 174K| 6640K| | 6759 (2)| 00:01:22 |
| 4 | SORT AGGREGATE | | 1 | 425 | | | |
|* 5 | HASH JOIN | | 24908 | 10M| 13M| 7371 (1)| 00:01:29 |
| 6 | VIEW | | 143K| 11M| | 3590 (1)| 00:00:44 |
| 7 | HASH GROUP BY | | 143K| 13M| 18M| 3590 (1)| 00:00:44 |
|* 8 | VIEW | | 174K| 15M| | 209 (3)| 00:00:03 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6603_FFCC105A | 174K| 6640K| | 209 (3)| 00:00:03 |
|* 10 | VIEW | | 174K| 56M| | 209 (3)| 00:00:03 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6603_FFCC105A | 174K| 6640K| | 209 (3)| 00:00:03 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("TB_TMP"."BIZNO"="A"."BIZ_SHEET_NO" AND "TB_TMP"."TIMES"="A"."PROJECT_TASK_TIMES")
8 - filter("A"."PROJECT_ID"='000000000001')
10 - filter("A"."BLANK0" LIKE '%160173576%' AND "A"."BLANK0" IS NOT NULL)

在优化sql的时候,我们一旦看见比如重复的大表扫描、或者表连接等,要联想到是否能够物化这个结果集为一个临时表,关于with as的介绍就到这里了,有好的案例和优化文章会继续贴出来于大家分享。

Stellungnahme:
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn