分析函数和表连接的性能分析-MySQL 튜토리얼-php.cn

집

데이터 베이스

MySQL 튜토리얼

分析函数和表连接的性能分析

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:36 PM

sql기능분석하다성능보고서성명연결하다

同事报表有些sql语句经常会涉及求表中分组后求某列最大或者最小等行的所有行记录，而这个往往开发人员写的sql都是先构造一个已经完成分组和某列max或者min的表，然后原表做子查询还是表关联。 SQL> set linesize 120 SQL> SELECT r.sample_id, 2 r.result_co

同事报表有些sql语句经常会涉及求表中分组后求某列最大或者最小等行的所有行记录，而这个往往开发人员写的sql都是先构造一个已经完成分组和某列max或者min的表，然后原表做子查询还是表关联。

SQL> set linesize 120
SQL> SELECT r.sample_id,
2 r.result_code,
3 r.reason_code,
4 r.begin_time
5 FROM call.hf_dm_visit_record r,
6 ( SELECT r.sample_id, MIN (r.begin_time) begin_time
7 FROM call.hf_dm_visit_record r
8 GROUP BY r.sample_id) r1
9 WHERE r.sample_id = r1.sample_id AND r.begin_time = r1.begin_time;

137540 rows selected.

Elapsed: 00:00:09.85

Execution Plan
----------------------------------------------------------
Plan hash value: 4064551521

---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 474 | | 5438 (2)| 00:01:06 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 3 | 474 | | 5438 (2)| 00:01:06 |
|* 3 | HASH JOIN | | 555K| 83M| 20M| 5381 (1)| 00:01:05 |
| 4 | TABLE ACCESS FULL| HF_DM_VISIT_RECORD | 274K| 17M| | 1500 (1)| 00:00:19 |
| 5 | TABLE ACCESS FULL| HF_DM_VISIT_RECORD | 274K| 23M| | 1503 (2)| 00:00:19 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("R"."BEGIN_TIME"=MIN("R"."BEGIN_TIME"))
3 - access("R"."SAMPLE_ID"="R"."SAMPLE_ID")

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
13338 consistent gets
1951 physical reads
172 redo size
9797533 bytes sent via SQL*Net to client
101351 bytes received via SQL*Net from client
9171 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
137540 rows processed

这个是开发人员的原sql语句，同样可以改写成子查询，但是由于子查询是可以展开的，所以一般执行计划不会变化，由于执行计划一样，就不重复列出。
SELECT r.sample_id,
r.result_code,
r.reason_code,
r.begin_time
FROM call.hf_dm_visit_record r
WHERE (sample_id, begin_time) IN
( SELECT r.sample_id, MIN (r.begin_time) begin_time
FROM call.hf_dm_visit_record r
GROUP BY r.sample_id)

而如果我们改成写分析函数，此时oracle只需要扫描一次hf-dm_visit_record表，但是有个WINDOW SORT的排序成本
SQL> SELECT a.sample_id,
a.result_code,
2 3 a.reason_code,
a.begin_time
FROM (SELECT r.sample_id,
r.result_code,
r.reason_code,
r.begin_time,
ROW_NUMBER ()
OVER (PARTITION BY r.sample_id ORDER BY r.sample_id )
cnt
FROM call.hf_dm_visit_record r)a
WHERE cnt = 1; 4 5 6 7 8 9 10 11 12 13

137540 rows selected.

Elapsed: 00:00:12.19

Execution Plan
----------------------------------------------------------
Plan hash value: 679670933

-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 274K| 46M| | 6564 (1)| 00:01:19 |
|* 1 | VIEW | | 274K| 46M| | 6564 (1)| 00:01:19 |
|* 2 | WINDOW SORT PUSHED RANK| | 274K| 20M| 24M| 6564 (1)| 00:01:19 |
| 3 | TABLE ACCESS FULL | HF_DM_VISIT_RECORD | 274K| 20M| | 1503 (2)| 00:00:19 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CNT"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "R"."SAMPLE_ID" ORDER BY NULL )

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6670 consistent gets
1951 physical reads
0 redo size
9796204 bytes sent via SQL*Net to client
101351 bytes received via SQL*Net from client
9171 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
137540 rows processed

看sql的执行时间，相比第一种表关联的方式相应更加快，而且cost值也更加低，只是表关联方式的逻辑读相比分析函数要高一部分,还有一个特别需要我们关注的就是hash join和windows sort PUSHED RANK都用到了临时表空间，我们看下hash join大概用到了TempSpc 20M，而windows sort pushed rank则达到了TempSpc 24M,注意这里的TempSpc表示的是hash join和windows sort pushed rank排序消耗的临时表空间大小。

但是大多数系统都是以sql的响应时间为性能参考的，上述sql语句改写为分析函数后执行效率并没有表连接或者子查询效率高，所以经常网络上有文章提到分析函数性能较高，那个也只是片面的，要依据实际的数据分布。

再看下面的一个sql语句：
SQL> SELECT r.called_object
2 FROM call.hf_script_callrecord r,
3 ( SELECT r.called_object, MAX (r.begin_time) begin_time
4 FROM call.hf_script_callrecord r
GROUP BY r.called_object) r1
WHERE r.called_object = r1.called_object AND r.begin_time = r1.begin_time; 5 6

138246 rows selected.

Elapsed: 00:00:30.70

Execution Plan
----------------------------------------------------------
Plan hash value: 4009191755

-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 26 | 3796 | | 8740 (14)| 00:01:45 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 26 | 3796 | | 8740 (14)| 00:01:45 |
|* 3 | HASH JOIN | | 8534K| 1188M| 25M| 7706 (3)| 00:01:33 |
| 4 | TABLE ACCESS FULL| HF_SCRIPT_CALLRECORD | 338K| 21M| | 2434 (1)| 00:00:30 |
| 5 | TABLE ACCESS FULL| HF_SCRIPT_CALLRECORD | 338K| 25M| | 2434 (1)| 00:00:30 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("R"."BEGIN_TIME"=MAX("R"."BEGIN_TIME"))
3 - access("R"."CALLED_OBJECT"="R"."CALLED_OBJECT")

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
20540 consistent gets
6818 physical reads
0 redo size
5652618 bytes sent via SQL*Net to client
101868 bytes received via SQL*Net from client
9218 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
138246 rows processed

消耗的逻辑读是20540，排序消耗的temp是25M，cost为8740

SQL> SELECT *
2 FROM (SELECT r.called_object,
3 ROW_NUMBER ()
4 OVER (PARTITION BY called_object ORDER BY called_object DESC)
5 cnt
6 FROM call.hf_script_callrecord r)
7 WHERE cnt = 1;

138199 rows selected.

Elapsed: 00:00:05.50

Execution Plan
----------------------------------------------------------
Plan hash value: 1404543553

---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 338K| 30M| | 5099 (2)| 00:01:02 |
|* 1 | VIEW | | 338K| 30M| | 5099 (2)| 00:01:02 |
|* 2 | WINDOW SORT PUSHED RANK| | 338K| 9268K| 11M| 5099 (2)| 00:01:02 |
| 3 | TABLE ACCESS FULL | HF_SCRIPT_CALLRECORD | 338K| 9268K| | 2434 (1)| 00:00:30 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CNT"=1)
2 - filter(ROW_NUMBER() OVER ( PARTITION BY "CALLED_OBJECT" ORDER BY NULL )

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10275 consistent gets
171 physical reads
136 redo size
6175932 bytes sent via SQL*Net to client
101835 bytes received via SQL*Net from client
9215 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
138199 rows processed

改写为分析函数后，逻辑读降低到10275，cost成本也降为了5099，消耗的tempspc则是降低为9268K，而sql的相应时间则从30秒降低为5秒

改写为分析函数后，sql响应时间得到了提高，而消耗的系统资源也较少了，而如果这个sql语句我们改写为分析函数无疑是比较高效的。

同样小鱼创建了一个大表，我们来看下这两张方式消耗资源和响应时间：
SQL>create table t01 as select * from dba_objects;
SQL>insert into t01 select * from t01;
SQL>insert into t01 select * from t01;
SQL>insert into t01 select * from t01;
…
SQL> select count(*) from t01;

COUNT(*)
----------
3220992

这个表有将近320w的数据，分别设置event 10046和tkprof后查看整个sql执行时间所消耗的资源和等待事件。

SQL> select a.object_id,a.object_name,a.object_type from t01 a,(select max(objec
t_id) col,object_type from t01 group by object_type) b where a.object_id=b.col a
nd a.object_type=b.object_type;

SQL> select * from (select object_id,object_name,object_type,max(object_id)over(
partition by object_type) col from t01)a where a.object_id=a.col;

TKPROF: Release 10.2.0.4.0 - Production on Fri Jun 13 15:32:05 2014

Trace file: g:\oracle\product\10.2.0\admin\ora10g\udump\ora10g_ora_8256.trc
Sort options: default

********************************************************************************
count = number of times OCI procedure was executed
cpu = cpu time in seconds executing
elapsed = elapsed time in seconds executing
disk = number of physical reads of buffers from disk
query = number of buffers gotten for consistent read
current = number of buffers gotten in current mode (usually for update)
rows = number of rows processed by the fetch or execute call
--------------------------------------------------------------------------------

*** SESSION ID:(143.195) 2014-06-13 15:29:31.947

********************************************************************************

select a.object_id,a.object_name,a.object_type
from
t01 a,(select max(object_id) col,object_type from t01 group by object_type)
b where a.object_id=b.col and a.object_type=b.object_type

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 168 2.12 1.91 0 92607 0 2496
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 170 2.12 1.91 0 92607 0 2496

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows Row Source Operation
------- ---------------------------------------------------
2496 HASH JOIN (cr=92607 pr=0 pw=0 time=2380108 us)
39 VIEW (cr=46223 pr=0 pw=0 time=1239182 us)
39 HASH GROUP BY (cr=46223 pr=0 pw=0 time=1239142 us)
3220992 TABLE ACCESS FULL T01 (cr=46223 pr=0 pw=0 time=79 us)
3220992 TABLE ACCESS FULL T01 (cr=46384 pr=0 pw=0 time=30 us)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 169 0.00 0.00
SQL*Net message from client 169 19.92 32.62
********************************************************************************
select *
from
(select object_id,object_name,object_type,max(object_id)over(partition by
object_type) col from t01)a where a.object_id=a.col

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 168 7.28 23.42 71080 46223 15 2496
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 170 7.28 23.42 71080 46223 15 2496

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows Row Source Operation
------- ---------------------------------------------------
2496 VIEW (cr=46223 pr=71080 pw=52672 time=289701960 us)
3220992 WINDOW SORT (cr=46223 pr=71080 pw=52672 time=130099916 us)
3220992 TABLE ACCESS FULL T01 (cr=46223 pr=0 pw=0 time=89 us)

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 168 0.00 0.00
direct path write temp 251 0.24 1.98
direct path read temp 44876 0.26 14.55
SQL*Net message from client 168 23.36 25.61
********************************************************************************

表连接的sql语句响应时间明显快于分析函数，而且分析函数有个很致命的问题是排序用了磁盘的temp，我们可以在10046 格式化的trac文件中看见很明显的direct path write temp和direct path read temp，同样由于是自己的测试机（T430），分析函数的sql执行时特别消耗自己的pc资源。

对于上述这类sql请求，分析函数的优势是减少表扫描次数，但是有个windows sort的排序成本，而表连接主要是表扫描次数增多，同样会有一个表连接和group by的排序，两种方式并不是绝对的谁优于谁，需要根据具体的数据分布来进行评估，有兴趣的朋友可以自己找自己的生产系统的sql语句来进行测试，当然也可以自己模拟数据。

原文地址：分析函数和表连接的性能分析, 感谢原作者分享。

성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

관련 기사

MySQL의 라이센스는 다른 데이터베이스 시스템과 어떻게 비교됩니까?Apr 25, 2025 am 12:26 AM

MySQL은 GPL 라이센스를 사용합니다. 1) GPL 라이센스는 MySQL의 무료 사용, 수정 및 분포를 허용하지만 수정 된 분포는 GPL을 준수해야합니다. 2) 상업용 라이센스는 공개 수정을 피할 수 있으며 기밀이 필요한 상업용 응용 프로그램에 적합합니다.

MyISAM을 통해 언제 innodb를 선택 하시겠습니까?Apr 25, 2025 am 12:22 AM

MyISAM 대신 InnoDB를 선택할 때의 상황에는 다음이 포함됩니다. 1) 거래 지원, 2) 높은 동시성 환경, 3) 높은 데이터 일관성; 반대로, MyISAM을 선택할 때의 상황에는 다음이 포함됩니다. 1) 주로 읽기 작업, 2) 거래 지원이 필요하지 않습니다. InnoDB는 전자 상거래 플랫폼과 같은 높은 데이터 일관성 및 트랜잭션 처리가 필요한 응용 프로그램에 적합하지만 MyISAM은 블로그 시스템과 같은 읽기 집약적 및 트랜잭션이없는 애플리케이션에 적합합니다.

MySQL에서 외국 키의 목적을 설명하십시오.Apr 25, 2025 am 12:17 AM

MySQL에서 외국 키의 기능은 테이블 간의 관계를 설정하고 데이터의 일관성과 무결성을 보장하는 것입니다. 외국 키는 참조 무결성 검사 및 계단식 작업을 통해 데이터의 효과를 유지합니다. 성능 최적화에주의를 기울이고 사용할 때 일반적인 오류를 피하십시오.

MySQL의 다른 유형의 인덱스는 무엇입니까?Apr 25, 2025 am 12:12 AM

MySQL에는 B-Tree Index, Hash Index, Full-Text Index 및 공간 인덱스의 네 가지 주요 인덱스 유형이 있습니다. 1.B- 트리 색인은 범위 쿼리, 정렬 및 그룹화에 적합하며 직원 테이블의 이름 열에서 생성에 적합합니다. 2. HASH 인덱스는 동등한 쿼리에 적합하며 메모리 저장 엔진의 HASH_Table 테이블의 ID 열에서 생성에 적합합니다. 3. 전체 텍스트 색인은 기사 테이블의 내용 열에서 생성에 적합한 텍스트 검색에 사용됩니다. 4. 공간 지수는 지리 공간 쿼리에 사용되며 위치 테이블의 Geom 열에서 생성에 적합합니다.

MySQL에서 인덱스를 어떻게 생성합니까?Apr 25, 2025 am 12:06 AM

toreateanindexinmysql, usethecreateindexstatement.1) forasinglecolumn, "createindexidx_lastnameonemployees (lastname);"2) foracompositeIndex를 사용하고 "createDexIdx_nameonemployees (forstName, FirstName);"3)을 사용하십시오

MySQL은 sqlite와 어떻게 다릅니 까?Apr 24, 2025 am 12:12 AM

MySQL과 Sqlite의 주요 차이점은 설계 개념 및 사용 시나리오입니다. 1. MySQL은 대규모 응용 프로그램 및 엔터프라이즈 수준의 솔루션에 적합하며 고성능 및 동시성을 지원합니다. 2. SQLITE는 모바일 애플리케이션 및 데스크탑 소프트웨어에 적합하며 가볍고 내부질이 쉽습니다.

MySQL의 색인이란 무엇이며 성능을 어떻게 향상 시키는가?Apr 24, 2025 am 12:09 AM

MySQL의 인덱스는 데이터 검색 속도를 높이는 데 사용되는 데이터베이스 테이블에서 하나 이상의 열의 주문 구조입니다. 1) 인덱스는 스캔 한 데이터의 양을 줄임으로써 쿼리 속도를 향상시킵니다. 2) B-Tree Index는 균형 잡힌 트리 구조를 사용하여 범위 쿼리 및 정렬에 적합합니다. 3) CreateIndex 문을 사용하여 CreateIndexIdx_customer_idonorders (customer_id)와 같은 인덱스를 작성하십시오. 4) Composite Indexes는 CreateIndexIdx_customer_orderOders (Customer_id, Order_Date)와 같은 다중 열 쿼리를 최적화 할 수 있습니다. 5) 설명을 사용하여 쿼리 계획을 분석하고 피하십시오

MySQL에서 트랜잭션을 사용하여 데이터 일관성을 보장하는 방법을 설명하십시오.Apr 24, 2025 am 12:09 AM

MySQL에서 트랜잭션을 사용하면 데이터 일관성이 보장됩니다. 1) STARTTRANSACTION을 통해 트랜잭션을 시작한 다음 SQL 작업을 실행하고 커밋 또는 롤백으로 제출하십시오. 2) SavePoint를 사용하여 부분 롤백을 허용하는 저장 지점을 설정하십시오. 3) 성능 최적화 제안에는 트랜잭션 시간 단축, 대규모 쿼리 방지 및 격리 수준을 합리적으로 사용하는 것이 포함됩니다.

See all articles