search
HomeSystem TutorialLINUXResearch on optimizing SQL efficiency
Research on optimizing SQL efficiencyJan 28, 2024 am 08:09 AM
linuxlinux tutorialRed Hatlinux systemlinux commandlinux certificationred hat linuxlinux video

This is a case shared by teacher Chen Hongyi (Old K) at the Shanghai MOORACLE Conference in August 2016. By rewriting a merge SQL into plsql, the execution efficiency was greatly improved. When Tiger Liu saw this case, he initially did not notice the actual number of records in each table displayed in the execution plan. He did not think that the way of rewriting plsql was more efficient than the way of writing analytic functions. He also had several email discussions with Teacher Chen. It wasn’t until later that I took a closer look at the execution plan.

The original SQL is as follows:

merge into t_customer c using

(

select a.cstno, a.amount from t_trade a,

(select cstno,max(trade_date) trade_date from t_trade

group by cstno) b

where a.cstno = b.cstno and a.trade_date=b.trade_date

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

This SQL is to update the latest consumption amount in the user transaction details table (t_trade) to the consumption amount field in the user information table (t_customer), using the merge operation.

Implementation plan:

Research on optimizing SQL efficiency

Tiger Liu Note:

Before mastering the writing method of analysis function, the red part of SQL is a common way of writing other field information after group by, which is also the fundamental reason for the poor execution efficiency of this SQL.

There is another hidden danger in the original SQL, that is, if the maximum trade_date corresponding to a certain cstno of t_trade is repeated, then this SQL will report an ORA-30926 error and cannot be executed.

If you don’t look carefully at the execution plan (real data volume information of the two tables), the usual optimization method for this kind of SQL is to use analytic functions to rewrite:

Rewriting method 1:

merge into t_customer c using

(

select a.cstno,a.amount from

(select trade_date,cstno,amount,

row_number()over(partition by cstno order by trade_date desc) RNO from t_trade)a

where RNO=1

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

This rewriting method will be much more efficient than the original SQL, and there will be no problem of repeated error reports for max trade_date corresponding to a certain cstno.

However, Teacher Chen did not use the rewriting method of analytic function. Instead, based on the large difference in data volume between the two tables, he rewritten the SQL into a more efficient plsql:

Rewriting method 2:

declare

vamount number;

begin

for v in (select * from t_customer )

loop

select amount into vamount from

(select amount from t_trade where cstno=v.cstno order by trade_date desc)

where rownum

update t_customer set amount = vamount where cstno=v.cstno;

end loop

commit;

end;

/

According to the original SQL execution plan, we know that the number of records in the t_customer table is relatively small, only more than 1,000, while the t_trade table has 10 million records, with a ratio of 1:10000 (I don’t know if this is real data or test data, only There are more than 1,000 users, and an average user has 10,000 consumption details, which does not look like real data).

In such a special case where the data between the two tables is quite different, the plsql writing method is indeed more efficient than the analytical function writing method. This rewriting is very clever.

Let’s analyze the advantages and disadvantages of these two rewritings:

1. The rewriting method of plsql is suitable when the t_customer table is relatively small, and the ratio of the number of records in the t_customer and t_trade tables is relatively large. The execution efficiency will be higher than the rewriting of the analytical function. In this example, if the number of records in the t_customer table is 100,000, then the way of writing the analytical function is dozens to hundreds of times faster than the way of writing plsql.

3. The prerequisite for this rewriting of plsql is that there must be a joint index of the two fields of the t_trade table cstno trade_date. The rewriting of analytic functions does not require any index support.

4. For tables with tens of millions of records like t_trade, writing analytical functions can speed up by turning on parallelism; if you want to improve efficiency when rewriting plsql, you need to first group the t_customer table by cstno and use multiple sessions. Concurrent execution.

Let’s see if Teacher Chen’s plsql can be implemented with a single sql. I made an attempt. The SQL code is as follows:

merge into t_customer c using

(

select tc.cstno,

(select amount

from t_trade td1

where td1.cstno=tc.cstno and td1.trade_date = (select max(trade_date) from t_trade td2 where tc.cstno = td2.cstno) and rownum=1 ) as amount

from t_customer tc

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

The execution plan is roughly as follows:

Research on optimizing SQL efficiency

This writing method also requires the cstno trade_date joint index (IDX_T_TRADE) to exist in the t_trade table, and the data volume of the T_customer table is much lower than that of T_trade.

According to the execution plan, the execution efficiency of this sql should be comparable to that of plsql writing.

Summarize:

SQL optimization, in addition to avoiding inefficient SQL writing, mainly depends on the data volume and data distribution of the table. The rewriting method of plsql will show higher efficiency in a few special cases. In some cases of data distribution, the efficiency may not be as good as the original SQL. However, the optimization ideas are worth learning from.

The way the analysis function is rewritten, no matter how the data is distributed, will be more efficient and more versatile than the original SQL.

There should still be many developers and DBAs using the SQL before this example was rewritten. After understanding how to use the analysis function, the inefficient way of writing the original SQL should be completely abandoned.

The last plsql is rewritten into a single SQL. The logic seems to be complicated and difficult to understand. Generally, such rewriting is not used. It would be nice for everyone to understand it.

Again, there is no definite formula for optimization. The optimizer is dead, but the human brain is alive. Only by mastering the principles can SQL execution efficiency become higher and higher.

The above is the detailed content of Research on optimizing SQL efficiency. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:Linux就该这么学. If there is any infringement, please contact admin@php.cn delete
什么是linux设备节点什么是linux设备节点Apr 18, 2022 pm 08:10 PM

linux设备节点是应用程序和设备驱动程序沟通的一个桥梁;设备节点被创建在“/dev”,是连接内核与用户层的枢纽,相当于硬盘的inode一样的东西,记录了硬件设备的位置和信息。设备节点使用户可以与内核进行硬件的沟通,读写设备以及其他的操作。

Linux中open和fopen的区别有哪些Linux中open和fopen的区别有哪些Apr 29, 2022 pm 06:57 PM

区别:1、open是UNIX系统调用函数,而fopen是ANSIC标准中的C语言库函数;2、open的移植性没fopen好;3、fopen只能操纵普通正规文件,而open可以操作普通文件、网络套接字等;4、open无缓冲,fopen有缓冲。

linux中什么叫端口映射linux中什么叫端口映射May 09, 2022 pm 01:49 PM

端口映射又称端口转发,是指将外部主机的IP地址的端口映射到Intranet中的一台计算机,当用户访问外网IP的这个端口时,服务器自动将请求映射到对应局域网内部的机器上;可以通过使用动态或固定的公共网络IP路由ADSL宽带路由器来实现。

什么是linux交叉编译什么是linux交叉编译Apr 29, 2022 pm 06:47 PM

在linux中,交叉编译是指在一个平台上生成另一个平台上的可执行代码,即编译源代码的平台和执行源代码编译后程序的平台是两个不同的平台。使用交叉编译的原因:1、目标系统没有能力在其上进行本地编译;2、有能力进行源代码编译的平台与目标平台不同。

linux中eof是什么linux中eof是什么May 07, 2022 pm 04:26 PM

在linux中,eof是自定义终止符,是“END Of File”的缩写;因为是自定义的终止符,所以eof就不是固定的,可以随意的设置别名,linux中按“ctrl+d”就代表eof,eof一般会配合cat命令用于多行文本输出,指文件末尾。

linux怎么判断pcre是否安装linux怎么判断pcre是否安装May 09, 2022 pm 04:14 PM

在linux中,可以利用“rpm -qa pcre”命令判断pcre是否安装;rpm命令专门用于管理各项套件,使用该命令后,若结果中出现pcre的版本信息,则表示pcre已经安装,若没有出现版本信息,则表示没有安装pcre。

linux怎么查询mac地址linux怎么查询mac地址Apr 24, 2022 pm 08:01 PM

linux查询mac地址的方法:1、打开系统,在桌面中点击鼠标右键,选择“打开终端”;2、在终端中,执行“ifconfig”命令,查看输出结果,在输出信息第四行中紧跟“ether”单词后的字符串就是mac地址。

linux中rpc是什么意思linux中rpc是什么意思May 07, 2022 pm 04:48 PM

在linux中,rpc是远程过程调用的意思,是Reomote Procedure Call的缩写,特指一种隐藏了过程调用时实际通信细节的IPC方法;linux中通过RPC可以充分利用非共享内存的多处理器环境,提高系统资源的利用率。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),