Home >System Tutorial >LINUX >Research on optimizing SQL efficiency

Research on optimizing SQL efficiency

王林
王林forward
2024-01-28 08:09:051087browse

This is a case shared by teacher Chen Hongyi (Old K) at the Shanghai MOORACLE Conference in August 2016. By rewriting a merge SQL into plsql, the execution efficiency was greatly improved. When Tiger Liu saw this case, he initially did not notice the actual number of records in each table displayed in the execution plan. He did not think that the way of rewriting plsql was more efficient than the way of writing analytic functions. He also had several email discussions with Teacher Chen. It wasn’t until later that I took a closer look at the execution plan.

The original SQL is as follows:

merge into t_customer c using

(

select a.cstno, a.amount from t_trade a,

(select cstno,max(trade_date) trade_date from t_trade

group by cstno) b

where a.cstno = b.cstno and a.trade_date=b.trade_date

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

This SQL is to update the latest consumption amount in the user transaction details table (t_trade) to the consumption amount field in the user information table (t_customer), using the merge operation.

Implementation plan:

Research on optimizing SQL efficiency

Tiger Liu Note:

Before mastering the writing method of analysis function, the red part of SQL is a common way of writing other field information after group by, which is also the fundamental reason for the poor execution efficiency of this SQL.

There is another hidden danger in the original SQL, that is, if the maximum trade_date corresponding to a certain cstno of t_trade is repeated, then this SQL will report an ORA-30926 error and cannot be executed.

If you don’t look carefully at the execution plan (real data volume information of the two tables), the usual optimization method for this kind of SQL is to use analytic functions to rewrite:

Rewriting method 1:

merge into t_customer c using

(

select a.cstno,a.amount from

(select trade_date,cstno,amount,

row_number()over(partition by cstno order by trade_date desc) RNO from t_trade)a

where RNO=1

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

This rewriting method will be much more efficient than the original SQL, and there will be no problem of repeated error reports for max trade_date corresponding to a certain cstno.

However, Teacher Chen did not use the rewriting method of analytic function. Instead, based on the large difference in data volume between the two tables, he rewritten the SQL into a more efficient plsql:

Rewriting method 2:

declare

vamount number;

begin

for v in (select * from t_customer )

loop

select amount into vamount from

(select amount from t_trade where cstno=v.cstno order by trade_date desc)

where rownum

update t_customer set amount = vamount where cstno=v.cstno;

end loop

commit;

end;

/

According to the original SQL execution plan, we know that the number of records in the t_customer table is relatively small, only more than 1,000, while the t_trade table has 10 million records, with a ratio of 1:10000 (I don’t know if this is real data or test data, only There are more than 1,000 users, and an average user has 10,000 consumption details, which does not look like real data).

In such a special case where the data between the two tables is quite different, the plsql writing method is indeed more efficient than the analytical function writing method. This rewriting is very clever.

Let’s analyze the advantages and disadvantages of these two rewritings:

1. The rewriting method of plsql is suitable when the t_customer table is relatively small, and the ratio of the number of records in the t_customer and t_trade tables is relatively large. The execution efficiency will be higher than the rewriting of the analytical function. In this example, if the number of records in the t_customer table is 100,000, then the way of writing the analytical function is dozens to hundreds of times faster than the way of writing plsql.

3. The prerequisite for this rewriting of plsql is that there must be a joint index of the two fields of the t_trade table cstno trade_date. The rewriting of analytic functions does not require any index support.

4. For tables with tens of millions of records like t_trade, writing analytical functions can speed up by turning on parallelism; if you want to improve efficiency when rewriting plsql, you need to first group the t_customer table by cstno and use multiple sessions. Concurrent execution.

Let’s see if Teacher Chen’s plsql can be implemented with a single sql. I made an attempt. The SQL code is as follows:

merge into t_customer c using

(

select tc.cstno,

(select amount

from t_trade td1

where td1.cstno=tc.cstno and td1.trade_date = (select max(trade_date) from t_trade td2 where tc.cstno = td2.cstno) and rownum=1 ) as amount

from t_customer tc

) m

on(c.cstno = m.cstno)

when matched then

update set c.amount = m.amount;

The execution plan is roughly as follows:

Research on optimizing SQL efficiency

This writing method also requires the cstno trade_date joint index (IDX_T_TRADE) to exist in the t_trade table, and the data volume of the T_customer table is much lower than that of T_trade.

According to the execution plan, the execution efficiency of this sql should be comparable to that of plsql writing.

Summarize:

SQL optimization, in addition to avoiding inefficient SQL writing, mainly depends on the data volume and data distribution of the table. The rewriting method of plsql will show higher efficiency in a few special cases. In some cases of data distribution, the efficiency may not be as good as the original SQL. However, the optimization ideas are worth learning from.

The way the analysis function is rewritten, no matter how the data is distributed, will be more efficient and more versatile than the original SQL.

There should still be many developers and DBAs using the SQL before this example was rewritten. After understanding how to use the analysis function, the inefficient way of writing the original SQL should be completely abandoned.

The last plsql is rewritten into a single SQL. The logic seems to be complicated and difficult to understand. Generally, such rewriting is not used. It would be nice for everyone to understand it.

Again, there is no definite formula for optimization. The optimizer is dead, but the human brain is alive. Only by mastering the principles can SQL execution efficiency become higher and higher.

The above is the detailed content of Research on optimizing SQL efficiency. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:linuxprobe.com. If there is any infringement, please contact admin@php.cn delete