Home >System Tutorial >LINUX >Optimize SQL queries to reduce 'Not in' runtime
Introduction | In the database environment optimized by DBA, the vast majority of performance problems are actually caused by improper SQL writing. The world of SQL is full of wonders. Today we will take a look at a killer SQL that will make you want to vomit blood. |
For an insurance client, ETL took several hours. We made a SQL report and found that the pressure was mainly on one of the SQLs.
Single execution time: 5788 (seconds)
Single logical read: 1 billion (blocks)
Number of rows returned at a time: 210,000 (rows)
Let’s look at the SQL statement first. Because it is relatively long, we only excerpt part of it
View its execution plan:
We mainly focus on rows 7 to 16: we found that there are two full table scans. A filter was done in the middle.
Years of experience tell me that the Filter composed of two full table scans has serious problems because it involves processing data one by one. In this execution plan, the driven table is still scanned in its entirety.
Not In/In operations sometimes do produce Filter operations. In versions before 11g, the not in statement must be converted into an anti-join. The column of the not in condition must have the Not null attribute, or not is included in the statement. null limit, otherwise you can only use Filter to filter one by one.
Let’s give an example:
View the attributes of T_OBJ:
Found that there is no restriction of not null on the three columns.
We are pretending to be a 10G optimizer at this time.
SQL> alter session set optimizer_features_enable=”10.2.0.5″;
Execute the following SQL:
SQL> set autotracetrace exp
SQL> SELECT * FROM T_TABLE WHERE TABLE_NAME NOT IN(SELECT OBJECT_NAME FROM T_OBJ);
Checking the execution plan at this time, we found that the filter is used:
But in the 11g version, the optimizer can automatically convert the Not in operation from expensive Filter to Null-Aware-Anti-Join.
If you add a Not null condition or set the field attribute to not null
SQL> alter table T_OBJ modify(OBJECT_NAME NOT NULL);
Execute the same statement again:
SQL> SELECT * FROM T_TABLE WHERE TABLE_NAME
NOT IN(SELECT OBJECT_NAME FROM T_OBJ
WHEREOBJECT_NAME IS NOT NULL);
View the execution plan again:
At this time we found that in the execution plan, hash join anti.
And, in 11g, not in columns are allowed without not null restrictions, and Anti-Join can also be converted.
SQL> alter session set optimizer_features_enable=”11.2.0.4″;
SQL> alter table T_OBJ modify(OBJECT_NAME NULL);
SQ> SELECT * FROM T_TABLE WHERE TABLE_NAME
NOT IN (SELECTOBJECT_NAMEFROM T_OBJ);
View execution plan:
We see that at this time, hash join anti.
is also used without non-empty restrictions.This feature can be controlled through optimizer parameters.
SQL>alter session set “_optimizer_null_aware_antijoin”=FALSE;
Execute the above statement again and view the execution plan:
SQL> SELECT * FROM T_TABLE WHERE TABLE_NAME
NOT IN (SELECTOBJECT_NAMEFROM T_OBJ);
It was found that hash join anti.
is still used.After verification, it is not a problem with this parameter setting
The logic of Not in is the mutual exclusion between result sets. In fact, there are many ways to rewrite it, such as:
—Not exists
— Outer Join is null
—Minus
The difference between not in and the above three ways of writing is: not in will exclude null values.
We try to rewrite.
Next, just when you thought a miracle would happen, the statement reported an error!
Why is an error reported?
If we convert this statement into not in:
According to the logic of not in, 'A.' should be added before fee_code at this time. Of course, this is no problem, but if you look at this statement again, it will become:
Since there is no FEE_CODE field in TMP_APP_xxx_PREM A, Not in cannot be automatically changed to Null Aware ANTI JOIN.
So, now that the answer is revealed, it turns out to be a mistake? ! I guessed the beginning, but not the ending.
But in this case, because the statement was not explicitly written in the SQL statement, this error was never discovered during the early analysis process.
Are you also speechless? In fact, what I want to ask more is, do you often write killer SQL? But it doesn't matter. If you are sick, I have medicine. (Innocent face, don’t hit me)
We all know that in the database environment optimized by DBA, the vast majority of performance problems are actually caused by improper SQL writing.
For systems that are not online, through early SQL audit and control, 80% of SQL problems will be eliminated in the budding stage. For online running systems, potential performance problems can be discovered and solved to prevent them before they happen. .
SQL audit allows DBA to transform from the system’s emergency doctor to the system’s health care doctor
1. DBA participates in the application code development and testing process: Provide developers with professional database development and optimization suggestions
2. Pre-optimization: Design efficient SQL and index according to business needs before the application code goes online
3. Control change risks: Pre-evaluate the impact of table structure changes and SQL changes during application development on running applications, and determine appropriate change windows and change plans.
The above is the detailed content of Optimize SQL queries to reduce 'Not in' runtime. For more information, please follow other related articles on the PHP Chinese website!