In some business scenarios, the NOT EXISTS statement will be used to ensure that the returned data does not exist in a specific collection. Some colleagues will find that the performance of NOT EXISTS in some scenarios is poor, and even some online There is a rumor that "NOT EXISTS does not use the index". How can we optimize the NOT EXISTS statement?
Take today’s optimized SQL as an example. The SQL before optimization is:
SELECT count(1) FROM t_monitor m WHERE NOT exists ( SELECT 1 FROM t_alarm_realtime AS a WHERE a.resource_id=m.resource_id AND a.resource_type=m.resource_type AND a.monitor_name=m.monitor_name)
We use the LEFT JOIN method to optimize. The SQL after optimization is:
SELECT count(1) FROM t_monitor m LEFT JOIN t_alarm_realtime AS a ON a.resource_id=m.resource_id AND a.resource_type=m.resource_type AND a.monitor_name=m.monitor_name WHERE a.resource_id is NULL
Optimization Effect:
The execution time before optimization is more than 29 seconds, and after optimization it is 1.2 seconds, and the optimization is improved by 25 times.
NOT EXISTS really not indexed?
View the execution plans of two SQLs!
Execution plan using NOT EXIST method:
##Execution plan using LEFT JOIN method:
From the execution plan, both tables use index , the difference is that NOT EXISTS uses the "DEPENDENT SUBQUERY" method , while LEFT JOIN uses ordinary table association.
Recommend reading:Why can indexes improve query speed?
Use theProfiling method provided by MySQL to view the execution process of the two methods.
Execution process using NOT EXIST mode:The execution process of using LEFT JOIN method:
From the execution process, the main consumption of LEFT JOIN method is Sending data One item (1.2s), while the NOT EXISTS method mainly consumes two items: executing and Sending data, which is limited by the fact that Profiling only stores 100 rows of records.
From Profiling, we can only see 47 combinations of "executing and Sending data" (each combination is about 50us). From the execution plan, we can see that the amount of data in the outer t_monitor is 578436 rows, ignoring statistics. If the information is inaccurate, using the NOT EXISTS method should produce 578436 combinations of "executing and Sending data", with a total consumption time of =50μs*578436=28921800us=28.92s.
It can be inferred from the above execution process:
The execution performance of NOT EXISTS method depends heavily on the number of execution times of NOT EXISTS subquery, that is, The amount of data in the outer query result set.
When the data volume N of the outer query result set is small, the execution performance is better. If N=10, the execution time is 50μs*10= 500us=0.005s, plus some additional consumption, the execution result can also be within the range of 0.01 seconds or 10 milliseconds. This response time should be acceptable to most applications.
When the data volume N of the outer Chengxun result set is large or even tens of millions of data, the query performance of NOT EXISTS will become very bad. It may even consume a large amount of server IO and CPU resources, affecting the normal operation of other businesses.
In addition to the above problems, during the optimization process it was discovered that the resource_id column that should store the same data is defined differently in the two tables, one table is VARCHAR and the other table is BIGINT, The field type of the external result set is different from the field type in the NOT EXIST word table, resulting in the index being unable to be used in the NOT EXISTS subquery, resulting in poor subquery performance and ultimately affecting the execution performance of the entire query.
Jingdong Mall has also seen a large number of similar cases. Some tables use VARCHAR to store order numbers, while other tables use BIGINT to store them. When managing the two tables, the performance is extremely poor. I hope my R&D colleagues will take this as a warning. Follow the public account Java Technology Stack and reply to m36 to get a copy of MySQL R&D military regulations.
Related learning recommendations: mysql video tutorial
The above is the detailed content of The relationship between MySQL not exists and indexes. For more information, please follow other related articles on the PHP Chinese website!