Home >Database >Mysql Tutorial >Interfering with the MySQL optimizer's use of hash join

Interfering with the MySQL optimizer's use of hash join

WBOY
WBOYforward
2022-09-15 16:15:012239browse

Recommended learning: mysql video tutorial

GreatSQL community original content may not be used without authorization. Please contact the editor and Indicate the source. GreatSQL is the domestic branch version of MySQL, and its usage is consistent with MySQL.

Preface

The database optimizer is equivalent to the human brain. Most of the time, it can make correct decisions, formulate correct execution plans, and find an efficient path, but after all, it is Judgments based on certain fixed rules and algorithms are sometimes not as flexible as our human brains. What should we do when we determine that the optimizer chooses the wrong execution plan? Add hint to the statement to prompt it to choose which path. A common optimization method.

We know that Oracle provides a more flexible hint to instruct the optimizer which table connection method to choose when connecting multiple tables, such as use_nl, no_use_nl to control whether Use Nest Loop Join, use_hash,no_use_hash to control whether to use hash join.

But MySQL has only had one table connection method for a long time, and that is Nest Loop Join. Hash join did not appear until MySQL version 8.0.18, so MySQL does not provide a control table connection method. There are so many rich hints for us to use. The hash_join and no_hash_join hints are just a glimpse. They only exist in version 8.0.18. 8.0.19 and later versions will use this hint. Abandoned, what if we want to do a hash join between two tables?

Experiment

Let’s do an experiment in the stand-alone environment of MySQL8.0.25. Create two tables, insert 10,000 rows of data respectively, and use the primary key to perform related queries between the two tables.

create table t1(id int primary key,c1 int,c2 int);
create table t2(id int primary key,c1 int,c2 int);
delimiter //
CREATE PROCEDURE p_test()
BEGIN
declare i int;
set i=1;
while i<10001 do
insert into t1 values(i,i,i);
insert into t2 values(i,i,i);
SET i = i + 1;
end while;
END;
//
delimiter ;

Query the actual execution plan when two tables use primary key fields to associate queries, as shown in the figure below:

Query two tables using non-index fields The actual execution plan for related queries is as shown in the following figure:

#It can be seen from the execution plan that there are indexes on the related fields of the driven table, and the optimizer selects the table Nest Loop Join is preferred when joining, and hash join is preferred when no index is available.

Based on this, we can use the no_index prompt to prohibit the statement from using the index of the associated field.

#It can be seen from the above execution plan that after using the no_index prompt, the optimizer chose to use hash join.

When the selectivity of the index is not good, it is very inefficient for the optimizer to choose to use the index to do Nest Loop Join.

We will change the data in column c1 in the two tables in the experiment to make it less selective, and build a normal index on column c1.

update t1 set c1=1 where id<5000;
update t2 set c1=1 where id<5000;
create index idx_t1 on t1(c1);
create index idx_t2 on t2(c1);

When we execute sql:

select t1.*,t2.* from t1 join t2 on t1.c1=t2.c1;

This query result will return a large amount of data. The index selectivity of the c1 column of the associated field of the driven table is poor. At this time, choosing hash join is a wiser choice. , but the optimizer will choose to use Nest Loop Join. We can verify the performance difference between hash join and Nest Loop Join through experiments.

It can be seen that the time consumption of using hash join is 1/6 of using Nest Loop Join, but when the optimizer estimates based on cost, the cost of using Nest Loop Join is higher than that of using Nest Loop Join. The cost of using hash join is much lower, so I will choose Nest Loop Join. At this time, you need to add hints to prohibit the use of indexes on related fields. The cost of full table scan every time on the driven table is very high, so this optimization After the processor estimates, it will choose to perform hash join.

MySQL official documentation mentions using BNL and NO_BNL hints to affect the optimization of hash join. However, experiments have proven that there is no effect on the table connection related fields. When an index is available, the optimizer will not use BNL full table scan for nested loop joins on the driven table after estimating the cost. Instead, it will choose to use hash join, so NO_BNL will be useless in this scenario.

So since we don’t need this index, can’t we just remove it? Why do we have to use the hint hint of no_index? We need to know that there are so many business usage scenarios. It is not used here. If this index is used elsewhere, the efficiency of the index may be greatly improved. At this time, the advantages of hint are highlighted. You only need to control Just use this statement.

Summary

Nest Loop Join has its advantages. It is the fastest connection method for response and is suitable for scenarios where the amount of returned data is small. When two large tables are connected and a large amount of data is returned, and the index of the associated field is relatively inefficient, it will be more efficient to use hash join. We can use the no_index hint to disable the inefficient index of the associated field, prompting the optimizer to choose hash join. .

Recommended learning: mysql video tutorial

The above is the detailed content of Interfering with the MySQL optimizer's use of hash join. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:jb51.net. If there is any infringement, please contact admin@php.cn delete