Home >Database >Mysql Tutorial >MySQL advanced learning: in-depth understanding of the three algorithms of join

MySQL advanced learning: in-depth understanding of the three algorithms of join

青灯夜游
青灯夜游forward
2021-10-09 18:43:452532browse

This article is an advanced study of MySQL. It will introduce the principle of join connection and the three algorithms of join in detail. I hope it will be helpful to you!

MySQL advanced learning: in-depth understanding of the three algorithms of join

We often use join to connect multiple tables when querying multiple tables. In fact, the efficiency of join is not good and we should try to avoid using it. Its essence is that each table For loop matching, MySQL only supports one join algorithm, Nested-Loop Join, but it has multiple variants of the algorithm, which actually improves the execution efficiency of the join. [Related recommendations: mysql video tutorial]

1. Simple Nested-Loop Join(simple nested loop connection)

Simple Nested-Loop join (NLJ) algorithm reads one row at a time from the first table in the loop, passing each row to a nested loop that matches whether the data is consistent. For example, the sql of the driving table User and the driven table UserInfo is select * from User u left join User_info info on u.id = info.user_id. In fact, it is our commonly used for loop. The logic of the pseudo code should be

for(User u:Users){
    for(UserInfo info:UserInfos){
        if(u.id == info.userId){
            // 得到匹配数据
        }
    }
}

Simple and crude algorithm, each time a piece of data is taken from the User table, then all records in User_info are scanned for matching, and finally the data is merged and returned.

If the driving table User has 10 pieces of data, and the driven table UserInfo also has 10 pieces of data, then the driving table User will actually be scanned 10 times, and the driven table will be scanned 10*10=100 times ( Every time the driver table is scanned, all driven tables will be scanned). This efficiency is very low and the overhead on the database is relatively large, especially the driven tables. Each scan is actually reading data from the hard disk and loading it into the memory, which is an IO. Currently IO is the biggest bottleneck

MySQL advanced learning: in-depth understanding of the three algorithms of join

2. Index Nested-Loop Join(index nested loop join)

Index nested loop uses the index to reduce the number of scans to improve efficiency, so it requires non-driver There must be an index on the table.

When querying, the driver table (User) will query based on the index of the associated field. When a matching value is found on the index, the table query will be performed. If the associated field (user_id) of the non-driven table (User_info) is the primary key, the query efficiency will be very high (the leaf nodes of the primary key index structure contain complete row data (InnoDB)). If it is not the primary key, the index will be matched every time Finally, a table return query is required (a table return query based on the primary key ID of the secondary index (non-primary key index)), and the performance is definitely weaker than the primary key query.

MySQL advanced learning: in-depth understanding of the three algorithms of join

The index query in the above figure may not necessarily return the table. Under what circumstances the table will be returned. This depends on whether the fields queried by the index can meet the fields required by the query. , for details, please refer to the previous article: Some basic index knowledge and B-tree index knowledge you need to know

3. Block Nested-Loop Join (cache block Nested loop connection)

If there is an index, the index method will be used to join. If the join column does not have an index, the driven table will have to be scanned too many times. Each time When accessing the driven table, the records in the table will be loaded into the memory, and then a record is taken from the driver table to match it. After the match is completed, the memory is cleared, and then a record is loaded from the driver table and the record of the driven table is Matching is loaded into the memory, and this goes over and over again, greatly increasing the number of IOs. In order to reduce the number of IOs on the driven table, the Block Nested-Loop Join method emerged.

No longer obtains the data of the driver table one by one, but obtains it piece by piece. The join buffer is introduced to cache some data columns related to the join of the driver table (the size is the limit of the join buffer) to the join. buffer, and then scan the driven table in its entirety. Each record in the driven table is matched with all driving table records in the join buffer at once (in-memory operation), and multiple comparisons in a simple nested loop are merged into one , reducing the access frequency of non-driven tables.

MySQL advanced learning: in-depth understanding of the three algorithms of join

Whether the driver table can be loaded at once depends on whether the join buffer can store all the data. By default, join_buffer_size=256k, when querying Join Buffer will cache all columns participating in the query instead of only join columns. In a SQL with N join associations, N-1 join buffers will be allocated. Therefore, when querying, try to reduce unnecessary fields so that more columns can be stored in the join buffer.

You can adjust the cache size of join_buffer_sizeshow variables like '%join_buffer%'This value can be changed according to the actual situation.

MySQL advanced learning: in-depth understanding of the three algorithms of join

Using the Block Nested-Loop Join algorithm requires turning on the optimizer_switch setting of the optimizer management configuration block_nested_loop to on, which is enabled by default. You can view the block_nested_loop status by show variables like '%optimizer_switch%'.

MySQL advanced learning: in-depth understanding of the three algorithms of join

It is enough to understand the above three algorithms. In fact, in actual work, as long as we can make good use of indexes, it will be good. Even for join connections, we must pay attention to whether the associated fields are established. Indexes still need to be good at using indexes to provide query efficiency.

Original address: https://juejin.cn/post/7014105037517357093

Author: Mr. Ji

For more programming-related knowledge, please visit : Introduction to Programming! !

The above is the detailed content of MySQL advanced learning: in-depth understanding of the three algorithms of join. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:掘金--纪先生. If there is any infringement, please contact admin@php.cn delete