MySQL advanced learning: in-depth understanding of the three algorithms of join-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

MySQL advanced learning: in-depth understanding of the three algorithms of join

青灯夜游

Oct 09, 2021 pm 06:43 PM

joinmysql

This article is an advanced study of MySQL. It will introduce the principle of join connection and the three algorithms of join in detail. I hope it will be helpful to you!

MySQL advanced learning: in-depth understanding of the three algorithms of join

We often use join to connect multiple tables when querying multiple tables. In fact, the efficiency of join is not good and we should try to avoid using it. Its essence is that each table For loop matching, MySQL only supports one join algorithm, Nested-Loop Join, but it has multiple variants of the algorithm, which actually improves the execution efficiency of the join. [Related recommendations: mysql video tutorial]

1. Simple Nested-Loop Join(simple nested loop connection)

Simple Nested-Loop join (NLJ) algorithm reads one row at a time from the first table in the loop, passing each row to a nested loop that matches whether the data is consistent. For example, the sql of the driving table User and the driven table UserInfo is select * from User u left join User_info info on u.id = info.user_id. In fact, it is our commonly used for loop. The logic of the pseudo code should be

for(User u:Users){
    for(UserInfo info:UserInfos){
        if(u.id == info.userId){
            // 得到匹配数据
        }
    }
}

Simple and crude algorithm, each time a piece of data is taken from the User table, then all records in User_info are scanned for matching, and finally the data is merged and returned.

If the driving table User has 10 pieces of data, and the driven table UserInfo also has 10 pieces of data, then the driving table User will actually be scanned 10 times, and the driven table will be scanned 10*10=100 times ( Every time the driver table is scanned, all driven tables will be scanned). This efficiency is very low and the overhead on the database is relatively large, especially the driven tables. Each scan is actually reading data from the hard disk and loading it into the memory, which is an IO. Currently IO is the biggest bottleneck

MySQL advanced learning: in-depth understanding of the three algorithms of join

2. Index Nested-Loop Join(index nested loop join)

Index nested loop uses the index to reduce the number of scans to improve efficiency, so it requires non-driver There must be an index on the table.

When querying, the driver table (User) will query based on the index of the associated field. When a matching value is found on the index, the table query will be performed. If the associated field (user_id) of the non-driven table (User_info) is the primary key, the query efficiency will be very high (the leaf nodes of the primary key index structure contain complete row data (InnoDB)). If it is not the primary key, the index will be matched every time Finally, a table return query is required (a table return query based on the primary key ID of the secondary index (non-primary key index)), and the performance is definitely weaker than the primary key query.

MySQL advanced learning: in-depth understanding of the three algorithms of join

The index query in the above figure may not necessarily return the table. Under what circumstances the table will be returned. This depends on whether the fields queried by the index can meet the fields required by the query. , for details, please refer to the previous article: Some basic index knowledge and B-tree index knowledge you need to know

3. Block Nested-Loop Join (cache block Nested loop connection)

If there is an index, the index method will be used to join. If the join column does not have an index, the driven table will have to be scanned too many times. Each time When accessing the driven table, the records in the table will be loaded into the memory, and then a record is taken from the driver table to match it. After the match is completed, the memory is cleared, and then a record is loaded from the driver table and the record of the driven table is Matching is loaded into the memory, and this goes over and over again, greatly increasing the number of IOs. In order to reduce the number of IOs on the driven table, the Block Nested-Loop Join method emerged.

No longer obtains the data of the driver table one by one, but obtains it piece by piece. The join buffer is introduced to cache some data columns related to the join of the driver table (the size is the limit of the join buffer) to the join. buffer, and then scan the driven table in its entirety. Each record in the driven table is matched with all driving table records in the join buffer at once (in-memory operation), and multiple comparisons in a simple nested loop are merged into one , reducing the access frequency of non-driven tables.

MySQL advanced learning: in-depth understanding of the three algorithms of join

Whether the driver table can be loaded at once depends on whether the join buffer can store all the data. By default, join_buffer_size=256k, when querying Join Buffer will cache all columns participating in the query instead of only join columns. In a SQL with N join associations, N-1 join buffers will be allocated. Therefore, when querying, try to reduce unnecessary fields so that more columns can be stored in the join buffer.

You can adjust the cache size of join_buffer_sizeshow variables like '%join_buffer%'This value can be changed according to the actual situation.

MySQL advanced learning: in-depth understanding of the three algorithms of join

Using the Block Nested-Loop Join algorithm requires turning on the optimizer_switch setting of the optimizer management configuration block_nested_loop to on, which is enabled by default. You can view the block_nested_loop status by show variables like '%optimizer_switch%'.

MySQL advanced learning: in-depth understanding of the three algorithms of join

It is enough to understand the above three algorithms. In fact, in actual work, as long as we can make good use of indexes, it will be good. Even for join connections, we must pay attention to whether the associated fields are established. Indexes still need to be good at using indexes to provide query efficiency.

Original address: https://juejin.cn/post/7014105037517357093

Author: Mr. Ji

For more programming-related knowledge, please visit : Introduction to Programming! !

The above is the detailed content of MySQL advanced learning: in-depth understanding of the three algorithms of join. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:掘金--纪先生. If there is any infringement, please contact admin@php.cn delete

Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

Describe the different SQL transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) and their implications in MySQL/InnoDB.Apr 15, 2025 am 12:11 AM

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL vs. Other Databases: Comparing the OptionsApr 15, 2025 am 12:08 AM

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

How does MySQL index cardinality affect query performance?Apr 14, 2025 am 12:18 AM

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

MySQL: Resources and Tutorials for New UsersApr 14, 2025 am 12:16 AM

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

Real-World MySQL: Examples and Use CasesApr 14, 2025 am 12:15 AM

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.

See all articles