Home  >  Article  >  Database  >  Detailed introduction to the principle of MySQL JOIN

Detailed introduction to the principle of MySQL JOIN

零下一度
零下一度Original
2017-07-20 15:31:092295browse

one. Overview of Join syntax

join is used to connect fields in multiple tables. The syntax is as follows:

... FROM table1 INNER|LEFT|RIGHT JOIN table2 ON conditiona

table1: left table; table2: right table.

JOIN is roughly divided into the following three categories according to its functions:

INNER JOIN (inner join, or equivalent join): obtains records with a connection matching relationship in two tables.

LEFT JOIN (Left Join): Get the complete records of the left table (table1), that is, there is no corresponding matching record in the right table (table2).

RIGHT JOIN (right join): Contrary to LEFT JOIN, complete records of the right table (table2) are obtained, that is, there is no matching corresponding record in the left table (table1).

Note: mysql does not support Full join, but you can use the UNION keyword to combine LEFT JOIN and RIGHT JOIN to simulate a FULL join.

First look at the two tables in the experiment:

Table comments, the total number of rows is 28856
Table comments_for, the total number of rows is 57, comments_id is indexed, and the ID column is the primary key.
The above two tables are the basis for our testing, and then look at the index. The comments_for table comments_id is indexed, and the ID is the primary key.
Recently, a developer in the company asked me about MySQL JOIN JOIN. In detail, I told you that my understanding of MySQL JOIN is not very deep, so I also checked a lot of documents, and finally posted it on the InsideMySQL official account. I saw two analysis articles about JOIN and felt they were very well written. Let me share my actual test of JOIN. Let's first introduce MySQL's JOIN algorithm, which is divided into three types (source: InsideMySQL):
MySQL only supports one JOIN algorithm, Nested-Loop Join (nested loop link), unlike other businesses The database can support hash links and merge connections, but MySQL's Nested-Loop Join (nested loop link) also has many variants, which can help MySQL perform JOIN operations more efficiently:
(1) Simple Nested- Loop Join (picture taken from InsideMySQL)
This algorithm is relatively simple. R1 is taken from the driver table to match all columns of the S table, and then R2, R3, until all the data in the R table are matched, and then the data are merged. It can be seen that this algorithm requires RN accesses to the S table. Although it is simple, the overhead is still relatively too high
(2) Index Nested-Loop Join, the implementation method is as follows:
Index nested connection Since there are indexes on the non-driven table, when comparing Instead of comparing records one by one, indexes can be used to reduce comparisons, thus speeding up queries. This is one of the main reasons why we usually require the related fields to have indexes when doing related queries.
When this algorithm performs a link query, the driver table will search based on the index of the associated field. When a matching value is found on the index, it will return to the table for query, that is, only when the index is matched. Only then will the table be returned. As for the selection of the driver table, the MySQL optimizer will generally choose the driver table with a small number of records. However, when the SQL is particularly complex, incorrect selections cannot be ruled out.
In the index nested link mode, if the associated key of the non-driven table is the primary key, the performance will be very high. If it is not the primary key, the number of rows returned will be large if the association is , the efficiency will be particularly low because multiple table return operations are required. First associate the index, and then perform the table return operation based on the primary key ID of the secondary index. In this case, the performance will be relatively poor.
(3) Block Nested-Loop Join, implemented as follows:
When there is an index, MySQL will try to use Index Nested -Loop Join algorithm. In some cases, the Join column may not have an index. In this case, MySQL's choice will definitely not be the Simple Nested-Loop Join algorithm introduced first, but will give priority to Block Nested-Loop Join. algorithm.
Compared with Simple Nested-Loop Join, Block Nested-Loop Join has an additional intermediate processing process, which is the join buffer. Use the join buffer to buffer all the query JOIN related columns of the driver table into the JOIN BUFFER, and then Compare batches with non-driven tables. If this is also implemented, multiple comparisons can be merged into one, reducing the access frequency of non-driven tables. That is, the S table only needs to be accessed once. In this way, the non-driven table will not be accessed multiple times, and only in this case will the join buffer be accessed.
In MySQL, we can set the value of the join buffer through the parameter join_buffer_size, and then perform the operation. By default join_buffer_size=256K, MySQL will cache all required columns into the join buffer during search, including the selected columns, instead of just caching the associated columns. In a SQL with N JOIN associations, N-1 join buffers will be allocated during execution.
The above introduction is over, let’s take a look at the specific examples
(1) Full table JOIN
EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id;

Look at the output information:
You can see that comments_for is used during the full table scan. In the driver table, because the related fields are indexed, a full index scan is performed on the index idx_commentsid to match the non-driven table comments, and one row can be matched each time. At this time, Index Nested-Loop Join is used, and the entire table is matched through the index. We can see that because the magnitude of the comments_for table is much smaller than comments, MySQL gives priority to the small table comments_for as the driving table.
(2) Full table JOIN + filter conditions
SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id
WHERE gc.comments_id =2056

At this time, Index Nested-Loop Join is used. First, filter the primary key of the driver table comments to match one, and then search and match the index idx_commentsid of the non-driven table comments_for. The final matching result is expected to affect one, so that only the non-driver table comments_for is matched. The idx_commentsid index of the driver table performs an access operation, and the efficiency is relatively high.
(3) Take a look at the situation where the associated field has no index:
EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.order_id=gcf.product_id

Let’s take a look at the execution plan:
We can see from the execution plan that this table JOIN uses Block Nested-Loop Join to perform table association. First, comments_for ( Only 57 rows) This small table is used as the driver table, and then the required data of comments_for is cached in the JOIN buffer, and the comments table is scanned in batches, that is, only one match is performed. The premise is that the join buffer is large enough to store the comments_for cache. data.
And we see a very clear prompt in the execution plan: Using where; Using join buffer (Block Nested Loop)
Generally, if this happens, it proves that our SQL needs to be optimized. .
It should be noted that in this case, MySQL will also choose the violent method of Simple Nested-Loop Join. I haven’t understood how it chooses this optimizer, but it usually uses Block Nested. -Loop Join, because CBO is based on overhead, the performance of Block Nested-Loop Join is much better than that of Simple Nested-Loop Join.
(4) Take a look at left join
EXPLAIN SELECT * FROM comments gc
LEFT JOIN comments_for gcf ON gc.comments_id=gcf.comments_id

Look at the execution plan:
In this case, since our related fields are indexed, we call Index Nested-Loop Join, but when there are no filter conditions, the first table will be selected as the driving table. To perform JOIN, to associate the index of the non-driven table to perform Index Nested-Loop Join.
If you add the filter condition gc.comments_id =2056, this will filter out an Index Nested-Loop Join for the non-driven table, which is very efficient.
If it is the following:
EXPLAIN SELECT * FROM comments_for gcf
LEFT JOIN comments gc ON gc.comments_id=gcf.comments_id
WHERE gcf.comments_id =2056

If you filter through the gcf table, the gcf table will be selected by default As a driver table, because it has obviously been filtered, there will be very few matching conditions. For details, you can look at the execution plan:
At this point, the join is basically very clear, to be continued , everyone is welcome to point out errors and I will seriously correct them. . . .

The above is the detailed content of Detailed introduction to the principle of MySQL JOIN. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn