Detailed introduction to the principle of MySQL JOIN-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Detailed introduction to the principle of MySQL JOIN

零下一度

Jul 20, 2017 pm 03:31 PM

joinmysqlprinciple

one. Overview of Join syntax

join is used to connect fields in multiple tables. The syntax is as follows:

... FROM table1 INNER|LEFT|RIGHT JOIN table2 ON conditiona

table1: left table; table2: right table.

JOIN is roughly divided into the following three categories according to its functions:

INNER JOIN (inner join, or equivalent join): obtains records with a connection matching relationship in two tables.

LEFT JOIN (Left Join): Get the complete records of the left table (table1), that is, there is no corresponding matching record in the right table (table2).

RIGHT JOIN (right join): Contrary to LEFT JOIN, complete records of the right table (table2) are obtained, that is, there is no matching corresponding record in the left table (table1).

Note: mysql does not support Full join, but you can use the UNION keyword to combine LEFT JOIN and RIGHT JOIN to simulate a FULL join.

First look at the two tables in the experiment:

Table comments, the total number of rows is 28856

Table comments_for, the total number of rows is 57, comments_id is indexed, and the ID column is the primary key.

The above two tables are the basis for our testing, and then look at the index. The comments_for table comments_id is indexed, and the ID is the primary key.

Recently, a developer in the company asked me about MySQL JOIN JOIN. In detail, I told you that my understanding of MySQL JOIN is not very deep, so I also checked a lot of documents, and finally posted it on the InsideMySQL official account. I saw two analysis articles about JOIN and felt they were very well written. Let me share my actual test of JOIN. Let's first introduce MySQL's JOIN algorithm, which is divided into three types (source: InsideMySQL):

MySQL only supports one JOIN algorithm, Nested-Loop Join (nested loop link), unlike other businesses The database can support hash links and merge connections, but MySQL's Nested-Loop Join (nested loop link) also has many variants, which can help MySQL perform JOIN operations more efficiently:

(1) Simple Nested- Loop Join (picture taken from InsideMySQL)

This algorithm is relatively simple. R1 is taken from the driver table to match all columns of the S table, and then R2, R3, until all the data in the R table are matched, and then the data are merged. It can be seen that this algorithm requires RN accesses to the S table. Although it is simple, the overhead is still relatively too high

(2) Index Nested-Loop Join, the implementation method is as follows:

Index nested connection Since there are indexes on the non-driven table, when comparing Instead of comparing records one by one, indexes can be used to reduce comparisons, thus speeding up queries. This is one of the main reasons why we usually require the related fields to have indexes when doing related queries.

When this algorithm performs a link query, the driver table will search based on the index of the associated field. When a matching value is found on the index, it will return to the table for query, that is, only when the index is matched. Only then will the table be returned. As for the selection of the driver table, the MySQL optimizer will generally choose the driver table with a small number of records. However, when the SQL is particularly complex, incorrect selections cannot be ruled out.

In the index nested link mode, if the associated key of the non-driven table is the primary key, the performance will be very high. If it is not the primary key, the number of rows returned will be large if the association is , the efficiency will be particularly low because multiple table return operations are required. First associate the index, and then perform the table return operation based on the primary key ID of the secondary index. In this case, the performance will be relatively poor.

(3) Block Nested-Loop Join, implemented as follows:

When there is an index, MySQL will try to use Index Nested -Loop Join algorithm. In some cases, the Join column may not have an index. In this case, MySQL's choice will definitely not be the Simple Nested-Loop Join algorithm introduced first, but will give priority to Block Nested-Loop Join. algorithm.

Compared with Simple Nested-Loop Join, Block Nested-Loop Join has an additional intermediate processing process, which is the join buffer. Use the join buffer to buffer all the query JOIN related columns of the driver table into the JOIN BUFFER, and then Compare batches with non-driven tables. If this is also implemented, multiple comparisons can be merged into one, reducing the access frequency of non-driven tables. That is, the S table only needs to be accessed once. In this way, the non-driven table will not be accessed multiple times, and only in this case will the join buffer be accessed.

In MySQL, we can set the value of the join buffer through the parameter join_buffer_size, and then perform the operation. By default join_buffer_size=256K, MySQL will cache all required columns into the join buffer during search, including the selected columns, instead of just caching the associated columns. In a SQL with N JOIN associations, N-1 join buffers will be allocated during execution.

The above introduction is over, let’s take a look at the specific examples

(1) Full table JOIN

EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id;

Look at the output information:

You can see that comments_for is used during the full table scan. In the driver table, because the related fields are indexed, a full index scan is performed on the index idx_commentsid to match the non-driven table comments, and one row can be matched each time. At this time, Index Nested-Loop Join is used, and the entire table is matched through the index. We can see that because the magnitude of the comments_for table is much smaller than comments, MySQL gives priority to the small table comments_for as the driving table.

(2) Full table JOIN + filter conditions

SELECT * FROM comments gc
JOIN comments_for gcf ON gc.comments_id=gcf.comments_id
WHERE gc.comments_id =2056

At this time, Index Nested-Loop Join is used. First, filter the primary key of the driver table comments to match one, and then search and match the index idx_commentsid of the non-driven table comments_for. The final matching result is expected to affect one, so that only the non-driver table comments_for is matched. The idx_commentsid index of the driver table performs an access operation, and the efficiency is relatively high.

(3) Take a look at the situation where the associated field has no index:

EXPLAIN SELECT * FROM comments gc
JOIN comments_for gcf ON gc.order_id=gcf.product_id

Let’s take a look at the execution plan:

We can see from the execution plan that this table JOIN uses Block Nested-Loop Join to perform table association. First, comments_for ( Only 57 rows) This small table is used as the driver table, and then the required data of comments_for is cached in the JOIN buffer, and the comments table is scanned in batches, that is, only one match is performed. The premise is that the join buffer is large enough to store the comments_for cache. data.

And we see a very clear prompt in the execution plan: Using where; Using join buffer (Block Nested Loop)

Generally, if this happens, it proves that our SQL needs to be optimized. .

It should be noted that in this case, MySQL will also choose the violent method of Simple Nested-Loop Join. I haven’t understood how it chooses this optimizer, but it usually uses Block Nested. -Loop Join, because CBO is based on overhead, the performance of Block Nested-Loop Join is much better than that of Simple Nested-Loop Join.

(4) Take a look at left join

EXPLAIN SELECT * FROM comments gc
LEFT JOIN comments_for gcf ON gc.comments_id=gcf.comments_id

Look at the execution plan:

In this case, since our related fields are indexed, we call Index Nested-Loop Join, but when there are no filter conditions, the first table will be selected as the driving table. To perform JOIN, to associate the index of the non-driven table to perform Index Nested-Loop Join.

If you add the filter condition gc.comments_id =2056, this will filter out an Index Nested-Loop Join for the non-driven table, which is very efficient.

If it is the following:

EXPLAIN SELECT * FROM comments_for gcf
LEFT JOIN comments gc ON gc.comments_id=gcf.comments_id
WHERE gcf.comments_id =2056

If you filter through the gcf table, the gcf table will be selected by default As a driver table, because it has obviously been filtered, there will be very few matching conditions. For details, you can look at the execution plan:

At this point, the join is basically very clear, to be continued , everyone is welcome to point out errors and I will seriously correct them. . . .

The above is the detailed content of Detailed introduction to the principle of MySQL JOIN. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the ACID properties (Atomicity, Consistency, Isolation, Durability).Apr 16, 2025 am 12:20 AM

ACID attributes include atomicity, consistency, isolation and durability, and are the cornerstone of database design. 1. Atomicity ensures that the transaction is either completely successful or completely failed. 2. Consistency ensures that the database remains consistent before and after a transaction. 3. Isolation ensures that transactions do not interfere with each other. 4. Persistence ensures that data is permanently saved after transaction submission.

MySQL: Database Management System vs. Programming LanguageApr 16, 2025 am 12:19 AM

MySQL is not only a database management system (DBMS) but also closely related to programming languages. 1) As a DBMS, MySQL is used to store, organize and retrieve data, and optimizing indexes can improve query performance. 2) Combining SQL with programming languages, embedded in Python, using ORM tools such as SQLAlchemy can simplify operations. 3) Performance optimization includes indexing, querying, caching, library and table division and transaction management.

MySQL: Managing Data with SQL CommandsApr 16, 2025 am 12:19 AM

MySQL uses SQL commands to manage data. 1. Basic commands include SELECT, INSERT, UPDATE and DELETE. 2. Advanced usage involves JOIN, subquery and aggregate functions. 3. Common errors include syntax, logic and performance issues. 4. Optimization tips include using indexes, avoiding SELECT* and using LIMIT.

MySQL's Purpose: Storing and Managing Data EffectivelyApr 16, 2025 am 12:16 AM

MySQL is an efficient relational database management system suitable for storing and managing data. Its advantages include high-performance queries, flexible transaction processing and rich data types. In practical applications, MySQL is often used in e-commerce platforms, social networks and content management systems, but attention should be paid to performance optimization, data security and scalability.

SQL and MySQL: Understanding the RelationshipApr 16, 2025 am 12:14 AM

The relationship between SQL and MySQL is the relationship between standard languages and specific implementations. 1.SQL is a standard language used to manage and operate relational databases, allowing data addition, deletion, modification and query. 2.MySQL is a specific database management system that uses SQL as its operating language and provides efficient data storage and management.

Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Atom editor mac version download

The most popular open source editor

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.