Briefly talk about join query in MySQL-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Briefly talk about join query in MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Nov 03, 2022 pm 04:49 PM

mysql

This article brings you relevant knowledge about mysql, which mainly introduces related issues about join query. Let’s take a look at it together. I hope it will be helpful to everyone.

Recommended learning: mysql video tutorial

The impact of index on join query

Data preparation

Assume that there are two tables t1 and t2. Both tables have primary key index id and index fields a and b fields without indexes. Then insert 100 rows of data into the t1 table and into the t2 table. Insert 1000 rows of data for experiment

CREATE TABLE `t2` (
 `id` int NOT NULL,
 `a` int DEFAULT NULL,
 `b` int DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `t2_a_index` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE PROCEDURE **idata**()
BEGIN
  DECLARE i INT;
  SET i = 1;
  WHILE (i <h3 id="strong-There-is-an-index-query-process-strong"><strong>There is an index query process</strong></h3><p>We use the query SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a=t2.a); because The join query MYSQL optimizer may not be able to execute according to our wishes, so for analysis we choose to use STRAIGHT_JOIN instead to observe more intuitively</p><p><briefly talk about join query in mysql src="https://Briefly%20talk%20about%20join%20query%20in%20MySQL.php.cn/upload/article/000/000/067/64b9ac3a02f2e5095eba59bdfbcd23d6-2.png" alt="图 1" loading="lazy"></briefly> Figure 1</p><p>You can see We use t1 as the driving table and t2 as the driven table. The explanation in the above figure shows that this query uses the field a index of the t2 table, so the execution process of this statement should be as follows: </p>

Read a row of data r from table t1
Get field a from data r to table t2 for matching
Take out the rows that meet the conditions in the t2 table, and form a row with r as part of the result set
Repeat steps 1-3 until table t1 loops data

This process is called Index Nested-Loop Join. In this process, the driver table t1 performs a full table scan. Because we inserted 100 rows of data into the t1 table, this scan row The number is 100, and when performing a join query, each row of the t1 table needs to be searched in the t2 table. An index tree search is used. Because the data we construct are in one-to-one correspondence, each search only scans One row, that is, the t2 table also scans a total of 100 rows. The total number of rows scanned during the entire query process is 100 100=200 rows.

No index query process

SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a = t2.b);

Figure 2

It can be seen that since there is no index on field B of t2 table, according to the above SQL During execution, a full table scan must be performed every time t1 matches t2. In this way, t2 must be scanned up to 100 times, and the total number of scans is 100*1000 = 100,000 rows.

Of course, this query result is still based on the fact that the two tables we built are small. If it is a table with 100,000 rows, 10 billion rows will need to be scanned. This is terrible!

2. UnderstandBlock Nested-Loop Join

##Block Nested-Loop JoinQuery process

So there is no index on the driven table, how did all this happen?

In fact, when there is no available index on the driven table, the algorithm process is as follows:

Read the data of t1 into the thread memory join_buffer, because we wrote above is select * from, so it is equivalent to putting the entire t1 table into the memory;
The process of scanning t2 is actually taking out each row of t2 and matching it with the rows in join_buffer The data is compared and those that meet the join conditions are returned as part of the result set.

So combined with the Extra part in Figure 2 to explain Using join buffer, we can find this clue. During the whole process, both tables t1 and t2 were done. A full table scan, so the number of rows scanned is 100 1000 = 1100 rows. Because join_buffer is organized in an unordered array, so for each row in table t2, 100 judgments must be made, and a total of 100 judgments must be made in memory. The number of judgments is 100*1000=100,000 times, but because these 100,000 times occur in memory, the speed is much faster and the performance is better.

Join_buffer

As we know from the above, without an index, MySQL reads the data into the memory for loop judgment, so this memory is definitely not unlimited. You use, then we need to use a parameter join_buffer_size, the value default size is 256k, as shown below:

SHOW VARIABLES LIKE '%join_buffer_size%';

Figure 4

If query The data is too large to be loaded at once, and only part of the data (80 items) can be loaded. Then the query process becomes as follows

扫描表 t1，顺序读取数据行放入 join_buffer 中，直至加载完第 80 行满了
扫描表 t2，把 t2 表中的每一行取出来跟 join_buffer 中的数据做对比，将满足条件的数据作为结果集的一部分返回
清空 join_buffer
继续扫描表 t1，顺序读取剩余的数据行放入 join_buffer 中，执行步骤 2

这个流程体现了算法名称中 Block 的由来，分块 join，可以看出虽然查询过程中 t1 被分成了两次放入 join_buffer 中，导致 t2 表被扫描了 2次，但是判断等值条件的次数还是不变的，依然是（80+20）*1000=10 万次。

所以这就是有时候 join 查询很慢，有些大佬会让你把 join_buffer_size 调大的原因。

如何正确的写出 join 查询

驱动表的选择

有索引的情况下

在这个 join 语句执行过程中，驱动表是走全表扫描，而被驱动表是走树搜索。

假设被驱动表的行数是 M，每次在被驱动表查询一行数据，先要走索引 a，再搜索主键索引。每次搜索一棵树近似复杂度是以 2为底的 M的对数，记为 log2M，所以在被驱动表上查询一行数据的时间复杂度是 2*log2M。

假设驱动表的行数是 N，执行过程就要扫描驱动表 N 行，然后对于每一行，到被驱动表上匹配一次。因此整个执行过程，近似复杂度是 N + N2log2M。显然，N 对扫描行数的影响更大，因此应该让小表来做驱动表。

那没有索引的情况

上述我知道了，因为 join_buffer 因为存在限制，所以查询的过程可能存在多次加载 join_buffer，但是判断的次数都是 10 万次，这种情况下应该怎么选择？

假设，驱动表的数据行数是 N，需要分 K 段才能完成算法流程，被驱动表的数据行数是 M。这里的 K不是常数，N 越大 K就越大，因此把 K 表示为λ*N，显然λ的取值范围是 (0,1)。

扫描的行数就变成了 N+λNM，显然内存的判断次数是不受哪个表作为驱动表而影响的，而考虑到扫描行数，在 M和 N大小确定的情况下，N 小一些，整个算是的结果会更小，所以应该让小表作为驱动表

总结：真相大白了，不管是有索引还是无索引参与 join 查询的情况下都应该是使用小表作为驱动表。

什么是小表

还是以上面表 t1 和表 t2 为例子：

SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.b = t2.b WHERE t2.id <p>上面这两条 SQL 我们加上了条件 t2.id </p><p>再看另一组：</p><pre class="brush:php;toolbar:false">SELECT t1.b,t2.* FROM t1 STRAIGHT_JOIN t2 ON t1.b = t2.b WHERE t2.id <p>这个例子里，表 t1 和 t2 都是只有 100 行参加 join。 但是，这两条语句每次查询放入 join_buffer 中的数据是不一样的: 表 t1 只查字段 b，因此如果把 t1 放到 join_buffer 中，只需要放入字段 b 的值; 表 t2 需要查所有的字段，因此如果把表 t2 放到 join_buffer 中的话，就需要放入三个字 段 id、a 和 b。</p><p>这里，我们应该选择表 t1 作为驱动表。也就是说在这个例子里，”只需要一列参与 join 的 表 t1“是那个相对小的表。</p><p>结论：</p><p>在决定哪个表做驱动表的时候，应该是两个表按照各自的条件过滤，过 滤完成之后，计算参与 join 的各个字段的总数据量，数据量小的那个表，就是“小表”， 应该作为驱动表。</p><p>推荐学习：<a href="https://www.php.cn/course/list/51.html" target="_blank" textvalue="mysql视频教程">mysql视频教程</a></p>

The above is the detailed content of Briefly talk about join query in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:掘金. If there is any infringement, please contact admin@php.cn delete

MySQL's Place: Databases and ProgrammingApr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL: From Small Businesses to Large EnterprisesApr 13, 2025 am 12:17 AM

MySQL is suitable for small and large enterprises. 1) Small businesses can use MySQL for basic data management, such as storing customer information. 2) Large enterprises can use MySQL to process massive data and complex business logic to optimize query performance and transaction processing.

What are phantom reads and how does InnoDB prevent them (Next-Key Locking)?Apr 13, 2025 am 12:16 AM

InnoDB effectively prevents phantom reading through Next-KeyLocking mechanism. 1) Next-KeyLocking combines row lock and gap lock to lock records and their gaps to prevent new records from being inserted. 2) In practical applications, by optimizing query and adjusting isolation levels, lock competition can be reduced and concurrency performance can be improved.

MySQL: Not a Programming Language, But...Apr 13, 2025 am 12:03 AM

MySQL is not a programming language, but its query language SQL has the characteristics of a programming language: 1. SQL supports conditional judgment, loops and variable operations; 2. Through stored procedures, triggers and functions, users can perform complex logical operations in the database.

MySQL: An Introduction to the World's Most Popular DatabaseApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

The Importance of MySQL: Data Storage and ManagementApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system suitable for data storage, management, query and security. 1. It supports a variety of operating systems and is widely used in Web applications and other fields. 2. Through the client-server architecture and different storage engines, MySQL processes data efficiently. 3. Basic usage includes creating databases and tables, inserting, querying and updating data. 4. Advanced usage involves complex queries and stored procedures. 5. Common errors can be debugged through the EXPLAIN statement. 6. Performance optimization includes the rational use of indexes and optimized query statements.

Why Use MySQL? Benefits and AdvantagesApr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

Describe InnoDB locking mechanisms (shared locks, exclusive locks, intention locks, record locks, gap locks, next-key locks).Apr 12, 2025 am 12:16 AM

InnoDB's lock mechanisms include shared locks, exclusive locks, intention locks, record locks, gap locks and next key locks. 1. Shared lock allows transactions to read data without preventing other transactions from reading. 2. Exclusive lock prevents other transactions from reading and modifying data. 3. Intention lock optimizes lock efficiency. 4. Record lock lock index record. 5. Gap lock locks index recording gap. 6. The next key lock is a combination of record lock and gap lock to ensure data consistency.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.