This article brings you relevant knowledge about mysql, which mainly introduces related issues about join query. Let’s take a look at it together. I hope it will be helpful to everyone.
Recommended learning: mysql video tutorial
Assume that there are two tables t1 and t2. Both tables have primary key index id and index fields a and b fields without indexes. Then insert 100 rows of data into the t1 table and into the t2 table. Insert 1000 rows of data for experiment
CREATE TABLE `t2` ( `id` int NOT NULL, `a` int DEFAULT NULL, `b` int DEFAULT NULL, PRIMARY KEY (`id`), KEY `t2_a_index` (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci; CREATE PROCEDURE **idata**() BEGIN DECLARE i INT; SET i = 1; WHILE (i <h3 data-id="heading-3"><strong>There is an index query process</strong></h3><p>We use the query SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a=t2.a); because The join query MYSQL optimizer may not be able to execute according to our wishes, so for analysis we choose to use STRAIGHT_JOIN instead to observe more intuitively</p><p><briefly talk about join query in mysql src="https://Briefly%20talk%20about%20join%20query%20in%20MySQL.php.cn/upload/article/000/000/067/64b9ac3a02f2e5095eba59bdfbcd23d6-2.png" alt="图 1" loading="lazy"></briefly> Figure 1</p><p>You can see We use t1 as the driving table and t2 as the driven table. The explanation in the above figure shows that this query uses the field a index of the t2 table, so the execution process of this statement should be as follows: </p>
Read a row of data r from table t1
Get field a from data r to table t2 for matching
Take out the rows that meet the conditions in the t2 table, and form a row with r as part of the result set
Repeat steps 1-3 until table t1 loops data
This process is called Index Nested-Loop Join. In this process, the driver table t1 performs a full table scan. Because we inserted 100 rows of data into the t1 table, this scan row The number is 100, and when performing a join query, each row of the t1 table needs to be searched in the t2 table. An index tree search is used. Because the data we construct are in one-to-one correspondence, each search only scans One row, that is, the t2 table also scans a total of 100 rows. The total number of rows scanned during the entire query process is 100 100=200 rows.
SELECT * FROM t1 STRAIGHT_JOIN t2 ON (t1.a = t2.b);
It can be seen that since there is no index on field B of t2 table, according to the above SQL During execution, a full table scan must be performed every time t1 matches t2. In this way, t2 must be scanned up to 100 times, and the total number of scans is 100*1000 = 100,000 rows.
Of course, this query result is still based on the fact that the two tables we built are small. If it is a table with 100,000 rows, 10 billion rows will need to be scanned. This is terrible!
SHOW VARIABLES LIKE '%join_buffer_size%';
扫描表 t1,顺序读取数据行放入 join_buffer 中,直至加载完第 80 行满了
扫描表 t2,把 t2 表中的每一行取出来跟 join_buffer 中的数据做对比,将满足条件的数据作为结果集的一部分返回
清空 join_buffer
继续扫描表 t1,顺序读取剩余的数据行放入 join_buffer 中,执行步骤 2
这个流程体现了算法名称中 Block 的由来,分块 join,可以看出虽然查询过程中 t1 被分成了两次放入 join_buffer 中,导致 t2 表被扫描了 2次,但是判断等值条件的次数还是不变的,依然是(80+20)*1000=10 万次。
所以这就是有时候 join 查询很慢,有些大佬会让你把 join_buffer_size 调大的原因。
有索引的情况下
在这个 join 语句执行过程中,驱动表是走全表扫描,而被驱动表是走树搜索。
假设被驱动表的行数是 M,每次在被驱动表查询一行数据,先要走索引 a,再搜索主键索引。每次搜索一棵树近似复杂度是以 2为底的 M的对数,记为 log2M,所以在被驱动表上查询一行数据的时间复杂度是 2*log2M。
假设驱动表的行数是 N,执行过程就要扫描驱动表 N 行,然后对于每一行,到被驱动表上 匹配一次。因此整个执行过程,近似复杂度是 N + N2log2M。显然,N 对扫描行数的影响更大,因此应该让小表来做驱动表。
那没有索引的情况
上述我知道了,因为 join_buffer 因为存在限制,所以查询的过程可能存在多次加载 join_buffer,但是判断的次数都是 10 万次,这种情况下应该怎么选择?
假设,驱动表的数据行数是 N,需要分 K 段才能完成算法流程,被驱动表的数据行数是 M。这里的 K不是常数,N 越大 K就越大,因此把 K 表示为λ*N,显然λ的取值范围 是 (0,1)。
扫描的行数就变成了 N+λNM,显然内存的判断次数是不受哪个表作为驱动表而影响的,而考虑到扫描行数,在 M和 N大小确定的情况下,N 小一些,整个算是的结果会更小,所以应该让小表作为驱动表
总结:真相大白了,不管是有索引还是无索引参与 join 查询的情况下都应该是使用小表作为驱动表。
还是以上面表 t1 和表 t2 为例子:
SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.b = t2.b WHERE t2.id <p>上面这两条 SQL 我们加上了条件 t2.id </p><p>再看另一组:</p><pre class="brush:php;toolbar:false">SELECT t1.b,t2.* FROM t1 STRAIGHT_JOIN t2 ON t1.b = t2.b WHERE t2.id <p>这个例子里,表 t1 和 t2 都是只有 100 行参加 join。 但是,这两条语句每次查询放入 join_buffer 中的数据是不一样的: 表 t1 只查字段 b,因此如果把 t1 放到 join_buffer 中,只需要放入字段 b 的值; 表 t2 需要查所有的字段,因此如果把表 t2 放到 join_buffer 中的话,就需要放入三个字 段 id、a 和 b。</p><p>这里,我们应该选择表 t1 作为驱动表。也就是说在这个例子里,”只需要一列参与 join 的 表 t1“是那个相对小的表。</p><p>结论:</p><p>在决定哪个表做驱动表的时候,应该是两个表按照各自的条件过滤,过 滤完成之后,计算参与 join 的各个字段的总数据量,数据量小的那个表,就是“小表”, 应该作为驱动表。</p><p>推荐学习:<a href="https://www.php.cn/course/list/51.html" target="_blank" textvalue="mysql视频教程">mysql视频教程</a></p>
The above is the detailed content of Briefly talk about join query in MySQL. For more information, please follow other related articles on the PHP Chinese website!