Home  >  Article  >  Database  >  In-depth understanding of the join statement algorithm and optimization methods in MySQL

In-depth understanding of the join statement algorithm and optimization methods in MySQL

青灯夜游
青灯夜游forward
2021-08-27 18:59:211940browse

This article will take you to understand the join statement algorithm in MySQL, and introduce the method of optimizing the join statement.

In-depth understanding of the join statement algorithm and optimization methods in MySQL

1. Join statement algorithm

Create two tables t1 and t2

CREATE TABLE `t2` (
  `id` int(11) NOT NULL,
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `a` (`a`)
) ENGINE=InnoDB;

CREATE DEFINER=`root`@`%` PROCEDURE `idata`()
BEGIN
	declare i int;
  set i=1;
  while(i<=1000)do
    insert into t2 values(i, i, i);
    set i=i+1;
  end while;
END

create table t1 like t2;
insert into t1 (select * from t2 where id<=100);

These two tables There is a primary key index id and an index a, and there is no index on field b. The stored procedure idata() inserts 1000 rows of data into table t2, and 100 rows of data into table t1

1, Index Nested-Loop Join

select * from t1 straight_join t2 on (t1.a=t2.a);

If you use the join statement directly, the MySQL optimizer may select table t1 or t2 as the driving table, and let MySQL use a fixed connection method to execute the query through straight_join. In this statement, t1 is the driving table and t2 is the driven table

In-depth understanding of the join statement algorithm and optimization methods in MySQL
There is an index on field a of driven table t2, and the join process uses this index, so the execution flow of this statement is as follows:

1. From table t1 Read a row of data R

2. From the data row R, take out the a field and search it in table t2

3. Take out the rows that meet the conditions in table t2 and form a row with R , as part of the result set

4. Repeat steps 1 to 3 until the loop ends at the end of table t1

This process can use the index of the driven table, called Index Nested -Loop Join, referred to as NLJ

In-depth understanding of the join statement algorithm and optimization methods in MySQL
In this process:

1. Perform a full table scan of the driver table t1. This process requires scanning 100 rows

2. For each row of R, search in table t2 based on the a field, using a tree search process. Since the data we construct has a one-to-one correspondence, only one row is scanned in each search process, and a total of 100 rows are scanned

3. Therefore, the total number of scanned rows in the entire execution process is 200

Assuming that join is not used, only a single table query can be used:

1. Execute select * from t1 to find out all the data in table t1. There are 100 rows here

2. Loop through these 100 rows of data:

  • Get the value of field a from each row R $R.a
  • Execute select * from t2 where a= $R.a
  • Combine the returned result and R to form a row of the result set

This query process also scanned 200 rows, but executed a total of 101 statements. Compared with Direct join adds 100 interactions. The client also has to splice the SQL statements and results by itself. This is not as good as joining directly

In-depth understanding of the join statement algorithm and optimization methods in MySQL

In the case where the index of the driven table can be used:

  • Using the join statement, the performance is better than force The performance of splitting into multiple single tables to execute SQL statements is better
  • If you use the join statement, you need to make the small table the driving table

2. Simple Nested-Loop Join

select * from t1 straight_join t2 on (t1.a=t2.b);

Since there is no index on field b of table t2, a full table scan must be done every time t2 is used for matching. This algorithm is called Simple Nested-Loop Join

Calculated in this way, this SQL request will scan table t2 up to 100 times, scanning a total of 100*100=100,000 rows

MySQL does not use this Simple Nested-Loop Join algorithm, but uses another algorithm called Block Nested-Loop Join, referred to as BNL

3, Block Nested-Loop Join

There are no available indexes on the driven table. The algorithm flow is as follows:

1. Read the data of table t1 into the thread memory join_buffer. Since this statement is written with select *, the entire table t1 is Put it into the memory

2. Scan table t2, take out each row in table t2, and compare it with the data in join_buffer. If the join conditions are met, it will be returned as part of the result set

In-depth understanding of the join statement algorithm and optimization methods in MySQL

In-depth understanding of the join statement algorithm and optimization methods in MySQL

In this process, a full table scan is performed on both table t1 and table t2, so the total number of scanned rows is 1100. Since join_buffer is organized in an unordered array, 100 judgments must be made for each row in table t2. The total number of judgments that need to be made in memory is 100*1000=100,000 times

Use the Simple Nested-Loop Join algorithm to query, and the number of scanned rows is also 100,000. Therefore, in terms of time complexity, these two algorithms are the same. However, the 100,000 judgments of the Block Nested-Loop Join algorithm are memory operations, which will be much faster and perform better.

In-depth understanding of the join statement algorithm and optimization methods in MySQL

At this time, choose a large table or a small table The table is used as a driver table, and the execution time is the same.

The size of join_buffer is set by the parameter join_buffer_size, and the default value is 256k. If all the data in table t1 cannot be put, the strategy is very simple, which is to put it in segments

1) Scan table t1, read the data rows sequentially and put them into the join_buffer. Assume that the join_buffer is full at line 88

2)扫描表t2,把t2中的每一行取出来,跟join_buffer中的数据做对比,满足join条件的,作为结果集的一部分返回

3)清空join_buffer

4)继续扫描表t1,顺序读取最后的12行放入join_buffer中,继续执行第2步

In-depth understanding of the join statement algorithm and optimization methods in MySQL

由于表t1被分成了两次放入join_buffer中,导致表t2会被扫描两次。虽然分成两次放入join_buffer,但是判断等值条件的此时还是不变的

In-depth understanding of the join statement algorithm and optimization methods in MySQL

4、能不能使用join语句?

1.如果可以使用Index Nested-Loop Join算法,也就是说可以用上被驱动表上的索引,其实是没问题的

2.如果使用Block Nested-Loop Join算法,扫描行数就会过多。尤其是在大表上的join操作,这样可能要扫描被驱动表很多次,会占用大量的系统资源。所以这种join尽量不要用

5、如果使用join,应该选择大表做驱动表还是选择小表做驱动表

1.如果是Index Nested-Loop Join算法,应该选择小表做驱动表

2.如果是Block Nested-Loop Join算法:

  • 在join_buffer_size足够大的时候,是一样的
  • 在join_buffer_size不够大的时候,应该选择小表做驱动表

在决定哪个表做驱动表的时候,应该是两个表按照各自的条件过滤,过滤完成以后,计算参数join的各个字段的总数据量,数据量小的那个表,就是小表,应该作为驱动表

二、join语句优化

创建两个表t1、t2

create table t1(id int primary key, a int, b int, index(a));create table t2 like t1;CREATE DEFINER = CURRENT_USER PROCEDURE `idata`()BEGIN
	declare i int;
  set i=1;
  while(i<=1000)do
    insert into t1 values(i, 1001-i, i);
    set i=i+1;
  end while;
  
  set i=1;
  while(i<=1000000)do
    insert into t2 values(i, i, i);
    set i=i+1;
  end while;END;

在表t1中,插入了1000行数据,每一行的a=1001-id的值。也就是说,表t1中字段a是逆序的。同时,在表t2中插入了100万行数据

1、Multi-Range Read优化

Multi-Range Read(MRR)优化主要的目的是尽量使用顺序读盘

select * from t1 where a>=1 and a<=100;

主键索引是一棵B+树,在这棵树上,每次只能根据一个主键id查到一行数据。因此,回表是一行行搜索主键索引的
In-depth understanding of the join statement algorithm and optimization methods in MySQL
如果随着a的值递增顺序查找的话,id的值就变成随机的,那么就会出现随机访问,性能相对较差

因为大多数的数据都是按照主键递增顺序插入得到的,所以如果按照主键的递增顺序查询,对磁盘的读比较接近顺序读,能够提升读性能

这就是MRR优化的设计思路,语句的执行流程如下:

1.根据索引a,定位到满足条件的记录,将id值放入read_rnd_buffer中

2.将read_rnd_buffer中的id进行递增排序

3.排序后的id数组,依次到主键id索引中查记录,并作为结果返回

read_rnd_buffer的大小是由read_rnd_buffer_size参数控制的。如果步骤1中,read_rnd_buffer放满了,就会先执行完步骤2和3,然后清空read_rnd_buffer。之后继续找索引a的下个记录,并继续循环

如果想要稳定地使用MRR优化的话,需要设置set optimizer_switch="mrr_cost_based=off"

In-depth understanding of the join statement algorithm and optimization methods in MySQL

In-depth understanding of the join statement algorithm and optimization methods in MySQL
explain结果中,Extra字段多了Using MRR,表示的是用上了MRR优化。由于在read_rnd_buffer中按照id做了排序,所以最后得到的结果也是按照主键id递增顺序的

MRR能够提升性能的核心在于,这条查询语句在索引a上做的是一个范围查询,可以得到足够多的主键id。这样通过排序以后,再去主键索引查数据,才能体现出顺序性的优势

2、Batched Key Access

MySQL5.6引入了Batched Key Access(BKA)算法。这个BKA算法是对NLJ算法的优化

NLJ算法流程图:

In-depth understanding of the join statement algorithm and optimization methods in MySQL

NLJ算法执行的逻辑是从驱动表t1,一行行地取出a的值,再到被驱动表t2去做join

BKA算法流程图:

In-depth understanding of the join statement algorithm and optimization methods in MySQL

BKA算法执行的逻辑是把表t1的数据取出来一部分,先放到一个join_buffer,一起传给表t2。在join_buffer中只会放入查询需要的字段,如果join_buffer放不下所有数据,就会将数据分成多段执行上图的流程

如果想要使用BKA优化算法的话,执行SQL语句之前,先设置

set optimizer_switch=&#39;mrr=on,mrr_cost_based=off,batched_key_access=on&#39;;

其中前两个参数的作用是启用MRR,原因是BKA算法的优化要依赖与MRR

3、BNL算法的性能问题

InnoDB对Buffer Pool的LRU算法做了优化,即:第一次从磁盘读入内存的数据页,会先放在old区域。如果1秒之后这个数据页不再被访问了,就不会被移动到LRU链表头部,这样对Buffer Pool的命中率影响就不大

如果一个使用BNL算法的join语句,多次扫描一个冷表,而且这个语句执行时间超过1秒,就会在再次扫描冷表的时候,把冷表的数据页移到LRU链表头部。这种情况对应的,是冷表的数据量小于整个Buffer Pool的3/8,能够完全放入old区域的情况

如果这个冷表很大,就会出现另外一种情况:业务正常访问的数据页,没有机会进入young区域。

由于优化机制的存在,一个正常访问的数据页,要进入young区域,需要隔1秒后再次被访问到。但是,由于join语句在循环读磁盘和淘汰内存页,进入old区域的数据页,很可能在1秒之内就被淘汰了。这样就会导致MySQL实例的Buffer Pool在这段时间内,young区域的数据页没有被合理地淘汰

In-depth understanding of the join statement algorithm and optimization methods in MySQL

4、BNL转BKA

一些情况下,我们可以直接在被驱动表上建索引,这时就可以直接转成BKA算法了

如果碰到一些不适合在被驱动表上建索引的情况,可以考虑使用临时表。大致思路如下:

select * from t1 join t2 on (t1.b=t2.b) where t2.b>=1 and t2.b<=2000;

1)把表t2中满足条件的数据放在临时表tmp_t中

2)为了让join使用BKA算法,给临时表tmp_t的字段b加上索引

3)让表t1和tmp_t做join操作

SQL语句写法如下:

create temporary table temp_t(id int primary key, a int, b int, index(b))engine=innodb;
insert into temp_t select * from t2 where b>=1 and b<=2000;
select * from t1 join temp_t on (t1.b=temp_t.b);

5、扩展hash join

MySQL的优化器和执行器不支持哈希join,可以自己实现在业务端,实现流程大致如下:

1.select * from t1;取得表t1的全部1000行数据,在业务端存入一个hash结构

2.select * from t2 where b>=1 and b获取表t2中满足条件的2000行数据

3.把这2000行数据,一行一行地取到业务端,到hash结构的数据表中寻找匹配的数据。满足匹配的条件的这行数据,就作为结果集的一行

相关学习推荐:mysql教程(视频)

The above is the detailed content of In-depth understanding of the join statement algorithm and optimization methods in MySQL. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:csdn.net. If there is any infringement, please contact admin@php.cn delete