Home >Database >Mysql Tutorial >Let's talk about how to optimize the order By statement in SQL
How to optimize the orderBy statement in sql? The following article will introduce to you the method of optimizing the orderBy statement in SQL. It has a good reference value and I hope it will be helpful to you.
#When using a database for data query, you will inevitably encounter the need to sort the query result set based on certain fields. In sql, the orderby statement is usually used to achieve this. Place the fields that need to be sorted after the keyword. If there are multiple fields, use "," to separate them.
select * from table t order by t.column1,t.column2;
The above sql indicates querying the data in the table, and then sorting by column1 first. If column1 is the same, then sort by column2. The default sorting method is descending order. Of course, the sorting method can also be specified. Add DESC and ASE after the sorted field to indicate descending and ascending order respectively.
Using this orderby can easily implement daily sorting operations. I have used it a lot, and I don’t know if you have ever encountered this scenario: Sometimes after using orderby, the SQL execution efficiency is very slow, and sometimes it is faster. Since I am obsessed with curd all day long, I don’t have time to study it. Anyway, I just feel It's amazing. While I'm free this weekend, let's study how orderby is implemented in mysql.
In order to facilitate the description, we first create a data table t1, as follows:
CREATE TABLE `t1` ( `id` int(11) NOT NULL not null auto_increment, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, PRIMARY KEY (`id`) , KEY `a` (`a`) USING BTREE ) ENGINE=InnoDB;
And insert the data:
insert into t1 (a,b,c) values (1,1,3); insert into t1 (a,b,c) values (1,4,5); insert into t1 (a,b,c) values (1,3,3); insert into t1 (a,b,c) values (1,3,4); insert into t1 (a,b,c) values (1,2,5); insert into t1 (a,b,c) values (1,3,6);
In order to make the index effective, insert 10,000 rows 7,7, 7. If the data is irrelevant and the amount of data is small, the entire table will be scanned directly
insert into t1 (a,b,c) values (7,7,7);
We now need to find all the records with a=1, and then sort them according to the b field.
The query sql is
select a,b,c from t1 where a = 1 order by b limit 2;
In order to prevent the full table scan during the query process, we added an index on field a.
First we check the sql execution plan through the statement
explain select a,b,c from t1 where a = 1 order by b lmit 2;
, as shown below:
We can see in extra Using filesort appears, which means that the sorting operation is performed during the execution of the sql. The sorting operation is completed in sort_buffer. sort_buffer is a memory buffer allocated by mysql to each thread. This buffer is specially used to complete sorting. The size is default It is 1M, and its size is controlled by the variable sort_buffer_size.
When mysql implements orderby, it implements two different implementation methods according to the different field contents put into sort_buffer: full field sorting and rowid sorting.
First of all, let’s take a look at the sql execution process through a picture:
mysql is first determined based on the query conditions The data set that needs to be sorted is the data set with a=1 in the table, that is, the records with primary key IDs from 1 to 6.
The entire sql execution process is as follows:
1. Create and initialize sort_buffer, and determine the fields that need to be placed in the buffer, that is, a, b ,c these three fields.
2. Find the first primary key id that satisfies a=1 from index tree a, that is, id=1.
3. Return to the table to the id index, take out the entire row of data, and then take out the values of a, b, and c from the entire row of data and put them into sort_buffer.
4. Find the next primary key id of a=1 in order from index a.
5. Repeat steps 3 and 4 until the last record with a=1 is obtained, that is, the primary key id=5.
6. At this time, the a, b, and c fields of all records that meet the condition a=1 are all read and placed in the sort_buffer. Then, these data are sorted according to the value of b. The sorting method is Quick sort. It is the quick sort that is often encountered in interviews, and the time complexity of quick sort is log2n.
7. Then take out the first 2 rows of data from the sorted result set.
The above is the execution process of orderby in msql. Because the data put into sort_buffer is all the fields that need to be output, this sorting is called full sorting.
I wonder if you have any questions after seeing this? What should I do if the amount of data that needs to be sorted is large and the sort_buffer cannot fit in it?
Indeed, if there are a lot of data rows with a=1, and there are many fields that need to be stored in sort_buffer, there may be more than three fields a, b, and c. Some businesses may need to output more fields. Then the sort_buffer with a default size of only 1M may not be able to accommodate it.
When sort_buffer cannot accommodate it, mysql will create a batch of temporary disk files to assist sorting. By default, 12 temporary files will be created, and the data to be sorted will be divided into 12 parts. Each part will be sorted separately to form 12 internal data ordered files, and then these 12 ordered files will be merged into an ordered file. Large files, and finally complete the sorting of the data.
File-based sorting is much less efficient than memory-based sorting. In order to improve the efficiency of sorting, you should try to avoid file-based sorting. If you want to avoid file-based sorting, you need to make sort_buffer Accommodates the amount of data that needs to be sorted.
So mysql has been optimized for situations where sort_buffer cannot accommodate it. It is to reduce the number of fields stored in sort_buffer during sorting.
The specific optimization method is the following rowId sorting
在全字段排序实现中,排序的过程中,要把需要输出的字段全部放到sort_buffer中,当输出的字段比较多的时候,可以放到sort_buffer中的数据行就会变少。也就增大了sort_buffer无法容纳数据的风险,直至出现基于文件的排序。
rowId排序对全字段排序的优化手段,主要是减少了放到sort_buffer中字段个数。
在rowId排序中,只会将需要排序的字段和主键Id放到sort_buffer中。
select a,b,c from t1 where a = 1 order by b limit 2;
在rowId的排序中的执行流程如下:
1.初始化并创建sort_buffer,并确认要放入的的字段,id和b。
2.从索引树a中找到第一个满足a=1的主键id,也就是id=1。
3.回表主键索引id,取出整行数据,从整行数据中取出id和b,存入sort_buffer中。
4.从索引a中取出下一条满足a=1的 记录的主键id。
5.重复步骤3和4,直到最后一个满足a=1的主键id,也就是a=6。
6.对sort_buffer中的数据,按照字段b排序。
7.从sort_buffer中的有序数据集中,取出前2个,因为此时取出的数据只有id和b,要想获取a和c字段,需要根据id字段,回表到主键索引中取出整行数据,从整行数据中获取需要的数据。
根据rowId排序的执行步骤,可以发现:相比全字段排序,rowId排序的实现方式,减少了存放到sort_buffer中的数据量,降低了基于文件的外部排序的可能性。
那rowid排序有不足的地方吗?肯定有的,要不然全字段排序就没有存在的意义了。rowid排序不足之处在于,在最后的步骤7中,增加了回表的次数,不过这个回表的次数,取决于limit后的值,如果返回的结果集比较小的话,回表的次数还是比较小的。
mysql是如何在全字段排序和rowId排序的呢?其实是根据存放的sort_buffer中每行字段的长度决定的,如果mysql认为每次放到sort_buffer中的数据量很大的话,那么就用rowId排序实现,否则使用全字段排序。那么多大算大呢?这个大小的阈值有一个变量的值来决定,这个变量就是 max_length_for_sort_data。如果每次放到sort_buffer中的数据大小大于该字段值的话,就使用rowId排序,否则使用全字段排序。
上面讲述了orderby的两种排序的方式,以及一些优化策略,优化的目的主要就是避免基于磁盘文件的外部排序。因为基于磁盘文件的排序效率要远低于基于sort_buffer的内存排序。
但是当数据量比较大的时候,即使sort_buffer比较大,所有数据全部放在内存中排序,sql的整体执行效率也不高,因为排序这个操作,本身就是比较消耗性能的。
试想,如果基于索引a获取到所有a=1的数据,按照字段b,天然就是有序的,那么就不用执行排序操作,直接取出来的数据,就是符合结果的数据集,那么sql的执行效率就会大幅度增长。
其实要实现整个sql执行过程中,避免排序操作也不难,只需要创建一个a和b的联合索引即可。
alter table t1 add index a_b (a,b);
添加a和b的联合索引后,sql执行流程就变成了:
1.从索引树(a,b)中找到第一个满足a=1的主键id,也就是id=1。
2.回表到主键索引树,取出整行数据,并从中取出a,b,c,直接作为结果集的一部分返回。
3.从索引树(a,b)上取出下一个满足a=1的主键id。
4.重复步骤2和3,直到找到第二个满足a=1的主键id,并回表获取字段a,b,c。
此时我们可以通过查看sql的执行计划,来判断sql的执行过程中是否执行了排序操作。
explain select a,b from t1 where a = 1 order by b lmit 2;
通过查看执行计划,我们发现extra中已经没有了using filesort了,也就是没有执行排序操作了。
其实还可以通过覆盖索引,对该sql进一步优化,通过在索引中覆盖字段c,来避免回表的操作。
alter table t1 add index a_b_c (a,b,c);
添加索引a_b_c后,sql的执行过程如下:
1.从索引树(a,b,c)中找到第一个满足a=1的索引,从中取出a,b,c。直接作为结果集的一部分直接返回。
2.从索引(a,b,c)中取出下一个,满足a=1的记录作为结果集的一部分。
3.重复执行步骤2,直到查到第二个a=1或者不满足a=1的记录。
此时通过查看执行sql的的还行计划可以发现 extra中只有 Using index。
explain select a,b from t1 where a = 1 order by b lmit 2;
Through multiple optimizations of this SQL, the final execution efficiency of SQL is basically the same as the query efficiency of ordinary SQL without sorting. The reason why the orderby sorting operation can be avoided is to take advantage of the naturally ordered characteristics of the index.
But we all know that indexes can speed up query efficiency, but the maintenance cost of indexes is relatively high. Adding and modifying data in the data table will involve changes in the indexes, so the more indexes, the better. , Sometimes, it is not worth adding too many indexes just because of some uncommon queries and sorting.
[Related recommendations: mysql video tutorial]
The above is the detailed content of Let's talk about how to optimize the order By statement in SQL. For more information, please follow other related articles on the PHP Chinese website!