This article will take you to understand the LIMIT statement in MySQL and talk about a question - is MySQL's LIMIT so bad? I hope to be helpful!
Recently, many friends asked children about LIMIT in the Q&A group. Let me briefly describe this question.
Question
In order for the story to develop smoothly, we must first have a table:
CREATE TABLE t ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, key1 VARCHAR(100), common_field VARCHAR(100), PRIMARY KEY (id), KEY idx_key1 (key1) ) Engine=InnoDB CHARSET=utf8;
Table t contains 3 columns, and the id column is the primary key , the key1 column is a secondary index column. The table contains 10,000 records. [Related recommendations: mysql video tutorial]
When we execute the following statement, the secondary index idx_key1 is used:
mysql> EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 1; +----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+ | 1 | SIMPLE | t | NULL | index | NULL | idx_key1 | 303 | NULL | 1 | 100.00 | NULL | +----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec)
This is easy to understand, because In the secondary index idx_key1, the key1 column is ordered. The query is to retrieve the first record sorted by the key1 column. Then MySQL only needs to obtain the first secondary index record from idx_key1, and then directly return to the table to obtain the complete record.
But if we replace LIMIT 1
in the above statement with LIMIT 5000, 1
, then we need to perform a full table scan and filesort. The execution plan is as follows:
mysql> EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 5000, 1; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+ | 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 9966 | 100.00 | Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+ 1 row in set, 1 warning (0.00 sec)
Some students don’t understand: LIMIT 5000, 1
You can also use the secondary index idx_key1. We can scan the 5001st secondary index record first, and then scan the 5001st secondary index record first. Wouldn't it be nice to perform a table return operation on 5001 secondary index records? This cost is definitely better than a full table scan filesort.
I regret to tell you that due to the flaws in MySQL implementation, the above ideal situation will not occur. It will only perform a full table scan filesort stupidly. Let’s talk about what is going on. Son.
Server layer and storage engine layer
As we all know, MySQL is actually divided into server layer and storage engine layer:
The server layer is responsible for handling some common things, such as connection management, SQL syntax parsing, analysis of execution plans, etc.
The storage engine layer is responsible for specific data storage, such as Whether the data is stored in a file or in memory, what is the specific storage format, etc. We basically use the InnoDB storage engine now, and other storage engines are rarely used, so we will not cover other storage engines.
The execution of a SQL statement in MySQL obtains the final result through multiple interactions between the server layer and the storage engine layer. For example, the following query:
SELECT * FROM t WHERE key1 > 'a' AND key1 < 'b' AND common_field != 'a';
The server layer will analyze that the above statement can be executed using the following two options:
-
Option 1: Use full table scan
Option 2: Use the secondary index idx_key1. At this time, you need to scan all secondary index records with the key1 column value between ('a', 'b'), and each secondary index All records need to be returned to the table.
The server layer will analyze which of the above two options is lower cost, and then select the lower cost option as the execution plan. Then the interface provided by the storage engine is called to actually execute the query.
It is assumed here that option 2 is adopted, that is, the secondary index idx_key1 is used to execute the above query. Then the dialogue between the server layer and the storage engine layer can be as follows:
Server layer: "Hey, please check the ('a', ' of the idx_key1 secondary index b') The first record in the interval, and then return the complete record to me after returning to the table."
InnoDB: "Received, check it now", and then InnoDB uses the idx_key1 secondary index The corresponding B-tree quickly locates the first secondary index record in the scan interval ('a', 'b'), and then returns the table to get the complete clustered index record and returns it to the server layer.
After the server layer receives the complete clustered index record, it continues to determine whether the common_field!='a'
condition is true. If it is not true, it will be discarded. record, otherwise the record is sent to the client. Then say to the storage engine: "Please give me the next record."
Tips:
The record sent to the client here is actually sent to the local network. Buffer, the buffer size is controlled by net_buffer_length, the default size is 16KB. Wait until the buffer is full before actually sending the network packet to the client.
InnoDB: "Received, check now". InnoDB finds the next secondary index record in the ('a', 'b') interval of idx_key1 based on the next_record attribute of the record, then performs a table return operation and returns the complete clustered index record to the server layer.
小贴士:
不论是聚簇索引记录还是二级索引记录,都包含一个称作next_record
的属性,各个记录根据next_record连成了一个链表,并且链表中的记录是按照键值排序的(对于聚簇索引来说,键值指的是主键的值,对于二级索引记录来说,键值指的是二级索引列的值)。
server层收到完整的聚簇索引记录后,继续判断common_field!='a'
条件是否成立,如果不成立则舍弃该记录,否则将该记录发送到客户端。然后对存储引擎说:“请把下一条记录给我哈”
... 然后就不停的重复上述过程。
直到:
也就是直到InnoDB发现根据二级索引记录的next_record获取到的下一条二级索引记录不在('a', 'b')区间中,就跟server层说:“好了,('a', 'b')区间没有下一条记录了”
server层收到InnoDB说的没有下一条记录的消息,就结束查询。
现在大家就知道了server层和存储引擎层的基本交互过程了。
那LIMIT是什么鬼?
说出来大家可能有点儿惊讶,MySQL是在server层准备向客户端发送记录的时候才会去处理LIMIT子句中的内容。拿下边这个语句举例子:
SELECT * FROM t ORDER BY key1 LIMIT 5000, 1;
如果使用idx_key1执行上述查询,那么MySQL会这样处理:
server层向InnoDB要第1条记录,InnoDB从idx_key1中获取到第一条二级索引记录,然后进行回表操作得到完整的聚簇索引记录,然后返回给server层。server层准备将其发送给客户端,此时发现还有个
LIMIT 5000, 1
的要求,意味着符合条件的记录中的第5001条才可以真正发送给客户端,所以在这里先做个统计,我们假设server层维护了一个称作limit_count的变量用于统计已经跳过了多少条记录,此时就应该将limit_count设置为1。server层再向InnoDB要下一条记录,InnoDB再根据二级索引记录的next_record属性找到下一条二级索引记录,再次进行回表得到完整的聚簇索引记录返回给server层。server层在将其发送给客户端的时候发现limit_count才是1,所以就放弃发送到客户端的操作,将limit_count加1,此时limit_count变为了2。
... 重复上述操作
直到limit_count等于5000的时候,server层才会真正的将InnoDB返回的完整聚簇索引记录发送给客户端。
从上述过程中我们可以看到,由于MySQL中是在实际向客户端发送记录前才会去判断LIMIT子句是否符合要求,所以如果使用二级索引执行上述查询的话,意味着要进行5001次回表操作。server层在进行执行计划分析的时候会觉得执行这么多次回表的成本太大了,还不如直接全表扫描+filesort快呢,所以就选择了后者执行查询。
怎么办?
由于MySQL实现LIMIT子句的局限性,在处理诸如LIMIT 5000, 1
这样的语句时就无法通过使用二级索引来加快查询速度了么?其实也不是,只要把上述语句改写成:
SELECT * FROM t, (SELECT id FROM t ORDER BY key1 LIMIT 5000, 1) AS d WHERE t.id = d.id;
这样,SELECT id FROM t ORDER BY key1 LIMIT 5000, 1
作为一个子查询单独存在,由于该子查询的查询列表只有一个id
列,MySQL可以通过仅扫描二级索引idx_key1执行该子查询,然后再根据子查询中获得到的主键值去表t中进行查找。
这样就省去了前5000条记录的回表操作,从而大大提升了查询效率!
吐个槽
设计MySQL的大叔啥时候能改改LIMIT子句的这种超笨的实现呢?还得用户手动想欺骗优化器的方案才能提升查询效率~
更多编程相关知识,请访问:编程视频!!
The above is the detailed content of In-depth analysis of the LIMIT statement in MySQL. For more information, please follow other related articles on the PHP Chinese website!

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Zend Studio 13.0.1
Powerful PHP integrated development environment