#How to optimize the performance of database queries? (Detailed explanation)
Query is the most commonly used operation in database technology. The process of query operation is relatively simple. First, the query SQL statement is issued from the client. After receiving the SQL statement sent by the client, the database server executes the SQL statement and then returns the query results to the client. Although the process is very simple, different query methods and database settings will have a great impact on query performance.
Therefore, this article discusses query optimization techniques commonly used in MySQL. Contents discussed include: improving query speed through query buffering; MySQL's automatic optimization of queries; index-based sorting; detection of unreachable queries and the use of various query options to improve performance.
1. Improve query speed through query buffering
Generally when we use SQL statements to query, the database server will execute this statement every time it receives SQL from the client. SQL statement. But when the exact same SQL statement is received within a certain interval (such as within 1 minute), it will be executed in the same way. Although this can ensure the real-time nature of the data, most of the time, the data does not require complete real-time, which means there can be a certain delay. If this is the case, executing the exact same SQL in a short period of time is not worth the gain.
Fortunately, MySQL provides us with the query buffering function (query buffering can only be used in MySQL 4.0.1 and above). We can improve query performance to a certain extent through query caching.
We can set the query buffer through the my.ini file in the MySQL installation directory. The setting is also very simple, just set query_cache_type to 1. After setting this attribute, before executing any SELECT statement, MySQL will check in its buffer whether the same SELECT statement has been executed before. If so, and the execution result has not expired, then the query result will be returned directly to client. But when writing SQL statements, please note that MySQL's query buffer is case-sensitive. The following two SELECT statements are as follows:
SELECT * from TABLE1 SELECT * FROM TABLE1
The above two SQL statements are completely different SELECTs for query buffering. Moreover, the query buffer does not automatically handle spaces. Therefore, when writing SQL statements, you should try to reduce the use of spaces, especially the spaces at the beginning and end of SQL (because the query cache does not automatically intercept the spaces at the beginning and end).
Although not setting up a query buffer may sometimes cause performance losses, there are some SQL statements that need to query data in real time, or are not used frequently (perhaps executed once or twice a day). This requires turning off buffering. Of course, you can turn off the query cache by setting the value of query_cache_type, but this will permanently turn off the query cache. MySQL 5.0 provides a method to temporarily turn off the query buffer:
SELECT SQL_NO_CACHE field1, field2 FROM TABLE1
The above SQL statement uses SQL_NO_CACHE, so regardless of whether this SQL statement has been executed, the server will not will look in the buffer and execute it every time.
We can also set query_cache_type in my.ini to 2, so that the query cache will be used only after SQL_CACHE is used.
SELECT SQL_CALHE * FROM TABLE1
2. MySQL’s automatic optimization of queries
Indexes are very important for the database. Indexes can be used to improve performance during queries. But sometimes using indexes can reduce performance. We can look at the following SALES table:
CREATE TABLE SALES ( ID INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, NAME VARCHAR(100) NOT NULL, PRICE FLOAT NOT NULL, SALE_COUNT INT NOT NULL, SALE_DATE DATE NOT NULL, PRIMARY KEY(ID), INDEX (NAME), INDEX (SALE_DATE) )
Assume that millions of pieces of data are stored in this table, and we want to query the average price of the product with product number 1000 in 2004 and 2005. We can write the following SQL statement:
SELECT AVG(PRICE) FROM SALES WHERE ID = 1000 AND SALE_DATE BETWEEN '2004-01-01' AND '2005-12-31';
If the quantity of this product is very large, it accounts for almost 50% or more of the records in the SALES table. Then using the index on the SALE_DATE field to calculate the average is a bit slow. Because if you use an index, you have to sort the index. When there are very many records that meet the conditions (such as accounting for 50% or more of the records in the entire table), the speed will slow down, so it is better to scan the entire table. Therefore, MySQL will automatically decide whether to use the index for query based on the proportion of data that meets the conditions in the entire table.
For MySQL, the index is not used when the proportion of the above query results to the records of the entire table is about 30%. This proportion is obtained by MySQL developers based on their experience. However, the actual ratio value will vary depending on the database engine used
3. Index-based sorting
One of the weaknesses of MySQL is its sorting. Although MySQL can query approximately 15,000 records in 1 second, MySQL can only use at most one index when querying. Therefore, if the WHERE condition already occupies the index, the index will not be used in sorting, which will greatly reduce the speed of the query. We can look at the following SQL statement:
SELECT * FROM SALES WHERE NAME = “name” ORDER BY SALE_DATE DESC;
The index on the NAME field has been used in the WHERE clause of the above SQL. Therefore, the index will no longer be used when sorting SALE_DATE. . In order to solve this problem, we can create a composite index on the SALES table:
ALTER TABLE SALES DROP INDEX NAME, ADD INDEX (NAME, SALE_DATE)
这样再使用上述的SELECT语句进行查询时速度就会大副提升。但要注意,在使用这个方法时,要确保WHERE子句中没有排序字段,在上例中就是不能用SALE_DATE进行查询,否则虽然排序快了,但是SALE_DATE字段上没有单独的索引,因此查询又会慢下来。
四、 不可达查询的检测
在执行SQL语句时,难免会遇到一些必假的条件。所谓必假的条件是无论表中的数据如何变化,这个条件都为假。如WHERE value 200。我们永远无法找到一个既小于100又大于200的数。
如果遇到这样的查询条件,再去执行这样的SQL语句就是多此一举。幸好MySQL可以自动检测这种情况。如我们可以看看如下的SQL语句:
SELECT * FROM SALES WHERE NAME = “name1” AND NAME = “name2”
以上的查询语句要查找NAME既等于name1又等于name2的记录。很明显,这是一个不可达的查询,WHERE条件一定是假。MySQL在执行SQL语句之前,会先分析WHERE条件是否是不可达的查询,如果是,就不再执行这条SQL语句了。为了验证这一点。我们首先对如下的SQL使用EXPLAIN进行测试:
EXPLAIN SELECT * FROM SALES WHERE NAME = “name1”
上面的查询是一个正常的查询,我们可以看到使用EXPLAIN返回的执行信息数据中table项是SALES。这说明MySQL对SALES进行操作了。再看看下面的语句:
EXPLAIN SELECT * FROM SALES WHERE NAME = “name1” AND NAME = “name2”
我们可以看到,table项是空,这说明MySQL并没有对SALES表进行操作。
五、 使用各种查询选择来提高性能
SELECT语句除了正常的使用外,MySQL还为我们提供了很多可以增强查询性能的选项。如上面介绍的用于控制查询缓冲的SQL_NO_CACHE和SQL_CACHE就是其中两个选项。在这一部分,我将介绍几个常用的查询选项。
1. STRAIGHT_JOIN:强制连接顺序
当我们将两个或多个表连接起来进行查询时,我们并不用关心MySQL先连哪个表,后连哪个表。而这一切都是由MySQL内部通过一系列的计算、评估,最后得出的一个连接顺序决定的。如下列的SQL语句中,TABLE1和TABLE2并不一定是谁连接谁:
SELECT TABLE1.FIELD1, TABLE2.FIELD2 FROM TABLE1 ,TABLE2 WHERE …
如果开发人员需要人为地干预连接的顺序,就得使用STRAIGHT_JOIN关键字,如下列的SQL语句:
SELECT TABLE1.FIELD1, TABLE2.FIELD2 FROM TABLE1 STRAIGHT_JOIN TABLE2 WHERE …
由上面的SQL语句可知,通过STRAIGHT_JOIN强迫MySQL按TABLE1、TABLE2的顺序连接表。如果你认为按自己的顺序比MySQL推荐的顺序进行连接的效率高的话,就可以通过STRAIGHT_JOIN来确定连接顺序。
2. 干预索引使用,提高性能
在上面已经提到了索引的使用。一般情况下,在查询时MySQL将自己决定是否使用索引,使用哪一个索引。但在一些特殊情况下,我们希望MySQL只使用一个或几个索引,或者不希望使用某个索引。这就需要使用MySQL的控制索引的一些查询选项。
限制使用索引的范围:
有时我们在数据表里建立了很多索引,当MySQL对索引进行选择时,这些索引都在考虑的范围内。但有时我们希望MySQL只考虑几个索引,而不是全部的索引,这就需要用到USE INDEX对查询语句进行设置。
SELECT * FROM TABLE1 USE INDEX (FIELD1, FIELD2) …
从以上SQL语句可以看出,无论在TABLE1中已经建立了多少个索引,MySQL在选择索引时,只考虑在FIELD1和FIELD2上建立的索引。
限制不使用索引的范围
如果我们要考虑的索引很多,而不被使用的索引又很少时,可以使用IGNORE INDEX进行反向选取。在上面的例子中是选择被考虑的索引,而使用IGNORE INDEX是选择不被考虑的索引。
SELECT * FROM TABLE1 IGNORE INDEX (FIELD1, FIELD2) …
在上面的SQL语句中,TABLE1表中只有FIELD1和FIELD2上的索引不被使用。
强迫使用某一个索引
上面的两个例子都是给MySQL提供一个选择,也就是说MySQL并不一定要使用这些索引。而有时我们希望MySQL必须要使用某一个索引(由于MySQL在查询时只能使用一个索引,因此只能强迫MySQL使用一个索引)。这就需要使用FORCE INDEX来完成这个功能。
SELECT * FROM TABLE1 FORCE INDEX (FIELD1) …
以上的SQL语句只使用建立在FIELD1上的索引,而不使用其它字段上的索引。
3. 使用临时表提供查询性能
当我们查询的结果集中的数据比较多时,可以通过SQL_BUFFER_RESULT.选项强制将结果集放到临时表中,这样就可以很快地释放MySQL的表锁(这样其它的SQL语句就可以对这些记录进行查询了),并且可以长时间地为客户端提供大记录集。
SELECT SQL_BUFFER_RESULT * FROM TABLE1 WHERE …
和SQL_BUFFER_RESULT.选项类似的还有SQL_BIG_RESULT,这个选项一般用于分组或DISTINCT关键字,这个选项通知MySQL,如果有必要,就将查询结果放到临时表中,甚至在临时表中进行排序。
SELECT SQL_BUFFER_RESULT FIELD1, COUNT(*) FROM TABLE1 GROUP BY FIELD1
六、 结论
在程序设计中同样存在一个“二八原则”,即20%的代码用去了80%的时间。数据库应用程序的开发亦然。数据库应用程序的优化,重点在于SQL的执行效率。而数据查询优化的重点,则是使得数据库服务器少从磁盘中读数据以及顺序读页而不是非顺序读页。
推荐教程:《MySQL教程》
The above is the detailed content of How to optimize database query performance? (detailed explanation). For more information, please follow other related articles on the PHP Chinese website!