Home  >  Article  >  Database  >  Detailed explanation of how to optimize query speed when MySQL processes large amounts of data

Detailed explanation of how to optimize query speed when MySQL processes large amounts of data

怪我咯
怪我咯Original
2017-04-30 10:10:371502browse

Recently, due to work needs, I have begun to pay attention to the optimization methods related to the select query statement for the Mysql database. Friends who need it can refer to the following

Due to the actual projects I participated in, I found that when the data volume of the mysql table reaches millions, the efficiency of ordinary SQL queries plummets, and if there are many query conditions in where, the query speed is simply intolerable. I once tested a conditional query on a table containing more than 4 million records (with index), and the query time was as high as 40 seconds. I believe that such a high query delay will drive any user crazy. Therefore, how to improve the efficiency of SQL statement query is very important. The following are 30 SQL query statement optimization methods that are widely circulated on the Internet:

1. Try to avoid using != or <>operator in the where clause, otherwise it will The engine gives up using the index and performs a full table scan.

2. To optimize the query, try to avoid full table scans. First, consider creating indexes on the columns involved in where and order by.

3. Try to avoid judging the null value of fields in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as:
select id from t where num is null
You can set the default value 0 on num, ensure that there is no null value in the num column in the table, and then query like this:
select id from t where num=0

4. Try to avoid in the where clause Use or to connect conditions, otherwise the engine will give up using the index and perform a full table scan, such as:
select id from t where num=10 or num=20
You can query like this:
select id from t where num=10
union all
select id from t where num=20

5. The following query will also cause a full table scan: (cannot precede the percent sign)
select id from t where name like '�c%'
To improve efficiency, you can consider full-text search.

6. In and not in should also be used with caution, otherwise it will lead to a full table scan, such as:
select id from t where num in(1,2,3)
For continuous values , if you can use between, don't use in:
select id from t where num between 1 and 3

7. If parameters are used in the where clause, it will also cause a full table scan. Because SQL resolves local variables only at runtime, the optimizer cannot defer selection of an access plan until runtime; it must make the selection at compile time. However, if the access plan is created at compile time, the values ​​of the variables are still unknown and cannot be used as input for index selection. For example, the following statement will perform a full table scan:
select id from t where num=@num
You can change it to force the query to use the index:
select id from t with (index (index name)) where num= @num

8. Try to avoid expression operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan. For example:
select id from t where num/2=100
should be changed to:
select id from t where num=100*2

9. Try to avoid using the where clause Function operations are performed on fields in the field, which will cause the engine to give up using the index and perform a full table scan. For example:
select id from t where substring(name,1,3)='abc'–name starts with abc id
select id from t where datediff(day,createdate,'2005-11-30′ )=0–'2005-11-30′The generated id
should be changed to:
select id from t where name like 'abc%'
select id from t where createdate>='2005-11 -30′ and createdate<'2005-12-1′

10. Do not perform functions, arithmetic operations or other expression operations on the left side of "=" in the where clause, otherwise the system may not be used correctly index.

11. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as the condition to ensure that the system uses the index, otherwise the index will not be used. will be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries. For example, if you need to generate an empty table structure:
select col1,col2 into #t from t where 1=0
This type of code will not return anything. The result set, but it will consume system resources, should be changed to this:
create table #t(...)

13. In many cases, using exists instead of in is a good choice:
select num from a where num in(select num from b)
Replace with the following statement:
select num from a where exists(select 1 from b where num=a.num)

14, and Not all indexes are effective for queries. SQL optimizes queries based on the data in the table. When there is a large amount of duplicate data in the index column, the SQL query may not use the index. For example, there is a field sex in a table, male and female are almost different. Half, then even if an index is built on sex, it will not have any effect on query efficiency.

15. The more indexes, the better. Although the index can improve the efficiency of the corresponding select, it also reduces the efficiency of insert and update, because the index may be rebuilt during insert or update, so how to build the index requires Consider carefully and on a case-by-case basis. It is best not to have more than 6 indexes on a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not commonly used.

16. Update clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column value changes, the entire table record will be Adjusting the order will consume considerable resources. If the application system needs to frequently update clustered index data columns, then you need to consider whether the index should be built as a clustered index.

17. Try to use numeric fields. If the fields contain only numerical information, try not to design them as character fields. This will reduce the performance of query and connection, and increase storage overhead. This is because the engine will compare each character in the string one by one when processing queries and connections, and only one comparison is enough for numeric types.

18. Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, variable length fields have small storage space and can save storage space. Secondly, for queries, search efficiency in a relatively small field is high. Obviously higher.

19. Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any unused fields.

20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the indexes are very limited (only primary key indexes).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable. Using them appropriately can make certain routines more efficient, for example, when you need to repeatedly reference a large table or a frequently used table. ##Dataset when. However, for one-time events, it is better to use an export table.

23. When creating a temporary table, if the one-time

insert data is large, you can use select into instead of create table to avoid causing a large number of logs to increase the speed; if the amount of data Not big. In order to alleviate the resources of the system table, you should first create the table and then insert.

24. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the

stored procedure. First, truncate table, and then drop table, so as to avoid system table corruption. Locked for a long time.

25. Try to avoid

using cursors because cursors are less efficient. If cursor operation involves more than 10,000 rows of data, then you should consider rewriting it.

26. Before using the cursor-based method or the temporary table method, you should first look for a set-based solution to solve the problem. The set-based method is usually more effective.

27. Like temporary tables, cursors are not unusable. Using FAST_

FORWARD cursors with small data sets is often superior to other row-by-row processing methods, especially when several tables must be referenced to obtain the required data. Routines that include "totals" in a result set are usually faster than using a cursor. If development time permits, you can try both the cursor-based method and the set-based method to see which method works better.

28. Set SET NO

COUNT ON at the beginning of all stored procedures and triggers , and set SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of stored procedures and triggers.

29. Try to avoid returning large amounts of data to the client. If the amount of data is too large, you should consider whether the corresponding requirements are reasonable.

30. Try to avoid large

transaction operations and improve the system’s concurrency capability.

The above is the detailed content of Detailed explanation of how to optimize query speed when MySQL processes large amounts of data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn