Home >Database >Mysql Tutorial >Detailed introduction to MySQL indexing and query optimization

Detailed introduction to MySQL indexing and query optimization

不言
不言forward
2019-01-07 11:04:364833browse

This article brings you a detailed introduction to MySQL indexing and query optimization. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

The article "MySQL Query Analysis" describes the method of using MySQL slow query and explain command to locate mysql performance bottlenecks. After locating the performance bottleneck sql statements, you need to analyze the inefficient sql statements. optimization. This article mainly discusses MySQL index principles and commonly used SQL query optimization.

A simple comparison test

In the previous case, the c2c_zwdb.t_file_count table has only one auto-incrementing id, and the sql execution situation of the FFileName field without indexing is as follows:

Detailed introduction to MySQL indexing and query optimization

In the above figure, type=all, key=null, rows=33777. This SQL does not use indexes and is a very inefficient full table scan. If joint queries and other constraints are added, the database will consume crazy memory and affect the execution of the front-end program.

At this time add an index to the FFileName field:

alter table c2c_zwdb.t_file_count add index index_title(FFileName);

Execute the above query statement again, the contrast is obvious:

Detailed introduction to MySQL indexing and query optimization

In this figure, type=ref, key=index name (index_title), rows=1. This SQL uses the index index_title, and it is a constant scan. Only one row is scanned based on the index.

Compared with the situation without indexing, after adding index, the query efficiency contrast is very obvious.

MySQL Index

It can be seen from the above comparison test that index is the key to fast search. The establishment of MySQL index is very important for the efficient operation of MySQL. For a small amount of data, the impact of not having a suitable index is not great, but as the amount of data increases, the performance will drop sharply. If multiple columns are indexed (combined index), the order of the columns is very important, and MySQL can only perform effective searches on the leftmost prefix of the index.

The following introduces several common MySQL index types.

Indexes are divided into single column indexes and combined indexes. A single-column index means that an index only contains a single column. A table can have multiple single-column indexes, but this is not a combined index. Combined index, that is, an index contains multiple columns.

1. MySQL index type

(1) Primary key index PRIMARY KEY

It is a special unique index that does not allow null values . Generally, the primary key index is created at the same time when creating the table.

Detailed introduction to MySQL indexing and query optimization

Of course, you can also use the ALTER command. Remember: a table can only have one primary key.

(2) Unique index UNIQUE

The value of the unique index column must be unique, but null values ​​are allowed. In the case of a composite index, the combination of column values ​​must be unique. You can specify it when creating the table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD UNIQUE (column)

(3) Ordinary index INDEX

This is the most basic index, it has no restrictions. You can specify it when creating the table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD INDEX index_name (column)

(4) Combined index INDEX

Combined index, that is, an index contains multiple columns. You can specify it when creating the table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD INDEX index_name(column1, column2 , column3)

(5) Full-text index FULLTEXT

Full-text index (also called full-text search) is a key currently used by search engines technology. It can use various algorithms such as word segmentation technology to intelligently analyze the frequency and importance of key words in the text, and then intelligently filter out the search results we want according to certain algorithm rules.

can be specified when creating the table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD FULLTEXT (column)

2. Index structure and principle

B Tree is commonly used as index in mysql, but the implementation differs according to clustered index and non-clustered index. This article will not discuss this point for now.

b tree introduction

The picture of the b tree below can be seen in many places. The reason why I choose this picture here is because I think this picture It can explain the index search process very well.

Detailed introduction to MySQL indexing and query optimization

As shown above, it is a b-tree. The light blue block is called a disk block. You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35. Contains pointers P1, P2, and P3. P1 represents disk blocks less than 17, P2 represents disk blocks between 17 and 35, and P3 represents disk blocks greater than 35.

The real data exists in leaf nodes, namely 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the search direction. For example, 17 and 35 do not actually exist in the data table.

Search process

In the above figure, if you want to find data item 29, then disk block 1 will first be loaded from the disk to the memory, and an IO will occur at this time , use binary search in memory to determine that 29 is between 17 and 35, lock the P2 pointer of disk block 1, the memory time is negligible because it is very short (compared to the IO of the disk), and pass the disk address of the P2 pointer of disk block 1 Load disk block 3 from disk to memory, the second IO occurs, 29 is between 26 and 30, lock the P2 pointer of disk block 3, load disk block 8 into the memory through the pointer, the third IO occurs, and at the same time in the memory Do a binary search to find 29 and end the query, a total of three IOs. The real situation is that a 3-layer b-tree can represent millions of data. If millions of data searches only require three IOs, the performance improvement will be huge. If there is no index, each data item will have to have one IO. , then a total of millions of IOs are required, which is obviously very, very expensive.

Properties

(1) The index field should be as small as possible.

Through the search process of the b tree above, or from the fact that real data exists in leaf nodes, we can know that the number of IOs depends on the height h of the b number.

Assume that the data volume of the current data table is N, and the number of data items in each disk block is m, then the tree height h=㏒(m 1)N, when the data volume N is constant, m The larger the value, the smaller h;

And m = the size of the disk block/the size of the data item. The size of the disk block is the size of a data page, which is fixed; if the space occupied by the data item is smaller, , the greater the number m of data items, the lower the height h of the tree. This is why each data item, that is, the index field, must be as small as possible. For example, int occupies 4 bytes, which is half less than bigint 8 bytes.

(2) The leftmost matching characteristic of the index.

When the data items of the b tree are composite data structures, such as (name, age, sex), the b number is built in order from left to right, such as when (Zhang San ,20,F) When retrieving data like this, the b-tree will give priority to comparing the name to determine the next search direction. If the names are the same, then compare age and sex in turn, and finally get the retrieved data; when (20,F ) When data without name comes, the b-tree does not know which node to check next, because name is the first comparison factor when building the search tree, and it must first search based on name to know where to go next. Inquire. For example, when retrieving data like (Zhang San, F), the b-tree can use name to specify the search direction, but the next field age is missing, so it can only find all the data whose name is equal to Zhang San, and then match the gender. It is the data of F. This is a very important property, that is, the leftmost matching characteristic of the index.

Several major principles for building an index

(1) Leftmost prefix matching principle

For multi-column indexes, always start from the front of the index The field starts and continues, and the middle cannot be skipped. For example, if you create a multi-column index (name, age, sex), the name field will be matched first, then the age field, and then the sex field. The middle cannot be skipped. MySQL will keep matching to the right until it encounters a range query (>,

Generally, when creating a multi-column index, the most frequently used column in the where clause is placed on the far left.

Look at a comparative example of complementation that complies with the leftmost prefix matching principle and that complies with this principle.

Example: Table c2c_db.t_credit_detail has an index (Flistid,Fbank_listid)

Detailed introduction to MySQL indexing and query optimization

SQL statements that do not comply with the leftmost prefix matching principle:

select * from t_credit_detail where Fbank_listid='201108010000199'G

This SQL statement directly uses the second index field Fbank_listid, skip it The first index field Flistid is removed, which does not comply with the leftmost prefix matching principle. Use the explain command to view the execution plan of the sql statement, as shown below:

Detailed introduction to MySQL indexing and query optimization

As can be seen from the above figure, the sql does not use indexes and is a low Efficient full table scan.

SQL statement that conforms to the leftmost prefix matching principle:

select * from t_credit_detail where Flistid='2000000608201108010831508721' and Fbank_listid='201108010000199'G

This sql first uses the first field Flistid of the index, and then uses the second field Fbank_listid of the index. There is no skipping in the middle, which conforms to the leftmost prefix matching principle. Use the explain command to view the execution plan of the sql statement, as shown below:

Detailed introduction to MySQL indexing and query optimization

As can be seen from the above figure, the sql uses indexes and only scans One line.

Comparison shows that the efficiency of SQL statements that comply with the leftmost prefix matching principle is greatly improved compared to SQL statements that do not comply with this principle, from full table scan to constant scan.

(2) Try to select columns with high differentiation as indexes.

For example, we will choose the student number as the index, but not the gender.

(3) = and in can be out of order

For example, a = 1 and b = 2 and c = 3, (a, b, c) index can be created in any order, mysql query The optimizer will help you optimize it into a form that the index can recognize.

(4) Index columns cannot participate in calculations, keep the columns "clean"

For example: Flistid 1>'2000000608201108010831508721'. The reason is very simple. If the index column participates in the calculation, then each time the index is retrieved, the index will be calculated once and then compared. Obviously, the cost is too high.

(5) Expand the index as much as possible, do not create a new index.

For example, there is already an index of a in the table, and now you want to add an index of (a, b), then you only need to modify the original index.

Disadvantages of index

Although indexes can improve query efficiency, indexes also have their own shortcomings.

Additional overhead of index:

(1) Space: The index needs to occupy space;

(2) Time: Querying the index takes time;

( 3) Maintenance: The index needs to be maintained (when data changes);

It is not recommended to use the index:

(1) Table with a small amount of data

(2) Space is tight

Commonly used optimization summary

There are many optimization statements, and there are many things that need to be paid attention to. Here are a few points based on the usual situation:

1. There is an index but it is not used. (not recommended)

(1) When the parameters of Like start with a wildcard character

Try to avoid the parameters of Like starting with a wildcard character, otherwise the database engine will give up using the index and perform a full table scan .

SQL statements starting with wildcards, for example: select * from t_credit_detail where Flistid like '%0'G

Detailed introduction to MySQL indexing and query optimization

This is all Table scan does not use indexes and is not recommended.

SQL statements that do not start with wildcards, for example: select * from t_credit_detail where Flistid like '2%'G

Detailed introduction to MySQL indexing and query optimization

Obviously , this uses the index, which is a range search, and is much more efficient than the SQL statement starting with a wildcard character.

(2) When the where condition does not comply with the leftmost prefix principle

Examples have been given in the content of the leftmost prefix matching principle.

(3) Use! = or operators

try to avoid using them! = or operator, otherwise the database engine will give up using the index and perform a full table scan. It is more efficient to use > or <.>

select * from t_credit_detail where Flistid != '2000000608201108010831508721'G

Detailed introduction to MySQL indexing and query optimization

##(4) Index columns participate in the calculation

You should try to avoid expression operations on fields in the where clause, which will cause the engine to give up using the index and perform a full table scan.

select * from t_credit_detail where Flistid 1 > '2000000608201108010831508722'G

Detailed introduction to MySQL indexing and query optimization##(5) Judge the null value of the field

Try to avoid making null value judgments on fields in the where clause, otherwise the engine will give up using the index and perform a full table scan, such as: Inefficiency: select * from t_credit_detail where Flistid is null;

You can set the default value 0 on Flistid, ensure that there is no null value in the Flistid column in the table, and then query like this: Efficient: select * from t_credit_detail where Flistid =0;

(6) Use or to connect conditions

You should try to avoid using or in the where clause to connect conditions, otherwise the engine will give up using the index and perform a full table scan, such as: Inefficiency: select * from t_credit_detail where Flistid = '2000000608201108010831508721' or Flistid = '10000200001';

You can use the following query to replace the above or query: Efficient: select

from t_credit_detail where Flistid = '2000000608201108010831508721' union all select

from t_credit_detail where Flistid = '10000200001' ;

Detailed introduction to MySQL indexing and query optimization

#2. Avoid selecting *

During the parsing process, '*' will be converted into all column names in turn. This works This is done by querying the data dictionary, which means it will take more time.

So, you should develop a good habit of taking whatever you need.

3. Order by statement optimization

Any non-index items or calculation expressions in the Order by statement will reduce the query speed.

Method: 1. Rewrite the order by statement to use the index;

  2.为所使用的列建立另外一个索引

  3.绝对避免在order by子句中使用表达式。

4. GROUP BY statement optimization

Improve the efficiency of the GROUP BY statement by removing unnecessary Records are filtered out before GROUP BY

Inefficiency:

SELECT JOB, AVG(SAL)

FROM EMP

GROUP by JOB

HAVING JOB = 'PRESIDENT'

OR JOB = 'MANAGER'

Efficient:

SELECT JOB , AVG(SAL)

FROM EMP

WHERE JOB = 'PRESIDENT'

OR JOB = 'MANAGER'

GROUP by JOB

5. Use exists instead of in

Many times it is a good choice to use exists instead of in: select num from a where num in(select num from b) Replace with the following statement: select num from a where exists(select 1 from b where num=a.num)

6. Use varchar/nvarchar instead of char/nchar

Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, the storage space of variable length fields is small, which can save storage space. Secondly, for In terms of queries, searching within a relatively small field is obviously more efficient.

7. If you can use DISTINCT, you don’t need GROUP BY

SELECT OrderID FROM Details WHERE UnitPrice > 10 GROUP BY OrderID

can be changed to:

SELECT DISTINCT OrderID FROM Details WHERE UnitPrice > 10

8. If you can use UNION ALL, don’t use UNION

UNION ALL does not execute the SELECT DISTINCT function, which will reduce a lot of unnecessary resources.

9. Use equivalent types of examples when joining tables and index them

If the application has many JOIN queries, you should confirm that the Join fields in the two tables are created. indexed. In this way, MySQL will start a mechanism internally to optimize the Join SQL statement for you.

Moreover, these fields used for Join should be of the same type. For example: If you join a DECIMAL field with an INT field, MySQL cannot use their indexes. For those STRING types, they also need to have the same character set. (The character sets of the two tables may be different)

This article ends here. For more knowledge about MySQL, you can follow the MySQL Tutorial column of the php Chinese website ! ! !

The above is the detailed content of Detailed introduction to MySQL indexing and query optimization. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete