Home >Database >Mysql Tutorial >How to create high-performance indexes for MySQL

How to create high-performance indexes for MySQL

WBOY
WBOYforward
2023-04-17 18:13:06828browse

    1 Index Basics

    1.1 Index Function

    In MySQL, when searching for data, first find the corresponding value in the index, and then Find the corresponding data row according to the matching index record. If you want to run the following query statement:

    SELECT	* FROM  USER  WHERE uid = 5;

    If there is an index built on uid, MySQL will use the index to first find the row with uid 5, that is to say MySQL first searches by value on the index and then returns all rows containing that value.

    1.2 Commonly used data structures for MySQL indexes

    MySQL indexes are implemented at the storage engine level, not on the server. Therefore, there is no unified indexing standard: indexes in different storage engines work differently.

    1.2.1 B-Tree

    Most MySQL engines support this kind of index B-Tree. Even if multiple storage engines support the same type of index, their underlying implementation may be different. . For example, InnoDB uses B Tree.

    Storage engines implement B-Tree in different ways, with different performances and advantages. For example, MyISAM uses prefix compression technology to make indexes smaller, while InnoDB stores the data according to the original data format. The MyISAM index refers to the indexed rows by the physical location of the data, while InnoDB applies the indexed rows according to the component.

    All B-Tree values ​​are stored sequentially, and the distance from each leaf page to the root is the same. The following figure roughly reflects how the InnoDB index works. The structure used by MyISAM is different. But the basic implementation is similar.

    How to create high-performance indexes for MySQL

    Example diagram description:

    Each node occupies one disk block, and there are two ascending sorting keys and three pointing subtrees on one node. The pointer of the root node, which stores the address of the disk block where the child node is located. The three range fields divided by the two keywords correspond to the range fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the keywords are 16 and 34, the data range of the subtree pointed by the P1 pointer is less than 16, the data range of the subtree pointed by the P2 pointer is 16~34, and the data range of the subtree pointed by the P3 pointer is greater than 34. Keyword search process:

    • Find disk block 1 based on the root node and read it into memory. [Disk I/O operation 1st time]

    • Compare keyword 28 In the interval (16,34), find the pointer P2 of disk block 1.

    • Find disk block 3 based on the P2 pointer and read it into memory. [Disk I/O operation 2nd time]

    • Compare keyword 28 In the interval (25,31), find the pointer P2 of disk block 3.

    • Find disk block 8 based on the P2 pointer and read it into memory. [Disk I/O operation 3rd]

    • Keyword 28 was found in the keyword list in disk block 8.

    Disadvantages:

    • Each node has a key and also contains data, and each page has storage space It is limited. If the data is relatively large, the number of keys stored in each node will become smaller;

    • When the amount of stored data is large, the depth will be larger and increase. The number of disk IO times during querying will affect query performance.

    1.2.2 B Tree Index

    B tree is a variant of B tree. Difference from B-tree: B-tree only stores data in leaf nodes, and non-leaf nodes only store key values ​​and pointers.

    There are two pointers on the B tree, one points to the root leaf node and the other points to the leaf node with the smallest key, and there is a chain ring structure between all leaf nodes (i.e. data nodes). Therefore, two search operations can be performed on the B-tree: one is a range search for components, and the other is a random search starting from the root node.

    B* tree is similar to B number. The difference is that B* number also has a chain ring structure between non-leaf nodes.

    How to create high-performance indexes for MySQL

    1.2.3 Hash Index

    Hash index is based on the hash table. Only queries that accurately match all columns of the index are valid. For each row of data, the storage engine will calculate a hash code for all index columns. The hash code is a smaller value, and the hash codes calculated for rows with different key values ​​are also different. A hash index stores all the hash codes in the index and a pointer to each data row in the hash table.

    In MySQL, only the default index type of Memory is the hash index used, and memory also supports B-Tree indexes. At the same time, the Memory engine supports non-unique hash indexes. If the hash values ​​of multiple columns are the same, the index will store multiple pointers in the same hash entry in a linked list. Similar to HashMap.

    How to create high-performance indexes for MySQL

    Advantages:
    The index itself only needs to store the corresponding hash value, so the structure of the index is very compact, and hashing speeds up searches very fast.
    Disadvantages:

    • If you use hash storage, you need to add all data files to the memory, which consumes more memory space;

    • Hash index data is not stored in order, so it cannot be used for sorting;

    • If all queries are equivalent queries, then hashing is indeed very fast, but in an enterprise or actual working environment, more data is searched in ranges rather than equivalent queries, so hashing is not So suitable;

    • If there are many hash conflicts, the cost of index maintenance operations will also be very high. This is also the problem of Hash conflicts solved by adding red-black trees in the later stage of HashMap;

    2 High-performance index strategy

    2.1 Clustered index and non-clustered index

    Clustered index

    is not a separate index type , but a data storage method. In the InnoDB storage engine, the clustered index actually stores key values ​​and data rows in the same structure. When a table has a clustered index, its data rows are actually stored in the leaf pages of the index. Because data rows cannot be stored in different places at the same time, there can only be one clustered index in a table (index coverage can simulate the situation of multiple clustered indexes).

    How to create high-performance indexes for MySQL

    Advantages of clustered index:

    Can save related data together; data access is faster because the index and data are saved in the same tree; Queries using covering index scans can directly use the primary key value in the page node;

    Disadvantages:

    Clustered data maximizes the performance of IO-intensive applications. If the data is all in memory, then the clustered index has no advantage; the insertion speed depends heavily on the insertion order, and inserting in the order of the primary key is the fastest way; updating the clustered index column is very expensive, because each updated row will be forced to be Move to a new location; a table based on a clustered index may face page splitting when new rows are inserted or the primary key is updated and rows need to be moved; clustered indexes may cause full table scans to slow down, especially row comparisons Sparse, or when data storage is discontinuous due to page splits;

    Non-clustered index

    Data files and index files are stored separately

    2.2 Prefix index

    Sometimes it is necessary to index very long strings, which will make the index large and slow. Usually, you can use part of the string at the beginning of a column, which greatly saves index space and improves index efficiency, but this will Reduce index selectivity. Index selectivity refers to the ratio of unique index values ​​(also called cardinality) to the total number of data table records, ranging from 1/#T to 1. The higher the selectivity of the index, the higher the query efficiency, because a more selective index allows MySQL to filter out more rows when searching.

    Generally, the selectivity of a certain column prefix is ​​high enough to meet the query performance. However, for columns of BLOB, TEXT, and VARCHAR types, prefix indexes must be used because MySQL does not allow indexing of these. The full length of the column. The trick with this method is to choose a prefix that is long enough to ensure high selectivity, but not too long.

    Example

    Table structure and data download from MySQL official website or GitHub.

    city Table Columns

    ##country_idCountry IDlast_update:Creation or last update time
    --计算完整列的选择性
    select count(distinct left(city,3))/count(*) as sel3,
        count(distinct left(city,4))/count(*) as sel4,
        count(distinct left(city,5))/count(*) as sel5,
        count(distinct left(city,6))/count(*) as sel6,
        count(distinct left(city,7))/count(*) as sel7,
        count(distinct left(city,8))/count(*) as sel8 
    from citydemo;

    How to create high-performance indexes for MySQL

    可以看到当前缀长度到达7之后,再增加前缀长度,选择性提升的幅度已经很小了。由此最佳创建前缀索引长度为7。

    2.3 回表

    要理解回表需要先了解聚族索引和普通索引。聚族索引即建表时设置的主键索引,如果没有设置MySQL自动将第一个非空唯一值作为索引,如果还是没有InnoDB会创建一个隐藏的row-id作为索引(oracle数据库row-id显式展示,可以用于分页);普通索引就是给普通列创建的索引。普通列索引在叶子节点中存储的并不是整行数据而是主键,当按普通索引查找时会先在B+树中查找该列的主键,然后根据主键所在的B+树中查找改行数据,这就是回表。

    2.4 覆盖索引

    覆盖索引在InnoDB中特别有用。MySQL中可以使用索引直接获取列的数据,如果索引的叶子节点中已经包含要查询的数据,那么就没必要再回表查询了,如果一个索引包含(覆盖)所有需要查询的字段的值,那么该索引就是覆盖索引。简单的说:不回表直接通过一次索引查找到列的数据就叫覆盖索引。

    表信息

    CREATE TABLE `t_user` (
      `uid` int(11) NOT NULL AUTO_INCREMENT,
      `uname` varchar(255) DEFAULT NULL,
      `age` int(11) DEFAULT NULL,
      `update_time` datetime DEFAULT NULL,
      PRIMARY KEY (`uid`)
    ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

    举例

    --将uid设置成主键索引后通过下面的SQL查询 在explain的Extra列可以看到“Using index”
    explain select uid from t_user where uid = 1;

    How to create high-performance indexes for MySQL

    覆盖索引在组合索引中用的比较多,举例

    explain select age,uname from t_user where age = 10 ;

    当不建立组合索引时,会进行回表查询

    How to create high-performance indexes for MySQL

    设置组合索引后再次查询

    create index index_user on t_user(age,uname);

    How to create high-performance indexes for MySQL

    2.5 索引匹配方式

    2.5.1 最左匹配

    在使用组合索引中,比如设置(age,name)为组合索引,单独使用组合索引中最左列是可以匹配索引的,如果不使用最左列则不走索引。例如下面SQL

    --走索引
    explain select * from t_user where age=10 and uname='zhang';

    How to create high-performance indexes for MySQL

    下面的SQL不走索引

    explain select * from t_user where  uname='zhang';

    How to create high-performance indexes for MySQL

    2.5.2 匹配列前缀

    可以匹配某一列的值的开头部分,比如like 'abc%'。

    2.5.3 匹配范围值

    可以查找某一个范围的数据。

    explain select * from t_user where age>18;

    How to create high-performance indexes for MySQL

    2.5.4 精确匹配某一列并范围匹配另外一列

    可以查询第一列的全部和第二列的部分

    explain select * from t_user where age=18 and uname like 'zhang%';

    How to create high-performance indexes for MySQL

    2.5.5 只访问索引的查询

    查询的时候只需要访问索引,不需要访问数据行,本质上就是覆盖索引。

    explain select age,uname,update_time from t_user 
                where age=18 and uname= 'zhang' and update_time='123';

    How to create high-performance indexes for MySQL

    3 索引优化最佳实践

    1. 当使用索引列进行查询的时候尽量不要使用表达式,把计算放到业务层而不是数据库层。

    --推荐
    select uid,age,uname from t_user where uid=1;
    
    --不推荐
    select uid,age,uname from t_user where uid+9=10;

    2. 尽量使用主键查询,而不是其他索引,因为主键查询不会触发回表查询

    3. 使用前缀索引参考2.2 前缀索引
    4. 使用索引扫描排序mysql有两种方式可以生成有序的结果:通过排序操作或者按索引顺序扫描,如果explain出来的type列的值为index,则说明mysql使用了索引扫描来做排序。
    扫描索引本身是很快的,因为只需要从一条索引记录移动到紧接着的下一条记录。但如果索引不能覆盖查询所需的全部列,那么就不得不每扫描一条索引记录就得回表查询一次对应的行,这基本都是随机IO,因此按索引顺序读取数据的速度通常要比顺序地全表扫描慢。
    mysql可以使用同一个索引即满足排序,又用于查找行,如果可能的话,设计索引时应该尽可能地同时满足这两种任务。
    只有当索引的列顺序和order by子句的顺序完全一致,并且所有列的排序方式都一样时,mysql才能够使用索引来对结果进行排序,如果查询需要关联多张表,则只有当orderby子句引用的字段全部为第一张表时,才能使用索引做排序。order by子句和查找型查询的限制是一样的,需要满足索引的最左前缀的要求,否则,mysql都需要执行顺序操作,而无法利用索引排序。
    举例表结构及数据MySQL官网或GItHub下载。

    CREATE TABLE `rental` (
      `rental_id` int(11) NOT NULL AUTO_INCREMENT,
      `rental_date` datetime NOT NULL,
      `inventory_id` mediumint(8) unsigned NOT NULL,
      `customer_id` smallint(5) unsigned NOT NULL,
      `return_date` datetime DEFAULT NULL,
      `staff_id` tinyint(3) unsigned NOT NULL,
      `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
      PRIMARY KEY (`rental_id`),
      UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`),
      KEY `idx_fk_inventory_id` (`inventory_id`),
      KEY `idx_fk_customer_id` (`customer_id`),
      KEY `idx_fk_staff_id` (`staff_id`),
      CONSTRAINT `fk_rental_customer` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`customer_id`) ON UPDATE CASCADE,
      CONSTRAINT `fk_rental_inventory` FOREIGN KEY (`inventory_id`) REFERENCES `inventory` (`inventory_id`) ON UPDATE CASCADE,
      CONSTRAINT `fk_rental_staff` FOREIGN KEY (`staff_id`) REFERENCES `staff` (`staff_id`) ON UPDATE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=16050 DEFAULT CHARSET=utf8mb4;

    rental表在rental_date,inventory_id,customer_id上有rental_date的索引。使用rental_date索引为下面的查询做排序

    --该查询为索引的第一列提供了常量条件,而使用第二列进行排序,将两个列组合在一起,就形成了索引的最左前缀
    explain select rental_id,staff_id from rental 
    where rental_date='2005-05-25' order by inventory_id desc
    
    --下面的查询不会利用索引
    explain select rental_id,staff_id from rental 
    where rental_date>'2005-05-25' order by rental_date,inventory_id

    How to create high-performance indexes for MySQL

    5. union all,in,or都能够使用索引,但是推荐使用in

    explain select * from actor where actor_id = 1 union all select * from actor where actor_id = 2;
    explain select * from actor where actor_id in (1,2);
    explain select * from actor where actor_id = 1 or actor_id =2;

    How to create high-performance indexes for MySQL

    6. 范围列可以用到索引范围条件是:d2714fbb0e49a95306c2048bc19e4f2b、>=、between。范围列可以用到索引,但是范围列后面的列无法用到索引,索引最多用于一个范围列。

    7. 更新十分频繁,数据区分度不高的字段上不宜建立索引

    • 更新会变更B+树,更新频繁的字段建议索引会大大降低数据库性能;

    • 类似于性别这类区分不大的属性,建立索引是没有意义的,不能有效的过滤数据;

    • 一般区分度在80%以上的时候就可以建立索引,区分度可以使用 count(distinct(列名))/count(*) 来计算;

    8. 创建索引的列,不允许为null,可能会得到不符合预期的结果

    9.当需要进行表连接的时候,最好不要超过三张表,如果需要join的字段,数据类型必须一致

    10. 能使用limit的时候尽量使用limit

    11. 单表索引建议控制在5个以内

    12. 单索引字段数不允许超过5个(组合索引)

    13. 创建索引的时候应该避免以下错误概念

    • 索引越多越好

    • 过早优化,在不了解系统的情况下进行优化

    4 索引监控

    show status like 'Handler_read%';

    How to create high-performance indexes for MySQL

    Field name Meaning
    city_id City primary key ID
    city City name
    参数 说明
    Handler_read_first 读取索引第一个条目的次数
    Handler_read_key 通过index获取数据的次数
    Handler_read_last 读取索引最后一个条目的次数
    Handler_read_next 通过索引读取下一条数据的次数
    Handler_read_prev 通过索引读取上一条数据的次数
    Handler_read_rnd 从固定位置读取数据的次数
    Handler_read_rnd_next 从数据节点读取下一条数据的次数

    The above is the detailed content of How to create high-performance indexes for MySQL. For more information, please follow other related articles on the PHP Chinese website!

    Statement:
    This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete