Home >Database >Mysql Tutorial >MySQL index knowledge point analysis

MySQL index knowledge point analysis

PHPz
PHPzforward
2023-05-27 20:38:351494browse

    1The concept of index

    1.1Definition

    In a relational database, an index is a separate, physical pair of databases A storage structure that sorts one or more column values ​​in a table. It is a collection of one or more column values ​​in a table, and a list of logical pointers to the data pages in the table that physically identify these values.
    The index is equivalent to the table of contents of the book. You can quickly find the required content based on the key page numbers of the table of contents. The database uses the index to find a specific value, and then follows the pointer to find the row containing the value, which can correspond to the table. SQL statements execute faster and provide quick access to specific information in database tables.

    1.2 Type

    InnoDB contains three index types, namely ordinary index, unique index (the primary key index is a special non-empty unique index), and full-text index.

    Rewritten as: Ordinary index, also known as non-unique index, has no restrictions. Unique: A unique index requires that the key value cannot be repeated (can be empty). The primary key index is actually a special unique index, but it also has an additional restriction, which requires that the key value cannot be empty. Primary key indexes are created using primary key. Full text (Fulltext): For relatively large data, for example, we store articles, texts, emails, etc., one field may require several kb. If you want to solve the problem of low efficiency of like query in full-text matching, you can create Full text index. Only fields of type char, varchar, and text can create full-text indexes. Both MyISAM and InnoDB support full-text indexing.

    1.3 Function

    One sentence summary:

    Index can improve the efficiency of data retrieval and reduce the IO cost of the database.

    Ask a question: We trade space for time, but what about its data structure, query IO cost, and how to store data?

    2 Indexed data The evolution process of structure B-tree

    We look at the evolution process of our B-tree from a Page perspective.

    Page is the basic unit for InnoDB to manage storage space. InnoDB stores the data in the database in the basic storage unit of page; page is also the basic unit for interaction between memory and disk. The database starts from disk. Read several pages of data into the memory, and refresh several pages of data in the memory to the disk.
    The memory size of one page is 16KB.

    Suppose we want to execute this SQL and get 10 records:

    SELECT * FROM INNODB_USER LIMIT 0 , 10;

    If the data size of a record is 4K, then one of our Page pages can How many pieces of data are stored?

    16K divided by 4K gets 4 records, right.

    Every piece of data in Page has a key attribute called record_type
    0 Ordinary user record 1 Directory index record 2 Minimum 3 Maximum

    Draw a picture to show how the data is placed on the page:

    MySQL index knowledge point analysis

    This is our Page, and each Page will store data, Store the data in an orderly manner according to the primary key

    We know that the storage of data is sequential IO, which is convenient for storage. However, if the storage is convenient, the query will be inconvenient. If the last one is checked, does it need to be traversed? Entire page of data?

    2.1 Question

    What if we want to check a piece of data? How can we quickly find the data?

    • If the data in our Page has a connection method, think about the data structures we have learned, which structure is the fastest to query?

    • If the data in our Page has a connection method, it can be solved! That's right, it's how the data in the

      linked list

    is connected (the data is in the same page):

    MySQL connects the data in the page through

    One-way linked list. If the query is based on the primary key, the binary positioning method will be very fast. If the query is based on the non-primary key index, only Can traverse a one-way linked list starting from the smallest one.

    How to establish a connection between multiple Pages (the data is in different pages):

    MySQL passes different pages through a two-way linked list Establish a link so that we can find the next page through the previous page and one page through the next page. Since

    we cannot quickly locate the page where the data is , we can only start from the first page Search all the way down the doubly linked list, and then search for the specified record in each page as on the same page. This is also a full table scan.

    MySQL index knowledge point analysis

    2.2 Question

    When there are more and more Page pages, what problems will occur in the query, how to solve and optimize it?

    When our linked list records increase, because we cannot directly locate them, we have the problem of slow query. Think deeply, the so-called slow query,

    is actually the following two problems:

    • Query time complexity 0 (N)

    • The number of IO times reading and writing to the disk is too many

    Let's think about it. When we usually read a book, we want to find information on a certain page. How do we do it?
    CheckDirectory Right? What is a directory? Isn’t it just an index?

    Find a directory on Baidu and post a picture:

    MySQL index knowledge point analysis

    We found that there are two Very important information:

    • Content introduction (chapter title)

    • The page number

    We refer to the idea of ​​​​a book's catalog to achieve our purpose of quickly querying data:

    Add a catalog to the data and check the data, we first based on the catalog Page finds where the data is on which page to improve query performance.

    But,

    2.3 Question: How to create a directory? Create a table of contents for each page?

    Is it necessary to create directories regularly? For example, the directory of a dictionary is established in alphabetical order. What did you think of? That’s right, primary key. The auto-incremented primary key in Mysql just meets our requirements. It is regular, has less content, and is not repeatable. It is a perfect directory. We will store the primary key of each page according to the rules. , add a pointer pointing to the location of the data, directly based on the primary key size during query, use the dichotomy method to quickly find the directory, and then find the data.
    But do we need to create a directory for each data page? It seems that this is still necessary. If you don't create data for each page, how can you locate the data in the page? Is it a full page scan?
    But create a directory for each page. As the directory pages appear multiple times, we have to traverse the directories one by one The query performance will also decrease.
    Can we create a directory for the directory?
    So, we can also create a directory for the directory page and extract one layer of root nodes upwards, which will make it easier for us to query.

    MySQL index knowledge point analysis

    This tree is stored according to the primary key, so we call it primary key index tree, because The primary key index tree stores all the data in our table, so in MySQL index is data, and data is index for this reason.

    This is the data structure of the MysqlB tree primary key index tree. How about it? Is it more impressive than the knowledge you get by memorizing it directly?

    2.4 Index tree, Page splitting and merging

    We have found a way to improve query performance. So, what problems will we encounter when Pages are added, modified, or deleted?

    What if

    increases in an orderly manner and adds a new piece of data? The page is full, so do you have to open a new page?
    And the data of the page must meet a condition:
    The primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous pageBecause it is an orderly increase, We can directly add a page to the end of the doubly linked list of pages.
    What if
    increases out of order and adds a new piece of data?

    • Open a new page and find the location of the data.

    • Move the old data to the new page and put the new data in an ordered position.

    • Leaf node data is always translated.

    • Triggers the splitting and merging of the leaf node data Page and triggers the splitting and merging of the upper leaf node and root node again.

    • What is this called, "a single move affects the whole body", also called page splitting! !

    Summary: Problems encountered when adding, modifying, and deleting Page pages:

    We can say that when an unordered increase occurs During update operations such as updating primary key IDs and deleting index pages, there will be a large number of tree node adjustments, which will trigger the paging and merging of child leaf node Page pages and upper leaf node and root node pages, resulting in a large amount of disk fragmentation and loss of database capacity. Performance, which explains why we

    should not build indexes on frequently updated and modified columns, or should not update the primary key .

    Let us summarize:

    Clustered index (clustered index):

    The primary key index tree is also It is called a clustered index or clustered index. In InnoDB, a table has only one clustered index tree. If a table creates a primary key index, then this primary key index is a clustered index. We determine the data based on the key value of the clustered index tree. In the physical storage order of rows, our clustered index will sort and store all columns in the table. The index is the data, and the data is the index, which refers to our primary key index tree.

    2.5 Based on what we just deduced, here are some interview questions

    Why is it best for the primary key ID to have an increasing trend?

    你刚刚看完啊,不会没记住吧,有序递增,下一个数据页中用户记录的主键值必须大于上一个页中用户的主键值,假如我是趋势递增,存入的数据肯定是在最末尾链表或者新增一个链表,就不会触发页的分裂与合并,导致添加的速度变慢。

    三层B+数能存多少数据?

    考察点:Page页的大小,B+树的定义
    1GB = 1024 M, 1mb = 1024k,1k= 1024 bytes

    答:
    已知:索引逻辑单元 16bytes 字节,16KB=16* 1024*1024,肯定比一千万多,在InnoDB中B+树的深度为3层就能满足千万级别的数据存储。

    mysql 大字段为什么要拆分?

    一个Page页可存放16K的数据,大字段占用大量的存储空间,意味着一个Page页可存储的数据条数变少,那么就需要更多的页来存储,需要更多的Page,意味着树的深度会变高。那么磁盘IO的次数会增加性能下降,查询更慢。大字段不管是否被使用都会存放在索引上,占据大量内存空间压缩Page数据条数。

    为什么用B+树?

    B+树的底层是多路平衡查找树,对于每一次的查询的都是从根节点触发,到子叶结点才存放数据,根节点和非叶子结点都是存放的索引指针,查找叶子结点互,可以根据键值数据查询。具备更强的扫库、扫表能力、排序能力以及查询效率和性能的稳定性,存储能力也更强,仅使用三层B+树就能存储千万级别的数据。

    3什么是二级索引树

    刚才看的是根据主键得来的索引,我们如果不查主键,或者说表里压根就没有主键,怎么办?我们还可以根据几个字段来创建联合索引(组合索引聚合索引。。哎呀名字而已怎么叫都行)。

    根据主键得到的索引树叫主键索引树,根据别的字段得到的索引树叫二级索引树。

    通过下面的SQL 可以建立一个组合索引

    ALTER TABLE INNODB_USER ADD INDEX
    SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name','phone');

    其实,看似建立了1个索引,但是你使用 age 查询 age,user_name 查询 age,user_name,phone 都能生效
    您也可以认为建立了三个这样的索引:

    ALTER TABLE INNODB__USER ADD INDEX
    SECOND_INDEX_AGE__USERNAME_PHONE('age');
    ALTER TABLE INNODB_USER ADD INDEX
    SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name');
    ALTER TABLE `INNODB_USER`ADD INDEX
    SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name','phone');

    3.1那么二级索引树怎么排序?

    首先需要知道参与排序的字段类型是否有有序?

    如果是有序字段,就按照有序字段排序比如(int) 1 2 3 4。
    如果是无序字段,按照这个列的字符集的排序规则来排序,这点不去深入,知道就好。

    我现在有一个组合索引(A-B-C)他会按照你建立字段的顺序来进行排序:
    如果A相同按照B排序,如果B相同按照C排序,如果ABC全部相同,会按照聚集索引进行排序。

    我们的Page会根据组合索引的字段建立顺序来存储数据,年龄 用户名 手机号。
    它的数据结构其实是一样的

    3.2索引桥的概念是什么呢(最左匹配原则)?

    还是上面那个索引,年龄用户名手机号,age,username,phone
    那么可以看到我们第一个字段是AGE,如果需要这个索引生效,是不是在查询的时候需要先使用Age查询,然后如果还需要user_name,就使用user_name。

    只使用了user_name 能使用到索引吗?
    其实是不行的,因为我是先使用age进行排序的,你必须先命中age,再命中user_name,再命中phone,这个其实
    就是我们所说的最左匹配原则。

    最左其实就是因为我们是按照组合索引的顺序来存储的。大家常说的"索引桥"也是这个原因。在命中组合索引中,必须像过桥一样,先跨过第一块木板,再到第二块木板,最后到第三块木板。

    3.3回表、覆盖索引、索引下推

    二级索引树有三个重要的概念,分别是回表、覆盖索引、索引下推。.

    回表就是:我们查询的数据不在二级索引树中需要拿到ID去主键索引树找的过程。

    覆盖索引就是:我们需要查询的数据都在二级索引树中,直接返回这种情况就叫做覆盖索引。
    索引下推(index condition pushdown )简称ICP:在Mysql5.6以后的版本上推出,用于优化回表查询;

    3.4延申几个面试题:

    为什么离散度低的列不走索引?

    What is the concept of dispersion? The more identical data, the lower the dispersion, and the less identical data, the higher the dispersion.
    The data is all the same, how to sort it? Can't sort it?
    There are too many duplicate values ​​in the B Tree. When the MySQL optimizer finds that indexing is almost the same as using a full table scan, it will not go even if the index is created. Whether to use the index or not is decided by the MySQL optimizer.

    Are the more indexes, the better?

    In terms of space: Exchange space for time, and the index needs to occupy disk space.
    Time: Hit the index to speed up our query efficiency. If it is an update and delete, it will cause the splitting and merging of pages, affecting the response time of insert and update statements, but slowing down performance.
    If it is a column that needs to be updated frequently, it is not recommended to create an index, because the splitting and merging of pages are frequently triggered.

    3.5 Summary of the secondary index tree

    Also called a combined index (composite index), the secondary index tree stores the order of the column names when we create the index. It only saves part of the data used to create the secondary index column names. The secondary index tree was born to assist us in querying and improve query efficiency. There are three actions in the secondary index tree: table return, covering index, and index pushdown. Among them, the most performant is the covering index.

    4 The difference between primary key index and secondary index

    I found a difference picture on the Internet

    MySQL index knowledge point analysis

    The above is the detailed content of MySQL index knowledge point analysis. For more information, please follow other related articles on the PHP Chinese website!

    Statement:
    This article is reproduced at:yisu.com. If there is any infringement, please contact admin@php.cn delete