This article brings you relevant knowledge about mysql, which mainly introduces related issues about the index structure. So, what is the structure of the index? Why can indexing be so fast? Let’s take a look at it below, I hope it will be helpful to everyone.
Recommended learning: mysql tutorial
First of all, we need to know that in order to achieve persistence ization, the index can only be stored on the hard disk. When querying through the index, I/O operations on the hard disk will occur. Therefore, when designing the index, it is necessary to reduce the number of searches as much as possible, thereby reducing I/O time-consuming.
In addition, you need to know a very important principle: the basic unit of database management storage space is Page (Page)
, and multiple row records (Row) are stored in one page.
The computer system will do read-ahead
optimization for disk I/O. When an I/O is performed, in addition to the data at the current disk address, adjacent data will also be read. In the memory buffer pool, the data read by each I/O becomes one page. InnoDB's default page size is 16KB.
64 consecutive pages form an Extent
, one or more extents form a Segment
, and one or more segments formTablespace
. InnoDB has two table space types. Shared table space means that multiple tables share one table space. Independent table space means that the data and indexes of each table are all stored in independent table spaces.
The structure of the data page is as follows (Source: Geek Time "MySQL Must Know"):
The 7 structural contents of the data page can be roughly divided into the following three categories:
For details, please refer to Taobao’s database kernel monthly report
Naturally, we will think of some common data structures involved in search algorithms, such as binary search trees, binary balanced trees, etc. In fact, Innodb’s index uses B Tree
is implemented. Let’s take a look at why this index structure was chosen.
Let’s briefly review the definition of Binary Search Tree. In a binary search tree, if the key to be found is greater than the root node, then in Search in the right subtree. If the key is smaller than the root node, search in the left subtree until the key is found. The time complexity is O(logn). For example, the sequence [4,2,6,1,3,5,7] will generate the following binary search tree:
However, in some special cases, the depth of the binary tree will be very large. For example, [1,2,3,4,5,6,7] will generate the following tree:
In the following situation, in the worst case, it takes 7 times to check. The desired results can be found, and the query time becomes O(n).
In order to optimize this situation, there is a balanced binary search tree (AVL tree). An AVL tree refers to a tree in which the height difference between the left and right subtrees does not exceed 1. The search time complexity is O(logn) , this is already an ideal search tree, but in a database with tens of millions of rows of records, the depth of the tree will still be very high, and it is still not the most ideal structure.
So, if you expand from a binary tree to an N-ary tree, it is easy to imagine that the N-ary tree can greatly reduce the depth of the tree. In fact, the 4-layer tree structure is It can already support dozens of terabytes of data.
B-tree (Balance Tree) is such an N-ary tree. B-tree is also called B-tree, which satisfies the following definition:
Let k be the degree of B-tree (degree, representing each The maximum number of child nodes a node can have),
k - 1
keywords and k
pointers to child nodes As mentioned above, each I/O will pre-read the data of a disk block, which is one page in size. The content of a disk block is used to represent an I/O. The structure of the B-tree is as follows Picture (Source: Geek Time SQL must know):
The B tree is also ordered. Since the child node pointer must be 1 more than the keyword, it can be divided into sub-trees by keywords. The section of the node, as in the example in the figure, each node has 2 keys and 3 child nodes, such as disk block 2, the key of the first byte point is 3, 5 is less than its first child node 8 , the second child node's 9, 10 are between 8 and 12, the third child node's value 13, 15 is greater than its own second child node 12.
Suppose we want to find 9 now, the steps are as follows:
. You can see that although many comparison operations have been performed, due to pre-reading, the comparison within the disk block is performed in memory and does not consume disk. I/O, the above operation only requires 3 I/O times to complete, which is already an ideal structure.
B-tree is further improved on the basis of B-tree. The differences between B-tree and B-tree are as follows:
The example is as follows. In this example, the keywords of the parent node are the minimum values among the child nodes (Source: Geek Time SQL must know):
Assumption To find the keyword 16, the search steps are as follows:
Advantages of B tree:
The default index structure of MySQL's memory storage engine is the Hash index. Hash is a function called a hash function, which is passed through a specific Algorithms (such as MD5, SHA1, SHA2, etc.) convert input of arbitrary length into output of fixed length. Input and output correspond one to one. This article will not give an in-depth introduction to the hash function. For details, please refer to Baidu Encyclopedia.
Hash search efficiency is O(1), which is very efficient. Python's dict, golang's map, and java's hash map are all implemented based on hash. Key-Value databases such as Redis are also implemented. Implemented by Hash.
For precise searches, Hash indexes are more efficient than B-tree indexes, but Hash indexes have some limitations and are therefore not the most mainstream index structure.
Based on the above reasons, the Mysql InnoDB engine does not support Hash index, but there is an adaptive Hash index function in the memory structure. When an index value is used very frequently, it will be in B Based on the tree index, automatically creates a Hash index to improve query performance.
Adaptive Hash index can be understood as a kind of "index of indexes". The Hash index is used to store the page address in the B-tree index and quickly locate the corresponding leaf node. It can be viewed through the innodb_adaptive_hash_index
variable.
Recommended learning: mysql tutorial
The above is the detailed content of In-depth understanding of MySQL index structure. For more information, please follow other related articles on the PHP Chinese website!