Home >Database >Mysql Tutorial >What is the reason why MySQL index can improve query efficiency so much?
I believe everyone will talk about indexes when optimizing databases, and I am no exception. Everyone can basically answer one question about the optimization of data structures. Two or three, and page caching, etc., I can talk about it a few words, but once an interviewer from Alibaba P9 asked me: Can you talk about the process of loading index data from the computer level? (Just wanted me to talk about IO)
I died on the spot.... Because the basic knowledge of computer networks and operating systems is really my blind spot, but I made up for it later, so I won’t talk nonsense. , let’s start with the computer loading data, and talk about indexing from another angle.
MySQL's index is essentially a data structure
Let us first understand the data loading of the computer.
Let’s talk about disk IO first. Disk reading data relies on mechanical movement. Reading data at one time requires three steps of seeking, finding a point, and copying to memory.
SeekThe time is the time required for the magnetic arm to move to the specified track, usually less than 5ms;
Search point is from the track The average time to find the point where the data exists is half a turn. If it is a 7200 rpm disk, the average time to find the point is 600000/7200/2=4.17ms;
Copy to memory The time is very fast, which is negligible compared with the previous two times, so the average time of one IO is about 9ms. It sounds fast, but it takes 9000 seconds to go through millions of data in the database, which is obviously a disaster level.
Considering that disk IO is a very expensive operation, the computer operating system has optimized read-ahead. When an IO is performed, not only the data at the current disk address, but alsoadjacent data are read into the memory buffer, because when the computer accesses the data at an address, it is adjacent to it. The data will also be accessed quickly.
We call the data read by IO each time a page. The specific size of data on a page depends on the operating system. It is usually 4k or 8k, that is, we read the data in one page. At that time, only one IO actually occurred. (Suddenly thought of a question I was asked just after graduation. In a 64-bit operating system, how many bytes does the int type in Java occupy? What is the maximum? Why?) Then if we want to optimize database queries, we mustreduce disk IO operations as much as possible, so indexes appear.
What is an index?MySQLThe official definition of index is: Index (Index) is a data structure that helps
MySQL obtain data efficiently.
MySQL The commonly used indexes are physically divided into two categories, B-tree indexes and hash indexes.
BTree index.
BTreeIt is also called a multi-path balanced search tree. The characteristics of an m-fork BTree are as follows:
1 time]
2. Disk block 1 stores 17, 35 and three pointer data. We find 17<29<35, so we find pointer p2. 3. According to the p2 pointer, we locate and read disk block 3. [Disk IO operations2 times]
4. Disk block 3 stores 26, 30 and three pointer data. We find 26<29<30, so we find pointer p2.5. According to the p2 pointer, we locate and read disk block 8. [Disk IO operations 3 times]
6, disk block 8 stores 28, 29. We find 29 and get the data corresponding to 29.
It can be seen that the BTree index makes the data fetched from the memory play a role in each disk I/O, thus improving the query efficiency.
But is there anything that can be optimized?
We can see from the figure that each node contains not only the key value of the data, but also the data value. The storage space of each page is limited. If the data data is large, the number of keys that can be stored in each node (i.e. one page) will be very small. When the amount of stored data is large, it will also lead to B- The depth of Tree is larger, which increases the number of disk I/Os during query, thereby affecting query efficiency.
B Tree
is an optimization based on B-Tree
, making it more suitable for implementing external storage index structures . In B Tree, all data record nodes are stored on leaf nodes of the same layer in order of key value. Only key value information is stored on non-leaf nodes. This can greatly increase the number of key values stored in each node. Reduce the height of B Tree.
B Tree has several differences compared to B-Tree:
Non-leaf nodes only store key value information, data records are stored in leaf nodes. Optimize the B-Tree in the previous section. Since the non-leaf nodes of B Tree only store key value information, the height of B Tree can be compressed to a particularly low level.
The specific data is as follows:
The page size in the InnoDB storage engine is 16KB. The primary key type of the general table is INT (occupies 4 bytes) or BIGINT (occupies 8 bytes). Bytes), the pointer type is generally 4 or 8 bytes, which means that one page (a node in B Tree) stores approximately 16KB/(8B 8B)=1K key values (because it is an estimate, it is For convenience of calculation, the value of K here is 〖10〗^3).
That is to say, a B Tree index with a depth of 3 can maintain 10^3 10^3 10^3 = 1 billion records. (There are errors in this calculation method, and the leaf nodes are not calculated. If the leaf nodes are calculated, the depth is actually 4)
We only need to perform three IO operations to obtain data from 1 billion pieces of data. To find the data we want, we don’t know how many times better it is than the initial million data of 9,000 seconds.
And there are usually two head pointers on B Tree, one points to the root node, the other points to the leaf node with the smallest key, and there is a chain ring structure between all leaf nodes (i.e. data nodes) . Therefore, in addition to performing primary key range search and paging search on B Tree, we can also perform random searches starting from the root node.
The B Tree index in the database can be divided into clustered index (clustered index) and auxiliary index (secondary index).
The implementation of the above B Tree example diagram in the database is a clustered index. The leaf nodes in the B Tree of the clustered index store the row record data of the entire table. The difference between the auxiliary index and the clustered index is The leaf nodes of the auxiliary index do not contain all the data of the row record, but the clustered index key that stores the corresponding row data, that is, the primary key.
When querying data through the auxiliary index, the InnoDB storage engine will traverse the auxiliary index to find the primary key, and then find the complete row record data in the clustered index through the primary key.
However, although indexes can speed up queries and improve MySQL's processing performance, excessive use of indexes will also cause the following disadvantages:
Note: Indexes can speed up queries in some cases, but in some cases, they will reduce efficiency.
Index is only one factor to improve efficiency, so the following principles should be followed when establishing an index:
Now everyone knows why the index can be so fast. In fact, it is just one sentence. The index structure can minimize the number of IO times in the database. After all, one IO time is really too long. . . .
As far as interviews are concerned, we can actually master a lot of knowledge easily, but for the purpose of learning, you will find that there are many things that we need to go deep into the basics of computers to discover them. Mystery, many people ask me how I remember so many things. In fact, learning itself is a very helpless thing. Since we have to learn, why not learn it well? To learn to enjoy it? Recently, I have also been studying the basics, and I will start to update my computer basics and network-related knowledge later.
I am Ao Bing. The more you know, the more you don’t know. See you in the next issue!
TALENTS Our 【三连】 is the biggest motivation for Ao Bing’s creation. If there are any errors or suggestions in this blog, talents are welcome to leave a message!
More related free learning recommendations: mysql tutorial(Video)
The above is the detailed content of What is the reason why MySQL index can improve query efficiency so much?. For more information, please follow other related articles on the PHP Chinese website!