The biggest role of a database index is to speed up queries. It can fundamentally reduce the number of record rows that need to be scanned. The database index is the data structure of the database. Furthermore, the data structure stores a All values of a column in the table, that is to say, the index is created based on a column in the data table.
Database index is an identifier attached to table fields in order to increase query speed. I have seen many people understand the concept of index mechanically and think that adding indexes only has benefits and no harm. Here I would like to summarize the previous index study notes:
First understand why the index will increase the speed. When the DB executes an Sql statement, the default method is to perform a full table scan based on the search conditions, and when a matching condition is encountered is added to the search result collection. If we add an index to a certain field, when querying, we will first locate the number of rows with a specific value in the index list, which greatly reduces the number of matching rows traversed, so the query speed can be significantly increased. So should indexing be added at any time? Here are a few counter-examples: 1. If you need to get all table records every time, and you must perform a full table scan anyway, then there is no point in adding an index. 2. For non-unique fields, such as "gender", which have a large number of repeated values, adding indexes is meaningless. 3. For tables with relatively few records, adding indexes will not bring about speed optimization but waste storage space, because indexes require storage space, and there is a fatal disadvantage that for each execution of update/insert/delete, the field All indexes must be recalculated for updates.
So when is it appropriate to add an index? Let's look at an example given in the Mysql manual. Here is a sql statement:
SELECT c.companyID, c.companyName FROM Companies c, User u WHERE c.companyID = u.fk_companyID AND c.numEmployees > = 0 AND c.companyName LIKE '%i%' AND u.groupID IN (SELECT g.groupID FROM Groups g WHERE g.groupLabel = 'Executive')
This statement involves the join of 3 tables. And includes many search conditions such as size comparison, Like matching, etc. The number of scan rows that Mysql needs to perform without an index is 77721876 rows. After we add indexes to the companyID and groupLabel fields, the number of scanned rows is only 134. In Mysql, you can view the number of scans through Explain Select. It can be seen that in the case of such joint tables and complex search conditions, the performance improvement brought by the index is far more important than the disk space it occupies.
So how is the index implemented? Most DB vendors implement indexes based on a data structure - B-tree. Because the characteristic of B-tree is that it is suitable for organizing dynamic lookup tables on direct storage devices such as disks. The definition of B-tree is as follows: A B-tree of order m(m>=3) is an m-ary tree that satisfies the following conditions:
1. Each node includes the following scope (j, p0 , k1, p1, k2, p2, ... ki, pi) where j is the number of keywords, p is the child pointer
2. All leaf nodes are on the same layer, and the number of layers is equal to the height of the tree h
3. The number of keywords contained in each non-root node satisfies [m/2-1]<=j<=m-1
4. If the tree is not empty , then the root has at least 1 keyword. If the root is not a leaf, there are at least 2 subtrees and at most m subtrees
Look at an example of a B-tree. For a B-tree with 26 English letters, this can be done structure:
It can be seen that the complexity of searching English letters in this B-tree is only O(m). When the amount of data is relatively large, such a structure can greatly increase the query speed. However, there is another data structure that performs queries faster than B-trees - hash tables. The definition of the Hash table is as follows: Let the set of all possible keywords be u, the actually stored keywords are denoted k, and |k| is much smaller than |u|. The hashing method is to map u to the subscript of the table T[0,m-1] through the hash function h, so that the keywords in u are variables, and the result of the function operation with h is the storage address of the corresponding node. . Thus, the search can be completed in O(1) time.
However, the hash table has a flaw, that is, hash conflict, that is, two keywords calculate the same result through the hash function. Let m and n represent the length of the hash table and the number of filled nodes respectively. n/m is the filling factor of the hash table. The larger the factor, the greater the chance of hash conflict.
Because of this flaw, the database will not use hash tables as the default implementation of indexes. Mysql claims that it will try to convert the disk-based B-tree index into a suitable hash index according to the execution query format in order to pursue further progress. Improve search speed. I think other database vendors will have similar strategies. After all, in the database battlefield, search speed and management security are very important competitive points.
The above is the detailed content of The role of database index. For more information, please follow other related articles on the PHP Chinese website!