Home >Database >Mysql Tutorial >MySQL single column index and joint index summary
This article brings you relevant knowledge about mysql, which mainly introduces issues related to single-column indexes and joint indexes. Using additional columns in the index can narrow the scope of the search, but Using an index with two columns is different from using two separate indexes. Let's take a look at it together. I hope it will be helpful to everyone.
Recommended learning: mysql video tutorial
Using additional columns in the index, you can Narrow your search, but using an index with two columns is different than using two separate indexes.
The structure of the joint index is similar to that of a phone book. A person's name is composed of a surname and a given name. The phone book is first sorted by surname, and then sorted by first name for people with the same surname. A phone book is very useful if you know your last name, even more useful if you know both your first and last name, but useless if you only know your first name but not your last name.
So when creating a joint index, you should carefully consider the order of columns. Union indexes are useful when searching on all columns in the index or only on the first few columns; they are not useful when searching on any subsequent columns.
When multiple single-column indexes are used for multi-condition queries, the optimizer will give priority to the optimal index strategy. It may only use one index, or it may use all multiple indexes. . However, multiple single-column indexes will create multiple B index trees at the bottom, which takes up space and wastes a certain amount of search efficiency. Therefore, it is best to build a joint index if there are only multi-condition joint queries.
As the name implies, it is leftmost priority. Any consecutive index starting from the leftmost can be matched. If the first field is a range query, it needs to be created separately. For an index, when creating a joint index, the most frequently used column in the where clause should be placed on the far left according to business needs. In this case, the scalability is better. For example, username is often used as a query condition, but age is not often used, so username needs to be placed in the first position of the joint index, that is, on the far left.
ALTER TABLE employee ADD INDEX idx_name_salary (name,salary)
SELECT * FROM employee WHERE NAME='哪吒编程'
SELECT * FROM employee WHERE salary=5000
SELECT * FROM employee WHERE NAME='哪吒编程' AND salary=5000
SELECT * FROM employee WHERE salary=5000 AND NAME='哪吒编程'
Compound index is also called a joint index. When we create a joint index, such as (k1, k2,k3), which is equivalent to creating three indexes (k1), (k1,k2) and (k1,k2,k3). This is the leftmost matching principle.
The joint index does not satisfy the leftmost principle, and the index will generally fail.
This involves the query optimizer strategy of MySQL itself. When a table has multiple indexes, MySQL will choose which index to use based on the cost of the query statement;
Some people say where query The order is from left to right, so the conditions with the strongest screening force should be placed first. Baidu Online does have this statement, but I have personally tested it. The MySQL execution optimizer will optimize it. When the index is not considered, the order of where conditions has no impact on efficiency. What really has an impact is whether the index is used!
When creating **(a, b, c) joint index, it is equivalent to creating (a) single column index, (a, b) joint index and (a, b, c) joint index, if you want the index to be effective, you can only use three combinations; of course, we have tested above that the combination of a and c can also be used, but in fact only the index of a is used, and c is not Not used.
1. Like subquery, put % in front;
2. Non-null judgment is not null; no index is used before and after the or statement at the same time. When only one of the left and right query fields of or is an index, the index will be invalid. It will only take effect when both the left and right query fields of or are indexes;
3. The or statement (only if there are indexes before and after, SQL optimization is required Avoid writing or statements);
4. There is implicit conversion of data types. If varchar is not enclosed in single quotes, it may be automatically converted to int type, invalidating the index and causing a full table scan.
1. Fields that need to be indexed must be included in the where condition
2. Fields with a small amount of data do not need to be indexed because they are built Indexing has a certain overhead. If the amount of data is small, there is no need to build an index, and the speed range is slow.
3. The joint index has more advantages than building an index on each column, because the more indexes are created, the more disk space is occupied, and the slower the data update speed will be. In addition, when creating a multi-column index, the order It is also important to note that strict indexing should be placed first, so that the screening will be stronger and more efficient.
支持事务处理,支持外键,支持崩溃修复能力和并发控制。如果需要对事务的完整性要求比较高(比如银行),要求实现并发控制(比如售票),那选择InnoDB有很大的优势。如果需要频繁的更新、删除操作的数据库,也可以选择InnoDB,因为支持事务的提交和回滚。
插入速度快,空间和内存使用比较低。如果表主要是用于插入新纪录和读取记录,那么选择MyISAM能实现处理高效率。如果应用的完整性、并发要求比较低,也可以使用。
注意,同一个数据库也可以使用多种存储引擎的表。如果一个表要求比较高的事务处理,可以选择InnoDB。这个数据库中可以将查询要求比较高的表选择MyISAM存储。如果该数据库需要一个用于查询的临时表,可以选择MEMORY存储引擎。
在mysql中常用两种索引结构(算法)BTree和Hash,两种算法检索方式不一样,对查询的作用也不一样。
Hash索引的底层实现是由Hash表来实现的,非常适合以 key-value 的形式查询,也就是单个key 查询,或者说是等值查询。
Hash 索引可以比较方便的提供等值查询的场景,由于是一次定位数据,不像BTree索引需 要从根节点到枝节点,最后才能访问到页节点这样多次IO访问,所以检索效率远高于BTree索引。但是对于范围查询的话,就需要进行全表扫描了。
但为什么我们使用BTree比使用Hash多呢?主要Hash本身由于其特殊性,也带来了很多限制和弊端:
Hash索引仅仅能满足“=”,“IN”,“”查询,不能使用范围查询。
联合索引中,Hash索引不能利用部分索引键查询。 对于联合索引中的多个列,Hash是要么全部使用,要么全部不使用,并不支持BTree支持的联合索引的最优前缀,也就是联合索引的前面一个或几个索引键进行查询时,Hash索引无法被利用。
Hash索引无法避免数据的排序操作 由于Hash索引中存放的是经过Hash计算之后的Hash值,而且Hash值的大小关系并不一定和Hash运算前的键值完全一样,所以数据库无法利用索引的数据来避免任何排序运算。
Hash索引任何时候都不能避免表扫描 Hash索引是将索引键通过Hash运算之后,将Hash运算结果的Hash值和所对应的行指针信息存放于一个Hash表中,由于不同索引键存在相同Hash值,所以即使满足某个Hash键值的数据的记录条数,也无法从Hash索引中直接完成查询,还是要通过访问表中的实际数据进行比较,并得到相应的结果。
Hash索引遇到大量Hash值相等的情况后性能并不一定会比BTree高 对于选择性比较低的索引键,如果创建Hash索引,那么将会存在大量记录指针信息存于同一个Hash值相关联。这样要定位某一条记录时就会非常麻烦,会浪费多次表数据访问,而造成整体性能底下。
B+Tree索引是最常用的mysql数据库索引算法,因为它不仅可以被用在=,>,>=,
例如:
select * from user where name like 'jack%'; select * from user where name like 'jac%k%';
如果一通配符开头,或者没有使用常量,则不会使用索引,
例如:
select * from user where name like '%jack'; select * from user where name like simply_name;
在数据库中,数据量相对较大,多路查找树显然更加适合数据库的应用场景,接下来我们就介绍这两类多路查找树,毕竟作为程序员,心里没点B树怎么能行呢?
B树:B树就是B-树,他有着如下的特性:
B树不同于二叉树,他们的一个节点可以存储多个关键字和多个子树指针,这就是B+树的特点;
一个m阶的B树要求除了根节点以外,所有的非叶子子节点必须要有[m/2,m]个子树;
根节点必须只能有两个子树,当然,如果只有根节点一个节点的情况存在;
B树是一个查找二叉树,这点和二叉查找树很像,他都是越靠前的子树越小,并且,同一个节点内,关键字按照大小排序;
B树的一个节点要求子树的个数等于关键字的个数+1;
B+树就是B树的plus版
B+树将所有的查找结果放在叶子节点中,这也就意味着查找B+树,就必须到叶子节点才能返回结果;
B The number of keywords in each node of the tree is the same as the number of subtree pointers;
B The number of keywords in the non-leaf nodes of the tree corresponds to A pointer, and the key is the maximum or minimum value of the subtree;
Optimize the B-Tree in the previous section, because the non-leaf nodes of the B Tree only store keys Value information, assuming that each disk block can store 4 key values and pointer information, then it becomes a B Tree and its structure is as shown below:
Usually there are two One pointer points to the root node, the other points to the leaf node with the smallest key, and there is a chain ring structure between all leaf nodes (that is, data nodes). Therefore, two search operations can be performed on B Tree: one is a range search and paging search for the primary key, and the other is a random search starting from the root node.
Maybe there are only 22 data records in the above example, and the advantages of B Tree cannot be seen. Here is a calculation:
The page size in the InnoDB storage engine is 16KB, and the primary key type of the general table It is INT (occupies 4 bytes) or BIGINT (occupies 8 bytes), and the pointer type is generally 4 or 8 bytes, which means that a page (a node in the B Tree) probably stores 16KB/(8B 8B)=1K
key values (because it is an estimate, for the convenience of calculation, the value of K here is 〖10〗^3).
In other words, a B Tree index with a depth of 3 can maintain 10^3 * 10^3 * 10^3 = 1 billion
records.
In actual situations, each node may not be filled completely, so in the database, the height of B Tree is generally 2-4 layers. MySQL's InnoDB storage engine is designed so that the root node is resident in memory, which means that only 1 to 3 disk I/O operations are needed to find the row record of a certain key value.
The B Tree index in the database can be divided into clustered index (clustered index) and auxiliary index (secondary index). The above B Tree example diagram is implemented in the database as a clustered index. The leaf nodes in the B Tree of the clustered index store the row record data of the entire table. The difference between an auxiliary index and a clustered index is that the leaf nodes of the auxiliary index do not contain all the data of the row record, but the clustered index key that stores the corresponding row data, that is, the primary key. When querying data through a secondary index, the InnoDB storage engine traverses the secondary index to find the primary key, and then finds the complete row record data in the clustered index through the primary key.
Recommended learning: mysql video tutorial
The above is the detailed content of MySQL single column index and joint index summary. For more information, please follow other related articles on the PHP Chinese website!