1The concept of index
1.1Definition
In a relational database, an index is a separate, physical pair of databases A storage structure that sorts one or more column values in a table. It is a collection of one or more column values in a table, and a list of logical pointers to the data pages in the table that physically identify these values.
The index is equivalent to the table of contents of the book. You can quickly find the required content based on the key page numbers of the table of contents. The database uses the index to find a specific value, and then follows the pointer to find the row containing the value, which can correspond to the table. SQL statements execute faster and provide quick access to specific information in database tables.
1.2 Type
InnoDB contains three index types, namely ordinary index, unique index (the primary key index is a special non-empty unique index), and full-text index.
Rewritten as: Ordinary index, also known as non-unique index, has no restrictions. Unique: A unique index requires that the key value cannot be repeated (can be empty). The primary key index is actually a special unique index, but it also has an additional restriction, which requires that the key value cannot be empty. Primary key indexes are created using primary key. Full text (Fulltext): For relatively large data, for example, we store articles, texts, emails, etc., one field may require several kb. If you want to solve the problem of low efficiency of like query in full-text matching, you can create Full text index. Only fields of type char, varchar, and text can create full-text indexes. Both MyISAM and InnoDB support full-text indexing.
1.3 Function
One sentence summary:
Index can improve the efficiency of data retrieval and reduce the IO cost of the database.
Ask a question: We trade space for time, but what about its data structure, query IO cost, and how to store data?
2 Indexed data The evolution process of structure B-tree
We look at the evolution process of our B-tree from a Page
perspective.
Page is the basic unit for InnoDB to manage storage space. InnoDB stores the data in the database in the basic storage unit of page; page is also the basic unit for interaction between memory and disk. The database starts from disk. Read several pages of data into the memory, and refresh several pages of data in the memory to the disk.
The memory size of one page is 16KB.
Suppose we want to execute this SQL and get 10 records:
SELECT * FROM INNODB_USER LIMIT 0 , 10;
If the data size of a record is 4K, then one of our Page pages can How many pieces of data are stored?
16K divided by 4K gets 4 records, right.
Every piece of data in Page has a key attribute called record_type
0 Ordinary user record 1 Directory index record 2 Minimum 3 Maximum
Draw a picture to show how the data is placed on the page:
This is our Page, and each Page will store data, Store the data in an orderly manner according to the primary key
We know that the storage of data is sequential IO, which is convenient for storage. However, if the storage is convenient, the query will be inconvenient. If the last one is checked, does it need to be traversed? Entire page of data? 2.1 QuestionWhat if we want to check a piece of data? How can we quickly find the data?
- If the data in our Page has a connection method, think about the data structures we have learned, which structure is the fastest to query?
- If the data in our Page has a connection method, it can be solved! That's right, it's how the data in the
linked list
is connected (the data is in the same page):
MySQL connects the data in the page throughOne-way linked list. If the query is based on the primary key, the binary positioning method will be very fast. If the query is based on the non-primary key index, only Can traverse a one-way linked list starting from the smallest one.
How to establish a connection between multiple Pages (the data is in different pages):
MySQL passes different pages through a two-way linked list Establish a link so that we can find the next page through the previous page and one page through the next page. Sincewe cannot quickly locate the page where the data is , we can only start from the first page Search all the way down the doubly linked list, and then search for the specified record in each page as on the same page. This is also a full table scan.
When there are more and more Page pages, what problems will occur in the query, how to solve and optimize it?
When our linked list records increase, because we cannot directly locate them, we have the problem of slow query. Think deeply, the so-called slow query,is actually the following two problems:
Query time complexity 0 (N)
The number of IO times reading and writing to the disk is too many
Let's think about it. When we usually read a book, we want to find information on a certain page. How do we do it?
CheckDirectory Right? What is a directory? Isn’t it just an index?
Find a directory on Baidu and post a picture:
We found that there are two Very important information:
Content introduction (chapter title)
The page number
We refer to the idea of a book's catalog to achieve our purpose of quickly querying data:
Add a catalog to the data and check the data, we first based on the catalog Page finds where the data is on which page to improve query performance.
But,
2.3 Question: How to create a directory? Create a table of contents for each page?
Is it necessary to create directories regularly? For example, the directory of a dictionary is established in alphabetical order. What did you think of? That’s right, primary key. The auto-incremented primary key in Mysql just meets our requirements. It is regular, has less content, and is not repeatable. It is a perfect directory. We will store the primary key of each page according to the rules. , add a pointer pointing to the location of the data, directly based on the primary key size during query, use the dichotomy method to quickly find the directory, and then find the data.
But do we need to create a directory for each data page? It seems that this is still necessary. If you don't create data for each page, how can you locate the data in the page? Is it a full page scan?
But create a directory for each page. As the directory pages appear multiple times, we have to traverse the directories one by one The query performance will also decrease.
Can we create a directory for the directory?
So, we can also create a directory for the directory page and extract one layer of root nodes upwards, which will make it easier for us to query.
This tree is stored according to the primary key, so we call it primary key index tree, because The primary key index tree stores all the data in our table, so in MySQL index is data, and data is index for this reason.
This is the data structure of the MysqlB tree primary key index tree. How about it? Is it more impressive than the knowledge you get by memorizing it directly?
2.4 Index tree, Page splitting and mergingWe have found a way to improve query performance. So, what problems will we encounter when Pages are added, modified, or deleted? What ifincreases in an orderly manner and adds a new piece of data? The
page is full, so do you have to open a new page?
And the data of the page must meet a condition:
The primary key value of the user record in the next data page must be greater than the primary key value of the user record in the previous pageBecause it is an orderly increase, We can directly add a page to the end of the doubly linked list of pages.
What if
increases out of order and adds a new piece of data?
- Open a new page and find the location of the data.
- Move the old data to the new page and put the new data in an ordered position.
- Leaf node data is always translated.
- Triggers the splitting and merging of the leaf node data Page and triggers the splitting and merging of the upper leaf node and root node again.
- What is this called, "a single move affects the whole body", also called page splitting! !
Summary: Problems encountered when adding, modifying, and deleting Page pages:
We can say that when an unordered increase occurs During update operations such as updating primary key IDs and deleting index pages, there will be a large number of tree node adjustments, which will trigger the paging and merging of child leaf node Page pages and upper leaf node and root node pages, resulting in a large amount of disk fragmentation and loss of database capacity. Performance, which explains why weshould not build indexes on frequently updated and modified columns, or should not update the primary key .
Let us summarize:
Clustered index (clustered index):
The primary key index tree is also It is called a clustered index or clustered index. In InnoDB, a table has only one clustered index tree. If a table creates a primary key index, then this primary key index is a clustered index. We determine the data based on the key value of the clustered index tree. In the physical storage order of rows, our clustered index will sort and store all columns in the table. The index is the data, and the data is the index, which refers to our primary key index tree. 2.5 Based on what we just deduced, here are some interview questionsWhy is it best for the primary key ID to have an increasing trend?
你刚刚看完啊,不会没记住吧,有序递增,下一个数据页中用户记录的主键值必须大于上一个页中用户的主键值,假如我是趋势递增,存入的数据肯定是在最末尾链表或者新增一个链表,就不会触发页的分裂与合并,导致添加的速度变慢。
三层B+数能存多少数据?
考察点:Page页的大小,B+树的定义
1GB = 1024 M, 1mb = 1024k,1k= 1024 bytes
答:
已知:索引逻辑单元 16bytes 字节,16KB=16* 1024*1024,肯定比一千万多,在InnoDB中B+树的深度为3层就能满足千万级别的数据存储。
mysql 大字段为什么要拆分?
一个Page页可存放16K的数据,大字段占用大量的存储空间,意味着一个Page页可存储的数据条数变少,那么就需要更多的页来存储,需要更多的Page,意味着树的深度会变高。那么磁盘IO的次数会增加,性能下降,查询更慢。大字段不管是否被使用都会存放在索引上,占据大量内存空间压缩Page数据条数。
为什么用B+树?
B+树的底层是多路平衡查找树,对于每一次的查询的都是从根节点触发,到子叶结点才存放数据,根节点和非叶子结点都是存放的索引指针,查找叶子结点互,可以根据键值数据查询。具备更强的扫库、扫表能力、排序能力以及查询效率和性能的稳定性,存储能力也更强,仅使用三层B+树就能存储千万级别的数据。
3什么是二级索引树
刚才看的是根据主键得来的索引,我们如果不查主键,或者说表里压根就没有主键,怎么办?我们还可以根据几个字段来创建联合索引(组合索引聚合索引。。哎呀名字而已怎么叫都行)。
根据主键得到的索引树叫主键索引树,根据别的字段得到的索引树叫二级索引树。
通过下面的SQL 可以建立一个组合索引
ALTER TABLE INNODB_USER ADD INDEX SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name','phone');
其实,看似建立了1个索引,但是你使用 age 查询 age,user_name 查询 age,user_name,phone 都能生效
您也可以认为建立了三个这样的索引:
ALTER TABLE INNODB__USER ADD INDEX SECOND_INDEX_AGE__USERNAME_PHONE('age'); ALTER TABLE INNODB_USER ADD INDEX SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name'); ALTER TABLE `INNODB_USER`ADD INDEX SECOND_INDEX_AGE_USERNAME_PHONE('age','user_name','phone');
3.1那么二级索引树怎么排序?
首先需要知道参与排序的字段类型是否有有序?
如果是有序字段,就按照有序字段排序比如(int) 1 2 3 4。
如果是无序字段,按照这个列的字符集的排序规则来排序,这点不去深入,知道就好。
我现在有一个组合索引(A-B-C)他会按照你建立字段的顺序来进行排序:
如果A相同按照B排序,如果B相同按照C排序,如果ABC全部相同,会按照聚集索引进行排序。
我们的Page会根据组合索引的字段建立顺序来存储数据,年龄 用户名 手机号。
它的数据结构其实是一样的
3.2索引桥的概念是什么呢(最左匹配原则)?
还是上面那个索引,年龄用户名手机号,age,username,phone
那么可以看到我们第一个字段是AGE,如果需要这个索引生效,是不是在查询的时候需要先使用Age查询,然后如果还需要user_name,就使用user_name。
只使用了user_name 能使用到索引吗?
其实是不行的,因为我是先使用age进行排序的,你必须先命中age,再命中user_name,再命中phone,这个其实
就是我们所说的最左匹配原则。
最左其实就是因为我们是按照组合索引的顺序来存储的。大家常说的"索引桥"也是这个原因。在命中组合索引中,必须像过桥一样,先跨过第一块木板,再到第二块木板,最后到第三块木板。
3.3回表、覆盖索引、索引下推
二级索引树有三个重要的概念,分别是回表、覆盖索引、索引下推。.
回表就是:我们查询的数据不在二级索引树中需要拿到ID去主键索引树找的过程。
覆盖索引就是:我们需要查询的数据都在二级索引树中,直接返回这种情况就叫做覆盖索引。
索引下推(index condition pushdown )简称ICP:在Mysql5.6以后的版本上推出,用于优化回表查询;
3.4延申几个面试题:
为什么离散度低的列不走索引?
What is the concept of dispersion? The more identical data, the lower the dispersion, and the less identical data, the higher the dispersion.
The data is all the same, how to sort it? Can't sort it?
There are too many duplicate values in the B Tree. When the MySQL optimizer finds that indexing is almost the same as using a full table scan, it will not go even if the index is created. Whether to use the index or not is decided by the MySQL optimizer.
Are the more indexes, the better?
In terms of space: Exchange space for time, and the index needs to occupy disk space.
Time: Hit the index to speed up our query efficiency. If it is an update and delete, it will cause the splitting and merging of pages, affecting the response time of insert and update statements, but slowing down performance.
If it is a column that needs to be updated frequently, it is not recommended to create an index, because the splitting and merging of pages are frequently triggered.
3.5 Summary of the secondary index tree
Also called a combined index (composite index), the secondary index tree stores the order of the column names when we create the index. It only saves part of the data used to create the secondary index column names. The secondary index tree was born to assist us in querying and improve query efficiency. There are three actions in the secondary index tree: table return, covering index, and index pushdown. Among them, the most performant is the covering index.
4 The difference between primary key index and secondary index
I found a difference picture on the Internet
The above is the detailed content of MySQL index knowledge point analysis. For more information, please follow other related articles on the PHP Chinese website!

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于架构原理的相关内容,MySQL Server架构自顶向下大致可以分网络连接层、服务层、存储引擎层和系统文件层,下面一起来看一下,希望对大家有帮助。

mysql的msi与zip版本的区别:1、zip包含的安装程序是一种主动安装,而msi包含的是被installer所用的安装文件以提交请求的方式安装;2、zip是一种数据压缩和文档存储的文件格式,msi是微软格式的安装包。

方法:1、利用right函数,语法为“update 表名 set 指定字段 = right(指定字段, length(指定字段)-1)...”;2、利用substring函数,语法为“select substring(指定字段,2)..”。

在mysql中,可以利用char()和REPLACE()函数来替换换行符;REPLACE()函数可以用新字符串替换列中的换行符,而换行符可使用“char(13)”来表示,语法为“replace(字段名,char(13),'新字符串') ”。

转换方法:1、利用cast函数,语法“select * from 表名 order by cast(字段名 as SIGNED)”;2、利用“select * from 表名 order by CONVERT(字段名,SIGNED)”语句。

本篇文章给大家带来了关于mysql的相关知识,其中主要介绍了关于MySQL复制技术的相关问题,包括了异步复制、半同步复制等等内容,下面一起来看一下,希望对大家有帮助。

在mysql中,可以利用REGEXP运算符判断数据是否是数字类型,语法为“String REGEXP '[^0-9.]'”;该运算符是正则表达式的缩写,若数据字符中含有数字时,返回的结果是true,反之返回的结果是false。

在mysql中,是否需要commit取决于存储引擎:1、若是不支持事务的存储引擎,如myisam,则不需要使用commit;2、若是支持事务的存储引擎,如innodb,则需要知道事务是否自动提交,因此需要使用commit。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6
Visual web development tools

SublimeText3 Chinese version
Chinese version, very easy to use

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.
