InnoDB data storage files are different from MyISAM-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

InnoDB data storage files are different from MyISAM

coldplay.xixi

Feb 02, 2021 am 09:16 AM

MySQL TutorialWhy use B Tree for the index introduced in the column

##Preface

The title of this article is a real problem I encountered during the interview process. An Internet crowdfunding company asked the first question when testing the interviewer's MySQL-related knowledge. , I was quite confused at the time. I didn’t expect that this young man didn’t follow martial arts ethics and didn’t play his cards according to routines. When people generally asked about MySQL-related knowledge, didn’t they always ask about index optimization, index failure and other related issues? Why did it come out? The storage files are different? Even if you examine the MVCC mechanism, it will do. So this time I will summarize this part of the knowledge points.

Why you need to create an index

First of all, we all know that the purpose of creating an index is to improve the query speed, so why can the query speed be improved with an index?

Let’s take a look at a schematic diagram of an index.

If I have a SQL statement:
select * from Table where id = 15 Then in the absence of an index, a full table scan will actually be performed, that is, one by one Search until you find the record with id=15, the time complexity is O(n);

What if you query it with an index? First, a binary search will be performed in the index value based on id=15. The efficiency of binary search is very high, and its time complexity is O(logn);

This is why indexes can improve query efficiency. , but the amount of index data is also relatively large, so it is generally not stored in memory, but is stored directly in the disk. Therefore, disk IO is unavoidable when reading the file content in the disk.

Why does MySQL index use B Tree

As we said above, index data is generally stored on disk, but calculated data must be in memory. If the index file is very large, it cannot be loaded into the memory at once, so when using the index for data search, multiple disk IOs will be performed to load the index data into the memory in batches,

Therefore, a good index data structure must have the least number of disk IO times under the premise of obtaining correct results. <strong></strong>

Hash type

Currently, MySQL actually has two index data types to choose from, one is BTree (actually B Tree), A Hash.

But why do most people choose BTree in actual use?

Because if you use a Hash type index, MySQL will perform a Hash operation on the index data when creating the index, so that the disk pointer can be quickly located based on the Hash value, even if the amount of data is large. Data can also be located quickly and accurately.

select * from Table where id > 15, Hash type indexes cannot be handled. For this range query, the entire table will be directly Scanning, and Hash type indexes cannot be sorted.

Binary tree

Then why does MySQL not have a binary tree as its index data structure? We all know that binary trees locate data through binary search, so the effect is still good, and the time complexity is O(logn);

InnoDB data storage files are different from MyISAM However, there is a problem with binary trees, that is, under special circumstances , it will degenerate into a stick, that is, a one-way linked list. At this time, its time complexity will degenerate to O(n);

So when we want to query the record with id=50, it is actually the same as a full table scan. So because of this situation, the binary tree is not suitable as an index data structure.

Balanced Binary Tree

So since a binary tree will degenerate into a linked list under special circumstances, why can't a balanced binary tree be used?

The height difference between the child nodes of a balanced binary tree cannot exceed 1. Like the binary tree in the figure below, the node with the key 15 has a height of 0 on its left child node and a height of 0 on its right child node. is 1, and the height difference does not exceed 1, so the tree below is a balanced binary tree.
平衡InnoDB data storage files are different from MyISAM Because it can maintain balance, its query time complexity is O(logN). As for how to maintain balance, it mainly involves doing some left-hand rotation, right-hand rotation, etc. The specific details of maintaining balance are not in this article. If you want to know the main content, you can search it yourself.

What are the problems with using this data structure to index MySQL?

Excessive disk IO: In MySQL, an IO operation only reads one node. If a node has at most two child nodes, then there are only queries for these two child nodes. range, so to be accurate to specific data, multiple reads are required. If the tree is very deep, a large amount of disk IO will be performed. Performance naturally degrades.
Low space utilization: For a balanced binary tree, each node value stores a keyword, a data area, and pointers to two child nodes. As a result, loading only such a small amount of data in one hard-working IO operation is really overkill.
The query effect is unstable: If in a balanced binary tree with a very deep height, if the queried data happens to be the root node, then it will be found quickly. If the queried data is If it happens to be a leaf node, it will need to perform multiple disk IOs before it can be returned. The response time may not be in the same order of magnitude as that of the root node.

Although the binary tree solves the problem of balance, it also brings new problems, that is, due to the depth of its own tree, it will cause a series of efficiency problems.

So in order to solve the problem of balancing binary trees, balanced multi-tree (Balance Tree) has become a better choice.

Balance Tree–B-Tree

B-Tree means balanced multi-tree. Generally, a node in B-Tree has The number of child nodes is what we call a B-Tree of that order. Usually m is used to represent the order. When m is 2, it is a balanced binary tree.

Each node of a B-Tree can have at most m-1 keywords, and at least Math.ceil(m/2)-1 keywords must be stored. All The leaf nodes are all on the same layer. The picture below is a 4th order B-Tree.
InnoDB data storage files are different from MyISAM
Then let’s take a look at how B-Tree searches for data:

If you query the data with id=7, first add the keyword The node of 20 is loaded into the memory, and it is judged that 7 is smaller than 20;
Then load the first child node, and if the queried data is equal to 12 or 17, it will be returned directly. If it is not equal, continue to search downwards and find 7 Less than 12;
Then continue to load the first child node. After finding 7, directly return the data below 7.

In this way, the entire operation actually performs 3 IO operations, but in fact, a general B-Tree has many branches (usually greater than 100) in each layer.

In order to better utilize the IO capability of the disk, MySQL sets the size of the operation page to 16K, that is, the size of each node is 16K. If the keywords in each node are of int type, then it is 4 bytes. If the size of the data area is 8 bytes, and the node pointer occupies another 4 bytes, then each node of the B-Tree The number of keywords that can be saved is: (16*1000) / (4 8 4)=1000. Each node can store up to 1000 keywords, and each node can have up to 1001 branch nodes. .

In this way, when querying index data, one disk IO operation can read 1000 keywords into the memory for calculation. One disk IO operation of B-Tree balances the binary data on top. N disk IO operations have been performed.

It should be noted that:B-Tree will perform a series of operations in order to ensure the balance of the data. This process of maintaining balance is relatively time-consuming. , so when creating an index, you must choose appropriate fields and do not create too many indexes. If you create too many indexes, the process of updating the index will be more time-consuming when updating the data.

Also Don’t choose low-discrimination field values as indexes, such as gender fields. There are only two values in total. Then it may cause the depth of the B-Tree to be too large and the index Efficiency is reduced.

B Tree

B-Tree has solved the problem of balanced binary trees very well, and can also ensure query efficiency, so why is there B What about Tree?

Let’s first look at what B Tree looks like.

B Tree is a variant of B-Tree. The relationship between each node keyword and the m-order formula of B Tree is different from that of B-Tree.

First of all, the ratio of the number of child nodes of each node and the keywords that can be stored in each node is 1:1. Secondly, when querying data, the left closed interval is used for query. There is also no data in the branch node. Only the keywords and child node points are saved, and the data is stored in the leaf nodes.
InnoDB data storage files are different from MyISAM
Then let’s take a look at how to perform data query in B Tree.

For example:

Now if we want to query the data with id=2, we will first take out the root node and load it into the memory. We find that id=2 exists in the root node because it stores data in the left closed interval. , so id are all on the first child node of the root node;
Then take out the first child node, load it into the memory, and find that the current node existsid=2 keyword, and it has reached the leaf node, then directly retrieve the data in the leaf node and return it.

Now let’s look at the difference between B-Tree and B Tree

B Tree’s query uses the left closed interval, so It can better support the query effect of auto-incrementing indexes, so generally when creating primary keys, they are usually auto-incrementing. This is different from B-Tree.
B Tree does not save data on the root node and branch nodes. Keyword-related data is only saved on the leaf nodes. This ensures the stability of the query effect. Any query You have to go to the leaf node to get the data. B-Tree saves data in branch nodes, and if the keyword is hit, the data is returned directly.
B Tree's leaf nodes are arranged sequentially, and two adjacent leaf nodes have a sequential reference relationship, which can better support range queries. B-Tree does not have this order relationship.

Why MySQL chose B Tree for its index

After the above layer-by-layer analysis, we can now summarize why MySQL chose B Tree as the data structure for its index. Woolen cloth.

First of all, compared with the balanced binary tree, the depth of the B Tree is lower, the nodes save more keywords, the number of disk IOs is fewer, and the query calculation efficiency is better.
B Tree has stronger global scanning capabilities. If you want to globally scan the data table based on index data, B-Tree will scan the entire tree. Then traverse layer by layer. As for B Tree, you only need to traverse the leaf nodes, because there is a sequential reference relationship between leaf nodes.
B Tree’s disk IO read and write capabilities are stronger, because only keywords are saved on each branch node of B Tree, so that every time the disk IO is When reading and writing, one page of 16K data can store more keywords, and each node can store more keywords than B-Tree. In this way, B Tree's disk IO loads much more data than B-Tree.
B Tree data structure has natural sorting ability, which is stronger than other data structures and sorting is done through branch nodes. If Branch nodes need to be loaded into memory for sorting, and more data can be loaded at one time.
B Tree's query effect is more stable, because all queries need to scan the leaf nodes before returning the data. The effect is only stable but not necessarily optimal. If the root node data of B-Tree is directly queried, B-Tree can directly return the data with only one disk IO, but the effect is optimal.

After analyzing the above points, MySQL finally chose B Tree as the data structure of its index.

What are the differences between InnDB’s data storage files and MyISAM’s?

The above summarizes the data structure of MySQL index. This time we can talk about the second question, because this question actually has a certain relationship with MySQL index.
Let's take a look, first find the directory where the server MySQL stores data:
Log in to MySQL, open the MySQL command line interface: enter show variables like '�tadir%';, and you can See the directory where the data is stored.
The directory where MySQL stores data in my server is:

/var/lib/mysql/

After entering this directory, you can see the directories of all databases and create a new database of study_test.
Then enter the directory

/var/lib/mysql/study_test

. Currently there is only one file. This file is used to record the contents of the character set configured when creating the database.

-rw-r----- 1 mysql mysql     60 1月  31 10:28 db.opt

Now create two new tables, select InnoDB as the engine type of the first table, and select MyISAM as the engine type of the second table.

student_innodb：

CREATE TABLE `student_innodb` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `address` varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE COMMENT 'name索引') ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='innodb引擎表';

student_myisam：

CREATE TABLE `student_myisam` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `address` varchar(100) COLLATE utf8mb4_bin DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_name` (`name`) USING BTREE COMMENT 'name索引') ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='myISAM引擎类型表';

After the two tables are created, we enter /var/lib/mysql/study_testTake a look:

-rw-r----- 1 mysql mysql     60 1月  31 10:28 db.opt-rw-r----- 1 mysql mysql   8650 1月  31 10:41 student_innodb.frm-rw-r----- 1 mysql mysql 114688 1月  31 10:41 student_innodb.ibd-rw-r----- 1 mysql mysql   8650 1月  31 10:58 student_myisam.frm-rw-r----- 1 mysql mysql      0 1月  31 10:58 student_myisam.MYD-rw-r----- 1 mysql mysql   1024 1月  31 10:58 student_myisam.MYI

Through the files in the directory, you can see that there are several more files after creating the table. This also shows that the InnoDB engine type table File differences with MyISAM engine type tables.

Each of these files has its own function:

There are two table files of the InnoDB engine:
- *.frm This type of file is the definition file of the table.
- *.ibd This type of file is a data and index storage file. Table data and indexes are aggregated and stored, and the data can be directly queried through the index.
There are three table files in the MyIASM engine:
- *.frm This type of file is the definition file of the table.
- *.MYD This type of file is a table data file, and all data in the table is saved in this file.
- *.MYI This type of file is the index file of the table, and the index data of the MyISAM storage engine is stored separately.

MyISAM data storage engine, index and data storage structure

When the MyISAM storage engine stores the index, it The data is stored separately, and the indexed B Tree ultimately points to the physical address where the data exists, not the specific data. Then find the specific data in the data file (*.MYD) according to the physical address.

As shown in the figure below:
InnoDB data storage files are different from MyISAM
Then when there are multiple indexes, multiple indexes point to the same physical address.
As shown in the figure below:

Through this structure, we can see that the indexes of MyISAM's storage engine are all at the same level, and the primary key and non-primary key index structures and query methods are exactly the same.

InnoDB data storage engine, index and data storage structure

First of all, InnoDB’s index is divided into clustered index and non-clustered index. Clustered index saves key Words save data, keywords are saved on each branch node of B Tree, and data are saved on leaf nodes.
"Clustering" means that the data rows are stored closely together one by one in a certain order. A table can only have one clustered index, because there is only one way to store data in a table. Generally, the primary key is used as a clustered index. If there is no primary key, InnoDB will generate a hidden column as the primary key by default.

As shown in the figure below:
InnoDB data storage files are different from MyISAM
Non-clustered index, also known as secondary index, although it also saves keywords on each branch node of B Tree, but the leaf node Not the saved data, but the saved primary key value. Querying data through the secondary index will first query the primary key corresponding to the data, and then query the specific data row based on the primary key.

As shown in the figure below:
InnoDB data storage files are different from MyISAM
Due to the design structure of the non-clustered index, the non-clustered index needs to perform two index retrievals when querying. This design The advantage is that once data migration occurs, only the primary key index needs to be updated, and the non-clustered index does not need to be moved. It also avoids the need to store physical addresses like MyISAM indexes, which need to be re-maintained during data migration. All indexing issues.

Summary

This time I have clearly summarized the data structure of MySQL index and the file storage structure. Later, in the actual work process, when designing the index I can think more comprehensively. By understanding the data structure of the index, I can also consider when writing SQL, which situations are indexed and which are not.

MySQL uses B Tree as the data structure of the index, because the depth of B Tree is low, the nodes save many keywords, and the number of disk IOs is small, thus ensuring higher query efficiency.
B Tree can ensure that the query effect of MySQL is stable whether it is a primary key index or a non-primary key index. Each time, the leaf node must be queried to return data. The depth of the leaf node of B Tree is the same. And in order to better support auto-incrementing primary keys, the query node range of B Tree is closed on the left and open on the right.
MySQL's MyISAM storage engine, table data and index data are stored in two files respectively, because the B Tree of its own index The disk address where the table data pointed to by the leaf node is located, and the index does not distinguish between primary key and non-primary key, so separate storage can better manage the index in a unified manner;
MySQL's InnoDB storage engine, table data and Index data are stored in a file, because the leaf nodes of InnoDB's clustered index point to specific data rows, and in order to ensure the stability of the query effect, there must be For a clustered index, when the secondary index performs index retrieval, it will first retrieve the primary key value of the data through the secondary index, and then retrieve the specific data in the clustered index based on the primary key.

Related free learning recommendations: mysql video tutorial

The above is the detailed content of InnoDB data storage files are different from MyISAM. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete

Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

Describe the different SQL transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) and their implications in MySQL/InnoDB.Apr 15, 2025 am 12:11 AM

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL vs. Other Databases: Comparing the OptionsApr 15, 2025 am 12:08 AM

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

How does MySQL index cardinality affect query performance?Apr 14, 2025 am 12:18 AM

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

MySQL: Resources and Tutorials for New UsersApr 14, 2025 am 12:16 AM

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

Real-World MySQL: Examples and Use CasesApr 14, 2025 am 12:15 AM

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

Atom editor mac version download

The most popular open source editor

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Zend Studio 13.0.1

Powerful PHP integrated development environment

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7517

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers