Why does MySQL choose B+ tree as the index structure? (detailed explanation)-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Why does MySQL choose B+ tree as the index structure? (detailed explanation)

青灯夜游

Nov 23, 2019 pm 03:28 PM

mysql

In MySQL, both Innodb and MyIsam use B-tree as the index structure (other indexes such as hash are not considered here). This article will start with the most common binary search tree, and gradually explain the problems solved by various trees and the new problems faced, thereby explaining why MySQL chooses B-tree as the index structure.

Why does MySQL choose B+ tree as the index structure? (detailed explanation)

##1. Binary Search Tree (BST): Unbalanced

Binary Search Tree (BST, Binary Search Tree), also called binary sorting tree, needs to satisfy on the basis of binary tree: the value of all nodes on the left subtree of any node is not greater than the value of the root node, any node The value of all nodes in the right subtree is not less than the value of the root node. The following is a BST (picture source).

When fast search is required, storing data in BST is a common choice, because at this time the query time depends on the tree height, and the average time complexity is O( lgn). However,

BST may grow skewed and become unbalanced, as shown in the figure below (picture source). At this time, BST degenerates into a linked list, and the time complexity degenerates into O(n).

In order to solve this problem, balanced binary trees are introduced.

2. Balanced Binary Tree (AVL): Rotation takes time

AVL tree is strictly balanced For binary trees, the height difference between the left and right subtrees of all nodes cannot exceed 1; AVL tree search, insertion and deletion are O(lgn) in both the average and worst cases.

The key to achieving balance in AVL lies in the rotation operation: insertion and deletion may destroy the balance of the binary tree, and the tree needs to be rebalanced through one or more tree rotations. When inserting data, only one rotation (single rotation or double rotation) is required at most; but when data is deleted, it will cause the tree to become unbalanced. AVL needs to maintain the balance of all nodes on the path from the deleted node to the root node. Rotation The magnitude of is O(lgn).

Due to the time-consuming rotation, the AVL tree is very inefficient when deleting data; when there are many deletion operations, the cost of maintaining balance may be higher than benefits, so AVL is not widely used in practice.

3. Red-black tree: The tree is too tall

Compared with AVL trees, red-black trees do not pursue strict balance, but is a rough balance: just make sure that the longest possible path from root to leaf is no more than twice as long as the shortest possible path. From an implementation point of view, the biggest feature of the red-black tree is that each node belongs to one of two colors (red or black), and the division of node colors needs to meet specific rules (the specific rules are omitted). An example of a red-black tree is as follows (picture source):

Compared with an AVL tree, the query efficiency of a red-black tree will decrease. This is because the balance of the tree changes. Bad, higher height. However, the deletion efficiency of the red-black tree has been greatly improved, because the red-black tree also introduces color. When inserting or deleting data, only O(1) times of rotation and color changes are needed to ensure the basic balance. There is no need for AVL The tree performs O(lgn) number of rotations. In general, the statistical performance of red-black trees is higher than that of AVL.

Therefore, in practical applications, AVL trees are used relatively rarely, while red-black trees are used very widely. For example, TreeMap in Java uses red-black trees to store sorted key-value pairs; HashMap in Java 8 uses linked-list red-black trees to solve hash conflict problems (when there are fewer conflicting nodes, use linked lists; when there are more conflicting nodes, use red-black Tree).

For situations where the data is in memory (such as the above-mentioned TreeMap and HashMap), the performance of the red-black tree is very excellent. But

for the situation where the data is in auxiliary storage devices such as disks (such as MySQL and other databases), the red-black tree is not good at it, because the red-black tree still grows too high. . When the data is on the disk, disk IO will become the biggest performance bottleneck, and the design goal should be to minimize the number of IOs; the higher the height of the tree, the more IO times required for additions, deletions, modifications, and searches, which will seriously affect performance.

4. B-tree: Born for disk

BThe tree is also called B- tree (where - is not a minus sign) is a multi-path balanced search tree designed for auxiliary storage devices such as disks. Compared with binary trees, Each non-leaf node of the B tree can have multiple subtrees. Therefore, when the total number of nodes is the same, the height of the B-tree is much smaller than the AVL tree and the red-black tree (the B-tree is a "dumpty"), and the number of disk IOs is greatly reduced.

The most important concept in defining a B-tree is the order. For an m-order B-tree, the following conditions need to be met:

Each node contains at most m child nodes.
If the root node contains child nodes, it must contain at least 2 child nodes; except for the root node, each non-leaf node must contain at least m/2 child nodes.
A non-leaf node with k child nodes will contain k - 1 records.
All leaf nodes are in the same layer.

It can be seen that the definition of B-tree mainly limits the number of child nodes and records of non-leaf nodes.

The following picture is an example of a 3-order B-tree (picture source):

The advantages of the B-tree are not only the small tree height, but also the ability to access local areas. Utilization of sexual principles. The so-called locality principle means that when a piece of data is used, the data nearby has a higher probability of being used in a short time. B-tree stores data with similar keys in the same node. When one of the data is accessed, the database will read the entire node into the cache; when its adjacent data is accessed immediately, it can be read directly from the cache. , no disk IO is required; in other words, the cache hit rate of B-tree is higher.

B-tree has some applications in databases. For example, mongodb's index uses B-tree structure. However, in many database applications, B-tree, a variant of B-tree, is used.

5. B-tree

B-tree is also a multi-path balanced search tree. Its main difference from B-tree is:

Each node in the B-tree (including leaf nodes and non-leaf nodes) stores real data. Only leaf nodes in the B-tree store real data, and non-leaf nodes only store keys. In MySQL, the real data mentioned here may be all the data of the row (such as Innodb's clustered index), or it may be just the primary key of the row (such as Innodb's auxiliary index), or the address of the row (such as MyIsam's non-clustered index).
A record in the B-tree will only appear once and will not appear repeatedly, while the key of the B-tree may appear repeatedly - it will definitely appear in a leaf node, or it may appear repeatedly in a non-leaf node.
The leaf nodes of the B tree are linked through a doubly linked list.
For non-leaf nodes in the B-tree, the number of records is 1 less than the number of child nodes; while the number of records in the B-tree is the same as the number of child nodes.

Thus, B-tree has the following advantages compared to B-tree:

Fewer IO times: The non-leaf nodes of the B tree only contain keys, not real data, so the number of records stored in each node is much more than the number of B (that is, the order m is larger), so the height of the B tree is lower, and the access time is Fewer IO times are required. In addition, since each node stores more records, the access locality principle is better utilized and the cache hit rate is higher.
More suitable for range queries: When performing a range query in a B-tree, first find the lower limit to be found, and then perform an in-order traversal of the B-tree until Find the upper limit of the search; and the range query of the B-tree only needs to traverse the linked list.
More stable query efficiency: The query time complexity of B-tree is between 1 and the tree height (corresponding to the root node and leaf node respectively), The query complexity of the B-tree is stable at the tree height, because all data is in the leaf nodes.

B trees also have disadvantages: they take up more space because keys are repeated. However, compared with the performance advantages, the space disadvantage is often acceptable, so B-trees are more widely used in databases than B-trees.

6. Feel the power of B-tree

As mentioned earlier, compared with binary trees such as red-black trees, B-tree/B-tree has the largest The advantage is that the tree height is smaller. In fact, for Innodb's B index, the height of the tree is generally 2-4 levels. Let’s make some specific estimates below.

The height of the tree is determined by the order. The larger the order, the shorter the tree; and the size of the order depends on how many records each node can store. Each node in Innodb uses a page (page), the page size is 16KB, of which metadata only accounts for about 128 bytes (including file management header information, page header information, etc.), most of the space is used to store data .

For non-leaf nodes, the record only contains the key of the index and a pointer to the next level node. Assuming that each non-leaf page stores 1000 records, each record takes up approximately 16 bytes; this assumption is reasonable when the index is an integer or a shorter string. By extension, we often hear suggestions that the length of the index column should not be too large. Here is the reason: if the index column is too long and each node contains too few records, the tree will be too tall and the indexing effect will be greatly reduced. And the index will waste more space.
For leaf nodes, the record contains the key and value of the index (the value may be the primary key of the row, a complete row of data, etc., see the previous article for details), and the amount of data is larger. It is assumed here that each leaf node page stores 100 records (actually, when the index is a clustered index, this number may be less than 100; when the index is a auxiliary index, this number may be much greater than 100; it can be estimated based on the actual situation) .

For a 3-layer B-tree, the first layer (root node) has 1 page and can store 1000 records; the second layer has 1000 pages and can store 1000*1000 records. ;The third layer (leaf node) has 1000*1000 pages, and each page can store 100 records, so it can store 1000*1000*100 records, or 100 million records. For a binary tree, about 26 layers are needed to store 100 million records.

7. Summary

Finally, summarize the problems solved by various trees and the new problems faced:

1 ), Binary Search Tree (BST): solves the basic problem of sorting, but because balance cannot be guaranteed, it may degenerate into a linked list;

2), Balanced Binary Tree (AVL): solves the problem of balance through rotation , but the rotation operation efficiency is too low;

3), red-black tree: By abandoning strict balance and introducing red-black nodes, the problem of AVL rotation efficiency being too low is solved. However, in scenarios such as disks, the tree Still too high, too many IO times;

4), B-tree: By changing the binary tree to a multi-path balanced search tree, the problem of the tree being too high is solved;

5), B Tree: Based on the B-tree, non-leaf nodes are transformed into pure index nodes that do not store data, further reducing the height of the tree; in addition, the leaf nodes are connected into a linked list using pointers, making range queries more efficient.

Recommended learning: MySQL tutorial

The above is the detailed content of Why does MySQL choose B+ tree as the index structure? (detailed explanation). For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:cnblogs. If there is any infringement, please contact admin@php.cn delete

MySQL's Place: Databases and ProgrammingApr 13, 2025 am 12:18 AM

MySQL's position in databases and programming is very important. It is an open source relational database management system that is widely used in various application scenarios. 1) MySQL provides efficient data storage, organization and retrieval functions, supporting Web, mobile and enterprise-level systems. 2) It uses a client-server architecture, supports multiple storage engines and index optimization. 3) Basic usages include creating tables and inserting data, and advanced usages involve multi-table JOINs and complex queries. 4) Frequently asked questions such as SQL syntax errors and performance issues can be debugged through the EXPLAIN command and slow query log. 5) Performance optimization methods include rational use of indexes, optimized query and use of caches. Best practices include using transactions and PreparedStatemen

MySQL: From Small Businesses to Large EnterprisesApr 13, 2025 am 12:17 AM

MySQL is suitable for small and large enterprises. 1) Small businesses can use MySQL for basic data management, such as storing customer information. 2) Large enterprises can use MySQL to process massive data and complex business logic to optimize query performance and transaction processing.

What are phantom reads and how does InnoDB prevent them (Next-Key Locking)?Apr 13, 2025 am 12:16 AM

InnoDB effectively prevents phantom reading through Next-KeyLocking mechanism. 1) Next-KeyLocking combines row lock and gap lock to lock records and their gaps to prevent new records from being inserted. 2) In practical applications, by optimizing query and adjusting isolation levels, lock competition can be reduced and concurrency performance can be improved.

MySQL: Not a Programming Language, But...Apr 13, 2025 am 12:03 AM

MySQL is not a programming language, but its query language SQL has the characteristics of a programming language: 1. SQL supports conditional judgment, loops and variable operations; 2. Through stored procedures, triggers and functions, users can perform complex logical operations in the database.

MySQL: An Introduction to the World's Most Popular DatabaseApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system, mainly used to store and retrieve data quickly and reliably. Its working principle includes client requests, query resolution, execution of queries and return results. Examples of usage include creating tables, inserting and querying data, and advanced features such as JOIN operations. Common errors involve SQL syntax, data types, and permissions, and optimization suggestions include the use of indexes, optimized queries, and partitioning of tables.

The Importance of MySQL: Data Storage and ManagementApr 12, 2025 am 12:18 AM

MySQL is an open source relational database management system suitable for data storage, management, query and security. 1. It supports a variety of operating systems and is widely used in Web applications and other fields. 2. Through the client-server architecture and different storage engines, MySQL processes data efficiently. 3. Basic usage includes creating databases and tables, inserting, querying and updating data. 4. Advanced usage involves complex queries and stored procedures. 5. Common errors can be debugged through the EXPLAIN statement. 6. Performance optimization includes the rational use of indexes and optimized query statements.

Why Use MySQL? Benefits and AdvantagesApr 12, 2025 am 12:17 AM

MySQL is chosen for its performance, reliability, ease of use, and community support. 1.MySQL provides efficient data storage and retrieval functions, supporting multiple data types and advanced query operations. 2. Adopt client-server architecture and multiple storage engines to support transaction and query optimization. 3. Easy to use, supports a variety of operating systems and programming languages. 4. Have strong community support and provide rich resources and solutions.

Describe InnoDB locking mechanisms (shared locks, exclusive locks, intention locks, record locks, gap locks, next-key locks).Apr 12, 2025 am 12:16 AM

InnoDB's lock mechanisms include shared locks, exclusive locks, intention locks, record locks, gap locks and next key locks. 1. Shared lock allows transactions to read data without preventing other transactions from reading. 2. Exclusive lock prevents other transactions from reading and modifying data. 3. Intention lock optimizes lock efficiency. 4. Record lock lock index record. 5. Gap lock locks index recording gap. 6. The next key lock is a combination of record lock and gap lock to ensure data consistency.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7482

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers