What are the disadvantages of mysql clustered index?-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

What are the disadvantages of mysql clustered index?

一个新手

Sep 19, 2017 am 09:35 AM

mysqlWhichshortcoming

The clustered index is not a separate index type, but a data storage method (not a data structure, but a storage structure). The specific details depend on its implementation, but the clustered index of innodb is actually The btree index and data rows are saved in the same structure.

When a table has an index, its data rows are actually stored in the leaf pages of the index. Clustering means that the data rows and adjacent key values are compactly stored together, because the data rows cannot be stored at the same time. Stored in two different places, so a table can only have one clustered index. Because the storage engine is responsible for implementing the index, not all storage engines support clustered indexes. The following mainly introduces innodb, but the principles discussed below are applicable to any engine that supports clustered indexes:

The leaf page contains all the data of the row, but the node page only contains the index column (or it can be said that the non-leaf page The node pages of the node contain the index of the index value, because the values contained in these node pages are extracted from the index column).

Innodb will aggregate data by primary key. If there is no primary key defined, Innodb will choose the first non-empty unique index instead. If there is no non-empty unique index, Innodb will implicitly define a 6-byte rowid primary key. as a clustered index. InnoDB only aggregates records in the same page, pages containing adjacent key values may be far apart.

Note: Clustered primary keys may help performance, but they may also cause serious performance problems, especially when the table's storage engine is converted from innodb to another engine.

Aggregated data has some important advantages:

A: Related data can be saved together. For example, when implementing email, you can aggregate data based on user ID, so you only need to All emails of a user can be obtained by reading a small number of data pages from the disk. If the clustered index is not used, each email may cause a disk IO

B: Data access is faster, the clustered index will index and The data is stored in the same btree, so retrieving data from a clustered index is usually faster than looking up in a non-clustered index

C: Queries using covering index scans can directly use the primary key value in the page node

Disadvantages of clustered indexes:

A: Clustered data maximizes the performance of IO-intensive applications, but if the data is all placed in memory, the order of access is not so important. No, the clustered index has no advantage anymore

B: The insertion speed depends heavily on the insertion order. Inserting in the order of the primary key is the fastest way to load data into the innodb table, but if it is not loaded in the order of the primary key data, then it is best to use the optimize table command to reorganize the table after the loading is complete

C: Updating clustered index columns is very expensive because it forces innodb to move each updated row to a new location

D: When a table based on a clustered index inserts a new row, or when the primary key is updated and the row needs to be moved, it may face the problem of page splitting. When the primary key value of a row requires that the row must be inserted into a certain When the page is full, the storage engine will split the page into two pages to accommodate the row. This is a page splitting operation. Page splitting will cause the table to occupy more disk space

　E: Aggregation Indexes may cause full table scans to slow down, especially when rows are sparse, or data storage is discontinuous due to page splits

F: The secondary index may be larger than expected, because in the secondary index Leaf nodes contain the primary key columns of the reference rows.

G: Secondary index access requires two index lookups instead of one

Because what is stored in the secondary index leaf node is not the pointer to the physical location of the row, but the primary key value of the row. This means that when searching for rows through the secondary index, the storage engine needs to find the leaf node of the secondary index to obtain the corresponding primary key value, and then use this primary key value to find the corresponding row in the clustered index. Repeated work is done here, two btree lookups instead of once. For innodb, adaptive hash indexes can reduce such repeated work.

Comparison of data distribution between innodb and myisam physical storage:

Myisam:

It is stored on the disk in the order of data insertion. The primary key index and secondary level in myisam There is no difference in the structure of the index. The primary key index is a unique non-empty index named primary.

innodb:

Because innodb supports clustered indexes, it uses a very different way to store the same data. The innodb clustered index contains the data of the entire table, not just the index, because in Innodb , the clustered index is a table, so it does not require independent row storage like myisam. Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transaction and MVCC, and the values of all remaining columns. If the primary key is a column prefix index, InnoDB also contains the complete primary key column and The remaining column values.

Another thing that is different from myisam is that the secondary index of innodb is very different from the clustered index. The leaf nodes of the secondary index of innodb store not the row pointer, but the primary key value, and use this as Pointers to rows. This strategy reduces the maintenance work of the secondary index when rows are moved or data pages are split. Using the primary key value as a pointer will make the secondary index take up more space. The benefit is, InnoDB does not need to update this pointer in the secondary index when moving rows.

Insert rows in the order of primary key in the innodb table. If you are using the Innodb table and there is no data to be aggregated, you can define a surrogate key as the primary key. This primary key data should have nothing to do with the application. The simplest The method is to use auto_increment to automatically increment the column, which can ensure that the data rows are inserted in order, and the performance of association operations based on the primary key will be better.

Do not use UUID as a clustered index, otherwise the performance will be very bad, because it makes the insertion of the clustered index completely random, making the data without any clustering characteristics. Because UUID is used as the primary key to insert rows, not only does it take longer, but the index is also larger. This is because the primary key field has become longer. On the other hand, it is undoubtedly due to the longer time caused by page splitting and the index change caused by fragmentation. big. Because the primary key values are sequential, Innodb stores each record after the previous record. When the maximum fill factor of the page is reached (InnoDB's default maximum fill factor is 15/16 of the page size, leaving (to free up some space for later modification), the next record will be written to a new page. Once the data is loaded in this sequence, the primary key page will be approximately filled with sequential records, which is what is expected. The results (however, secondary index pages may be different).

Under the UUID primary key, because the primary key value of the newly inserted row is not necessarily greater than the previous one, innodb cannot simply always insert the new row at the end of the index, but needs to find the new row. The appropriate location is usually the middle location of the existing data, and allocating new space will add a lot of extra work and lead to less than optimal data distribution. The following are some disadvantages of using UUID as the primary key:

A: The written target page may have been flushed to the disk and removed from the cache, or it has not been loaded into the cache. InnoDB has to find and read the target page from the disk into the memory before inserting it. This is Will result in a lot of random IO

B: Because writes are out of order, innodb has to do page splitting operations frequently to allocate space for new rows. Page splitting will cause a large amount of data to be moved and inserted at one time At least three pages need to be modified instead of one page

C: Due to frequent page splits, pages will become sparse and filled irregularly, so the final data will be fragmented

After loading these random values into the clustered index, you may need to do an optimize table to rebuild the table and optimize page filling. When using InnoDB, you should insert data in primary key order as much as possible, and use a simple increment of the value of the clustering key to insert new rows whenever possible.

Note: When does a sequential primary key cause worse results?

For high-concurrency workloads, inserting in the order of primary keys in Innodb may cause obvious contention. The upper bound of the primary key will be called a hotspot, because all insertions occur here, so concurrent insertions may cause Gap lock contention, another hotspot may be the auto_increment lock mechanism. If you encounter this problem, you may need to redesign the table or application, or change the innodb_autoinc_lock_mode configuration.

The above is the detailed content of What are the disadvantages of mysql clustered index?. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

What are the differences in syntax between MySQL and other SQL dialects?Apr 27, 2025 am 12:26 AM

MySQLdiffersfromotherSQLdialectsinsyntaxforLIMIT,auto-increment,stringcomparison,subqueries,andperformanceanalysis.1)MySQLusesLIMIT,whileSQLServerusesTOPandOracleusesROWNUM.2)MySQL'sAUTO_INCREMENTcontrastswithPostgreSQL'sSERIALandOracle'ssequenceandt

What is MySQL partitioning?Apr 27, 2025 am 12:23 AM

MySQL partitioning improves performance and simplifies maintenance. 1) Divide large tables into small pieces by specific criteria (such as date ranges), 2) physically divide data into independent files, 3) MySQL can focus on related partitions when querying, 4) Query optimizer can skip unrelated partitions, 5) Choosing the right partition strategy and maintaining it regularly is key.

How do you grant and revoke privileges in MySQL?Apr 27, 2025 am 12:21 AM

How to grant and revoke permissions in MySQL? 1. Use the GRANT statement to grant permissions, such as GRANTALLPRIVILEGESONdatabase_name.TO'username'@'host'; 2. Use the REVOKE statement to revoke permissions, such as REVOKEALLPRIVILEGESONdatabase_name.FROM'username'@'host' to ensure timely communication of permission changes.

Explain the differences between InnoDB and MyISAM storage engines.Apr 27, 2025 am 12:20 AM

InnoDB is suitable for applications that require transaction support and high concurrency, while MyISAM is suitable for applications that require more reads and less writes. 1.InnoDB supports transaction and bank-level locks, suitable for e-commerce and banking systems. 2.MyISAM provides fast read and indexing, suitable for blogging and content management systems.

What are the different types of JOINs in MySQL?Apr 27, 2025 am 12:13 AM

There are four main JOIN types in MySQL: INNERJOIN, LEFTJOIN, RIGHTJOIN and FULLOUTERJOIN. 1.INNERJOIN returns all rows in the two tables that meet the JOIN conditions. 2.LEFTJOIN returns all rows in the left table, even if there are no matching rows in the right table. 3. RIGHTJOIN is contrary to LEFTJOIN and returns all rows in the right table. 4.FULLOUTERJOIN returns all rows in the two tables that meet or do not meet JOIN conditions.

What are the different storage engines available in MySQL?Apr 26, 2025 am 12:27 AM

MySQLoffersvariousstorageengines,eachsuitedfordifferentusecases:1)InnoDBisidealforapplicationsneedingACIDcomplianceandhighconcurrency,supportingtransactionsandforeignkeys.2)MyISAMisbestforread-heavyworkloads,lackingtransactionsupport.3)Memoryengineis

What are some common security vulnerabilities in MySQL?Apr 26, 2025 am 12:27 AM

Common security vulnerabilities in MySQL include SQL injection, weak passwords, improper permission configuration, and unupdated software. 1. SQL injection can be prevented by using preprocessing statements. 2. Weak passwords can be avoided by forcibly using strong password strategies. 3. Improper permission configuration can be resolved through regular review and adjustment of user permissions. 4. Unupdated software can be patched by regularly checking and updating the MySQL version.

How can you identify slow queries in MySQL?Apr 26, 2025 am 12:15 AM

Identifying slow queries in MySQL can be achieved by enabling slow query logs and setting thresholds. 1. Enable slow query logs and set thresholds. 2. View and analyze slow query log files, and use tools such as mysqldumpslow or pt-query-digest for in-depth analysis. 3. Optimizing slow queries can be achieved through index optimization, query rewriting and avoiding the use of SELECT*.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

1 months agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

1 months agoByDDD

How to fix KB5055523 fails to install in Windows 11?

2 weeks agoByDDD

InZoi: How To Apply To School And University

3 weeks agoByDDD

Hot Tools

Atom editor mac version download

The most popular open source editor

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software