MySQL数据库InnoDB存储引擎多版本控制(MVCC)实现原理分析

Home

Database

Mysql Tutorial

MySQL数据库InnoDB存储引擎多版本控制(MVCC)实现原理分析_MySQL

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 01, 2016 pm 01:13 PM

technologydatabaseResearch InstituteNetEase

文/何登成

导读：

来自网易研究院的MySQL内核技术研究人何登成，把MySQL数据库InnoDB存储引擎的多版本控制（简称：MVCC）实现原理，做了深入的研究与详细的文字图表分析，方便大家理解InnoDB存储引擎实现的多版本控制技术（简称：MVCC）。

基本知识

假设对于多版本控制(MVCC)的基础知识，有所了解。MySQL数据库InnoDB存储引擎为了实现多版本的一致性读，采用的是基于回滚段的协议。

行结构

MySQL数据库InnoDB存储引擎表数据的组织方式为主键聚簇索引。由于采用索引组织表结构，记录的ROWID是可变的(索引页分裂的时候，Structure Modification Operation，SMO)，因此二级索引中采用的是(索引键值, 主键键值)的组合来唯一确定一条记录。

无论是聚簇索引，还是二级索引，其每条记录都包含了一个DELETED BIT位，用于标识该记录是否是删除记录。除此之外，聚簇索引记录还有两个系统列：DATA_TRX_ID，DATA_ROLL_PTR。DATA _TRX_ID表示产生当前记录项的事务ID；DATA _ROLL_PTR指向当前记录项的undo信息。

聚簇索引行结构(与多版本一致读有关的部分，DELETED BIT省略)：

innodb-mvcc-1

二级索引行结构：

innodb-mvcc-2

从聚簇索引行结构，与二级索引行结构可以看出，聚簇索引中包含版本信息(事务号+回滚指针)，二级索引不包含版本信息，二级索引项的可见性如何判断？下面将会给出。

Read View

InnoDB存储引擎默认的隔离级别为Repeatable Read (RR)，可重复读。InnoDB存储引擎在开始一个RR读之前，会创建一个Read View。Read View用于判断一条记录的可见性。Read View定义在read0read.h文件中，其中最主要的与可见性相关的属性如下：

123456789101112131415161718192021

dulintlow_limit_id;/* 事务号 >= low_limit_id的记录，对于当前Read View都是不可见的 */

dulintup_limit_id;/* 事务号

ulintn_trx_ids;/* Number of cells in the trx_ids array */

dulint*trx_ids;/* Additional trx ids which the read should

not see: typically, these are the active

transactions at the time when the read is

serialized, except the reading transaction

itself; the trx ids in this array are in a

descending order */

dulintcreator_trx_id;/* trx id of creating transaction, or

(0, 0) used in purge */

简单来说，Read View记录读开始时，所有的活动事务，这些事务所做的修改对于Read View是不可见的。除此之外，所有其他的小于创建Read View的事务号的所有记录均可见。可见包括两层含义：

记录可见，且Deleted bit = 0；当前记录是可见的有效记录。
记录可见，且Deleted bit = 1；当前记录是可见的删除记录。此记录在本事务开始之前，已经删除。

测试方法：

12345678910111213141516

-create table and index

create table test (id int primary key, comment char(50)) engine=InnoDB;

create index test_idx on test(comment);

-Insert

insert into test values(1, ‘aaa’);

insert into test values(2, ‘bbb’);

-update primary key

update test set id = 9 where id = 1;

-update non-primary key with different value

update test set comment = ‘ccc’ where id = 9;

-update non-primary key with same value

update test set comment = ‘bbb’ where id = 2 and comment = ‘bbb’;

-read隔离级别

repeatable read（RR）

测试结果

update primary key

代码调用流程：

1	`ha_innobase::update_row -> row_update_for_mysql -> row_upd_step -> row_upd -> row_upd_clust_step -> row_upd_clust_rec_by_insert -> btr_cur_del_mark_set_clust_rec -> row_ins_index_entry`

简单来说，就是将cluster index的旧记录标记位删除；插入一条新纪录。该语句执行完之后，数据结构如下：

innodb-mvcc-3-300x248

老版本仍旧存储在聚簇索引之中，其DATA_TRX_ID被设置为1811，Deleted bit设置为1，undo中记录了前镜像的事务id = 1809。新版本DATA_TRX_ID也为1811。通过此图，还可以发现，虽然新老版本是一条记录，但是在聚簇索引中是通过两条记录来标识的。同时，由于更新了主键，二级索引也需要做相应的更新(二级索引中包含主键项)。

update non-primary key(diff value)

更新comment字段，代码调用流程与上面有部分不同，可以自行跟踪，此处省略。更新操作执行完之后，索引结构变更如下：

innodb-mvcc-4-300x213

从上图可见，更新二级索引的键值时，聚簇索引本身并不会产生新的记录项，而是将旧版本信息记录在undo之中。与此同时，二级索引将会产生新的索引项，其PK值保持不变，指向聚簇索引的同一条记录。细心的读者可能会发现，二级索引页面中有一个MAX_TRX_ID，此值记录的是更新二级索引页面的最大事务ID。通过MAX_TRX_ID的过滤，INNODB能够实现大部分的辅助索引覆盖性扫描(仅仅扫描辅助索引，不需要回聚簇索引)。具体过滤方法，将在后面的内容中给出。

update non-primary key(same value)

最后一个测试用例，是更新comment项为同样的值。在我的测试中，更新之后的索引结构如下：

innodb-mvcc-5-300x220

聚簇索引仍旧会更新，但是二级索引保持不变。

总结

无论是聚簇索引，还是二级索引，只要其键值更新，就会产生新版本。将老版本数据deleted bti设置为1；同时插入新版本。
对于聚簇索引，如果更新操作没有更新primary key，那么更新不会产生新版本，而是在原有版本上进行更新，老版本进入undo表空间，通过记录上的undo指针进行回滚。
对于二级索引，如果更新操作没有更新其键值，那么二级索引记录保持不变。
对于二级索引，更新操作无论更新primary key，或者是二级索引键值，都会导致二级索引产生新版本数据。
聚簇索引设置记录deleted bit时，会同时更新DATA_TRX_ID列。老版本DATA_TRX_ID进入undo表空间；二级索引设置deleted bit时，不写入undo。

可见性判断

主键查找

select * from test where id = 1;

针对测试1，如果1811(DATA_TRX_ID) 无记录返回。
针对测试1，如果 1811(DATA_TRX_ID) >= read_view.low_limit_id，证明被标记为删除的记录1不可见，通过DATA_ROLL_PTR回滚记录，得到DATA_TRX_ID = 1809。如果1809可见，则返回记录(1，aaa)；否则无记录返回。
针对测试1，如果up_limit_id，low_limit_id都无法判断可见性，那么遍历read_view中的trx_ids，依次对比事务id，如果在DATA_TRX_ID在trx_ids数组中，则不可见(更新未提交)。

select * from test where id = 9;

针对测试2，如果1816可见，返回(9,ccc)。
针对测试2，如果1816不可见，通过DATA_ROLL_PTR回滚到1811，如果1811可见，返回(9, aaa)。
针对测试2，如果1811不可见，无结果返回。

select * from test where id > 0;

针对测试1，索引中，满足条件的同一记录，有两个版本(版本1，delete bit =1)。那么是否会一条记录返回两次呢？必定不会，这是因为pk = 1的可见性与pk = 9的可见性是一致的，同时pk = 1是标记了deleted bit的版本。如果事务ID = 1811可见。那么pk = 1 delete可见，无记录返回，pk = 9返回记录；如果1811不可见，回滚到1809可见，那么pk = 1返回记录，pk = 9回滚后无记录。

总结：

通过主键查找记录，需要配合read_view，记录DATA_TRX_ID，记录DATA_ROLL_PTR指针共同判断。
read_view用于判断当前记录是否可见(判断DATA_TRX_ID)。DATA_ROLL_PTR用于将当前记录回滚到前一版本。

非主键查找

select comment from test where comment > ‘ ‘;

针对测试2，二级索引，当前页面的最大更新事务MAX_TRX_ID = 1816。如果MAX_TRX_ID lock_sec_rec_cons_read_sees)
针对测试2，二级索引，如果当前页面不能满足MAX_TRX_ID ?

1234567 if (clust_rec
&& (old_vers || rec_get_deleted_flag(
rec,dict_table_is_comp(sec_index->table)))&& !row_sel_sec_rec_is_for_clust_rec(rec, sec_index, clust_rec, clust_index))

满足if判断的所有聚簇索引记录，都直接丢弃，以上判断的逻辑如下：

需要回聚簇索引扫描，并且获得记录
聚簇索引记录为回滚版本，或者二级索引中的记录为删除版本
聚簇索引项，与二级索引项，其键值并不相等

为什么满足if判断，就可以直接丢弃数据？用白话来说，就是我们通过二级索引记录，定位聚簇索引记录，定位之后，还需要再次检查聚簇索引记录是否仍旧是我在二级索引中看到的记录。如果不是，则直接丢弃；如果是，则返回。

根据此条件，结合查询与测试2中的索引结构。可见版本为事务1811.二级索引中的两项pk = 9都能通过聚簇索引回滚到1811版本。但是，二级索引记录(ccc,9)与聚簇索引回滚后的版本(aaa,9)不一致，直接丢弃。只有二级索引记录 (aaa,9)保持一致，直接返回。

总结：

二级索引的多版本可见性判断，需要通过聚簇索引完成。
二级索引页面中保存了MAX_TRX_ID，可以快速判断当前页面中，是否所有项均可见，可以实现二级索引页面级别的索引覆盖扫描。一般而言，此判断是满足条件的，保证了索引覆盖扫描 (index only scan)的高效性。
二级索引中的项，需要与聚簇索引中的可见性进行比较，保证聚簇索引中的可见项，与二级索引中的项数据一致。

疑问

在http://blogs.InnoDB.com/wp/2011/04/mysql-5-6-multi-threaded-purge/中，作者提到，InnoDB存储引擎的purge操作，是通过遍历undo来实现对于标记位deleted项的回收的。如果二级索引本身标记deleted位不记录 undo，那么这个回收操作如何完成？还是说purge是通过解析redo来完成回收的？（根据下面对于purge的流程分析，此问题已解决）

Purge流程

Purge功能：

InnoDB由于要支持多版本协议，因此无论是更新，删除，都只是设置记录上的deleted bit标记位，而不是真正的删除记录。后续这些记录的真正删除，是通过Purge后台进程实现的。Purge进程定期扫描InnoDB的undo，按照先读老undo，再读新undo的顺序，读取每条undo record。对于每一条undo record，判断其对应的记录是否可以被purge(purge进程有自己的read view，等同于进程开始时最老的活动事务之前的view，保证purge的数据，一定是不可见数据，对任何人来说)，如果可以purge，则构造完整记录(row_purge_parse_undo_rec)。然后按照先purge二级索引，最后purge聚簇索引的顺序，purge一个操作生成的旧版本完整记录。

一个完整的purge函数调用流程如下：

123	`row_purge_step->row_purge->trx_purge_fetch_next_rec->row_purge_parse_undo_rec->row_purge_del_mark->row_purge_remove_sec_if_poss->row_purge_remove_clust_if_poss`

总结：

purge是通过遍历undo实现的。
purge的粒度是一条记录上的一个操作。如果一条记录被update了3次，产生3个old版本，均可purge。那么purge读取undo，对于每一个操作，都会调用一次purge。一个purge删除一个操作产生的old版本(按照操作从老到新的顺序)。
purge按照先二级索引，最后聚簇索引的顺序进行。
purge二级索引，通过构造出的索引项进行查找定位。不能直接针对某个二级页面进行，因为不知道记录的存放page。
对于二级索引设置deleted bit为不需要记录undo，因为purge是根据聚簇索引undo实现。因此二级索引deleted bit被设置为1的项，没有记录undo，仍旧可以被purge。
purge是一个耗时的操作。二级索引的purge，需要search_path定位数据，相当于每个二级索引，都做了一次index unique scan。
一次delete操作，IO翻番。第一次IO是将记录的deleted bit设置为1；第二次的IO是将记录删除。

文章具体来源不详，如有知情者，请在评论中回复。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the InnoDB Buffer Pool and its importance for performance.Apr 19, 2025 am 12:24 AM

InnoDBBufferPool reduces disk I/O by caching data and indexing pages, improving database performance. Its working principle includes: 1. Data reading: Read data from BufferPool; 2. Data writing: After modifying the data, write to BufferPool and refresh it to disk regularly; 3. Cache management: Use the LRU algorithm to manage cache pages; 4. Reading mechanism: Load adjacent data pages in advance. By sizing the BufferPool and using multiple instances, database performance can be optimized.

MySQL vs. Other Programming Languages: A ComparisonApr 19, 2025 am 12:22 AM

Compared with other programming languages, MySQL is mainly used to store and manage data, while other languages such as Python, Java, and C are used for logical processing and application development. MySQL is known for its high performance, scalability and cross-platform support, suitable for data management needs, while other languages have advantages in their respective fields such as data analytics, enterprise applications, and system programming.

Learning MySQL: A Step-by-Step Guide for New UsersApr 19, 2025 am 12:19 AM

MySQL is worth learning because it is a powerful open source database management system suitable for data storage, management and analysis. 1) MySQL is a relational database that uses SQL to operate data and is suitable for structured data management. 2) The SQL language is the key to interacting with MySQL and supports CRUD operations. 3) The working principle of MySQL includes client/server architecture, storage engine and query optimizer. 4) Basic usage includes creating databases and tables, and advanced usage involves joining tables using JOIN. 5) Common errors include syntax errors and permission issues, and debugging skills include checking syntax and using EXPLAIN commands. 6) Performance optimization involves the use of indexes, optimization of SQL statements and regular maintenance of databases.

MySQL: Essential Skills for Beginners to MasterApr 18, 2025 am 12:24 AM

MySQL is suitable for beginners to learn database skills. 1. Install MySQL server and client tools. 2. Understand basic SQL queries, such as SELECT. 3. Master data operations: create tables, insert, update, and delete data. 4. Learn advanced skills: subquery and window functions. 5. Debugging and optimization: Check syntax, use indexes, avoid SELECT*, and use LIMIT.

MySQL: Structured Data and Relational DatabasesApr 18, 2025 am 12:22 AM

MySQL efficiently manages structured data through table structure and SQL query, and implements inter-table relationships through foreign keys. 1. Define the data format and type when creating a table. 2. Use foreign keys to establish relationships between tables. 3. Improve performance through indexing and query optimization. 4. Regularly backup and monitor databases to ensure data security and performance optimization.

MySQL: Key Features and Capabilities ExplainedApr 18, 2025 am 12:17 AM

MySQL is an open source relational database management system that is widely used in Web development. Its key features include: 1. Supports multiple storage engines, such as InnoDB and MyISAM, suitable for different scenarios; 2. Provides master-slave replication functions to facilitate load balancing and data backup; 3. Improve query efficiency through query optimization and index use.

The Purpose of SQL: Interacting with MySQL DatabasesApr 18, 2025 am 12:12 AM

SQL is used to interact with MySQL database to realize data addition, deletion, modification, inspection and database design. 1) SQL performs data operations through SELECT, INSERT, UPDATE, DELETE statements; 2) Use CREATE, ALTER, DROP statements for database design and management; 3) Complex queries and data analysis are implemented through SQL to improve business decision-making efficiency.

MySQL for Beginners: Getting Started with Database ManagementApr 18, 2025 am 12:10 AM

The basic operations of MySQL include creating databases, tables, and using SQL to perform CRUD operations on data. 1. Create a database: CREATEDATABASEmy_first_db; 2. Create a table: CREATETABLEbooks(idINTAUTO_INCREMENTPRIMARYKEY, titleVARCHAR(100)NOTNULL, authorVARCHAR(100)NOTNULL, published_yearINT); 3. Insert data: INSERTINTObooks(title, author, published_year)VA

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks agoByDDD

Hot Tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

Useful JavaScript development tools

Atom editor mac version download

The most popular open source editor

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Hot Topics

Where is the login entrance for gmail email?

7631

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

141