Home  >  Article  >  Database  >  An article explaining the principles of transactions and MVCC in MySQL in detail

An article explaining the principles of transactions and MVCC in MySQL in detail

青灯夜游
青灯夜游forward
2022-03-09 11:05:012449browse

This article will take you to understand the transactions in MySQL and introduce the principle of MVCC. I hope it can help you!

An article explaining the principles of transactions and MVCC in MySQL in detail

#01 What is a transaction?

Database transaction refers to a set of data operations. The operations within the transaction are either all successful or all failed. Nothing is done. In fact, it is not that nothing is done. It is possible that some of it is done, but as long as If one step fails, all operations must be rolled back, which is a bit of a non-stop operation.

In MySQL, transaction support is implemented at the engine layer. MySQL is a system that supports multiple engines, but not all engines support transactions. For example, MySQL's native MyISAM engine does not support transactions, which is one of the important reasons why MyISAM was replaced by InnoDB.

1.1 Four major characteristics

  • Atomicity: After the transaction starts, all operations are either completed or completed If you don’t do it, you won’t be stuck in the middle. If an error occurs during transaction execution, it will be rolled back to the state before the transaction started, and all operations will be as if they did not happen. That is to say, affairs are an indivisible whole, just like atoms learned in chemistry, which are the basic units of matter.
  • Consistency: Before and after the transaction starts and ends, the integrity constraints of the database are not violated. For example, when A transfers money to B, it is impossible for A to deduct the money but B not to receive it.
  • Isolation: Only one transaction is allowed to request the same data at the same time, and there is no interference between different transactions. For example, A is withdrawing money from a bank card. B cannot transfer money to this card before A's withdrawal process is completed.
  • Durability (Durability): After the transaction is completed, all updates to the database by the transaction will be saved to the database and cannot be rolled back.

1.2 Isolation level

Among the four major characteristics of SQL transactions, atomicity, consistency, and durability are all relatively easy to understand. But the isolation level of transactions is indeed difficult. Today we will mainly talk about the isolation of MySQL transactions.

SQL standard transaction isolation from low to high levels are: read uncommitted (read uncommitted), read committed (read committed), repeatable read (repeatable read) and serializable (serializable) ). The higher the level, the lower the efficiency.

  • Read uncommitted: When a transaction has not been committed, the changes it makes can be seen by other transactions.
  • Read commit: After a transaction is committed, the changes it makes will be seen by other transactions.
  • Repeatable read: The data seen during the execution of a transaction is always consistent with the data seen when the transaction is started. Of course, under the repeatable read isolation level, uncommitted changes are also invisible to other transactions.
  • Serialization: As the name implies, for the same row of records, "write" will add a "write lock", and "read" will add a "read lock". When a read-write lock conflict occurs, the transaction accessed later must wait for the completion of the previous transaction before it can continue to execute. So all data under this isolation level is the most stable, but the performance is also the worst.

1.3 Solved concurrency issues

SQL transaction isolation level is designed to solve concurrency issues to the greatest extent:

  • Dirty read: Transaction A reads the data updated by transaction B, and then B rolls back the operation, then the data read by A is dirty data
  • Non-repeatable read: Transaction A has more When transaction A reads the same data multiple times, transaction B updates and submits the data, resulting in inconsistent results when transaction A reads the same data multiple times.
  • Phantom reading: System administrator A changed the grades of all students in the database from specific scores to ABCDE grades, but system administrator B inserted a record of specific scores at this time. When system administrator A After the modification is completed, I find that there is still one record that has not been modified, as if I have hallucinated. This is called phantom reading.

SQL Different transaction isolation levels can solve different concurrency problems, as shown in the following table: Only the serialized isolation level solves all three problems, and the other 3 Each isolation level has flaws.

##Serializationimpossibleimpossibleimpossible

PS: It is easy to confuse non-repeatable reading and phantom reading. Non-repeatable reading focuses on modification, and phantom reading focuses on adding or deleting. To solve the problem of non-repeatable reading, you only need to lock the rows that meet the conditions. To solve the problem of phantom reading, you need to lock the table

1.4 Let’s give an example

This It may be a bit difficult to understand, so give me an example. Still the previous table structure and table data

CREATE TABLE `student`  (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `age` int(11) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 66 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;

An article explaining the principles of transactions and MVCC in MySQL in detail

Suppose now, I want to start two foods at the same time, a transaction A queries students with id = 2 age, a transaction B updates the age of the student with id = 2. The process is as follows, What are the values ​​of X1, X2, and X3 under the four isolation levels?

An article explaining the principles of transactions and MVCC in MySQL in detail

  • Read uncommitted: The value of X1 is 23, because although transaction B has not been committed, its changes have been See. (If B later rolls back the value of X1, it will be dirty). The values ​​of X2 and X3 are also 23, which is understandable.
  • Read submitted: The value of X1 is 22, because although B has changed, A cannot see it. (If B is rolled back later, the value of X1 remains unchanged, which solves the problem of dirty reading). The values ​​of X2 and
  • Repeatable reading: X1 and (No matter how B is modified during this period, as long as A has not submitted, it will be invisible, which solves the problem of non-repeatable reading), and the value of X3 is 23, because A has submitted and B can be seen The value has been modified.
  • Serialization: B will be locked while performing changes until A commits. B can continue execution. (While A is reading, B cannot write. It must be ensured that the data is the latest at this time. Solving phantom reading) So X1 and X2 are both 22, and the final X3 is executed after B submits, and its value That’s 23.

Then why does such a result occur? How is the transaction isolation level implemented?

How is the transaction isolation level implemented? I found the answer in Teacher Ding Qi’s class at Geek Time:

In fact, a view will be created in the database, and the logical result of the view shall prevail when accessing. Under the "Repeatable Read" isolation level, this view is created when the transaction starts and is used throughout the transaction. Under the "read-committed" isolation level, this view is created at the beginning of each SQL statement. What needs to be noted here is that under the "read uncommitted" isolation level, the latest value on the record is directly returned, without the concept of a view; while under the "serialization" isolation level, locking is directly used to avoid parallel access.

1.5 Set transaction isolation level

The default transaction isolation level of different databases is also very different. The default isolation level of Oracle database is Read commit, while MySQL is repeatable read. Therefore, When your system needs to migrate the database from Oracle to MySQL, please set the level to be consistent with the one before the migration (read commit) to avoid unpredictable problems.

1.5.1 Check the transaction isolation level

# 查看事务隔离级别
5.7.20 之前
SELECT @@transaction_isolation
show variables like 'transaction_isolation';

# 5.7.20 以及之后
SELECT @@tx_isolation
show variables like 'tx_isolation'

+---------------+-----------------+
| Variable_name | Value           |
+---------------+-----------------+
| tx_isolation  | REPEATABLE-READ |
+---------------+-----------------+

1.5.2 Set the isolation level

The format of the statement to modify the isolation level is: set [scope] transaction isolation level [transaction isolation level]

The scope is optional: SESSION (session), GLOBAL (global); isolation level It is the 4 mentioned above, not case sensitive.

For example: Set the global isolation level to read-commit

set global transaction isolation level read committed;

1.6 Transaction startup

MySQL transaction startup is as follows Several ways:

  • 显式启动事务语句, begin 或 start transaction。配套的提交语句是 commit,或者回滚语句是 rollback。
# 更新学生名字
START TRANSACTION;
update student set name = '张三' where id = 2;
commit;
  • set autocommit = 0,这个命令会将线程的自动提交关掉。意味着如果你只执行一个 select 语句,这个事务就启动了,而且并不会自动提交。这个事务持续存在直到你主动执行 commit 或 rollback 语句,或者断开连接。
  • set autocommit = 1,表示 MySQL 自动开启和提交事务。 比如执行一个 update 语句,语句只完成后就自动提交了。不需要显示的使用 begin、commit 来开启和提交事务。所以当我们执行多个语句的时候,就需要手动的用 begin、commit 来开启和提交事务。
  • start transaction with consistent snapshot;上面提到的 begin/start transaction 命令并不是一个事务的起点,在执行到它们之后的第一个操作 InnoDB 表的语句,事务才真正启动。如果你想要马上启动一个事务,可以使用 start transaction with consistent snapshot 命令。 第一种启动方式,一致性视图是在执行第一个快照读语句时创建的; 第二种启动方式,一致性视图是在执行 start transaction with consistent snapshot 时创建的

02 事务隔离的实现

理解了隔离级别,那事务的隔离是怎么实现的呢?要想理解事务隔离,先得了解 MVCC 多版本的并发控制这个概念。而 MVCC 又依赖于 undo log 和 read view 实现。

2.1 什么是 MVCC?

百度上的解释是这样的:

MVCC,全称 Multi-Version Concurrency Control,即多版本并发控制。MVCC 是一种并发控制的方法,一般在数据库管理系统中,实现对数据库的并发访问,在编程语言中实现事务内存。

MVCC 使得数据库读不会对数据加锁,普通的 SELECT 请求不会加锁,提高了数据库的并发处理能力;数据库写才会加锁。 借助 MVCC,数据库可以实现 READ COMMITTED,REPEATABLE READ 等隔离级别,用户可以查看当前数据的前一个或者前几个历史版本,保证了 ACID 中的 I 特性(隔离性)。

MVCC 只在 REPEATABLE READ 和 READ COMMITIED 两个隔离级别下工作。其他两个隔离级别都和 MVCC 不兼容 ,因为 READ UNCOMMITIED 总是读取最新的数据行,而不是符合当前事务版本的数据行。而 SERIALIZABLE 则会对所有读取的行都加锁。

2.1.1 InnDB 中的 MVCC

InnDB 中每个事务都有一个唯一的事务 ID,记为 transaction_id。它在事务开始时向 InnDB 申请,按照时间先后严格递增。

而每行数据其实都有多个版本,这就依赖 undo log 来实现了。每次事务更新数据就会生成一个新的数据版本,并把  transaction_id 记为 row trx_id。同时旧的数据版本会保留在 undo log 中,而且新的版本会记录旧版本的回滚指针,通过它直接拿到上一个版本。

所以,InnDB 中的 MVCC 其实是通过在每行记录后面保存两个隐藏的列来实现的。一列是事务 ID:trx_id;另一列是回滚指针:roll_pt。

2.2 undo log

回滚日志保存了事务发生之前的数据的一个版本,可以用于回滚,同时可以提供多版本并发控制下的读(MVCC),也即非锁定读。

根据操作的不同,undo log 分为两种: insert undo log 和 update undo log。

2.2.1 insert undo log

insert 操作产生的 undo log,因为 insert 操作记录没有历史版本只对当前事务本身可见,对于其他事务此记录不可见,所以 insert undo log 可以在事务提交后直接删除而不需要进行 purge 操作。

purge 的主要任务是将数据库中已经 mark del 的数据删除,另外也会批量回收 undo pages

所以,插入数据时。它的初始状态是这样的:

insert undo log

2.2.2 update undo log

UPDATE 和 DELETE 操作产生的 Undo log 都属于同一类型:update_undo。(update 可以视为 insert 新数据到原位置,delete 旧数据,undo log 暂时保留旧数据)。

事务提交时放到 history list 上,没有事务要用到这些回滚日志,即系统中没有比这个回滚日志更早的版本时,purge 线程将进行最后的删除操作。

One transaction modifies the current data:

An article explaining the principles of transactions and MVCC in MySQL in detail

Another transaction modifies the data:

An article explaining the principles of transactions and MVCC in MySQL in detail

There are multiple versions of the same record in the database, which is the multi-version concurrency control MVCC mentioned above.

In addition, you can return to the previous version state by rolling back with the help of undo log. For example, to return to V1, you only need to perform two rollbacks in sequence.

2.3 read-view

#read view is a consistent read view used by InnDB when implementing MVCC to support Implementation of RC (Read Committed) and RR (Repeatable Read) isolation levels.

read view does not really exist, it is just a concept, and undo log is its embodiment. It is mainly calculated through version and undolog. The function is to determine what data the transaction can see.

Each transaction or statement has its own consistency view. Ordinary query statements are consistent reads. Consistent reads determine the visibility of the data version based on row trx_id and consistent views.

2.3.1 Visibility rules of data versions

read view mainly contains other active read and write transactions in the current system. In terms of implementation, InnDB constructs an array for each transaction to save the currently active (not yet submitted) transaction at the moment when the transaction is started.

As mentioned earlier, the transaction ID strictly increases with time. The maximum value of the submitted transaction ID in the system is recorded as the low water level of the array, and the created transaction ID 1 is recorded as the high water level.

This view array and the high water level form the consistency view of the current transaction (read view)

Draw a picture of this array, it looks like this:

An article explaining the principles of transactions and MVCC in MySQL in detail

The rules are as follows:

  • 1 If trx_id is in the gray area, it means that the trx_id of the accessed version is less than the id value of the low water level in the array , that is, the transaction that generated this version has been committed before generating the read view, so this version is visible and can be accessed by the current transaction.
  • 2 If trx_id is in the orange area, it means that the trx_id of the accessed version is greater than the id value of the high water level in the array, that is, the transaction that generated this version was generated after the read view was generated, so this version is not visible and cannot be Current transaction access.
  • 3 If it is in the green area, there will be two situations:

    • a) trx_id is in the array, proving that this version is generated by an uncommitted transaction , invisible
    • b) trx_id is not in the array, proving that this version is generated by a committed transaction, visible

The third point I am looking at I was a little confused during the tutorial, but fortunately some enthusiastic netizens answered:

falling in the green area means that the transaction ID is within the range of low water level and high water level. Whether it is really visible depends on whether there is this in the green area. value. If the green area does not have this transaction ID, it is visible, if it does, it is not visible. Being in this range does not mean that this range has this value, such as [1,2,3,5], 4 is in the range of 1-5 of this array, but not in this array.

This may be a bit difficult to understand. I assume a scenario: three transactions query and update the same piece of data. I drew a picture to facilitate understanding:

An article explaining the principles of transactions and MVCC in MySQL in detail

The original data is still as shown below. Update the information for Zhang San with id = 2:

An article explaining the principles of transactions and MVCC in MySQL in detail

Regarding the above picture, I would like to ask a question. Under the RC (Read Commit) and RR (Repeatable Read) isolation levels, what are the query age values ​​at T4 and T5 time points respectively? What is the updated value of T4? Think for a moment, I believe everyone has their own answer. The answer is at the end of the article. I hope you can continue reading with your own questions.

2.3.2 Results under RR (Repeatable Read)

At RR level, the query is only recognized before the transaction is started. For the completed data that has been submitted, the view will be built once the transaction is started. So use the start transaction with consistent snapshot command, and the view will be created immediately.

现在假设:

  • 事务 A 开始前,只有一个活跃的事务,ID = 2,
  • 已提交的事务也就是插入数据的事务 ID = 1
  • 事务 A、B、C 的事务 ID 分别是 3、4、5

在这种隔离级别下,他们创建视图的时刻如下:

An article explaining the principles of transactions and MVCC in MySQL in detail

根据上图得,事务 A 的视图数组是[2,3];事务 B 的视图数组是 [2,3,4];事务 C 的视图数组是[2,3,4,5]。分析一波:

  • T4 时刻,B 读数据都是从当前版本读起,过程是这样的:

    • 读到当前版本的 trx_id = 4,刚好是自己,可见
    • 所以 age = 24
  • T5 时刻,A 读数据都是从当前版本读起,过程是这样的:

    • 读到当前版本的 trx_id = 4,比自己视图数组的高水位大,不可见
    • 再往上读到 trx_id = 5,比自己视图数组高水位大,不可见
    • 再往上读到 trx_id = 1,比自己视图数组低水位小,可见
    • 所以 age = 22

这样执行下来,虽然期间这一行数据被修改过,但是事务 A 不论在什么时候查询,看到这行数据的结果都是一致的,所以我们称之为一致性读

其实视图是否可见主要看创建视图和提交的时机,总结下规律:

  • 版本未提交,不可见
  • 版本已提交,但在视图创建后提交,不可见
  • 版本已提交,但在视图创建前提交,可见

2.3.2.1 快照读和当前读

事务 B 的 update 语句,如果按照上图的一致性读,好像结果不大对?

如下图周明,B 的视图数组是先生成的,之后事务 C 才提交。那就应该看不见 C 修改的 age = 23 呀?最后 B 怎么得出 24 了?

An article explaining the principles of transactions and MVCC in MySQL in detail

没错,如果 B 在更新之前执行查询语句,那返回的结果肯定是 age = 22。问题是更新就不能在历史版本更新了呀,否则 C 的更新不就丢失了?

所以,更新有个规则:更新数据都是先读后写(读是更新语句执行,不是我们手动执行),读的就是当前版本的值,叫当前读;而我们普通的查询语句就叫快照读

因此,在更新时,当前读读到的是 age = 23,更新之后就成 24 啦。

2.3.2.2 select 当前读

除了更新语句,查询语句如果加锁也是当前读。如果把事务 A 的查询语句 select age from t where id = 2 改一下,加上锁(lock in mode 或者 for update),也都可以得到当前版本 4 返回的 age = 24

下面就是加了锁的 select 语句:

select age from t where id = 2 lock in mode;
 select age from t where id = 2 for update;

2.3.2.3 事务 C 不马上提交

假设事务 C 不马上提交,但是 age = 23 版本已生成。事务 B 的更新将会怎么走呢?

事务 C 不马上提交

事务 C 还没提交,写锁还没释放,但是事务 B 的更新必须要当前读且必须加锁。所以事务 B 就阻塞了,必须等到事务 C 提交,释放锁才能继续当前的读。

被事务 C 锁住

2.3.3 RC(读提交)下的结果

在读提交隔离级别下,查询只承认在语句启动前就已经提交完成的数据;每一个语句执行之前都会重新算出一个新的视图

注意:在上图的表格中用于启动事务的是 start transaction with consistent snapshot 命令,它会创建一个持续整个事务的视图。所以,在  RC 级别下,这命令其实不起作用。等效于普通的 start transaction(在执行 sql 语句之前才算是启动了事务)。所以,事务 B 的更新其实是在事务 C 之后的,它还没真正启动事务,而 C 已提交

现在假设:

  • Before transaction A starts, there is only one active transaction, ID = 2,
  • The committed transaction is the transaction ID = 1 that inserts data
  • Transaction A, B , C's transaction IDs are 3, 4, and 5 respectively

Under this isolation level, the time when they create the view is as follows:

An article explaining the principles of transactions and MVCC in MySQL in detail

According to the above figure, the view array of transaction A is [2,3,4], but its high water level is 6 or greater (transaction ID 1 has been created); the view array of transaction B is [ 2,4]; the view array of transaction C is [2,5]. A wave of analysis:

  • At T4 time, B reads data from the current version. The process is as follows:

    • reads the trx_id of the current version = 4, which happens to be myself. It can be seen that
    • so age = 24
  • At time T5, A reads data from the current version. The process is as follows of:

    • Read the current version of trx_id = 4, which is within the scope of your own consistency view but contains 4, and is invisible
    • Read further up trx_id = 5, which is within your own consistency view Consistency view range but does not include 5, visible
    • so age = 23

##03 Shoulders of Giants

    cnblogs.com/wyaokai/p/10921323.html
  • time.geekbang.org/column/article/70562
  • zhuanlan.zhihu.com/p/117476959
  • cnblogs.com/xd502djj/p/6668632.html
  • blog.csdn.net/article/details/109044141
  • blog.csdn.net/u014078930/article/details/ 99659272

04 Summary

This article talks about all aspects of transactions in detail, such as: four major features, isolation level, concurrency problems solved, how to set up, and view Isolation levels, how to start transactions, etc. In addition, we also gained an in-depth understanding of how the two levels of isolation, RR and RC, are achieved? Including a detailed explanation of how MVCC, undo log and read view work together to implement MVCC. Finally, we talked about snapshot reading, current reading, etc. It can be said that all the knowledge points related to affairs are here. If you still don’t understand after reading this article, come and hit me!

[Related recommendations:

mysql video tutorial]

Transaction isolation level Dirty read Non-repeatable read Phantom read
Read Uncommitted Possible Possible Possible
Read Committed Impossible Possible Possible
Repeatable Read Impossible impossible possible

The above is the detailed content of An article explaining the principles of transactions and MVCC in MySQL in detail. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete