Home  >  Article  >  Database  >  My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

coldplay.xixi
coldplay.xixiforward
2020-10-27 17:03:222659browse

mysql教程栏目介绍MySQL相关的事务、隔离级别及MVCC。

My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

MySQL 系列的第四篇,主要内容是事务,包括事务 ACID 特性,隔离级别,脏读、不可重复读、幻读的理解以及多版本并发控制(MVCC)等内容。

事务(Transaction)能够保证一组不可分割的原子性操作集合要么都执行,要么都不执行。在MySQL 常用的存储引擎中,InnoDB 是支持事务的,原生的 MyISAM 引擎则不支持事务。

在本文中,若未特殊说明,使用的数据表及数据如下所示:

CREATE TABLE `user`  (  `id` int(11) DEFAULT NULL,  `name` varchar(12) DEFAULT NULL) ENGINE = InnoDB;insert into user values(1, '刺猬');复制代码

1. ACID 四大特性

首先需要理解的是事务 ACID 四大特性,即原子性(Atomicity)、一致性(Consistency)、隔离性(Isolation)、持久性(Durability),这也是事务的四个基本要素。

为了详细解释 ACID 特性,在这里先设想一个场景:我向你转账100元。

假设这个操作可以分为以下几步(假设我和你的账户余额均为100元):

  1. 查询我的账户余额
  2. 我的账户扣款100元
  3. 100元开始转移
  4. 查询你的账户余额
  5. 你的账户到账100元

1.1 原子性(Atomicity)

事务的原子性是指:一个事务必须是不可再分割的最小工作单元,一个事务中的操作要么都成功,要么都失败,不可能存在只执行一个事务中部分操作的情况。

在上述的转账场景中,原子性就要求了这五个步骤要么都执行,要么都不执行,不可能存在我的账户扣款100元,而你的账户100元没有到账的情况。

1.2 一致性(Consistency)

事务的一致性是指:数据库总是从一个一致性状态转换到另一个一致性状态,一致性侧重的是数据的可见性,数据的中间状态对外是不可见的。

同时,事务的一致性要求符合开发人员定义的约束,如金额大于0、身高大于0等。

在上述的转账场景中,一致性能够保证最终执行完整个转账操作后,我账户的扣款金额与你账户到账金额是一致的,同时如果我和你的账户余额不满足金额的约束(如小于0),整个事务会回滚。

1.3 隔离性(Isolation)

事务的隔离性是指:在一次状态转换过程中不会受到其他状态转换的影响。

假设我和你都有100元,我发起两次转账,转账金额都是50元,下面使用伪代码来表示的操作步骤:

  1. 查询我的账户余额 read my
  2. 我的账户扣款50元 my=my-50
  3. 50元开始转移
  4. 查询你的账户余额 read yours
  5. 你的账户到账50元 yours=yours+50

如果未保证隔离性就可能发生下面的情况:

时刻 第一次转账 第二次转账 我的账户余额 你的账户余额
1 read my(100)
my=100 yours=100
2
read my(100) my=100 yours=100
3 my=my-50=100-50=50
my=50 yours=100
4 read yours(100) my=my-50=100-50=50 my=50 yours=100
5 yours=yours+50=100+50=150
my=50 yours=150
6
read yours(150) my=50 yours=150
7
yours=yours+50=150+50=200 my=50 yours=200
7 end end my=50 yours=200

两次转账后,最终的结果是我的账户余额为50元,你的账户余额为200元,这显然是不对的。

而如果在保证事务隔离性的情况下,就不会发生上面的情况,损失的只是一定程度上的一致性。

1.4 持久性(Durability)

事务的持久性是指:事务在提交以后,它所做的修改就会被永久保存到数据库。

在上述的转账场景中,持久性就保证了在转账成功之后,我的账户余额为0,你的账户余额为200。

2. 自动提交与隐式提交

2.1 自动提交

在 MySQL 中,我们可以通过 begin 或 start transaction 来开启事务,通过 commit 来关闭事务,如果 SQL 语句中没有这两个命令,默认情况下每一条 SQL 都是一个独立的事务,在执行完成后自动提交

比如:

update user set name='重塑' where id=1;复制代码

假设我只执行这一条更新语句,在我关闭 MySQL 客户端然后重新打开一个新的客户端后,可以看到 user 表中的 name 字段值全变成了「重塑」,这也印证了这条更新语句在执行后已被自动提交。

自动提交是 MySQL 的一个默认属性,可以通过 SHOW VARIABLES LIKE 'autocommit' 语句来查看,当它的值为 ON 时,就代表开启事务的自动提交。

mysql> SHOW VARIABLES LIKE 'autocommit';
+---------------+-------+| Variable_name | Value |
+---------------+-------+| autocommit    | ON    |
+---------------+-------+1 row in set (0.00 sec)复制代码

我们可以通过 SET autocommit = OFF 来关闭事务的自动提交。

2.2 隐式提交

然而,即便我们已经将 autocommit 变量的值改为 OFF 关闭事务自动提交了,在执行某些 SQL 语句的时候,MySQL 还是会将事务自动提交掉,这被称为隐式提交

会触发隐式提交的 SQL 语句有:

  • DDL(Data definition language,数据定义语言),如 create, drop, alter, truncate
  • 修改 MySQL 自带表数据的语句,如 create/drop user, grant, set password
  • 在一个事务中,开启一个新的事务,会隐式提交上一个事务,如:
时刻 事务A 事务B
1 begin;
2 update user set name='重塑' where id=1;
3
select name from user where id=1;(N1)
4 begin;
5
select name from user where id=1;(N2)

There are two query statements N1 and N2 in transaction B. The execution result is N1=hedgehog and N2=reshape, which can be proved.

  • There are some other management statements, so I won’t give examples one by one. You can Baidu them by yourself.

3. Isolation level

The isolation level of a transaction specifies the visibility of modifications made in a transaction within and between transactions. Lower levels of isolation typically allow for higher concurrency and lower system overhead.

There are four transaction isolation levels defined in the SQL standard, namely Read Uncommitted, Read Committed, Repeatable Read, Serializable ( Serializable).

In order to explain these four isolation levels and their respective phenomena in detail, assume that two transactions are about to be executed, and the execution content is as follows:

##5678 During the execution of transaction A and transaction B, there are three queries N1, N2, and N3. Under each isolation level, their values ​​​​are different. , discussed separately below.
Time TransactionA TransactionB
1 begin;
2
begin;
3
update user set name='reshape' where id=1;
4 select name from user where id=1; (N1)

##commit;
select name from user where id=1;(N2)
commit;
select name from user where id=1;(N3)

3.1 Read Uncommitted

Under the read uncommitted isolation level,

modifications in a transaction are visible to other transactions even if they are not committed

. In the above scenario, if the isolation level of the database is read uncommitted, since transaction A can read the modified data of uncommitted transaction B, that is, the modification of transaction B in time 3 is visible to transaction A, so N1=reshape, N2=reshape, N3=reshape.

3.2 Read Committed

Under the read committed isolation level,

the modifications in the transaction will not be visible to other transactions

until they are committed. In the above scenario, if the isolation level of the database is read committed, since transaction A can only read the data after transaction B is submitted, that is, the modification of transaction B at time 3 is not visible to transaction A, N2 The query at is after transaction B is submitted, so it is visible to transaction A. So N1=Hedgehog, N2=Reinvention, N3=Reinvention.

3.3 Repeatable Read

Repeatable Read is MySQL’s default transaction isolation level

. Under the repeatable read isolation level, the same record is queried multiple times in a transaction, and the results are always consistent . In the above scenario, if the isolation level of the database is repeatable read, since queries N1 and N2 are in a transaction, their values ​​​​are both "hedgehogs", and N3 is executed after transaction A is submitted. For the query performed, the modifications to transaction B are visible, so N3 = reshape.

3.4 Serializable

Under the serializable isolation level,

transactions are executed serially, reading will increase read locks, and writing will increase Write lock

, transactions will not be executed concurrently, so no exceptions will occur. In the above scenario, if the isolation level of the database is serializable, transaction A is opened first and is blocked when transaction B is opened. Transaction B will not be opened until transaction A is submitted, so N1 = Hedgehog , N2=hedgehog. The query at N3 will be executed after transaction B is submitted (transaction B is blocked first, and the execution order is before the N3 query statement), so N3=reshape.

4. Problems caused by isolation levels

In different transaction isolation levels, if transactions are executed concurrently, many problems will occur, such as dirty read (Dirty Read), non-repeatable Read (Non-Repeatable Read), Phantom Read (Phantom Read), etc. Different examples will be used to explain these issues in detail below.

4.1 Dirty Read

Dirty Read

(Dirty Read) means that a transaction can read data modified by another uncommitted transaction. Look at the following case, assuming the isolation level is read uncommitted:

Time123456##7commit;
Transaction A Transaction B
begin;

#begin;

update user set name='Re Plastic' where id=1;
select name from user where id=1;(N1)

rollback;
select name from user where id=1;(N2)

##

Under the isolation level of read uncommitted, the value of N1 is "reshape", and due to the rollback of transaction B, the value of N2 is "hedgehog". A dirty read occurred here at N1. Obviously, the query result at N1 is dirty data, which will have an impact on normal business.

Dirty reads will occur in the read uncommitted isolation level.

4.2 Non-Repeatable Read

Non-Repeatable Read (Non-Repeatable Read) means that executing the same query twice may result in different results. Same result.

Continue to use the AB transaction case when introducing the isolation level, and assume that the isolation level is read committed:

##7commit;8select name from user where id=1;(N3)
Time Transaction A TransactionB
1 begin;
2
begin;
3
update user set name='reshape' where id=1;
4 select name from user where id=1;(N1)
5
commit;
6 select name from user where id=1;(N2)


Under the read-committed isolation level, transactions can read data submitted by other transactions. In the above case, the results are N1=hedgehog, N2=reshape, N3=reshape. In transaction A, there are two identical queries N1 and N2, but the results of these two queries are not the same. This happened Cannot be read repeatedly.

Non-repeatable reads will occur in the isolation levels of read uncommitted and read committed.

4.3 Phantom Read

Phantom Read (Phantom Read) means that when a transaction reads records in a certain range, another transaction Insert a new record in this range. When the previous transaction reads the records in this range again, this new record will be read.

Look at the following case, assuming that the isolation level is repeatable read at this time:

TimeTransaction A Transaction B1begin;2 select name from user;(N1)##3456##7select name from user for update;(N3)8commit;

Transaction A has three queries. Between N1 and N2, transaction B executed an insert statement and submitted it. The query at N3 used for update.

The result at N1 is obviously only a "hedgehog". The result at N2 is also a "hedgehog" because transaction A is opened before transaction B. The result at N3 is theoretically at the repeatable read isolation level. There should only be "hedgehog" in N2, but in fact the result of N2 is "hedgehog" and "gojo man", which causes phantom reading.

This is very strange. Doesn’t it mean that the isolation level of repeatable read can ensure that the same record is queried multiple times in a transaction and the results are always consistent? This result does not meet the definition of repeatable read.

In fact, under the isolation level of repeatable read, if current read is used, phantom reading may occur.

Current reading and snapshot reading will be discussed later when the implementation principles of transactions and MVCC are introduced. Here is a conclusion first.

Phantom reading will occur in the isolation levels of read uncommitted, read committed, and repeatable read.

What needs extra attention here is: Phantom reading and non-repeatable reading both mean that the results of the same query statement in a transaction are different, but phantom reading is more It focuses on querying data newly inserted by other transactions (insert) or data deleted by other transactions (delete), while non-repeatable read has a wider scope. As long as the results are different, it can be considered as non-repeatable read, but generally we think of non-repeatable read. Focus more on the update of data by other transactions.

4.4 Summary

Through the above description, we already know the concepts of four isolation levels and the problems they will encounter respectively. The higher the isolation level of a transaction, the stronger the isolation. There will be fewer problems encountered. But at the same time, the higher the isolation level, the weaker the concurrency capability.

The following table is a summary of the concepts of isolation levels and problems that may occur at different isolation levels:




begin;

insert into user values(2, '五条人');

commit;
select name from user;(N2 )


##Read submitted√√√Modifications in a transaction are visible to other transactions even if they are not committedRead uncommitted√√The modifications in the transaction will only be visible to other transactions after they are submittedRepeatable readSerializable

5. MVCC

MVCC(Multi-Version Concurrency Control)即多版本并发控制,这是 MySQL 为了提高数据库并发性能而实现的。它可以在并发读写数据库时,保证不同事务的读-写操作并发执行,同时也能解决脏读、不可重复读、幻读等事务隔离问题。

在前文讨论幻读的时候提到过当前读的概念,正是由于当前读,才会在可重复读的隔离级别下也会发生幻读的情况。

在解释可重复读隔离级别下发生幻读的原因之前,首先介绍 MVCC 的实现原理。

5.1 MVCC 的实现原理

首先我们需要知道,InnoDB 的数据页中每一行的数据是有隐藏字段的:

  • DB_ROW_ID: 隐式主键,若表结构中未定义主键,InnoDB 会自动生成该字段作为表的主键
  • DB_TRX_ID: 事务ID,代表修改此行记录的最后一次事务ID
  • DB_ROLL_PTR: 回滚指针,指向此行记录的上一个版本(上一个事务ID对应的记录)

每一条修改语句都会相应地记录一条回滚语句(undo log),如果把每一条回滚语句视为一条数据表中的记录,那么通过事务ID和回滚指针就可以将对同一行的修改记录看作一个链表,链表上的每一个节点就是一个快照版本,这就是 MVCC 中多版本的意思。

举个例子,假设对 user 表中唯一的一行「刺猬」进行多次修改。

update user set name='重塑' where id=1;update user set name='木马' where id=1;update user set name='达达' where id=1;复制代码

那么这条记录的My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC就是:

My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

在这个My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC中,头结点就是当前记录的最新版本。DB_TRX_ID 事务ID 字段是非常重要的属性,先 Mark 一下。

除此之外,在读已提交(RC,Read Committed)和可重复读(RR,Repeatable Read)的隔离级别中,事务在启动的时候会创建一个读视图(Read View),用它来记录当前系统的活跃事务信息,通过读视图来进行本事务之间的可见性判断

在读视图中有两个重要的属性:

  • 当前事务ID:表示生成读视图的事务的事务ID
  • 事务ID列表:表示在生成读视图时,当前系统中活跃着的事务ID列表
  • 最小事务ID:表示在生成读视图时,当前系统中活跃着的最小事务ID
  • 下一个事务ID:表示在生成读视图时,系统应该分配给下一个事务的事务ID

需要注意下一个事务I的值,并不是事务ID列表中的最大值+1,而是当前系统中已存在过的事务的最大值+1。例如当前数据库中活跃的事务有(1,2),此时事务2提交,同时又开启了新事务,在生成的读视图中,下一个事务ID的值为3。

我们通过将My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC与读视图两者结合起来,来进行并发事务间可见性的判断,判断规则如下(假设现在要判断事务A是否可以访问到事务B的修改记录):

  • 若事务B的当前事务ID小于事务A的最小事务ID的值,代表事务B是在事务A生成读视图之前就已经提交了的,所以事务B对于事务A来说是可见的。
  • 若事务B的当前事务ID大于或等于事务A下一个事务ID的值,代表事务B是在事务A生成读视图之后才开启,所以事务B对于事务A来说是不可见的。
  • 若事务B的当前事务ID在事务A的最小事务ID下一个事务ID之间(左闭右开,[最小事务ID, 下一个事务ID)),需要分两种情况讨论:
    • 若事务B的当前事务ID在事务A的事务ID列表中,代表创建事务A时事务B还是活跃的,未提交,所以事务B对于事务A来说是不可见的。
    • 若事务B的当前事务ID不在事务A的事务ID列表中,代表创建事务A时事务B已经提交,所以事务B对于事务A来说是可见的。

如果事务B对于事务A来说是不可见的,就需要顺着修改记录的My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC,从回滚指针开始往前遍历,直到找到第一个对于事务A来说是可见的事务ID,或者遍历完My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC也未找到(表示这条记录对事务A不可见)。

这就是 MVCC 的实现原理。

5.2 The creation timing of the read view

What needs to be noted here is the creation timing of the read view. In the above discussion, we already know that a transaction will create a when it starts. Read View (Read View), and there are two ways to start a transaction, one is begin/start transaction, the other is start transaction with consistent snapshot, through these two To open a transaction in the method, the timing of creating a read view is also different:

  • If the transaction is opened in the begin/start transaction method, the read view will execute the first snapshot read statement
  • If you start the transaction with start transaction with consistent snapshot, a read view will be created at the same time

5.3 MVCC running process

In order to explain the running process of MVCC in detail, the following is an example, assuming that there are currently two transactions (the transaction isolation level is MySQL's default repeatable read):

What needs to be noted here isTransaction startup timing, in the above discussion we already know that a transaction will create a Read View (Read View) when it starts, and there are two ways to start a transaction, one is begin/start transaction, the second is start transaction with consistent snapshot, through these two methods to start the transaction, the timing of creating the read view is also different:

  • If If the transaction is started in the begin/start transaction method, the read view will be created when the first snapshot read statement is executed
  • If it is started in the start transaction with consistent snapshot method transaction, a read view will be created at the same time
Isolation Level Dirty Read Non-repeatable read Phantom read Concept



##√
Multiple queries in one transaction The same record, the result is always consistent



Transactions are executed serially, reading will add a read lock, and writing will add a write lock
##3update user set name ='Reshape' where id=1;##4567

Then analyze the running process of MVCC based on the version chain described above and the read view when the two transactions are started.

My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

The above picture is the read view when two transactions are started. When the update statement of transaction B is executed, the version chain of row id=1 is as follows.

mvcc2-My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC

Let’s first look at the query statement at N1. The current transaction ID of transaction B is 2, and its value is equal to the next of transaction A. Transaction ID, so according to the visibility judgment discussed above, transaction B is invisible to transaction A, and needs to be retrieved online along the version chain of the current row.

So we followed the version chain to DB_TRX_ID=1The historical version of transaction ID=1 happens to be equal to the transaction ID value of transaction A, which is the version of the row when transaction A was started. This version is of course visible to transaction A, so the name='hedgehog' in row id=1 is read, that is, the final N1=hedgehog.

Looking at the query statement at N2, transaction B has been submitted at this time, and the version chain is still as shown in the figure above, because the transaction ID of the current version is equal to the next transaction ID# in the read view of transaction A ##, so the current version of the record is invisible to transaction A, so the same N2=hedgehog.

It should be noted here that if the time 4 statement of transaction A in the example is changed to an update statement for the row, then transaction A will wait for transaction B to submit before executing the update statement. This is Because transaction B has not committed, that is, transaction B's write lock on row id=1 has not been released, and transaction A also needs to update the row, it must update the latest version (current read), so transaction A is blocked. , you must wait for transaction B's write lock on the row to be released before continuing to execute the update statement.

5.4 Comparison of the timing of generating read views between RC and RR

The MVCC running process discussed above is all for the Repeatable Read (RR, Repeatable Read) isolation level. If it is What about the Read Committed (RC, Read Committed) level?

We have already discussed the situation of non-repeatable reads in the Read Committed isolation level. We will not give examples here and just give the conclusion directly.

    The time to generate the Read View (Read View) under the Repeatable Read (RR, Repeatable Read) isolation level is when a transaction is started.
  • Read Committed (RC, Read Committed) The timing to generate a Read View (Read View) under the isolation level is before each statement is executed.
For the example in the MVCC execution process described above, if the isolation level is Read Committed (RC, Read Committed ): The query statement at

    N1, since transaction B has not yet been submitted, the visible version of transaction A is still the version with transaction ID=1, so N1=Hedgehog
  • N2 In the query statement, transaction B has been submitted. When the query statement at N2 is executed, a read view will also be generated. Its current transaction ID=3, and in the version chain of the record, the transaction ID of the current version
  • DB_TRX_ID=2, is visible before the N2 query statement transaction ID, so N2=reshape
5.5 Current read and snapshot read

  • Current read: Read the latest version of the record
  • Snapshot read: When reading the record, the transaction visible version of the record will be read according to certain rules
5.6 Repeatable Reasons why phantom reads occur

After understanding MVCC, let’s look at the reasons why phantom reads occur under the repeatable read isolation level. As mentioned above, it is precisely because of the current read that phantom reads occur under the isolation level of repeatable reads. First, let's review the example.

Time Transaction A Transaction B
1 start transaction with consistent snapshot;
2
start transaction with consistent snapshot;

select name from user where id=1;(N1)

commit;
select name from user where id=1;(N2)
commit;
TimeTransaction ATransaction B##1234567##8commit;

The queries at N1 and N2 must have been very clear that they are all "hedgehogs". The query statement used at N3 is for update. Using it to query will add a "row-level lock" to the target record. The meaning of the row-level lock will be discussed later. Now you only need to know for updateJust be able to lock the target record.

Locking is naturally to prevent others from modifying it, so of course, what is locked is the latest version of the record. Therefore, when using for update to query, current read will be used to read the latest version of the target record, so the query statement at N3 will change the original version of transaction B Records invisible to transaction A are also queried, and phantom reading occurs.

The statements using the current read are:

  • select ... for update
  • select ... lock in share mode (shared read lock)
  • update ...
  • insert ...
  • delete ...

6. Review the past and learn the new

  1. Affairs The four major characteristics of ACID
  2. Automatic submission and implicit submission (implicitly submitted statements)
  3. Isolation level of transaction
  4. Possible problems at each isolation level of transaction
  5. The implementation principle and process of MVCC
  6. The timing of RC and RR to generate read views

More related free learning recommendations:mysql tutorial(Video)

begin;
select name from user;(N1)

begin;

insert into user values(2, '五条人');

commit;
select name from user;(N2)
select name from user for update;(N3)

The above is the detailed content of My understanding of MySQL Part 4: Transactions, Isolation Levels and MVCC. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:juejin.im. If there is any infringement, please contact admin@php.cn delete