This article will take you to understand the transactions in MySQL and introduce the principle of MVCC. I hope it can help you!
Database transaction refers to a set of data operations. The operations within the transaction are either all successful or all failed. Nothing is done. In fact, it is not that nothing is done. It is possible that some of it is done, but as long as If one step fails, all operations must be rolled back, which is a bit of a non-stop operation.
In MySQL, transaction support is implemented at the engine layer. MySQL is a system that supports multiple engines, but not all engines support transactions. For example, MySQL's native MyISAM engine does not support transactions, which is one of the important reasons why MyISAM was replaced by InnoDB.
Among the four major characteristics of SQL transactions, atomicity, consistency, and durability are all relatively easy to understand. But the isolation level of transactions is indeed difficult. Today we will mainly talk about the isolation of MySQL transactions.
SQL transaction isolation level is designed to solve concurrency issues to the greatest extent:
SQL Different transaction isolation levels can solve different concurrency problems, as shown in the following table: Only the serialized isolation level solves all three problems, and the other 3 Each isolation level has flaws.
Transaction isolation level |
Dirty read |
Non-repeatable read |
Phantom read |
Read Uncommitted |
Possible |
Possible |
Possible |
Read Committed |
Impossible |
Possible |
Possible |
Repeatable Read |
Impossible |
impossible |
possible |
##Serialization | impossible | impossible | impossible |
PS: It is easy to confuse non-repeatable reading and phantom reading. Non-repeatable reading focuses on modification, and phantom reading focuses on adding or deleting. To solve the problem of non-repeatable reading, you only need to lock the rows that meet the conditions. To solve the problem of phantom reading, you need to lock the table
1.4 Let’s give an example
This It may be a bit difficult to understand, so give me an example. Still the previous table structure and table data
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`age` int(11) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 66 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
Suppose now, I want to start two foods at the same time, a transaction A queries students with id = 2 age, a transaction B updates the age of the student with id = 2. The process is as follows, What are the values of X1, X2, and X3 under the four isolation levels?
- Read uncommitted: The value of X1 is 23, because although transaction B has not been committed, its changes have been See. (If B later rolls back the value of X1, it will be dirty). The values of X2 and X3 are also 23, which is understandable.
- Read submitted: The value of X1 is 22, because although B has changed, A cannot see it. (If B is rolled back later, the value of X1 remains unchanged, which solves the problem of dirty reading). The values of X2 and
- Repeatable reading: X1 and (No matter how B is modified during this period, as long as A has not submitted, it will be invisible, which solves the problem of non-repeatable reading), and the value of X3 is 23, because A has submitted and B can be seen The value has been modified.
- Serialization: B will be locked while performing changes until A commits. B can continue execution. (While A is reading, B cannot write. It must be ensured that the data is the latest at this time. Solving phantom reading) So X1 and X2 are both 22, and the final X3 is executed after B submits, and its value That’s 23.
Then why does such a result occur? How is the transaction isolation level implemented?
How is the transaction isolation level implemented? I found the answer in Teacher Ding Qi’s class at Geek Time:
In fact, a view will be created in the database, and the logical result of the view shall prevail when accessing. Under the "Repeatable Read" isolation level, this view is created when the transaction starts and is used throughout the transaction. Under the "read-committed" isolation level, this view is created at the beginning of each SQL statement. What needs to be noted here is that under the "read uncommitted" isolation level, the latest value on the record is directly returned, without the concept of a view; while under the "serialization" isolation level, locking is directly used to avoid parallel access.
1.5 Set transaction isolation level
The default transaction isolation level of different databases is also very different. The default isolation level of Oracle database is Read commit, while MySQL is repeatable read. Therefore, When your system needs to migrate the database from Oracle to MySQL, please set the level to be consistent with the one before the migration (read commit) to avoid unpredictable problems.
1.5.1 Check the transaction isolation level
# 查看事务隔离级别
5.7.20 之前
SELECT @@transaction_isolation
show variables like 'transaction_isolation';
# 5.7.20 以及之后
SELECT @@tx_isolation
show variables like 'tx_isolation'
+---------------+-----------------+
| Variable_name | Value |
+---------------+-----------------+
| tx_isolation | REPEATABLE-READ |
+---------------+-----------------+
1.5.2 Set the isolation level
The format of the statement to modify the isolation level is: set [scope] transaction isolation level [transaction isolation level]
The scope is optional: SESSION (session), GLOBAL (global); isolation level It is the 4 mentioned above, not case sensitive.
For example: Set the global isolation level to read-commit
set global transaction isolation level read committed;
1.6 Transaction startup
MySQL transaction startup is as follows Several ways:
- 显式启动事务语句, begin 或 start transaction。配套的提交语句是 commit,或者回滚语句是 rollback。
# 更新学生名字
START TRANSACTION;
update student set name = '张三' where id = 2;
commit;
- set autocommit = 0,这个命令会将线程的自动提交关掉。意味着如果你只执行一个 select 语句,这个事务就启动了,而且并不会自动提交。这个事务持续存在直到你主动执行 commit 或 rollback 语句,或者断开连接。
- set autocommit = 1,表示 MySQL 自动开启和提交事务。 比如执行一个 update 语句,语句只完成后就自动提交了。不需要显示的使用 begin、commit 来开启和提交事务。所以当我们执行多个语句的时候,就需要手动的用 begin、commit 来开启和提交事务。
- start transaction with consistent snapshot;上面提到的 begin/start transaction 命令并不是一个事务的起点,在执行到它们之后的第一个操作 InnoDB 表的语句,事务才真正启动。如果你想要马上启动一个事务,可以使用 start transaction with consistent snapshot 命令。 第一种启动方式,一致性视图是在执行第一个快照读语句时创建的; 第二种启动方式,一致性视图是在执行 start transaction with consistent snapshot 时创建的。
02 事务隔离的实现
理解了隔离级别,那事务的隔离是怎么实现的呢?要想理解事务隔离,先得了解 MVCC 多版本的并发控制这个概念。而 MVCC 又依赖于 undo log 和 read view 实现。
2.1 什么是 MVCC?
百度上的解释是这样的:
MVCC,全称 Multi-Version Concurrency Control,即多版本并发控制。MVCC 是一种并发控制的方法,一般在数据库管理系统中,实现对数据库的并发访问,在编程语言中实现事务内存。
MVCC 使得数据库读不会对数据加锁,普通的 SELECT 请求不会加锁,提高了数据库的并发处理能力;数据库写才会加锁。 借助 MVCC,数据库可以实现 READ COMMITTED,REPEATABLE READ 等隔离级别,用户可以查看当前数据的前一个或者前几个历史版本,保证了 ACID 中的 I 特性(隔离性)。
MVCC 只在 REPEATABLE READ 和 READ COMMITIED 两个隔离级别下工作。其他两个隔离级别都和 MVCC 不兼容 ,因为 READ UNCOMMITIED 总是读取最新的数据行,而不是符合当前事务版本的数据行。而 SERIALIZABLE 则会对所有读取的行都加锁。
2.1.1 InnDB 中的 MVCC
InnDB 中每个事务都有一个唯一的事务 ID,记为 transaction_id。它在事务开始时向 InnDB 申请,按照时间先后严格递增。
而每行数据其实都有多个版本,这就依赖 undo log 来实现了。每次事务更新数据就会生成一个新的数据版本,并把 transaction_id 记为 row trx_id。同时旧的数据版本会保留在 undo log 中,而且新的版本会记录旧版本的回滚指针,通过它直接拿到上一个版本。
所以,InnDB 中的 MVCC 其实是通过在每行记录后面保存两个隐藏的列来实现的。一列是事务 ID:trx_id;另一列是回滚指针:roll_pt。
2.2 undo log
回滚日志保存了事务发生之前的数据的一个版本,可以用于回滚,同时可以提供多版本并发控制下的读(MVCC),也即非锁定读。
根据操作的不同,undo log 分为两种: insert undo log 和 update undo log。
2.2.1 insert undo log
insert 操作产生的 undo log,因为 insert 操作记录没有历史版本只对当前事务本身可见,对于其他事务此记录不可见,所以 insert undo log 可以在事务提交后直接删除而不需要进行 purge 操作。
purge 的主要任务是将数据库中已经 mark del 的数据删除,另外也会批量回收 undo pages
所以,插入数据时。它的初始状态是这样的:
2.2.2 update undo log
UPDATE 和 DELETE 操作产生的 Undo log 都属于同一类型:update_undo。(update 可以视为 insert 新数据到原位置,delete 旧数据,undo log 暂时保留旧数据)。
事务提交时放到 history list 上,没有事务要用到这些回滚日志,即系统中没有比这个回滚日志更早的版本时,purge 线程将进行最后的删除操作。
One transaction modifies the current data:
Another transaction modifies the data:
There are multiple versions of the same record in the database, which is the multi-version concurrency control MVCC mentioned above.
In addition, you can return to the previous version state by rolling back with the help of undo log. For example, to return to V1, you only need to perform two rollbacks in sequence.
2.3 read-view
#read view is a consistent read view used by InnDB when implementing MVCC to support Implementation of RC (Read Committed) and RR (Repeatable Read) isolation levels.
read view does not really exist, it is just a concept, and undo log is its embodiment. It is mainly calculated through version and undolog. The function is to determine what data the transaction can see.
Each transaction or statement has its own consistency view. Ordinary query statements are consistent reads. Consistent reads determine the visibility of the data version based on row trx_id and consistent views.
2.3.1 Visibility rules of data versions
read view mainly contains other active read and write transactions in the current system. In terms of implementation, InnDB constructs an array for each transaction to save the currently active (not yet submitted) transaction at the moment when the transaction is started.
As mentioned earlier, the transaction ID strictly increases with time. The maximum value of the submitted transaction ID in the system is recorded as the low water level of the array, and the created transaction ID 1 is recorded as the high water level.
This view array and the high water level form the consistency view of the current transaction (read view)
Draw a picture of this array, it looks like this:
The rules are as follows:
The third point I am looking at I was a little confused during the tutorial, but fortunately some enthusiastic netizens answered:
falling in the green area means that the transaction ID is within the range of low water level and high water level. Whether it is really visible depends on whether there is this in the green area. value. If the green area does not have this transaction ID, it is visible, if it does, it is not visible. Being in this range does not mean that this range has this value, such as [1,2,3,5], 4 is in the range of 1-5 of this array, but not in this array.
This may be a bit difficult to understand. I assume a scenario: three transactions query and update the same piece of data. I drew a picture to facilitate understanding:
The original data is still as shown below. Update the information for Zhang San with id = 2:
Regarding the above picture, I would like to ask a question. Under the RC (Read Commit) and RR (Repeatable Read) isolation levels, what are the query age values at T4 and T5 time points respectively? What is the updated value of T4? Think for a moment, I believe everyone has their own answer. The answer is at the end of the article. I hope you can continue reading with your own questions.
2.3.2 Results under RR (Repeatable Read)
At RR level, the query is only recognized before the transaction is started. For the completed data that has been submitted, the view will be built once the transaction is started. So use the start transaction with consistent snapshot command, and the view will be created immediately.
现在假设:
- 事务 A 开始前,只有一个活跃的事务,ID = 2,
- 已提交的事务也就是插入数据的事务 ID = 1
- 事务 A、B、C 的事务 ID 分别是 3、4、5
在这种隔离级别下,他们创建视图的时刻如下:
根据上图得,事务 A 的视图数组是[2,3];事务 B 的视图数组是 [2,3,4];事务 C 的视图数组是[2,3,4,5]。分析一波:
这样执行下来,虽然期间这一行数据被修改过,但是事务 A 不论在什么时候查询,看到这行数据的结果都是一致的,所以我们称之为一致性读。
其实视图是否可见主要看创建视图和提交的时机,总结下规律:
- 版本未提交,不可见
- 版本已提交,但在视图创建后提交,不可见
- 版本已提交,但在视图创建前提交,可见
2.3.2.1 快照读和当前读
事务 B 的 update 语句,如果按照上图的一致性读,好像结果不大对?
如下图周明,B 的视图数组是先生成的,之后事务 C 才提交。那就应该看不见 C 修改的 age = 23 呀?最后 B 怎么得出 24 了?
没错,如果 B 在更新之前执行查询语句,那返回的结果肯定是 age = 22。问题是更新就不能在历史版本更新了呀,否则 C 的更新不就丢失了?
所以,更新有个规则:更新数据都是先读后写(读是更新语句执行,不是我们手动执行),读的就是当前版本的值,叫当前读;而我们普通的查询语句就叫快照读。
因此,在更新时,当前读读到的是 age = 23,更新之后就成 24 啦。
2.3.2.2 select 当前读
除了更新语句,查询语句如果加锁也是当前读。如果把事务 A 的查询语句 select age from t where id = 2 改一下,加上锁(lock in mode 或者 for update),也都可以得到当前版本 4 返回的 age = 24
下面就是加了锁的 select 语句:
select age from t where id = 2 lock in mode;
select age from t where id = 2 for update;
2.3.2.3 事务 C 不马上提交
假设事务 C 不马上提交,但是 age = 23 版本已生成。事务 B 的更新将会怎么走呢?
事务 C 还没提交,写锁还没释放,但是事务 B 的更新必须要当前读且必须加锁。所以事务 B 就阻塞了,必须等到事务 C 提交,释放锁才能继续当前的读。
2.3.3 RC(读提交)下的结果
在读提交隔离级别下,查询只承认在语句启动前就已经提交完成的数据;每一个语句执行之前都会重新算出一个新的视图。
注意:在上图的表格中用于启动事务的是 start transaction with consistent snapshot 命令,它会创建一个持续整个事务的视图。所以,在 RC 级别下,这命令其实不起作用。等效于普通的 start transaction(在执行 sql 语句之前才算是启动了事务)。所以,事务 B 的更新其实是在事务 C 之后的,它还没真正启动事务,而 C 已提交。
现在假设:
- Before transaction A starts, there is only one active transaction, ID = 2,
- The committed transaction is the transaction ID = 1 that inserts data
- Transaction A, B , C's transaction IDs are 3, 4, and 5 respectively
Under this isolation level, the time when they create the view is as follows:
According to the above figure, the view array of transaction A is [2,3,4], but its high water level is 6 or greater (transaction ID 1 has been created); the view array of transaction B is [ 2,4]; the view array of transaction C is [2,5]. A wave of analysis:
-
At T4 time, B reads data from the current version. The process is as follows:
- reads the trx_id of the current version = 4, which happens to be myself. It can be seen that
- so age = 24
-
At time T5, A reads data from the current version. The process is as follows of:
- Read the current version of trx_id = 4, which is within the scope of your own consistency view but contains 4, and is invisible
- Read further up trx_id = 5, which is within your own consistency view Consistency view range but does not include 5, visible
- so age = 23
##03 Shoulders of Giants
cnblogs.com/wyaokai/p/10921323.html- time.geekbang.org/column/article/70562
- zhuanlan.zhihu.com/p/117476959
- cnblogs.com/xd502djj/p/6668632.html
- blog.csdn.net/article/details/109044141
- blog.csdn.net/u014078930/article/details/ 99659272
-
04 Summary
This article talks about all aspects of transactions in detail, such as: four major features, isolation level, concurrency problems solved, how to set up, and view Isolation levels, how to start transactions, etc. In addition, we also gained an in-depth understanding of how the two levels of isolation, RR and RC, are achieved? Including a detailed explanation of how MVCC, undo log and read view work together to implement MVCC. Finally, we talked about snapshot reading, current reading, etc. It can be said that all the knowledge points related to affairs are here. If you still don’t understand after reading this article, come and hit me! [Related recommendations: mysql video tutorial]
The above is the detailed content of An article explaining the principles of transactions and MVCC in MySQL in detail. For more information, please follow other related articles on the PHP Chinese website!