MVCC (Multiversion Concurrency Control), multi-version concurrency control
. As the name suggests, MVCC implements concurrency control
of the database through multiple version management of data rows. This technology ensures consistent read operations under InnoDB's transaction isolation level. In other words, it is to query some rows that are being updated by another transaction, and you can see the values before they were updated, so that you do not have to wait for another transaction to release the lock when doing the query. There is no formal standard for MVCC. The implementation of MVCC may be different in different DBMS, and it is not universally used (you can refer to the relevant DBMS documentation). Here we explain the implementation mechanism of MVCC in InnoDB (other MySQL storage engines do not support it)
2 Snapshot reading and current reading
, so that even if there are read-write conflicts, no locking
, non-blocking concurrent reading can be achieved
, and this read refers to the snapshot read
, not the current read
. The current read is actually a locking operation and an implementation of pessimistic locking. The essence of MVCC is a way of using optimistic locking thinking. 2.1 Snapshot read
, is a non-blocking read without locking; for example:
select * from player where ...
The reason why snapshot reading occurs , is based on the consideration of improving concurrency performance. The implementation of snapshot reading is based on MVCC, which avoids locking operations and reduces overhead in many cases.
Since it is based on multiple versions, the snapshot read may not necessarily read the latest version of the data, but may be the previous historical version.
The premise of snapshot reading is that the isolation level is not the serial level. Snapshot reading at the serial level will degenerate into current reading.
2.2 Current Reading
3. Review
In MysQL, the default isolation level is repeatable read, which can solve dirty reads and non-repeatable reads. If we look at the problem of reading only from the perspective of definition, it does not solve the problem of phantom reading. If we want to solve the phantom read problem, we need to use serialization, that is, to increase the isolation level to the highest level, but this will significantly reduce the transaction concurrency capability of the database.
MVCC can not use the lock mechanism, but solve the problem of non-repeatable reads and phantom reads through optimistic locking! It can replace row-level locks in most cases and reduce system overhead.
3.2 Hidden fields, Undo Log version chain
trx_id: Every time a transaction changes a clustered index record, the transaction ID of the transaction will be assigned to the trx_id hidden column. roll_pointer: Every time a clustered index record is modified, the old version will be written to the undo log. Then this hidden column is equivalent to a pointer, through which the information before the record can be found is modified.
##4. MVCC Implementation Principle of ReadView
4.1 What is ReadView
In the MVCC mechanism, multiple transactions pair Updating the same row record will generate multiple historical snapshots, and these historical snapshots are saved in the Undo Log. If a transaction wants to query this row record, which version of the row record needs to be read? At this time, ReadView needs to be used, which helps us solve the row visibility problem.When a transaction uses the MVCC mechanism to perform a snapshot read operation, a read view will be generated. This view is ReadView. When a transaction is started, a current snapshot of the database system will be generated. InnoDB constructs an array for each transaction to record and maintain the ID of the current active transactions in the system ("active" refers to started but not yet submitted. ).
Use READ UNCOMMITTED
isolation level transactions. Since you can read the records modified by uncommitted transactions, you can directly read the latest records of the records. The version is just fine.
Use SERIALIZABLE
isolation level transactions, InnoDB stipulates to use locking to access records.
Use READ COMMITTED
Transactions with isolation level REPEATABLE READ
must ensure that they read the records modified by has been submitted by the transaction
. If another transaction has modified the record but has not yet submitted it, it cannot directly When reading the latest version of records, the core problem is to determine which version in the version chain is visible to the current transaction. This is the main problem to be solved by ReadView.
This ReadView mainly contains 4 The more important contents are as follows:
With this ReadView, when accessing a certain record, Just follow the steps below to determine whether a certain version of the record is visible.
If the trx_id attribute value of the accessed version is the same as the creator_trx_id
value in ReadView, it means The current transaction is accessing its own modified records, so this version can be accessed by the current transaction.
If the value of the trx_id attribute of the accessed version is less than the up_limit_id# in ReadView ## value, indicating that the transaction that generated this version has been committed before the current transaction generates ReadView, so this version can be accessed by the current transaction.
low_limit_id value in ReadView, indicating that the transaction that generated this version was opened after the current transaction generated ReadView, so this version cannot be accessed by the current transaction.
up_limit_id and
low_limit_id, then you need to determine whether the trx_id attribute value is in the
trx_ids list. If it is, it means that the transaction that generated this version when the ReadView was created is still active, and this version cannot be accessed. If it is not there, it means that the transaction that generated this version when the ReadView was created has been committed, and this version can be accessed. 4.4 MVCC overall operation process
After understanding these concepts, let’s take a look at how the system finds it through MVCC when querying a record:
is as follows:
5. Example transaction whose transaction id
is 8
:
READ COMMITTED and
REPEATABLE READ.
Under isolation level:
READ COMMITTED: A ReadView is generated before each data reading
There are currently two transactions with transaction IDs 10 and 20 being executed:
Description: Transaction execution During the process, only when the record is actually modified for the first time (such as using INSERT, DELETE, UPDATE statements), a separate transaction ID will be assigned, and this transaction ID will be incremented. That's why we updated some records in other tables in transaction 2 in order to let it assign transaction ids.
At this moment, the version linked list obtained by the record with id 1 in the student table is as follows:
Assume now A transaction using the READ COMMITED isolation level begins execution:
The execution process of this SELECT1 is as follows:
Step 1: When executing the SELECT statement, a ReadView
will be generated first. The content of the trx_ids
list of ReadView is [10, 20], up_limit_id
is 10
, low_limit_id
is 21
, creator_trx_id
is 0
.
Step 2: Select visible records from the version chain. As can be seen from the figure, the content of the column name
of the latest version is '王五'
, and the # of this version The value of ##trx_id is
10, which is in the trx_ids list, so it does not meet the visibility requirements and jumps to the next version according to roll_pointer.
Step 3: The content of the column
name of the next version is
'李思', and the
trx_id value of this version is also
10 is also in the
trx_ids list, so it does not meet the requirements and continues to jump to the next version.
Step 4: The content of the column
name of the next version is
‘Zhang San', and the
trx_id value of this version is
8, is smaller than the up_limit_id value 10 in ReadView, so this version meets the requirements. The version returned to the user is the record whose column name is "Zhang San".
id to
10:
Transaction id is
20 Update the record in table
student where
id is
1:
At this moment, the version chain of the record with id 1 in the student table looks like this:
Then continue to search for the record with id 1 in the transaction that just used the READ COMMITTED isolation level,is as follows:
Execution of this SELECT2 The process is as follows:
Step 1: When executing the SELECT statement, a ReadView will be generated separately. The content of the trx_ids list of the ReadView is [20], and the up_limit_id is 20. low_limit_id is 21, creator_trx_id is 0.
Step 2:Select visible records from the version chain. As can be seen from the figure, the content of the column name of the latest version is "Song Ba", and the tr_id value of this version is 20, is in the trx_ids list, so it does not meet the visibility requirements and jumps to the next version according to roll.pointer.
Step 3:The content of the column name of the next version is "Qian Qi". The trx_id value of this version is 20, which is also in the trx_ids list, so it does not meet the requirements. Continue to jump. to the next version. Step 4: The content of the column name of the next version is "王五". The trx_id value of this version is 10, which is less than the up_limit.id value 20 in ReadView, so this version meets the requirements, and finally returns The version given to users is the record whose column name is "王五".
By analogy, if the record with transaction ID 20 is also submitted later, when the record with ID value 1 in table student is queried again in a transaction using READ CONMMITTED isolation level, the result obtained is "Song Ba" ;, we will not analyze the specific process.
Under the isolation level:
For transactions using the REPEATABLE READ isolation level, the query statement will only be executed for the first time When a ReadView is generated, subsequent queries will not be generated repeatedly.For example, there are two transactions with transaction IDs 10 and 20 being executed in the system:
此刻,表student中id为1的记录得到的版本链表如下所示:
假设现在有一个使用REPEATABLE READ隔离级别的事务开始执行:
此时执行过程与read committed相同
然后再到刚才使用REPEATABLE READ隔离级别的事务中继续查找id为1的记录,如下:
这个SELECT2的执行过程如下:
步骤1:因为当前事务的隔离级别为REPEATABLE READ,而之前在执行SELECT1时已经生成过ReadView了,所以此时直接复用之前的ReadView,之前的ReadView的trx_ids列表的内容就是[10,20],up_limit_id为10, low_limit_id为21 , creator_trx_id为0。
步骤2:然后从版本链中挑选可见的记录,从图中可以看出,最新版本的列name的内容是’宋八’trx_id值为20,在trx_ids列表内,所以不符合可见性要求,根据roll_pointer跳到下一个版本。
步骤3:下一个版本的列name的内容是’钱七’,该版本的trx_id值为20,也在trx_ids列表内合要求,继续跳到下一个版本。
步骤4:下一个版本的列name的内容是’王五’,该版本的trx_id值为10,而trx_ids列表中是包含值为10的事务id的,所以该版本也不符合要求,同理下一个列name的内容是’李四’的版本也不符合要求。继续跳到下个版本。
步聚5∶下一个版本的列name的内容是’张三’,该版本的trx_id值为80,小于Readview中的up_limit_id值10,所以这个版本是符合要求的,最后返回给用户的版本就是这条列c为‘张三’的记录。
两次SELECT查询得到的结果是重复的,记录的列c值都是’张三’,这就是可重复读的含义。如果我们之后再把事务id为20的记录提交了,然后再到刚才使用REPEATABLE READ隔离级刷的事务中继续查找这个id为1的记录,得到的结果还是’张三’,具体执行过程大家可以自己分析一下。
假设现在表student中只有一条数据,数据内容中,主键id=1,隐藏的trx_id=10,它的undo log如下图所示。
假设现在有事务A和事务B并发执行,事务A的事务id为20,事务B的事务id为30。
步骤1:事务A开始第一次查询数据,查询的SQL语句如下。
select * from student where id > 1;
在开始查询之前,MySQL会为事务A产生一个ReadView,此时ReadView的内容如下: trx_ids=[20, 30 ] ,up_limit_id=20 , low_limit_id=31 , creator_trx_id=20。
因为表student只有一条符合条件 where id>=1 的数据,所以会被查询出来。然后根据ReadView机制,发现该行数据的trx_id=10,小于事务A的ReadView里up_limit_id,这表示这条数据是事务A开启之前,其他事务就已经提交了的数据,因此事务A可以读取到。
结论:事务A的第一次查询,能读取到一条数据,id=1。
步骤2∶接着事务B(trx_id=30),往表student中新插入两条数据,并提交事务。
insert into student(id,name) values(2,'李四'); insert into student(id,name) values(3,'王五');
此时表student中就有三条数据了,对应的undo如下图所示:
Step 3: Then transaction A starts the second query. According to the rules of repeatable read isolation level, transaction A will not regenerate ReadView at this time. At this time, the three pieces of data in the student table all meet the condition of where id>=1, so they will be found first. Then according to the ReadView mechanism, it is judged whether each piece of data can be seen by transaction A.
1) First of all, the data with id=1, as mentioned before, can be seen by transaction A.
2) Then there is the data with id=2, and its trx_id=30. At this time, transaction A finds that this value is between up_limit_id and low_limit_id, so it is necessary to determine whether 30 is in the trx_ids array. Since transaction A's trx_ids=[20,30], in the array, this means that the data with id=2 was submitted by other transactions started at the same time as transaction A, so this data cannot be seen by transaction A. .
3) Similarly, the trx_id of this data with id=3 is also 30, so it cannot be seen by transaction A.
Conclusion:The second query of transaction A can only query the data with id=1. This is the same as the result of the first query of transaction A, so there is no phantom reading phenomenon. Therefore, under the repeatable isolation level of MySQL, there is no phantom reading problem.
The above is the detailed content of MySQL multi-version concurrency control MVCC instance analysis. For more information, please follow other related articles on the PHP Chinese website!