Home >Database >Mysql Tutorial >Detailed introduction to the implementation principle of ACID characteristics of MySQL transactions (picture and text)

Detailed introduction to the implementation principle of ACID characteristics of MySQL transactions (picture and text)

不言
不言forward
2019-01-30 10:15:593529browse

This article brings you a detailed introduction (pictures and texts) on the implementation principles of the ACID feature of MySQL transactions. It has certain reference value. Friends in need can refer to it. Hope it helps.

Transactions are an important aspect that distinguishes relational databases such as MySQL from NoSQL and are an important means to ensure data consistency. This article will first introduce MySQL basic concepts related to transactions, then introduce the ACID characteristics of transactions, and analyze its implementation principles.

MySQL is extensive and profound, and omissions in the article are inevitable. Criticisms and corrections are welcome.

1. Basic concepts

Transaction is a program execution unit that accesses and updates the database; a transaction may contain one or more SQL statements, which are either Either execute, or neither. As a relational database, MySQL supports transactions. This article is based on MySQL5.6.

First review the basics of MySQL transactions. (Recommended course: MySQL video tutorial)

1. Logical architecture and storage engine

As shown above As shown, the MySQL server logical architecture can be divided into three layers from top to bottom:

(1) The first layer: handles client connections, authorization authentication, etc.

(2) Second layer: Server layer, responsible for parsing, optimizing, caching of query statements, implementation of built-in functions, stored procedures, etc.

(3) The third layer: storage engine, responsible for the storage and retrieval of data in MySQL. MySQLThe server layer does not manage transactions, and transactions are implemented by the storage engine. MySQL's storage engines that support transactions include InnoDB, NDB Cluster, etc., among which InnoDB is the most widely used; other storage engines do not support transactions, such as MyIsam, Memory, etc.

Unless otherwise specified, the content described in the following article is based on InnoDB.

2. Submit and rollback

A typical MySQL transaction is operated as follows:

start transaction;
……  #一条或多条sql语句
commit;

where start transaction identifies the start of the transaction, commit commits the transaction, and writes the execution results to the database. If there is a problem with the execution of SQL statements, rollback will be called to roll back all SQL statements that have been successfully executed. Of course, you can also use the rollback statement directly in the transaction to roll back.

Autocommit

MySQL uses the autocommit mode by default, as shown below:

In auto-commit mode, if a transaction is not explicitly started by start transaction, then each SQL statement will be treated as a transaction to perform a commit operation.

You can turn off autocommit in the following ways; it should be noted that the autocommit parameters are specific to connections. Modifying parameters in one connection will not affect other connections.

If autocommit is turned off, all SQL statements are in one transaction until commit or rollback is executed, the transaction ends, and another transaction starts.

Special operations

In MySQL, there are some special commands. If these commands are executed in a transaction, commit will be forced to commit the transaction immediately; such as DDL statements (create table/drop table/alter/table), lock tables statements, etc.

However, the commonly used select, insert, update and delete commands will not force the transaction to be committed.

3. ACID characteristics

ACID is a measure of four characteristics of transactions:

  • Atomicity (or indivisibility)

  • Consistency

  • Isolation

  • Durability

According to strict standards, only transactions that meet the ACID characteristics at the same time are considered transactions; however, in the implementations of major database vendors, there are very few transactions that truly meet ACID. For example, MySQL's NDB Cluster transaction does not meet durability and isolation; InnoDB's default transaction isolation level is repeatable read, which does not meet isolation; Oracle's default transaction isolation level is READ COMMITTED, which does not meet isolation...So instead of It is said that ACID is a condition that a transaction must meet. Rather, it is better to say that they are the four dimensions of measuring a transaction.

The ACID characteristics and their implementation principles will be introduced in detail below; for ease of understanding, the order of introduction is not strictly A-C-I-D.

2. Atomicity

1. Definition

Atomicity means that a transaction is an indivisible unit of work, in which all operations are done or none are done. ; If the execution of a SQL statement in the transaction fails, the executed statement must also be rolled back, and the database returns to the state before the transaction.

2. Implementation principle: undo log

Before explaining the principle of atomicity, first introduce the MySQL transaction log. There are many types of MySQL logs, such as binary logs, error logs, query logs, slow query logs, etc. In addition, the InnoDB storage engine also provides two transaction logs: redo log (redo log) and undo log (rollback log). The redo log is used to ensure transaction durability; the undo log is the basis for transaction atomicity and isolation.

Let’s talk about undo log. The key to achieving atomicity is to be able to undo all successfully executed SQL statements when the transaction is rolled back. InnoDBRollback is achieved by relying on undo log: When a transaction modifies the database, InnoDB will generate the corresponding undo log ; If the transaction execution fails or rollback is called, causing the transaction to be rolled back, you can use the information in the undo log to roll back the data to the way it was before modification.

undo log is a logical log, which records information related to SQL execution. When a rollback occurs, InnoDB will do the opposite of the previous work based on the contents of the undo log: for each insert, a delete will be executed during rollback; for each delete, an insert will be executed during rollback; for each update, a delete will be executed during rollback. When rolling, a reverse update will be performed to change the data back.

Take the update operation as an example: when a transaction executes an update, the undo log generated will contain the primary key of the modified row (in order to know which rows have been modified), which columns have been modified, and the values ​​of these columns before and after the modification. Value and other information, you can use this information to restore the data to the state before the update when rolling back.

3. Durability

1. Definition

Persistence means that once a transaction is submitted, its changes to the database should be permanent. Subsequent operations or failures should not have any impact on it.

2. Implementation principle: redo log

Both redo log and undo log belong to InnoDB transaction log. Let’s first talk about the background of the existence of redo log.

InnoDB is the storage engine of MySQL, and data is stored on the disk. However, if disk IO is required every time to read and write data, the efficiency will be very low. To this end, InnoDB provides a cache (Buffer Pool). The Buffer Pool contains the mapping of some data pages on the disk and serves as a buffer for accessing the database: when reading data from the database, it will first be read from the Buffer Pool. If the Buffer Pool If it does not exist in the Pool, it will be read from the disk and put into the Buffer Pool; when writing data to the database, it will be written to the Buffer Pool first, and the modified data in the Buffer Pool will be regularly refreshed to the disk (this process is called dirty flushing ).

The use of Buffer Pool greatly improves the efficiency of reading and writing data, but it also brings new problems: if MySQL goes down and the modified data in the Buffer Pool has not been flushed to the disk, it will cause Data loss and transaction durability cannot be guaranteed.

So, redo log was introduced to solve this problem: when the data is modified, in addition to modifying the data in the Buffer Pool, the operation will also be recorded in the redo log; when the transaction is committed, the fsync interface will be called to Redo log is used to flush the disk. If MySQL goes down, you can read the data in the redo log and restore the database when it restarts. The redo log uses WAL (Write-ahead logging, write-ahead log). All modifications are first written to the log and then updated to the Buffer Pool, ensuring that the data will not be lost due to MySQL downtime, thereby meeting the durability requirements.

Since the redo log also needs to write the log to the disk when the transaction is committed, why is it faster than directly writing the modified data in the Buffer Pool to the disk (that is, brushing it dirty)? There are mainly two reasons:

(1) Dirty cleaning is random IO, because the data location modified each time is random, but writing redo log is an append operation and belongs to sequential IO.

(2) Dirty cleaning is based on data pages (Page). The default page size of MySQL is 16KB. A small modification on a Page requires the entire page to be written; and the redo log only contains what is really needed. In the writing part, invalid IO is greatly reduced.

3. redo log and binlog

We know that there is also a binlog (binary log) in MySQL that can also record write operations and be used for data recovery, but the two are fundamentally different. of:

(1) Different functions: redo log is used for crash recovery to ensure that MySQL downtime will not affect durability; binlog is used for point-in-time recovery to ensure that the server can be based on Point-in-time recovery of data, in addition binlog is also used for master-slave replication.

(2) Different levels: redo log is implemented by the InnoDB storage engine, while binlog is implemented by the MySQL server layer (please refer to the introduction to the MySQL logical architecture earlier in the article), and supports InnoDB and other storages at the same time. engine.

(3) The content is different: redo log is a physical log, and the content is based on the page of the disk; binlog is a logical log, and the content is one sql.

(4) The writing timing is different: binlog is written when the transaction is committed; the writing timing of the redo log is relatively diverse:

  • As mentioned earlier: when the transaction When committing, fsync will be called to flush the redo log; this is the default strategy. Modifying the innodb_flush_log_at_trx_commit parameter can change the strategy, but the durability of the transaction cannot be guaranteed.

  • 除了事务提交时,还有其他刷盘时机:如master thread每秒刷盘一次redo log等,这样的好处是不一定要等到commit时刷盘,commit速度大大加快。

四、隔离性

1. 定义

与原子性、持久性侧重于研究事务本身不同,隔离性研究的是不同事务之间的相互影响。隔离性是指,事务内部的操作与其他事务是隔离的,并发执行的各个事务之间不能互相干扰。严格的隔离性,对应了事务隔离级别中的Serializable (可串行化),但实际应用中出于性能方面的考虑很少会使用可串行化。

隔离性追求的是并发情形下事务之间互不干扰。简单起见,我们仅考虑最简单的读操作和写操作(暂时不考虑带锁读等特殊操作),那么隔离性的探讨,主要可以分为两个方面:

  • (一个事务)写操作对(另一个事务)写操作的影响:锁机制保证隔离性

  • (一个事务)写操作对(另一个事务)读操作的影响:MVCC保证隔离性

2. 锁机制

首先来看两个事务的写操作之间的相互影响。隔离性要求同一时刻只能有一个事务对数据进行写操作,InnoDB通过锁机制来保证这一点。

锁机制的基本原理可以概括为:事务在修改数据之前,需要先获得相应的锁;获得锁之后,事务便可以修改数据;该事务操作期间,这部分数据是锁定的,其他事务如果需要修改数据,需要等待当前事务提交或回滚后释放锁。

行锁与表锁

按照粒度,锁可以分为表锁、行锁以及其他位于二者之间的锁。表锁在操作数据时会锁定整张表,并发性能较差;行锁则只锁定需要操作的数据,并发性能好。但是由于加锁本身需要消耗资源(获得锁、检查锁、释放锁等都需要消耗资源),因此在锁定数据较多情况下使用表锁可以节省大量资源。MySQL中不同的存储引擎支持的锁是不一样的,例如MyIsam只支持表锁,而InnoDB同时支持表锁和行锁,且出于性能考虑,绝大多数情况下使用的都是行锁。

如何查看锁信息

有多种方法可以查看InnoDB中锁的情况,例如:

select * from information_schema.innodb_locks; #锁的概况
show engine innodb status; #InnoDB整体状态,其中包括锁的情况

下面来看一个例子:

#在事务A中执行:
start transaction;
update account SET balance = 1000 where id = 1;
#在事务B中执行:
start transaction;
update account SET balance = 2000 where id = 1;

此时查看锁的情况:

show engine innodb status查看锁相关的部分:

通过上述命令可以查看事务24052和24053占用锁的情况;其中lock_type为RECORD,代表锁为行锁(记录锁);lock_mode为X,代表排它锁(写锁)。

除了排它锁(写锁)之外,MySQL中还有共享锁(读锁)的概念。由于本文重点是MySQL事务的实现原理,因此对锁的介绍到此为止,后续会专门写文章分析MySQL中不同锁的区别、使用场景等,欢迎关注。

介绍完写操作之间的相互影响,下面讨论写操作对读操作的影响。

3. 脏读、不可重复读和幻读

首先来看并发情况下,读操作可能存在的三类问题:

(1)脏读:当前事务(A)中可以读到其他事务(B)未提交的数据(脏数据),这种现象是脏读。举例如下(以账户余额表为例):

(2)不可重复读:在事务A中先后两次读取同一个数据,两次读取的结果不一样,这种现象称为不可重复读。脏读与不可重复读的区别在于:前者读到的是其他事务未提交的数据,后者读到的是其他事务已提交的数据。举例如下:

(3)幻读:在事务A中按照某个条件先后两次查询数据库,两次查询结果的条数不同,这种现象称为幻读。不可重复读与幻读的区别可以通俗的理解为:前者是数据变了,后者是数据的行数变了。举例如下:

4. 事务隔离级别

SQL标准中定义了四种隔离级别,并规定了每种隔离级别下上述几个问题是否存在。一般来说,隔离级别越低,系统开销越低,可支持的并发越高,但隔离性也越差。隔离级别与读问题的关系如下:

In actual applications, read uncommitted will cause many problems during concurrency, but the performance improvement is very limited compared to other isolation levels, so it is less used. SerializableForcing transactions to be serialized, the concurrency efficiency is very low. It can only be used when data consistency requirements are extremely high and no concurrency is acceptable, so it is rarely used. Therefore, in most database systems, the default isolation level is Read Committed ( such as Oracle) or Repeatable Read (hereinafter referred to as RR).

You can view the global isolation level and the isolation level of this session through the following two commands:

InnoDB The default isolation level is RR, and we will focus on RR later. It should be noted that in the SQL standard, RR cannot avoid the phantom read problem, but the RR implemented by InnoDB avoids the phantom read problem.

5. MVCC

RR solves problems such as dirty reads, non-repeatable reads, phantom reads, etc., using MVCC: MVCC’s full name is Multi-Version Concurrency Control, which is a multi-version concurrency control protocol. The following example well reflects the characteristics of MVCC: at the same time, the data read by different transactions may be different (i.e., multiple versions) - at time T5, transaction A and transaction C can read different versions. The data.

#The biggest advantage of MVCC is that reading is not locked, so there is no conflict between reading and writing, and the concurrency performance is good. InnoDB implements MVCC, and multiple versions of data can coexist, mainly relying on the hidden columns of data (also called mark bits) and undo log. The hidden columns of the data include the version number of the row of data, the deletion time, the pointer to the undo log, etc.; when reading the data, MySQL can use the hidden columns to determine whether rollback is needed and find the undo log required for rollback. Thus MVCC is implemented; the detailed format of hidden columns is no longer expanded.

The following are explained separately in conjunction with the several issues mentioned above.

(1) Dirty read

When transaction A reads zhangsan’s balance at the T3 time node, it will be found that the data has been modified by other transactions, and The status is Uncommitted. At this time, after transaction A reads the latest data, it performs a rollback operation based on the undo log of the data and obtains the data before transaction B modified, thus avoiding dirty reads.

(2) Non-repeatable read

When transaction A reads data for the first time on the T2 node, the version number of the data (data The version number is recorded in row units), assuming the version number is 1; when transaction B commits, the version number recorded in this row increases, assuming the version number is 2; when transaction A reads data again at T5, It is found that the version number of the data (2) is greater than the version number (1) recorded when reading for the first time. Therefore, a rollback operation is performed based on the undo log to obtain the data when the version number is 1, thereby achieving repeatable reading.

(3) Phantom reading

The RR implemented by InnoDB avoids the phenomenon of phantom reading through the next-key lock mechanism.

next-key lock is a type of row lock, which is equivalent to record lock(record lock) gap lock( Gap lock); Its characteristic is that it not only locks the record itself (the function of record lock), but also locks a range (gap lockfunction). Of course, what we are talking about here is unlocked reading: the next-key lock at this time is not really locked, it just adds a mark to the read data (the mark content includes the version number of the data, etc.); accurate For the sake of convenience, let’s call it a next-key lock mechanism. Let’s use the previous example to illustrate:

When transaction A reads 0

6. Summary

In summary, the RR implemented by InnoDB achieves a certain degree of isolation through the lock mechanism, hidden columns of data, undo log and next-key lock class. Can meet the needs of most scenarios. However, it should be noted that although RR avoids the phantom read problem, it is not Serializable after all and cannot guarantee complete isolation. Here is an example, you can verify it yourself.

5. Consistency

1. Basic concept

Consistency means that after the transaction execution ends, the integrity constraints of the database are not destroyed, and the integrity constraints of the database are not destroyed before and after the transaction execution. Legal data status. Integrity constraints of the database include but are not limited to: entity integrity (such as the primary key of the row exists and is unique), column integrity (such as the type, size, and length of the field must meet the requirements), foreign key constraints, and user-defined Completeness (for example, the sum of the balances of the two accounts should remain unchanged before and after the transfer).

2. Implementation

It can be said that consistency is the ultimate goal pursued by transactions: the atomicity, persistence and isolation mentioned above are all to ensure the consistency of the database state. In addition, in addition to guarantees at the database level, the implementation of consistency also requires guarantees at the application level.

Measures to achieve consistency include:

  • Ensure atomicity, durability and isolation. If these characteristics cannot be guaranteed, transaction consistency cannot be guaranteed

  • The database itself provides guarantees, for example, it is not allowed to insert string values ​​into integer columns, the string length cannot exceed the column limit, etc.

  • Conducted at the application level Guarantee, for example, if the transfer operation only deducts the balance of the transferor and does not increase the balance of the recipient, no matter how perfect the database is implemented, the consistency of the status cannot be guaranteed

6. Summary

The following summarizes the ACID characteristics and their implementation principles:

  • Atomicity: Statements are either fully executed or not executed at all. This is the core characteristic of a transaction. The transaction itself is Defined by atomicity; the implementation is mainly based on undo log

  • Persistence: to ensure that data will not be lost due to downtime and other reasons after the transaction is submitted; the implementation is mainly based on redo log

  • Isolation: Ensure that transaction execution is not affected by other transactions as much as possible; InnoDB’s default isolation level is RR. The implementation of RR is mainly based on the lock mechanism, hidden columns of data, undo log and class next- key lock mechanism

  • Consistency: The ultimate goal pursued by transactions. The realization of consistency requires both database-level guarantees and application-level guarantees

The above is the detailed content of Detailed introduction to the implementation principle of ACID characteristics of MySQL transactions (picture and text). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:cnblogs.com. If there is any infringement, please contact admin@php.cn delete