Home >Database >Mysql Tutorial >What is transaction in mysql

What is transaction in mysql

青灯夜游
青灯夜游Original
2022-02-17 16:17:4111840browse

In MySQL, a transaction is a mechanism, a sequence of operations, and a program execution unit for accessing and updating the database. A transaction contains one or more database operation commands, and all commands will be submitted or revoked to the system as a whole. That is, this group of database commands will either be executed or none of them will be executed.

What is transaction in mysql

The operating environment of this tutorial: windows7 system, mysql5.6 version, Dell G3 computer.

Database transaction (Transaction) is a mechanism, an operation sequence, and a program execution unit for accessing and updating the database. It includes a set of database operation commands.

A transaction submits or revokes an operation request to the system together with all commands as a whole, that is, this group of database commands are either executed or not executed, so the transaction is an indivisible logical unit of work.

When performing concurrent operations on a database system, transactions are used as the smallest control unit, which is especially suitable for database systems operated by multiple users at the same time.

As a relational database, MySQL supports transactions. This article is based on MySQL5.6.

First review the basics of MySQL transactions.

1. Logical architecture and storage engine

Image source: https://blog.csdn .net/fuzhongmin05/article/details/70904190

As shown in the figure above, the MySQL server logical architecture can be divided into three layers from top to bottom:

(1) The first layer: processing customers terminal connection, authorization authentication, etc.

(2) Second layer: Server layer, responsible for parsing, optimizing, caching of query statements, implementation of built-in functions, stored procedures, etc.

(3) The third layer: storage engine, responsible for the storage and retrieval of data in MySQL. MySQLThe server layer does not manage transactions, and transactions are implemented by the storage engine. MySQL's storage engines that support transactions include InnoDB, NDB Cluster, etc., among which InnoDB is the most widely used; other storage engines do not support transactions, such as MyIsam, Memory, etc.

Unless otherwise specified, the content described in the following article is based on InnoDB.

2. Submit and rollback

A typical MySQL transaction is operated as follows:

start transaction;
……  #一条或多条sql语句
commit;

where start transaction identifies The transaction starts, commit commits the transaction, and writes the execution results to the database. If there is a problem with the execution of SQL statements, rollback will be called to roll back all SQL statements that have been successfully executed. Of course, you can also use the rollback statement directly in the transaction to roll back.

Autocommit

MySQL uses the autocommit mode by default, as shown below:

In auto-commit mode, if a transaction is not explicitly started by start transaction, then each SQL statement will be treated as a transaction to perform a commit operation.

You can turn off autocommit in the following ways; it should be noted that the autocommit parameters are specific to connections. Modifying parameters in one connection will not affect other connections.

If autocommit is turned off, all SQL statements are in one transaction until commit or rollback is executed, the transaction ends, and another transaction starts.

Special operations

In MySQL, there are some special commands. If these commands are executed in a transaction, commit will be forced to commit the transaction immediately; such as DDL statements (create table/drop table/alter/table), lock tables statements, etc.

However, the commonly used select, insert, update and delete commands will not force the transaction to be committed.

3. ACID characteristics

ACID is a measure of the four characteristics of a transaction:

  • Atomicity , or indivisibility)
  • Consistency
  • Isolation
  • Durability

According to According to strict standards, only transactions that meet the ACID characteristics at the same time are considered transactions; however, in the implementations of major database vendors, there are very few transactions that truly meet ACID. For example, MySQL's NDB Cluster transaction does not meet durability and isolation; InnoDB's default transaction isolation level is repeatable read, which does not meet isolation; Oracle's default transaction isolation level is READ COMMITTED, which does not meet isolation...So instead of It is said that ACID is a condition that a transaction must meet. Rather, it is better to say that they are the four dimensions of measuring a transaction.

The ACID characteristics and their implementation principles will be introduced in detail below; for ease of understanding, the order of introduction is not strictly A-C-I-D.

Atomicity

1. Definition

Atomicity means that a transaction is an indivisible unit of work, in which all operations are done or none are done; if a SQL statement in the transaction fails to execute, the executed statement must also be rolled back and the database returns. to the state before the transaction.

2. Implementation principle: undo log

Before explaining the principle of atomicity, first introduce the MySQL transaction log. There are many types of MySQL logs, such as binary logs, error logs, query logs, slow query logs, etc. In addition, the InnoDB storage engine also provides two transaction logs: redo log (redo log) and undo log (rollback log). The redo log is used to ensure transaction durability; the undo log is the basis for transaction atomicity and isolation.

Let’s talk about undo log. The key to achieving atomicity is to be able to undo all successfully executed SQL statements when the transaction is rolled back. InnoDBRollback is achieved by relying on undo log: When a transaction modifies the database, InnoDB will generate the corresponding undo log ; If the transaction execution fails or rollback is called, causing the transaction to be rolled back, you can use the information in the undo log to roll back the data to the way it was before modification.

undo log is a logical log, which records information related to SQL execution. When a rollback occurs, InnoDB will do the opposite of the previous work based on the contents of the undo log: for each insert, a delete will be executed during rollback; for each delete, an insert will be executed during rollback; for each update, a delete will be executed during rollback. When rolling, a reverse update will be performed to change the data back.

Take the update operation as an example: when a transaction executes an update, the undo log generated will contain the primary key of the modified row (in order to know which rows have been modified), which columns have been modified, and the values ​​of these columns before and after the modification. Value and other information, you can use this information to restore the data to the state before the update when rolling back.

Persistence

1. Definition

Persistence means that once a transaction is committed, its changes to the database should be is permanent. Subsequent operations or failures should not have any impact on it.

2. Implementation principle: redo log

Both redo log and undo log belong to the InnoDB transaction log. Let’s first talk about the background of the existence of redo log.

InnoDB is the storage engine of MySQL, and data is stored on the disk. However, if disk IO is required every time to read and write data, the efficiency will be very low. To this end, InnoDB provides a cache (Buffer Pool). The Buffer Pool contains the mapping of some data pages on the disk and serves as a buffer for accessing the database: when reading data from the database, it will first be read from the Buffer Pool. If the Buffer Pool If it does not exist in the Pool, it will be read from the disk and put into the Buffer Pool; when writing data to the database, it will be written to the Buffer Pool first, and the modified data in the Buffer Pool will be regularly refreshed to the disk (this process is called dirty flushing ).

The use of Buffer Pool greatly improves the efficiency of reading and writing data, but it also brings new problems: if MySQL goes down and the modified data in the Buffer Pool has not been flushed to the disk, it will cause Data loss and transaction durability cannot be guaranteed.

So, redo log was introduced to solve this problem: when the data is modified, in addition to modifying the data in the Buffer Pool, the operation will also be recorded in the redo log; when the transaction is committed, the fsync interface will be called to Redo log is used to flush the disk. If MySQL goes down, you can read the data in the redo log and restore the database when it restarts. The redo log uses WAL (Write-ahead logging, write-ahead log). All modifications are first written to the log and then updated to the Buffer Pool, ensuring that the data will not be lost due to MySQL downtime, thereby meeting the durability requirements.

Since the redo log also needs to write the log to the disk when the transaction is committed, why is it faster than directly writing the modified data in the Buffer Pool to the disk (that is, brushing it dirty)? There are mainly two reasons:

(1) Dirty cleaning is random IO, because the data location modified each time is random, but writing redo log is an append operation and belongs to sequential IO.

(2) Dirty cleaning is based on data pages (Page). The default page size of MySQL is 16KB. A small modification on a Page requires the entire page to be written; and the redo log only contains what is really needed. In the writing part, invalid IO is greatly reduced.

3. redo log and binlog

We know that there is also a binlog (binary log) in MySQL that can also record write operations and be used for Data recovery, but the two are fundamentally different:

(1) Different functions: redo log is used for crash recovery to ensure that MySQL downtime will not affect durability; binlog is used for crash recovery Point-in-time recovery ensures that the server can recover data based on time points. In addition, binlog is also used for master-slave replication.

(2) Different levels: redo log is implemented by the InnoDB storage engine, while binlog is implemented by the MySQL server layer (please refer to the introduction to the MySQL logical architecture earlier in the article), and supports InnoDB and other storages at the same time. engine.

(3)内容不同:redo log是物理日志,内容基于磁盘的Page;binlog的内容是二进制的,根据binlog_format参数的不同,可能基于sql语句、基于数据本身或者二者的混合。

(4)写入时机不同:binlog在事务提交时写入;redo log的写入时机相对多元:

  • 前面曾提到:当事务提交时会调用fsync对redo log进行刷盘;这是默认情况下的策略,修改innodb_flush_log_at_trx_commit参数可以改变该策略,但事务的持久性将无法保证。
  • 除了事务提交时,还有其他刷盘时机:如master thread每秒刷盘一次redo log等,这样的好处是不一定要等到commit时刷盘,commit速度大大加快。

四、隔离性

1. 定义

与原子性、持久性侧重于研究事务本身不同,隔离性研究的是不同事务之间的相互影响。隔离性是指,事务内部的操作与其他事务是隔离的,并发执行的各个事务之间不能互相干扰。严格的隔离性,对应了事务隔离级别中的Serializable (可串行化),但实际应用中出于性能方面的考虑很少会使用可串行化。

隔离性追求的是并发情形下事务之间互不干扰。简单起见,我们主要考虑最简单的读操作和写操作(加锁读等特殊读操作会特殊说明),那么隔离性的探讨,主要可以分为两个方面:

  • (一个事务)写操作对(另一个事务)写操作的影响:锁机制保证隔离性
  • (一个事务)写操作对(另一个事务)读操作的影响:MVCC保证隔离性

2. 锁机制

首先来看两个事务的写操作之间的相互影响。隔离性要求同一时刻只能有一个事务对数据进行写操作,InnoDB通过锁机制来保证这一点。

锁机制的基本原理可以概括为:事务在修改数据之前,需要先获得相应的锁;获得锁之后,事务便可以修改数据;该事务操作期间,这部分数据是锁定的,其他事务如果需要修改数据,需要等待当前事务提交或回滚后释放锁。

行锁与表锁

按照粒度,锁可以分为表锁、行锁以及其他位于二者之间的锁。表锁在操作数据时会锁定整张表,并发性能较差;行锁则只锁定需要操作的数据,并发性能好。但是由于加锁本身需要消耗资源(获得锁、检查锁、释放锁等都需要消耗资源),因此在锁定数据较多情况下使用表锁可以节省大量资源。MySQL中不同的存储引擎支持的锁是不一样的,例如MyIsam只支持表锁,而InnoDB同时支持表锁和行锁,且出于性能考虑,绝大多数情况下使用的都是行锁。

如何查看锁信息

有多种方法可以查看InnoDB中锁的情况,例如:

select * from information_schema.innodb_locks; #锁的概况
show engine innodb status; #InnoDB整体状态,其中包括锁的情况

下面来看一个例子:

#在事务A中执行:
start transaction;
update account SET balance = 1000 where id = 1;
#在事务B中执行:
start transaction;
update account SET balance = 2000 where id = 1;

此时查看锁的情况:

show engine innodb status查看锁相关的部分:

通过上述命令可以查看事务24052和24053占用锁的情况;其中lock_type为RECORD,代表锁为行锁(记录锁);lock_mode为X,代表排它锁(写锁)。

除了排它锁(写锁)之外,MySQL中还有共享锁(读锁)的概念。由于本文重点是MySQL事务的实现原理,因此对锁的介绍到此为止,后续会专门写文章分析MySQL中不同锁的区别、使用场景等,欢迎关注。

介绍完写操作之间的相互影响,下面讨论写操作对读操作的影响。

3. 脏读、不可重复读和幻读

首先来看并发情况下,读操作可能存在的三类问题:

(1)脏读:当前事务(A)中可以读到其他事务(B)未提交的数据(脏数据),这种现象是脏读。举例如下(以账户余额表为例):

(2)不可重复读:在事务A中先后两次读取同一个数据,两次读取的结果不一样,这种现象称为不可重复读。脏读与不可重复读的区别在于:前者读到的是其他事务未提交的数据,后者读到的是其他事务已提交的数据。举例如下:

(3) Phantom reading: In transaction A, the database is queried twice according to a certain condition, and the number of results of the two queries is different. This phenomenon is called phantom reading. The difference between non-repeatable reading and phantom reading can be easily understood as: the former means that the data has changed, and the latter means that the number of rows of data has changed. For example:

4. Transaction isolation level

There are four isolation levels defined in the SQL standard , and stipulates whether the above problems exist under each isolation level. Generally speaking, the lower the isolation level, the lower the system overhead, the higher the concurrency that can be supported, but the worse the isolation. The relationship between isolation level and read problems is as follows:

In actual applications, read uncommitted will cause many problems during concurrency, and performance is relatively Other isolation level improvements are limited and therefore less commonly used. SerializableForcing transactions to be serialized, the concurrency efficiency is very low. It can only be used when data consistency requirements are extremely high and no concurrency is acceptable, so it is rarely used. Therefore, in most database systems, the default isolation level is Read Committed ( such as Oracle) or Repeatable Read (hereinafter referred to as RR).

You can view the global isolation level and the isolation level of this session through the following two commands:

InnoDB The default isolation level is RR, and we will focus on RR later. It should be noted that in the SQL standard, RR cannot avoid the phantom read problem, but the RR implemented by InnoDB avoids the phantom read problem.

5. MVCC

RR solves problems such as dirty reads, non-repeatable reads, phantom reads, etc., using MVCC: MVCC’s full name is Multi- Version Concurrency Control, a multi-version concurrency control protocol. The following example well reflects the characteristics of MVCC: at the same time, the data read by different transactions may be different (i.e., multiple versions) - at time T5, transaction A and transaction C can read different versions. The data.

#The biggest advantage of MVCC is that reading is not locked, so there is no conflict between reading and writing, and the concurrency performance is good. InnoDB implements MVCC, and multiple versions of data can coexist. It is mainly based on the following technologies and data structures:

1) Hidden columns: Each row of data in InnoDB has hidden columns, and the hidden columns contain the data of this row. Transaction id, pointer to undo log, etc.

2) Version chain based on undo log: As mentioned earlier, the hidden column of each row of data contains a pointer to the undo log, and each undo log will also point to an earlier version of the undo log, thus forming A version chain.

3) ReadView: By hiding columns and version chains, MySQL can restore data to a specified version; but which version to restore to specifically needs to be determined based on ReadView. The so-called ReadView means that a transaction (recorded as transaction A) takes a snapshot of the entire transaction system (trx_sys) at a certain moment. When a read operation is performed later, the transaction ID in the read data will be compared with the trx_sys snapshot, thus Determine whether the data is visible to the ReadView, that is, whether it is visible to transaction A.

The main content in trx_sys and the method of judging visibility are as follows:

  • low_limit_id: Indicates the id that should be assigned to the next transaction in the system when generating ReadView. If the transaction ID of the data is greater than or equal to low_limit_id, it will not be visible to the ReadView.
  • up_limit_id: Indicates the smallest transaction ID among the active read and write transactions in the current system when the ReadView is generated. If the transaction id of the data is less than up_limit_id, it is visible to the ReadView.
  • rw_trx_ids: Represents the transaction ID list of active read and write transactions in the current system when the ReadView is generated. If the transaction id of the data is between low_limit_id and up_limit_id, you need to determine whether the transaction id is in rw_trx_ids: if it is, it means that the transaction is still active when the ReadView is generated, so the data is not visible to the ReadView; if not, it means that the transaction is when the ReadView is generated. It has been submitted, so the data is visible to ReadView.

The following takes the RR isolation level as an example and explains it separately based on the several issues mentioned above.

(1)Dirty read

当事务A在T3时刻读取zhangsan的余额前,会生成ReadView,由于此时事务B没有提交仍然活跃,因此其事务id一定在ReadView的rw_trx_ids中,因此根据前面介绍的规则,事务B的修改对ReadView不可见。接下来,事务A根据指针指向的undo log查询上一版本的数据,得到zhangsan的余额为100。这样事务A就避免了脏读。

(2)不可重复读

当事务A在T2时刻读取zhangsan的余额前,会生成ReadView。此时事务B分两种情况讨论,一种是如图中所示,事务已经开始但没有提交,此时其事务id在ReadView的rw_trx_ids中;一种是事务B还没有开始,此时其事务id大于等于ReadView的low_limit_id。无论是哪种情况,根据前面介绍的规则,事务B的修改对ReadView都不可见。

当事务A在T5时刻再次读取zhangsan的余额时,会根据T2时刻生成的ReadView对数据的可见性进行判断,从而判断出事务B的修改不可见;因此事务A根据指针指向的undo log查询上一版本的数据,得到zhangsan的余额为100,从而避免了不可重复读。

(3)幻读

MVCC避免幻读的机制与避免不可重复读非常类似。

当事务A在T2时刻读取0

当事务A在T5时刻再次读取0

扩展

前面介绍的MVCC,是RR隔离级别下“非加锁读”实现隔离性的方式。下面是一些简单的扩展。

(1)读已提交(RC)隔离级别下的非加锁读

RC与RR一样,都使用了MVCC,其主要区别在于:

RR是在事务开始后第一次执行select前创建ReadView,直到事务提交都不会再创建。根据前面的介绍,RR可以避免脏读、不可重复读和幻读。

RC每次执行select前都会重新建立一个新的ReadView,因此如果事务A第一次select之后,事务B对数据进行了修改并提交,那么事务A第二次select时会重新建立新的ReadView,因此事务B的修改对事务A是可见的。因此RC隔离级别可以避免脏读,但是无法避免不可重复读和幻读。

(2)加锁读与next-key lock

按照是否加锁,MySQL的读可以分为两种:

一种是非加锁读,也称作快照读、一致性读,使用普通的select语句,这种情况下使用MVCC避免了脏读、不可重复读、幻读,保证了隔离性。

另一种是加锁读,查询语句有所不同,如下所示:

#共享锁读取
select...lock in share mode
#排它锁读取
select...for update

加锁读在查询时会对查询的数据加锁(共享锁或排它锁)。由于锁的特性,当某事务对数据进行加锁读后,其他事务无法对数据进行写操作,因此可以避免脏读和不可重复读。而避免幻读,则需要通过next-key lock。next-key lock是行锁的一种,实现相当于record lock(记录锁) + gap lock(间隙锁);其特点是不仅会锁住记录本身(record lock的功能),还会锁定一个范围(gap lock的功能)因此,加锁读同样可以避免脏读、不可重复读和幻读,保证隔离性。

6. 总结

In summary, the RR implemented by InnoDB achieves a certain degree of isolation through the lock mechanism (including next-key lock), MVCC (including hidden columns of data, version chain based on undo log, ReadView), etc. Can meet the needs of most scenarios.

However, it should be noted that although RR avoids the phantom read problem, it is not Serializable after all and cannot guarantee complete isolation. The following are two examples:

The first example, if The first read in the transaction uses non-locked read, and the second read uses locked read. If the data changes between the two reads, the results of the two reads are different because of the lock. MVCC will not be used when reading.

The second example is as shown below, you can verify it yourself.

Consistency

1. Basic concept

Consistency refers to transactions After the execution, the integrity constraints of the database have not been destroyed, and the data state before and after the transaction is executed is legal. Integrity constraints of the database include but are not limited to: entity integrity (such as the primary key of the row exists and is unique), column integrity (such as the type, size, and length of the field must meet the requirements), foreign key constraints, and user-defined Completeness (for example, the sum of the balances of the two accounts should remain unchanged before and after the transfer).

2. Implementation

It can be said that consistency is the ultimate goal pursued by transactions: the atomicity, durability and isolation mentioned earlier sex, all to ensure the consistency of the database state. In addition, in addition to guarantees at the database level, the implementation of consistency also requires guarantees at the application level.

Measures to achieve consistency include:

  • Ensure atomicity, durability and isolation. If these characteristics cannot be guaranteed, transaction consistency cannot be guaranteed either
  • The database itself provides guarantees, for example, it is not allowed to insert string values ​​into integer columns, the string length cannot exceed the column limit, etc.
  • Guarantees are provided at the application level, for example, if the transfer operation only deducts the balance of the transferor, but not Increasing the recipient's balance, no matter how perfect the database is implemented, the state cannot be guaranteed to be consistent

Summary

The following summarizes the ACID characteristics and their implementation principles:

  • Atomicity: Statements are either fully executed or not executed at all. This is the core feature of a transaction. The transaction itself is defined by atomicity; the implementation is mainly based on undo log
  • Persistence: guaranteed transaction After submission, data will not be lost due to downtime and other reasons; the implementation is mainly based on redo log
  • Isolation: ensure that transaction execution is not affected by other transactions as much as possible; InnoDB's default isolation level is RR, and the implementation of RR is mainly Based on lock mechanism (including next-key lock), MVCC (including hidden columns of data, version chain based on undo log, ReadView)
  • Consistency: The ultimate goal pursued by transactions, the realization of consistency requires both database Protection at the application level also requires protection at the application level

The above is the detailed content of What is transaction in mysql. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn