Transactions are an important feature that distinguishes MySQL from NoSQL and are a key technology to ensure data consistency in relational databases. A basic execution unit consisting of one or more SQL statements can be regarded as a transaction's operation on the database. When these statements are executed, either all of them are executed or none of them are executed.
The execution of a transaction mainly includes two operations, commit and rollback.
Submit: commit, writing the transaction execution results to the database.
Rollback: rollback, roll back all executed statements and return the data before modification.
MySQL transactions include four characteristics, known as the four ACID kings.
Atomicity: Statements are either fully executed or not executed at all. This is the core feature of a transaction. The transaction itself is defined by atomicity; the implementation is mainly based on the undo log. .
Durability (Durability): Ensure that data will not be lost due to downtime and other reasons after transaction submission; the implementation is mainly based on redo log.
Isolation (Isolation): Ensure that transaction execution is as possible as possible Not affected by other transactions; InnoDB's default isolation level is RR. The implementation of RR is mainly based on the lock mechanism, hidden columns of data, undo log and next-key lock mechanism.
Consistency: Transaction The ultimate goal pursued, the realization of consistency requires guarantees at both the database level and the application level.
Atomicity of transactions Just like an atomic operation, it means that the transaction cannot be further divided. Either all the operations are done or none of the operations are done; if an SQL statement in the transaction fails to execute, the executed statement must also be rolled back, and the database returns to the state before the transaction. .Only 0 and 1, no other values.
The atomicity of the transaction indicates that the transaction is a whole. When the transaction cannot be executed successfully, all statements that have been executed in the transaction need to be rolled back, causing the database to return To the state where the transaction has not been started initially.
The atomicity of the transaction is achieved through the undo log. When the transaction needs to be rolled back, the InnoDB engine will call the undo log to undo the SQL statement. Rollback of data.
The durability of a transaction means that after the transaction is committed, the changes in the database should be saved permanently, not temporarily. In other words, no matter what operation or system downtime occurs, the original execution results will not be affected after the transaction is submitted.
The durability of the transaction is achieved through the redo log in the InnoDB storage engine. Specific implementation ideas See below.
Atomicity and durability are properties at the level of a single transaction itself, while isolation refers to the relationship that should be maintained between transactions. Different The effects between transactions must be isolated from each other, and the operations of each transaction should not interfere with other transactions.
Since a transaction may not only contain one SQL statement, there may be errors during the execution of the transaction. Other transactions begin to execute. Therefore, multi-transaction concurrency requires that operations between transactions be isolated from each other. This is somewhat similar to the concept of data synchronization between multi-threads.
Lock Mechanism
Isolation between transactions is achieved through the lock mechanism. When a transaction needs to modify a certain row of data in the database, the data needs to be locked first; the locked data cannot be used by other transactions It does not run operations and can only wait for the current transaction to commit or rollback to release the lock.
In many scenarios, we will use different locks to protect and synchronize data, so the lock mechanism is not unfamiliar. Concept. In MySQL, locks can also be divided into different types according to different division standards.
Divided according to granularity: row lock, table lock, page lock
Divided according to usage: shared lock, exclusive lock
Divided according to idea: pessimistic lock , Optimistic locking
There are many knowledge points about the locking mechanism. Due to lack of space, I will discuss them all. Here is a brief introduction to locks divided according to granularity.
Granularity: refers to the level of refinement or comprehensiveness of the data stored in the data unit of the data warehouse. The higher the degree of refinement, the smaller the granularity level; conversely, the lower the degree of refinement, the larger the granularity level.
MySQL can be divided into row locks, table locks and page locks according to the granularity of locks.
Row lock: The lock with the smallest granularity, indicating that only the row of the current operation is locked;
Table lock: The lock with the largest granularity, indicating that the current The operation locks the entire table;
Page lock: A lock with a granularity between row-level locks and table-level locks, which means locking the page.
Granular division of database
These three types of locks lock data at different levels. Due to the different granularities, they bring The advantages and disadvantages are also different.
Table lock will lock the entire table when operating data, so the concurrency performance is poor;
Row lock only locks the data that needs to be operated, and the concurrency performance is good. However, since locking itself consumes resources (obtaining locks, checking locks, releasing locks, etc. all consume resources), using table locks can save a lot of resources when there is a lot of locked data.
Different storage engines in MySQL can support different locks. MyIsam only supports table locks, while InnoDB supports both table locks and row locks. For performance reasons, row locks are used in most cases.
Concurrent reading and writing issues
In a concurrent situation, simultaneous reading and writing of MySQL may cause three types of problems, dirty reads, non-repeatability and phantom reads.
(1) Dirty reading: The current transaction reads uncommitted data from other transactions, which is dirty data.
In the above example scenario, when transaction A reads the reading volume of the article, it obtains the data that transaction B has not yet submitted. If transaction B is not successfully submitted in the end, causing the transaction to be rolled back, then the reading amount is not actually modified successfully, but transaction A reads the modified value, which is obviously unreasonable.
(2) Non-repeatable read: The same data is read twice in transaction A, but the results of the two reads are different. The difference between dirty read and non-repeatable read is that the former reads uncommitted data by other transactions, while the latter reads data that has been submitted by other transactions.
For example, when transaction A reads the article reading data in sequence, it gets different results. It means that during the execution of transaction A, the value of the reading volume was modified by other transactions. This makes the data query results no longer reliable and also unrealistic.
After rewriting: In transaction A, when two database queries are performed according to a certain condition, the number of rows in the query results is different, which causes phantom reading. The original words can be rewritten as: Compared with phantom reading, non-repeatable reading is more like a change in the data, which is a change in the number of rows of data.
##The above picture is an example. When querying articles with 0Isolation Level
Based on the above three problems, four isolation levels have been generated, indicating different degrees of isolation properties of the database. In actual database design, the higher the isolation level, the lower the concurrency efficiency of the database will be; and if the isolation level is too low, it will cause the database to fail during the reading and writing process. You will encounter all kinds of messy problems. So in most database systems, the default isolation level is read committed (such as Oracle) or repeatable read RR (MySQL's InnoDB engine).MVCC
Another big piece that is hard to chew. MVCC is used to implement the third isolation level above, which can read RR repeatedly.Multi-Version Concurrency Control, also known as MVCC, is a concurrency control protocol that can support multiple versions of data.The characteristic of MVCC is that at the same time, different transactions can read different versions of data, thus solving the problems of dirty reads and non-repeatable reads. MVCC actually achieves the coexistence of multiple versions of data through hidden columns of data and rollback logs (undo log). The advantage of this is that when using MVCC to read data, there is no need to lock, thus avoiding conflicts of simultaneous reading and writing. When implementing MVCC, several additional hidden columns will be saved in each row of data, such as the version number and deletion time when the current row was created and the rollback pointer to the undo log. The version number here is not the actual time value, but the system version number. Every time a new transaction is started, the system version number is automatically incremented. At the beginning of a transaction, the system version number will be assigned the transaction version number for comparison with the version number of each row of records queried. Each transaction has its own version number. In this way, when data operations are performed within the transaction, the purpose of data version control is achieved through comparison of version numbers. In addition, the isolation level RR implemented by InnoDB can avoid the phenomenon of phantom reading, which is achieved through the next-key lock mechanism.
next-key lock is actually a type of row lock, except that it not only locks the current row record itself, but also locks a range. For example, in the above example of phantom reading, when you start to query articles with 0 Gap lock: Block the gap in the index record Although InnoDB uses next-key lock to avoid phantom read problems, it is not truly serializable isolation. Let’s look at another example. First ask a question: At T6 time, after transaction A commits the transaction, guess the reading volume of article A and article B For how much? The answer is that the reading volume of article AB has been modified to 10,000. This means that the submission of transaction B actually affects the execution of transaction A, further indicating that the two transactions are not completely independent. Although it can avoid the phenomenon of phantom reading, it does not reach the level of serializability. The necessary condition to achieve the serializable isolation level is to avoid dirty reads, non-repeatable reads, and phantom reads, but this is not sufficient. Serializability can avoid dirty reads, non-repeatable reads, and phantom reads, but avoiding dirty reads, non-repeatable reads, and phantom reads does not necessarily achieve serializability. Consistency Consistency means that after the transaction is executed, the integrity constraints of the database are not destroyed, and the data state is legal before and after the transaction is executed. Consistency is the ultimate goal pursued by transactions. Atomicity, durability, and isolation actually exist to ensure the consistency of the database state. I won’t say much more. You taste carefully. After studying the basic structure of MySQL, I now have a relatively accurate understanding of the execution process of MySQL. Next I will introduce the logging system to you. The MySQL log system is an important component of the database and is used to record updates and modifications to the database. If the database fails, the original data of the database can be restored through different log records. Therefore, in fact, the log system directly determines the robustness and robustness of MySQL operation. MySQL has many types of logs, such as binary logs (binlog), error logs, query logs, slow query logs, etc. In addition, the InnoDB storage engine also provides two types of logs: redo log (redo log) and undo log (rollback log). Here we will focus on the InnoDB engine and analyze the three types of redo logs, rollback logs and binary logs. Redo log (redo log) The redo log (redo log) is the log of the InnoDB engine layer. It is used to record the changes in data caused by transaction operations. is a physical modification of the data page. The function of redoing the diary is actually easy to understand. Let me use an analogy. Modification of data in the database is like a paper you wrote. What if the paper is lost one day? To prevent this unfortunate occurrence, when writing a paper, we can keep a small notebook to record every modification, and record when and what kind of modifications were made to a certain page. This is the redo log. The InnoDB engine updates data by first writing the update record to the redo log, and then updating the contents of the log to the disk when the system is idle or according to the set update strategy. . The so-called write-ahead logging technology (Write Ahead logging). This technology can greatly reduce the frequency of IO operations and improve the efficiency of data refresh. Dirty data flush It is worth noting that the size of the redo log is fixed. In order to continuously write update records, in redo Two flag positions are set in the log, checkpoint and write_pos, which respectively indicate the position where erasure is recorded and the position where writing is recorded. The data writing diagram of the redo log can be seen in the figure below. When the write_pos mark reaches the end of the log, it will jump from the end to the head of the log for re-circulation writing. Therefore, the logical structure of redo log is not linear, but can be regarded as a circular motion. The space between write_pos and checkpoint can be used to write new data. Writing and erasing are carried out backwards and forward in a cycle. When write_pos catches up with the checkpoint, it means that the redo log is full. At this time, you cannot continue to execute new database update statements. You need to stop and delete some records first, and execute checkpoint rules to free up writable space. Checkpoint rules: After checkpoint is triggered, flush both dirty data pages and dirty log pages in the buffer to disk. Dirty data: refers to the data in the memory that has not been flushed to the disk. The most important concept in redo log is the buffer pool, which is an area allocated in memory and contains the mapping of some data pages in the disk as a buffer for accessing the database. When requesting to read data, it will first determine whether there is a hit in the buffer pool. If there is a miss, it will be retrieved on the disk and put into the buffer pool; When a request is made to write data, it will be written to the buffer pool first, and the modified data in the buffer pool will be periodically refreshed to the disk. This process is also called brushing. Therefore, when the data is modified, in addition to modifying the data in the buffer pool, the operation will also be recorded in the redo log; when the transaction is submitted, the data will be flushed based on the redo log records. plate. If MySQL goes down, the data in the redo log can be read when restarting and the database can be restored, thereby ensuring the durability of transactions and enabling the database to gain crash-safe capabilities. Dirty log flushing In addition to the above mentioned flushing of dirty data, in fact, when the redo log is recorded, in order to ensure the persistence of the log file , also need to go through the process of writing log records from memory to disk. The redo log log can be divided into two parts. One is the cache log redo log buff that exists in volatile memory, and the other is the redo log log file redo log file saved on the disk. In order to ensure that each record can be written to the log on the disk, the fsync operation of the operating system is called every time the log in the redo log buffer is written to the redo log file. fsync function: included in the UNIX system header file #include When writing, it also needs to go through the cache of the operating system kernel space. The writing process of redo log can be seen in the figure below. redo log flushing process Binary log (binlog) Binary log binlog is the service layer The log is also called archive log. Binlog records all update operations in the database, including database changes. All operations involving data changes must be recorded in the binary log. Therefore, binlog can easily copy and back up data, so it is often used for synchronization of master-slave libraries. Although the storage contents of binlog and redo log look similar, they are actually different. The redo log is a physical log, which records the actual modifications to a certain data; while the binlog is a logical log, which records the original logic of the SQL statement, such as "give the a field of the row with ID=2 plus 1 ". The content in the binlog log is binary. Depending on the log format parameters, it may be based on SQL statements, the data itself, or a mixture of the two. Generally commonly used records are SQL statements. My personal understanding of the concepts of physics and logic here is: The physical log can be regarded as the change information on the data page in the actual database, and only the results are valued. It doesn't matter "which way" led to this result; Logical logs can be regarded as changes in data caused by a certain method or operation method, and store logical operations. . At the same time, redo log is based on crash recovery to ensure data recovery after MySQL crashes; while binlog is based on point-in-time recovery to ensure that the server can recover data based on time points, or Make a backup of your data. In fact, MySQL did not have a redo log at the beginning. Because at first MySQL did not have an InnoDB engine, the built-in engine was MyISAM. Binlog is a service layer log, so it can be used by all engines. However, binlog logs alone can only provide archiving functions and cannot provide crash-safe capabilities. Therefore, the InnoDB engine uses a technology learned from Oracle, that is, redo log, to achieve crash-safe capabilities. Here, the characteristics of redo log and binlog are compared respectively: The reason for arranging such a two-stage submission naturally has its reason. Now we can assume that instead of using two-stage submission, we adopt "single-stage" submission, that is, either write redo log first and then write binlog; or write binlog first and then write redo log. Submitting in these two ways will cause the state of the original database to be inconsistent with the state of the restored database. Write redo log first, then write binlog: After writing redo log, the data has crash-safe capability at this time, so the system crashes and the data It will be restored to the state before the transaction started. If the system crashes after the redo log is written but before the binlog is written, then... At this time, binlog does not save the above update statement, resulting in the above update statement being missing when binlog is used to back up or restore the database. As a result, the data in the row id=2 is not updated. The problem of writing redo log first and then binlog Write binlog first and then redo log: After writing the binlog, all statements are saved, so the data in the row id=2 in the database copied or restored through the binlog will be updated to a=1. However, if the system crashes before the redo log is written, the transaction recorded in the redo log will be invalid, causing the data in the row id=2 in the actual database not to be updated. The problem of writing binlog first and then redo log It can be seen that the two-stage submission is to avoid the above problems, making binlog and redo log The information stored in is consistent. Rollback log (undo log) The InnoDB engine provides a rollback log, which is used for data rollback operations. When a transaction modifies the database, the InnoDB engine will not only record the redo log, but also generate the corresponding undo log; if the transaction execution fails or rollback is called, causing the transaction to be rolled back, the information in the undo log can be used to restore the data. Scroll to the way it looked before the modification. But undo log is different from redo log. It is a logical log. It records information related to the execution of SQL statements. When a rollback occurs, the InnoDB engine will do the opposite of the previous work based on the records in the undo log. For example, for each data insertion operation (insert), a data deletion operation (delete) will be performed during rollback; for each data deletion operation (delete), a data insertion operation (insert) will be performed during rollback; for each data update Operation (update), when rolling back, a reverse data update operation (update) will be performed to change the data back. Undo log has two functions, one is to provide rollback, and the other is to implement MVCC. The concept of master-slave replication is very simple. It is to copy an identical database from the original database. The original database is called the master database. , the replicated database is called the slave database. The slave database will synchronize data with the master database to maintain data consistency between the two. The principle of master-slave replication is actually implemented through the bin log. The bin log log stores all SQL statements in the database. By copying the SQL statements in the bin log log and then executing the statements, synchronization between the slave database and the master database can be achieved. The master-slave replication process can be seen in the figure below. The master-slave replication process is mainly carried out by three threads. A sending thread runs in the master server and is used to send binlog logs to the slave server. There are two additional I/O threads and SQL threads running on the slave server. The I/O thread is used to read the binlog log content sent by the main server and copy it to the local relay log. The SQL thread reads SQL statements containing data updates from the relay log, and performs operations based on these statements to ensure that the data in the master-slave database remains consistent. Master-slave replication principle The reason why master-slave replication needs to be implemented is actually determined by the actual application scenario. The benefits that master-slave replication can bring are: 1. Realize off-site backup of data through replication. When the master database fails, the slave database can be switched to avoid data loss. 2. The architecture can be expanded. When the business volume becomes larger and the I/O access frequency is too high, multi-database storage can be used to reduce the frequency of disk I/O access and improve the efficiency of a single machine. I/O performance. 3. It can realize the separation of reading and writing, so that the database can support greater concurrency. 4. Implement server load balancing by dividing the load of customer queries between the master server and the slave server. 2. MySQL log system
3. Master-slave replication
The above is the detailed content of What are the characteristics of the MySQL transaction log?. For more information, please follow other related articles on the PHP Chinese website!