Transactions in DB Management System (DBMS)
Definition:
A transaction in a database management system (DBMS) is a sequence of operations that are performed as a single logical unit of work. A transaction can involve reading, inserting, updating, or deleting data in the database. The key characteristic of a transaction is that it is atomic, meaning that all operations within a transaction are completed successfully, or none of them are applied to the database at all.
Key Properties of Transactions: ACID Properties
Transactions must adhere to the ACID properties to ensure database consistency and reliability. These properties are:
-
Atomicity:
- Atomicity ensures that a transaction is treated as a single, indivisible unit of work. Either all operations within the transaction are executed, or none are. If any part of the transaction fails, the entire transaction is rolled back, ensuring the database remains in a consistent state.
-
Example: If a transaction involves transferring money from Account A to Account B, atomicity ensures that either the entire transfer is completed (money is deducted from A and added to B) or nothing is done (no money is transferred if there’s a failure).
-
Consistency:
- Consistency ensures that a transaction moves the database from one valid state to another. Any transaction should not violate the database's integrity constraints (such as primary keys, foreign keys, etc.). After the transaction is committed, the database should always be in a consistent state.
-
Example: After transferring money between two bank accounts, the sum of the balances in both accounts should remain constant. If the database consistency rule is violated, the transaction will be rolled back.
-
Isolation:
- Isolation ensures that the operations of a transaction are hidden from other transactions while it is being executed. Even if multiple transactions are happening concurrently, each transaction should be unaware of the others' operations. This prevents dirty reads, non-repeatable reads, and phantom reads.
-
Example: If two users are transferring money from the same bank account simultaneously, isolation ensures that one transaction does not interfere with the other, preventing errors like double withdrawal.
-
Durability:
- Durability guarantees that once a transaction has been committed, its changes are permanent, even in the case of a system crash. After a successful commit, the changes made by the transaction are saved to the database, and they are not lost.
-
Example: If a user has successfully placed an order, the changes in the database (such as stock update and order placement) should persist, even if there’s a sudden power failure right after the commit.
Types of Transactions
-
Simple Transactions:
- These involve a single operation like reading or writing data. For example, a simple read query or an update query could be part of a transaction.
-
Complex Transactions:
- These involve multiple operations, including reads, updates, inserts, and deletes. A complex transaction can be a sequence of operations designed to complete a business process, such as transferring money from one account to another, which involves checking account balances, deducting from one, and adding to the other.
Transaction Lifecycle
A typical transaction follows these steps:
-
Begin Transaction:
- This marks the start of a transaction. It indicates that a new transaction is about to be performed.
-
Perform Operations:
- The transaction performs a series of database operations, such as selecting, inserting, updating, or deleting records. These operations are executed in a sequential manner as part of the transaction.
-
Commit Transaction:
- After all operations are completed, the transaction is committed. This means that all changes made by the transaction are saved permanently to the database. Once committed, the transaction cannot be rolled back.
-
Rollback Transaction:
- If there is an error or failure during the transaction execution (such as a violation of a constraint), the transaction is rolled back. This means that all changes made by the transaction are undone, and the database returns to its original state.
-
End Transaction:
- After either a commit or rollback, the transaction ends. This signifies that the transaction’s lifecycle is complete.
Transaction States
A transaction must be in one of the following states during its lifecycle:
-
Active:
- The initial state of a transaction. The transaction stays in this state while it is executing and performing operations.
-
Partially Committed:
- This state occurs after the final statement of the transaction has been executed. At this point, the transaction has finished all operations but has not yet been committed to the database.
-
Failed:
- A transaction enters the failed state when it is discovered that normal execution can no longer proceed, usually due to a problem such as a constraint violation or a system failure.
-
Aborted:
- After the transaction has been rolled back and the database has been restored to its state prior to the start of the transaction, it enters the aborted state.
-
Committed:
- A transaction enters the committed state after all operations are successfully completed, and its changes have been permanently applied to the database.
-
Terminated:
- A transaction is considered terminated if it has either committed or aborted. Once a transaction has reached the committed or aborted state, it cannot be restarted.
Schedules in DBMS
A schedule is a sequence of operations from multiple transactions that is executed in a specific order. A schedule determines the execution order of read and write operations for multiple transactions. The key aim is to determine how concurrent transactions interact and ensure the database remains in a consistent state.
A schedule can be serial or non-serial.
Types of Schedules
-
Serial Schedule:
- A serial schedule is one where transactions are executed one after the other without any interleaving. This means that the operations of one transaction are completed entirely before the next transaction starts.
- In a serial schedule, there is no concurrency, and the schedule is equivalent to executing the transactions sequentially.
Example of a Serial Schedule:
- Transaction 1: T1: Read(A); Write(A); Commit
- Transaction 2: T2: Read(B); Write(B); Commit
- The transactions are executed one after another, and there is no overlap in the operations.
-
Non-Serial Schedule:
- A non-serial schedule allows operations from multiple transactions to interleave. In this type of schedule, transactions are executed concurrently, meaning that their operations are mixed together.
- A non-serial schedule can lead to different outcomes depending on the order in which operations are executed, and this requires careful management to maintain the ACID properties.
Example of a Non-Serial Schedule:
- T1: Read(A); Write(A);
- T2: Read(B); Write(B);
- T1: Commit;
- T2: Commit;
- The operations of transactions T1 and T2 are interleaved.
Serializability
Serializability is the concept of ensuring that the result of executing multiple transactions concurrently is the same as if the transactions were executed serially (one after the other). A schedule is said to be serializable if it is equivalent to a serial schedule in terms of its effect on the database.
Importance:
The goal of serializability is to ensure that concurrent transactions do not cause conflicts or anomalies (such as dirty reads, lost updates, etc.) and that the database remains in a consistent state.
There are two main types of serializability:
-
Conflict Serializability:
- A schedule is conflict-serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. Two operations are considered conflicting if they are from different transactions, access the same data item, and at least one of them is a write operation.
-
Conflict Operations:
- A read-write conflict occurs when one transaction reads a data item that another transaction writes.
- A write-write conflict occurs when two transactions both write to the same data item.
Example of Conflict Serializability:
- Schedule:
- T1: Write(A);
- T2: Write(B);
- T1: Read(B);
- T2: Read(A);
- This schedule can be rearranged to a serial schedule and is conflict serializable.
-
View Serializability:
- A schedule is view-serializable if it is equivalent to a serial schedule in terms of the final result, i.e., the same reads and writes occur, and the transactions are serialized correctly. However, in view serializability, the operations do not necessarily have to be swapped like in conflict serializability.
Conflict Equivalent, Conflict Serializable
-
Conflict Equivalent:
- Two schedules are said to be conflict equivalent if they contain the same operations, the operations are in the same order, and the order of the conflicting operations is preserved. This means that the interleaving of transactions in both schedules produces the same outcome, ensuring they are conflict equivalent.
-
Conflict Serializable:
- A schedule is conflict serializable if its transactions can be rearranged to form a serial schedule, maintaining the conflict order. In simple terms, a schedule is conflict serializable if the transactions can be serialized without violating the data consistency, i.e., maintaining the order of conflicting operations.
Example of Conflict Serializability and Conflict Equivalent Schedules
Let’s consider the following transactions and their operations:
- Transaction T1: Read(A); Write(A); Commit
- Transaction T2: Read(A); Write(A); Commit
Schedule 1:
T1: Read(A);
T2: Read(A);
T1: Write(A);
T2: Write(A);
T1: Commit;
T2: Commit;
Schedule 2:
T2: Read(A);
T1: Read(A);
T2: Write(A);
T1: Write(A);
T1: Commit;
T2: Commit;
Conflict Serializability Check:
-
Schedule 1 and Schedule 2 are conflict equivalent because the order of conflicting operations (Read/Write on A) is maintained. Both schedules can be transformed into a serial schedule:
- T1: Read(A); Write(A); Commit;
- T2: Read(A); Write(A); Commit;
- Therefore, both schedules are conflict serializable.
Transaction Isolation and Atomicity
In addition to the fundamental properties of transactions, such as ACID properties, managing transaction failures is an important aspect of maintaining the consistency and reliability of a database. When transactions are executed concurrently, the database must ensure that any failure during a transaction does not corrupt the database state, and the database's atomicity and consistency are preserved. This section discusses how failure handling impacts transaction isolation and atomicity in DBMS.
Atomicity and Failure Handling
Atomicity is the property of a transaction that ensures it is treated as a single, indivisible unit of work. This means that a transaction is either fully completed or not executed at all. If any part of the transaction fails, the entire transaction must be rolled back to ensure that no partial changes are applied to the database.
When a transaction fails, any effects it may have had on the database must be undone, so the database can return to its state before the transaction started. In systems that allow concurrent execution of transactions, if a transaction T1 fails, all the transactions that depend on T1 (i.e., any transaction T2 that has read or written data affected by T1) must also be aborted to preserve the atomicity of the database. If a transaction is dependent on another transaction that fails, it should not leave partial changes or inconsistent data.
To prevent inconsistencies during transaction execution, it is crucial to manage schedules that dictate the order of transaction operations, especially when transactions fail. Certain types of schedules need to be restricted to ensure that failure can be managed properly.
Recoverable Schedules
A recoverable schedule is one in which a transaction T2 only commits after the transaction T1 it depends on has committed. In simpler terms, if transaction T2 reads data written by transaction T1, T1 must commit before T2 does. This ensures that T2 does not commit changes based on an uncommitted transaction, preventing the scenario where rolling back T1 would require rolling back T2 as well.
A schedule that violates this rule is called a nonrecoverable schedule. For example, if T2 commits after reading data written by T1, but T1 fails and is rolled back, the database cannot be recovered to a consistent state because T2’s commit depends on the uncommitted changes of T1. In this situation, undoing T2 becomes impossible, leading to data inconsistency.
Example of a Nonrecoverable Schedule:
In a nonrecoverable schedule, let’s say:
- Transaction T6 writes data to item A.
- Transaction T7 reads the value of A written by T6 and commits immediately.
If T6 fails before it commits, T7 is dependent on the uncommitted changes made by T6. Since T7 has already committed, we cannot roll back T7, leading to a situation where it is impossible to recover from the failure of T6. This results in a nonrecoverable schedule.
For the schedule to be recoverable, T7 should have delayed its commit until after T6 commits, ensuring that T7 does not depend on uncommitted data from T6.
Cascadeless Schedules
Even if a schedule is recoverable, it can still lead to issues during transaction failure recovery, especially when cascading rollbacks occur. Cascading rollback refers to a situation where the failure of one transaction leads to a chain reaction of rollbacks for other dependent transactions. This situation happens when transactions read data that was written by another uncommitted transaction.
For example, consider the scenario where:
- Transaction T8 writes a value to data item A, which is read by transaction T9.
- Transaction T9 writes a value to A, which is then read by transaction T10.
- If T8 fails, T9, which depends on T8, also needs to be rolled back. T10, which depends on T9, must also be rolled back. This creates a cascading rollback effect, undoing the work of multiple transactions unnecessarily.
Cascading rollbacks are undesirable because they lead to the undoing of a significant amount of work, even if only a single transaction fails. To prevent cascading rollbacks, we can use cascadeless schedules. A cascadeless schedule is one where transactions do not read data that was written by a transaction that has not yet committed.
Formally, a cascadeless schedule is one where, for any two transactions T1 and T2, if T2 reads a data item written by T1, then T1 must have committed before T2 reads the data item. This ensures that no transaction can depend on uncommitted data, and there are no cascading rollbacks.
Every cascadeless schedule is also a recoverable schedule. This means that cascadeless schedules not only prevent cascading rollbacks but also ensure that the database can recover correctly in the event of a transaction failure.
Example of a Cascadeless Schedule:
Consider the following:
- Transaction T8 writes A, and T9 reads A.
- In a cascadeless schedule, T9 can only read A after T8 has committed.
This guarantees that T9 does not read data from T8 unless T8 has completed successfully, ensuring that no rollback of T9 is necessary if T8 fails.
Transaction Isolation Levels
Transaction isolation ensures that concurrent transactions do not interfere with each other in ways that violate the database's consistency. The isolation level of a transaction defines the degree to which the transaction is isolated from other transactions.
There are different isolation levels, which range from low to high isolation:
-
Read Uncommitted:
- This isolation level allows a transaction to read data written by other uncommitted transactions. This level can result in issues like dirty reads, where a transaction reads data that might later be rolled back, leading to inconsistent results.
-
Read Committed:
- A transaction at this level can only read data that has been committed by other transactions. While this prevents dirty reads, it can still lead to non-repeatable reads, where a transaction reads the same data twice and gets different results because another transaction has modified the data in the meantime.
-
Repeatable Read:
- This level prevents both dirty reads and non-repeatable reads. However, it still allows phantom reads, where a transaction may encounter different sets of rows if other transactions insert or delete records.
-
Serializable:
- This is the highest isolation level. It ensures that the result of concurrent transactions is the same as if the transactions were executed serially, one after another. This prevents dirty reads, non-repeatable reads, and phantom reads, but can have a performance impact due to its strict nature.
The isolation level chosen for a database or a transaction affects the balance between data consistency and performance, with higher isolation levels generally leading to slower performance due to greater restrictions on concurrency.
Concurrency Control in DBMS
Concurrency control is a critical aspect of database management systems (DBMS) that ensures correct transaction execution while allowing multiple transactions to be executed concurrently. The goal of concurrency control is to maintain database consistency and integrity in the face of transaction interleaving and failure scenarios. This section covers Lock-Based Protocols, including various lock modes, the Two-Phase Locking Protocol, Deadlock Handling mechanisms, and Timestamp-Based Protocols.
Lock-Based Protocols
Locking is a fundamental mechanism used in DBMS to prevent conflicts between concurrently executing transactions. Locks are applied to data items to control access, ensuring that multiple transactions do not violate the database’s integrity. In lock-based concurrency control, transactions acquire locks on data items before performing operations on them, and release the locks once the operation is completed.
Types of Locks
There are various modes in which a data item may be locked. In this section, we restrict our attention to two basic modes:
-
Shared Lock (S):
- A transaction holds a shared lock on a data item when it only needs to read the item.
- Multiple transactions can hold a shared lock on the same data item simultaneously. This allows concurrent reads of the data item by different transactions.
-
Shared lock does not allow a transaction to modify the data item.
Example:
- Transaction T1 holds a shared lock on item A and reads A.
- Transaction T2 holds a shared lock on item A and reads A.
- Both can read the data but cannot write to A.
-
Exclusive Lock (X):
- A transaction holds an exclusive lock on a data item when it needs to both read and write the item.
- Only one transaction can hold an exclusive lock on a particular data item at any time. If a transaction has an exclusive lock, no other transaction can acquire a shared or exclusive lock on the same data item.
Example:
- Transaction T1 holds an exclusive lock on item A and writes to it.
- While T1 holds the exclusive lock on A, no other transaction can read or write to A.
Granting of Locks
Locks are granted based on the protocol that the system follows, and different lock-based protocols can govern how locks are requested and granted during transaction execution. These protocols help avoid conflicts such as lost updates, temporary inconsistency, and uncommitted data being accessed by other transactions.
Two-Phase Locking Protocol (2PL)
The Two-Phase Locking Protocol is a widely used protocol to ensure serializability — a property that guarantees transactions execute in such a way that their results are equivalent to executing them one after another (serially). Two-Phase Locking ensures serializability by enforcing two phases during transaction execution:
-
Growing Phase:
- In this phase, a transaction can acquire locks but cannot release any locks.
- The transaction can request any number of locks, but once it releases a lock, it cannot acquire any new locks. This phase ends when the first unlock operation is performed.
-
Shrinking Phase:
- In this phase, a transaction can release locks but cannot acquire any new locks.
- Once a transaction starts releasing locks, it cannot lock any additional data items. This phase ends when the transaction either commits or aborts.
The Two-Phase Locking protocol guarantees serializability because it prevents cycles in the locking graph, ensuring that the order of execution follows a strict sequence that is serializable.
Example:
- Transaction T1 and T2 need to update data items A and B. Using 2PL, both transactions will acquire the necessary locks in the growing phase and only release them after they have completed their operations (in the shrinking phase).
Deadlock Handling
Deadlock occurs when two or more transactions are waiting for each other to release locks, leading to a situation where none of them can proceed. This creates a cycle of waiting transactions that cannot be resolved unless one or more transactions are rolled back.
Deadlock Prevention
Deadlock prevention techniques are designed to avoid the occurrence of deadlock by imposing restrictions on transaction behavior. One common strategy for preventing deadlock is the use of timestamps to prioritize transactions.
Timestamp-Based Deadlock Prevention Schemes
There are two prominent deadlock prevention schemes that use timestamps:
-
The Wait-Die Scheme:
- This is a non-preemptive deadlock prevention technique.
- If transaction Ti requests a data item held by Tj, Ti is allowed to wait only if its timestamp is smaller than that of Tj (i.e., Ti is older than Tj).
- If Ti's timestamp is greater than Tj's, Ti is rolled back (i.e., Ti "dies").
Example:
- If transactions T14, T15, and T16 have timestamps 5, 10, and 15, respectively:
- If T14 requests a data item held by T15, T14 will wait because T14 is older.
- If T16 requests a data item held by T15, T16 will be rolled back because T16 is younger than T15.
-
The Wound-Wait Scheme:
- This is a preemptive deadlock prevention technique.
- If transaction Ti requests a data item held by Tj, Ti is allowed to wait only if its timestamp is larger than that of Tj (i.e., Ti is younger than Tj).
- If Ti's timestamp is smaller than Tj's, Ti preempts Tj, and Tj is rolled back (i.e., Tj is "wounded" by Ti).
Example:
- If transactions T14, T15, and T16 have timestamps 5, 10, and 15, respectively:
- If T14 requests a data item held by T15, T14 will preempt T15, causing T15 to be rolled back.
- If T16 requests a data item held by T15, T16 will wait because it is younger than T15.
Timestamp-Based Protocols
In addition to lock-based protocols, timestamp-based protocols also manage concurrency in databases. These protocols use timestamps to order transactions and resolve conflicts, ensuring that the system behaves as if transactions are executed serially.
Timestamps and Their Role
Timestamps are numerical values assigned to transactions when they are created. The timestamp of a transaction determines its priority — a lower timestamp value indicates an older transaction, and a higher value indicates a newer transaction.
-
W-timestamp(Q):
- This denotes the largest timestamp of any transaction that has successfully executed a write operation on data item Q.
- It helps to identify the last transaction that modified the data item.
-
R-timestamp(Q):
- This denotes the largest timestamp of any transaction that has successfully executed a read operation on data item Q.
- It helps to identify the last transaction that read the data item.
The Timestamp-Ordering Protocol
The Timestamp-Ordering Protocol ensures serializability by enforcing a total order on the transactions based on their timestamps. The protocol requires that:
- If a transaction Ti writes a data item Q, and another transaction Tj reads or writes Q, Ti must have a smaller timestamp than Tj.
- Similarly, if Ti reads a data item Q, and Tj writes Q, Ti must have a smaller timestamp than Tj.
This protocol resolves conflicts based on the timestamps of transactions rather than locks.
Example:
- Transaction T1 with timestamp 10 writes data item A.
- Transaction T2 with timestamp 12 reads data item A.
- The timestamp-ordering protocol ensures that the write operation of T1 happens before the read operation of T2, preserving the correct transaction order.
The above is the detailed content of Transactions and Concurrency Controls: DBMS. For more information, please follow other related articles on the PHP Chinese website!