Home >Java >javaTutorial >In-depth understanding of JAVA distributed transactions

In-depth understanding of JAVA distributed transactions

零到壹度
零到壹度Original
2018-04-03 14:33:532408browse

This article mainly introduces an in-depth understanding of JAVA distributed transactions. The editor thinks it is quite good. Now I will share it with you and give you a reference. Let’s follow the editor and take a look.

1. What is a distributed transaction?

Distributed transaction refers to the participants of the transaction, the server that supports the transaction, and the resource server. and transaction managers are located on different nodes in different distributed systems. The above is the explanation from Baidu Encyclopedia. Simply put, a large operation is composed of different small operations. These small operations are distributed on different servers and belong to different applications. Distributed transactions need to ensure that these small operations are either all Succeed or fail all. Essentially, distributed transactions are to ensure data consistency in different databases.

2. Reasons for distributed transactions

2.1. Database sub-database and sub-table

When the data generated by a single database table in a year exceeds 1000W, then we need to consider database and table sharding. The specific principles of database and table sharding will not be explained here. I will go into details later when I have time. To put it simply, the original one database has become multiple databases. At this time, if an operation accesses both the 01 library and the 02 library, and the consistency of the data must be ensured, then distributed transactions must be used.

2.2. Application of SOA

The so-called SOA is the service-oriented business. For example, a single machine originally supported the entire e-commerce website, but now the entire website has been dismantled and separated into the order center, user center, and inventory center. For the order center, there is a special database to store order information, the user center also has a special database to store user information, and the inventory center also has a special database to store inventory information. At this time, if you want to operate orders and inventory at the same time, it will involve the order database and inventory database. In order to ensure data consistency, distributed transactions need to be used.

The above two situations have different appearances, but the essence is the same, both because there are more databases to operate!

3. ACID characteristics of transactions

3.1. Atomicity (A)

The so-called atomicity means that all operations in the entire transaction are either completed, Either do nothing, there is no in-between. If an error occurs during transaction execution, all operations will be rolled back, and the entire transaction will be as if it had never been executed.

3.2. Consistency (C)

The execution of transactions must ensure the consistency of the system. Take transfer as an example. A has 500 yuan and B has 300 yuan. , if A successfully transfers 50 yuan to B in a transaction, then no matter how many concurrencies there are, no matter what happens, as long as the transaction is executed successfully, then in the end, account A must be 450 yuan, and account B must be 350 yuan.

3.3. Isolation (I)

The so-called isolation means that transactions will not affect each other, and the intermediate state of a transaction will not be Other transaction awareness.

3.4. Durability (D)

The so-called persistence means that after a single transaction is completed, the changes made to the data by the transaction will be completely saved in stored in the database, even if a power outage occurs or the system is down.

4. Application scenarios of distributed transactions

4.1. Payment

The most classic scenario is payment. A payment is a payment for a purchase. To deduct money from the home account and add money to the seller's account at the same time, these operations must be performed in one transaction, either all of them succeed or all of them fail. As for the buyer's account, which belongs to the buyer center, it corresponds to the buyer's database, while the seller's account belongs to the seller's center, which corresponds to the seller's database. Operations on different databases must introduce distributed transactions.

4.2. Online ordering

Buyers placing orders on e-commerce platforms often involve two actions, one is to deduct inventory, and the second is to update Order status, inventory and orders generally belong to different databases, and distributed transactions need to be used to ensure data consistency.

5. Common distributed transaction solutions

5.1. Two-phase submission based on XA protocol

XA is a distributed transaction protocol proposed by Tuxedo. XA is roughly divided into two parts: transaction manager and local resource manager. The local resource manager is often implemented by a database. Commercial databases such as Oracle and DB2 all implement the XA interface, and the transaction manager serves as the global scheduler and is responsible for the submission and rollback of each local resource. The principle of XA implementing distributed transactions is as follows:

Generally speaking, the XA protocol is relatively simple, and once a commercial database implements the XA protocol, use The cost of distributed transactions is also relatively low. However, XA also has a fatal shortcoming, that is, its performance is not ideal. Especially in the transaction order link, the amount of concurrency is often very high, and XA cannot meet high concurrency scenarios. XA is currently supported ideally in commercial databases, but not so well in mysql databases. The XA implementation of mysql does not record prepare phase logs, and switching back between the active and standby databases results in data inconsistency between the active and standby databases. Many nosql also do not support XA, which makes the application scenarios of XA very narrow.

5.2, Message Transaction + Final Consistency

The so-called message transaction is a two-stage submission based on message middleware, which is essentially a kind of message middleware. Special utilization, it puts local transactions and message sending in a distributed transaction, ensuring that either the local operation succeeds and the external message is sent successfully, or both fail. The open source RocketMQ supports this feature. The specific principle is as follows :

1, System A sends a preliminary message to the message middleware
2, The message middleware saves the preliminary message and returns success
3. A executes a local transaction
4. A sends a commit message to the message middleware

A message transaction is completed through the above 4 steps. For the above 4 steps, each step may cause errors. Let’s analyze them one by one:

  • If an error occurs in step one, the entire transaction will fail and will not be executed. A’s local operation

  • If an error occurs in step two, the entire transaction will fail and A’s local operation will not be executed

  • An error occurred in step three. At this time, the preparation message needs to be rolled back. How to roll back? The answer is that system A implements a callback interface for message middleware. The message middleware will continuously execute the callback interface to check whether transaction A is executed successfully. If it fails, the prepared message will be rolled back

  • There is an error in step 4. At this time, A's local transaction is successful. Does the message middleware need to roll back A? The answer is no. In fact, through the callback interface, the message middleware can check that A has executed successfully. At this time, there is actually no need for A to send a submission message. The message middleware can submit the message by itself, thereby completing the entire message transaction

Two-phase commit based on message middleware is often used in high concurrency scenarios to split a distributed transaction into a message transaction (local operation of system A + message sending ) + local operation of system B. The operation of system B is driven by messages. As long as the message transaction is successful, the operation of A must be successful and the message must be sent. At this time, B will receive the message to perform the local operation. If the local operation If it fails, the message will be re-delivered until the operation of B is successful, thus realizing the distributed transaction of A and B in disguise. The principle is as follows:

Although the above solution can complete the operations of A and B, A and B are not strictly consistent, but ultimately consistent. Yes, we sacrificed consistency here in exchange for a substantial improvement in performance. Of course, this kind of gameplay is also risky. If B continues to fail to execute, the consistency will be destroyed. Whether to play it depends on how much risk the business can bear.

5.3. TCC Programming Mode

The so-called TCC programming mode is also a variant of two-phase submission. TCC provides a programming framework that divides the entire business logic into three parts: Try, Confirm and Cancel operations. Taking online ordering as an example, the Try stage will deduct inventory, and the Confirm stage will update the order status. If the order update fails, it will enter the Cancel stage and the inventory will be restored. In short, TCC artificially implements two-stage submission through code. The codes written in different business scenarios are different and the complexity is also different. Therefore, this model cannot be reused well.

6. Summary

Distributed transactions are essentially unified control of transactions in multiple databases. According to the control intensity, they can be divided into: no control and partial control. and complete control. No control means not introducing distributed transactions. Partial control means two-phase commit of various variants, including the message transaction + eventual consistency and TCC mode mentioned above. Full control means fully realizing two-phase commit. The advantage of partial control is that concurrency and performance are very good. The disadvantage is that data consistency is weakened. Full control sacrifices performance and ensures consistency. Which method to use ultimately depends on the business scenario. As a technician, you must not forget that technology serves the business. Do not use technology for the sake of technology. Technology selection for different businesses is also a very important ability!

The above is the detailed content of In-depth understanding of JAVA distributed transactions. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn