JavaInterview questions

[Interview] How to ensure 100% successful delivery of messages? How to ensure message idempotence?

Java学习指南

Jul 26, 2023 pm 02:58 PM

interview

[Interview] How to ensure 100% successful delivery of messages? How to ensure message idempotence?

I. Introduction

Our friends should have heard enough about the message middleware MQ, such as: RabbitMQ, RocketMQ, Kafka, etc. The benefits of introducing middleware can play a role in resisting high concurrency, peaking, and business decoupling.

[Interview] How to ensure 100% successful delivery of messages? How to ensure message idempotence?

As shown above:

(1) The order service delivers messages to MQ middleware (2) The logistics service monitors MQ middleware messages for consumption

In this article, we discuss how to ensure that the order service successfully delivers the message to the MQ middleware, taking RabbitMQ as an example.

2. Analysis of the problem

Friends may have some questions about this. If the order service initiates the message service and the return is successful, wouldn’t it mean success? ? For example, the following pseudo code:

In the above code, messages are generally sent like this. Do you guys think there is any problem?

Let’s talk about a scenario. What will happen if the MQ server suddenly goes down? Are all the messages sent by our order service gone? Yes, generally MQ middleware will save messages in memory in order to improve the throughput of the system. If no other processing is done, once the MQ server goes down, all messages will be lost. This is not allowed by the business and will cause a great impact.

3. Persistence

Experienced friends will say that I know one way is to persist the message and send the message in RabbitMQ There will be a durable parameter that can be set. If it is set to true, it will be persistent.

In this case, even if the MQ server goes down, the message will be stored in the disk file after restarting, so that it will not be lost. Yes, this ensures that the message will not be lost with a certain probability.

But there will be a scenario where the message has just been saved to the MQ memory, but before it has time to be updated to the disk file, it suddenly crashes. (Holy crap, this will happen in such a short period of time. The probability is too low.) This scenario will be very common in the process of continuous large-scale message delivery.

What should we do? How can we ensure that it will be persisted to the disk?

4. Confirm mechanism

The above problem arises from the fact that no one tells us whether persistence is successful. Fortunately, many MQs have callback notification features, and RabbitMQ has a confirm mechanism to notify us whether persistence is successful?

Principle of the confirm mechanism:

(1) The message producer sends the message to MQ. If the reception is successful, MQ will return an ack message to the producer ;

(2) If the message reception is unsuccessful, MQ will return a nack message to the producer;

The above pseudocode has two ways to process the message , which are ack callback and nack callback.

Does this ensure that 100% of the messages will not be lost?

Let’s take a look at the confirm mechanism. Imagine if every time our producer sends a message, MQ must be persisted to the disk, and then initiate an ack or nack callback. In this case, the throughput of our MQ is very low, because the message must be persisted to the disk every time. Writing to disk is very slow. This is unacceptable in high concurrency scenarios, and the throughput is too low.

So the actual implementation of MQ persistent disk is processed through asynchronous calls. It has a certain mechanism. For example, when there are thousands of messages, it will be flushed to the disk at once. Instead of flushing the disk every time a message comes.

So the comfirm mechanism is actually an asynchronous listening mechanism, which is to ensure the high throughput of the system. This means that it is still not 100% guaranteed that the message will not be lost, because even if the confirm mechanism is added, the message will not be lost. MQ crashed before the memory was flushed to the disk, and it still couldn't be processed.

After saying so much, I still can’t guarantee it, so what should I do? ? ?

5. Message persistence in advance Scheduled tasks

In fact, the essential reason is that it is impossible to determine whether it is persisted? So can we make the message persistent ourselves? The answer is yes, our plan will evolve further.

The process in the above picture:

(1) Before the order service producer delivers the message, it should persist the message to Redis or DB. It is recommended that Redis ,high performance. The status of the message is Sending.

(2) Is the confirm mechanism monitoring message sent successfully? If the ack is successful, delete this message in Redis.

(3) If nack fails, you can choose whether to resend the message according to your own business. You can also delete this message, depending on your business decision.

(4) A scheduled task is added here to pull the message after a certain period of time. The message status is still sending. This status indicates that the order service has not received the ack success message.

(5) Scheduled tasks will deliver compensatory messages. At this time, if the MQ callback ack is successfully received, the message will be deleted in Redis.

Such a mechanism is actually a compensation mechanism. I don’t care whether MQ actually receives it or not. As long as the message status in my Redis is [Sending], it means that the message was not successfully received correctly. Delivery. Then start the scheduled task to monitor and initiate compensation delivery.

Of course, we can also add a compensation number for the scheduled task. If it is more than 3 times and the ack message is still not received, then directly set the status of the message to [Failed] and let it be checked manually. Why?

In this case, the solution is relatively perfect, ensuring that 100% of the messages are not lost (of course, it does not include the disk failure, so you can use a master-slave solution).

However, with this solution, it is possible to send the same message multiple times. It is very likely that MQ has already received the message, but there was a network failure during the callback of the ack message, and the producer did not receive it.

Then we must require consumers to ensure idempotence when consuming!

6. The meaning of idempotence

Let’s first understand what idempotence is? In distributed applications, idempotence is very important, that is, if a business is operated under the same conditions, the result will be the same no matter how many times it is performed.

6.1. Why is there such a scenario as idempotent?

Why is there such a scenario as idempotent? Because in large systems, they are all deployed in a distributed manner. For example, order business and inventory business may be deployed independently and are separate services. When a user places an order, the order service and inventory service will be called.

Due to distributed deployment, it is very likely that when calling the inventory service, the order service call fails due to network and other reasons, but in fact the inventory service has been processed and only appears when the processing result is returned to the order service. Exception. At this time, the system will generally make a compensation plan, that is, the order service will then place a call to the inventory service, and the inventory will be reduced by 1.

#There is a problem. In fact, the last call has been reduced by 1, but the order service has not received the processing result. Now it is called again, and it needs to be reduced by 1. This is not in line with the business, and it is an extra deduction.

The concept of idempotence is that no matter how many times the inventory service is called under the same conditions, the processing result will be the same. Only in this way can the feasibility of the compensation plan be ensured.

6.2. Optimistic locking scheme

Learn from the optimistic locking mechanism of the database, such as:

Based on the version, that is, get the version number of the current product before operating the inventory, and then bring this version number with you when operating. Let's sort it out. When we operated the inventory for the first time, we got version 1, and when we called the inventory service, the version became 2; but there was a problem when returning to the order service. The order service initiated another call to the inventory service. When the order service passed as The version is still 1, and when the above SQL statement is executed again, it will not be executed; because the version has changed to 2, the where condition does not hold. This ensures that no matter how many times it is called, it will only be processed once.

6.3. Unique ID fingerprint code

The principle is to use the database primary key to remove duplication, and insert the primary key identification after the business is completed

The unique ID is the only primary key of the business table, such as product ID
The fingerprint code is to distinguish the code for each normal operation. The fingerprint code is generated for each operation; the time stamp business number can be used.

The above sql statement:

If the return value is 0, it means no operation has been performed, then it can be done after the business operation insert into t_check (unique ID fingerprint code)
Return If it is greater than 0, it means the operation has been performed, then return directly

Benefits: Simple implementation

Disadvantages: Database bottleneck under high concurrency

Solution: Split databases and tables based on ID for algorithm routing

6.4, Redis atomic operation

Use the atomic operation of redis to mark the completion of the operation. This performance is better. But there will be some problems.

First: Do we need to store the business results in the database? If so, the key problem to solve is how to achieve atomicity in database and redis operations?

This means that the inventory is reduced by 1, but what should I do if it fails when the redis operation completes the mark? That is to say, we must ensure that the database drop and redis either succeed or fail together

Second: If the database is not dropped, then both are stored in the cache. How to set the scheduled synchronization strategy?

This means that the inventory is reduced by 1, and the inventory is not dropped. The redis operation is directly performed to complete the marking, and then another synchronization service is used to drop the inventory. This increases the complexity of the system and the synchronization strategy. How to set up

The above is the detailed content of [Interview] How to ensure 100% successful delivery of messages? How to ensure message idempotence?. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:Java学习指南. If there is any infringement, please contact admin@php.cn delete