Author: Anatoly Yakovenko
Compiled by: Deep Tide TechFlow
MEV is a fundamental problem in permissionless blockchains. Like most permissionless blockchains, Solana’s goal is to minimize the MEV that chain operators extract from users.
Solana’s approach is to reduce MEV by maximizing competition among leaders (i.e. block producers). This means shortening slot times, reducing the number of slots scheduled consecutively by a single leader, and increasing the number of concurrent leaders per slot.
Generally speaking, more leaders per second means that the user has more options to choose the best offer from the incoming leaders after waiting T seconds. More leaders also means it costs less for good leaders to provide block space, making it easier for users to only transact with good leaders and exclude transactions from bad leaders. The market should decide what is good and what is bad.
Solana’s larger vision is to build a global permissionless price discovery engine capable of competing with the best performance of any centralized exchange (CEX).
If a market-impacting event occurs in Singapore, the message still needs to be transmitted via optical fiber at the speed of light to the CEX in New York. Before the message reaches New York, the leader in the Solana network should have broadcast the message in the block. Unless a physical Internet partition occurs at the same time, Solana's status will already reflect the message by the time it reaches New York. Therefore, there should not be an arbitrage opportunity between CEX and Solana in New York.
To fully achieve this goal, Solana needs many concurrent leaders with highly optimistic confirmation guarantees.
Just like the current leader schedule, the system configures each slot with 2 leaders instead of 1 leader. To differentiate between the two leaders, one channel is labeled A and one channel is labeled B. A and B can rotate independently. The question that needs to be answered to implement this plan is:
What if blocks A and B arrive at different times or fail?
How to merge the transaction order in blocks A and B?
How to allocate block capacity between A and B?
To understand the specific process, we need to quickly understand Turbine.
The leader splits the block into shards when building it. A batch of 32 fragments is an erasure code of 32 code fragments. Lots of 64 fragments were mercury-signed and root-signed, and these were linked to the previous lot.
Each shard is sent via an independent deterministic random path. The retransmitter of each last batch signs the root.
From the receiver’s perspective, each receiver needs to receive 32 fragments from the authenticated retransmitter. Any missing pieces are randomly repaired.
This number can be increased or decreased with minimal impact on latency.
Assuming that the retransmitter fragment path sampling is random enough and weighted by shares, the shares required for a cooperatively partitioned network will be far more than ε shares, both in terms of arrival time and data. If the receiver detects that each batch of 32/64 (configurable) shards arrives within T time, then most likely every node does as well. This is because 32 random nodes are large enough and unlikely to all be in the same partition at random.
If a partition occurs, consensus needs to resolve it. This does not affect security, but is relatively slow.
If a single block is transmitted, each receiver (including the next leader) will see the shard batches arrive for each block. If a block is incomplete for T milliseconds, the current leader will skip the block and build a fork without it. If the leader is wrong, all other nodes will vote on the block and the leader's block will be skipped. The non-faulty leader will immediately switch to the heaviest fork indicated by voting.
In the case of multi-block transfers, each node will need to wait up to T milliseconds before voting on the observed block partition. For two concurrent leaders, the possible scenarios are: A, B, or A and B. Additional latency will only be added if the block is delayed. Under normal operation, all blocks should arrive at the same time, and each validator can vote as soon as both arrive. Therefore, T may be close to zero in practice.
What this attack needs to focus on is whether a leader with a very small amount of tokens staked can transmit a block slightly later on the slot boundary, thus reliably causing the network to split and force the network to spend a lot of time to pass Consensus mechanism to solve problems. Part of the network will vote for A, part will vote for B, and part of the network will vote for both A and B. These three split situations all need to be resolved through consensus mechanisms.
Specifically, the goal of zero-neighborhood should be to ensure that nodes recover blocks at the same time. If an attacker has a cooperating node in the zero neighborhood, they can transmit 31/64 fragments normally and allow the attacker to selectively transmit the last fragment in an attempt to create a partition. Honest nodes can detect which retransmitters are late and push the missing fragments to any single honest node as soon as they recover the block. Retransmitters can continue if they receive the fragment from anywhere or restore it. Therefore, blocks should be recovered by all nodes shortly after one honest node recovers. Testing is needed to determine how long to wait, and whether it is absolute, or weighted by the arrival time of each shard, and whether stake node reputation should be used.
The probability of a coordinated leader and a retransmitter in each block is approximately P leader shares (64P retransmitter shares). 1% of the stake can be used to attempt attacks in ½ shard batches arranged by the attacker as the leader. Therefore, detection and mitigation need to be robust enough.
This attack has minimal impact on the next leader because asynchronous execution allows unused capacity to be carried forward. So if the current leader forces the next leader to skip a slot, and the next leader has 4 consecutive slots, the unused capacity of the skipped slot can be carried over, allowing the leader to re-include the skipped slot transaction.
If a user sends the same transaction to both leaders A and B in order to increase the chance of being included or to be first in the block, this will lead to a waste of resources . If this happens, increasing the number of concurrent leaders will have very limited performance improvement because they will simply be processing twice as many garbage transactions.
To avoid duplicate transactions, the top N number of fee payers will determine which leader channel the transaction is valid in. In this example, the highest bit will select A or B. Fee payers must be assigned to an exclusive channel so that the leader can be certain that the fee payer is valid and has not spent all of its lamports (the smallest unit of currency in the Solana blockchain) on other leaders.
This will force the spammer to pay at least twice for the logically identical transaction, but in order to increase the probability of being the first transaction, the spammer may still send the logically identical transaction.
To prevent this behavior, users can choose to include an additional 100% burn order fee in addition to the leader’s priority fee. Orders with the highest fees are executed first. Otherwise, first-in-first-out (FIFO) ordering is used. In case of a tie, order is resolved using deterministic random permutations. Therefore, it is more cost-effective for spammers to increase their order fees and execute first than to pay inclusion fees twice.
In order to handle bundled and reordered transaction sequences, the system needs to support bundled transactions, which can add an order fee to cover the sequencing cost of the entire transaction sequence. The fee payer is only valid in its scheduled channel, so the bundle can only manipulate sequences in its own channel.
Alternatively, order fees may not be necessary. If FIFO ordering is used, and spammers are always charged a priority fee in all channels, it may be possible to simply allow the market to decide that paying N leaders to increase the cost of inclusion opportunities is the same as paying the closest leader most likely to include the transaction first the cost of the operator.
In a blockchain network, when there are two concurrent leaders, each system-wide block capacity limit needs to be distributed equally. Specifically, not just the total capacity, but each specific limit, such as the write lock limit - no account can write more than 6 million compute units (CUs), and each leader can only schedule up to 24 million CUs. trade. In this way, even in the worst case scenario, the merged blocks will not exceed the total capacity limit of the system.
This mechanism may lead to fee fluctuations and resource underutilization, because the fee for scheduling priority will be determined by the capacity of each leader, and each leader has little knowledge of the scheduling status of other concurrent leaders .
To mitigate resource underutilization and resulting fee spikes, any unused block capacity should be rolled over to future blocks. That is, if the current merged block uses less than X in write locks, total bytes, or total compute units (CUs), K*X should be added to the next block, where 0 < K < 1 , until a certain maximum value. Asynchronous execution can lag the top of the chain by up to one epoch, so capacity rolling can be quite aggressive.
Based on recent block data, most blocks are typically 80% filled, while the write lock limit is well below 50%. Generally speaking, there should always be some spare capacity for future blocks. Since blocks may temporarily exceed capacity limits, execution must occur asynchronously to the consensus process. For more information about the asynchronous execution proposal, see the APE article.
The above is the detailed content of Solana joint new article: Solana's concurrent leader mechanism solves MEV and builds a global price discovery engine. For more information, please follow other related articles on the PHP Chinese website!