The Role of Relays in Reorgs

dataalways · December 20, 2024, 7:24am

The Role of Relays in Reorgs

Special thanks to Quintus, Burak, Christoph, Tesa, and Hasu for review and discussions.

tl;dr

MEV-Boost has helped scale the network and significantly reduced reorg rates, but there exists heterogeneity between the quality of relays. We analyze reorg rates to study the difference in performance between relay. We focus on slots where the previous block was slow to propagate and did not receive enough attestations, leading the proposer to fall behind the chain tip. During these slots there is a misalignment of expectations between the proposer and block builders which leads to less competitive auctions, allowing Flashbots relay to capture an excessive share of the market due to the diminished significance of bidding latency.

Introduction

Simple models view blockchains as a sequence of updates to the state of a ledger, but when we incorporate decentralized consensus we start to see disagreement about the validity of these updates. In particular, for updates to be added to the sequence, they must be added in a valid block and one of the requirements for block validity is timely delivery.

Message latency due to geographical separation between voting nodes drives differing views on the timeliness of blocks and therefore differing views on whether updates should be added to the sequence. On Ethereum this latency causes about 1-in-300 blocks to be reorged.

The term reorg conjures fear in many, but to clarify explicitly: there is no evidence of malicious reorgs on Ethereum today, only latency induced single-block disputes about the canonical state of the chain. These blocks that get reorged were either propagated too late or too slowly for enough validators to record the blocks in their individual ledgers and to be cemented into the blockchain.

MEV-Boost facilitates a market for the outsourcing of block production and propagation. This report is focused on the role of MEV-Boost in Ethereum reorgs; more specifically how out-of-protocol actors—various relays and builders—handle these edge cases. Reorg rates are primarily determined by data transfer (block size and blob counts), block release timing (proposer timing games), and the positioning of a slot in an epoch (complexity of epoch transitions). The variance between different builders and different relays mostly has a second-order effect, but understanding how the MEV supply chain handles these contentious blocks aims to shed light on the reliability of PBS and provide insight into the latency dynamics within the MEV supply chain.

Figure 1: the share of blocks propagated by each relay that reorg other blocks. Flashbots relay is a clear outlier, prompting the question of why blocks delivered by Flashbots are more likely to induce reorgs than other relays. This dynamic is analyzed in the second half of this article. Image retrieved October 13, 2024 from reorg.pics.

Methodology

This analysis primarily relies on Xatu data provided by ethPandaOps to identify reorgs. The statistics presented here show higher reorg rates than reorg.pics. This is expected and reflects that the reorg.pics is populated with data sourced from only one node, whereas reorg data is sparse by definition and therefore should be sourced from a geographically distributed set of nodes to create an accurate and fair picture. In this context, fair alludes to differences in latency for relays and a given validator, i.e., a node colocated with ultrasound relay should show accurate reorg rates at ultrasound relay, while reorged blocks from Flashbots relay (located across the Atlantic Ocean) may instead appear as missed slots to that same node as they did not receive the reorged block in time.

This analysis chooses to treat bloXroute’s relays as a single entity. We make this decision because bloXroute uses the same infrastructure for both their relays resulting in skewed results when comparing blocks that are propagated by individual vs. multiple relays.

Blocks That Are Reorged

Proposer-builder separation (PBS) is a scaling technology. The community tends to focus on the increase in income from outsourcing block building, but relays have also made significant investments in network infrastructure to optimize the propagation of blocks. When proposers sign a block header, relays shotgun the data across the globe. This results in significantly lower reorg rates even for proposers that may be weakly connected to their peers.

Reorg rates at the ultrasound relay are as low as 0.06%, while locally built blocks are reorged more than 10x as frequently. This delta persists despite ultrasound relay introducing artificial latency into the system in-order to capture additional MEV for proposers.

Figure 2: seven day rolling average of the share of blocks reorged by entity. We see significantly lower rates for blocks built through PBS than for blocks build by the proposer.

One downside of the current implementation of PBS is that many relays are closed-sourced. At times relays implement aggressive optimizations to gain a competitive advantage, but the standard to which these are tested is lower than for changes to the Ethereum protocol. A prominent example of this is the network instability in late March due to the rise of blobscriptions and bloXroute’s untested blob propagation strategy.

Although MEV-Boost has drastically reduced reorg rates compared to local building, not all relays perform equally. Flashbots and Manifold stand out as the most reorged relays, while ultrasound, Aestus, and Agnostic at first glance appear to be reorged the least often.

Figure 3: reorg rates by relay between May 1, 2024 and September 30, 2024.

In Figure 3 we segmented each relay’s net reorg rate into blocks delivered by one relay v.s. multiple relays. By default, block proposers submit the signed header of the highest received bid to all connected relays. This means that all relays configured by the validator who saw the winning bid propagate the block, rather than just the relay who delivered the bid. This further reduces reorg rates but makes it difficult to decode the magnitude of roles played by each relay.

Figure 4: diagram of a block propagated by multiple relays. In this example, a proposer in Northern Africa submits the signed header to both Flashbots relay and ultrasound relay. Although ultrasound is likely the first relay to receive the signed header and begin propagating the block, the direct route to Flashbots relay allows the block to begin propagating in North America earlier than had ultrasound been the only relay to receive the signed header.

To better understand how individual relays perform, we need to deconvolve the noise introduced by blocks that are propagated by multiple relays. Yuki from Fenbushi Capital / Sorella Labs modelled the reduction in missed slot rates due to multi-relay propagation in late-2023 and we replicate his analysis for May to September of this year in Table 1 to demonstrate the current state of the network.

Relay	Payloads Delivered	Share of Payloads at Relay That Are Solo	Reorg Rate (Solo)	Reorg Rate (Coop)	Reorg Rate (Total)
Bloxroute	579,694	54%	0.12%	0.06%	0.09%
Ultrasound	563,270	47%	0.06%	0.07%	0.07%
Agnostic	193,335	3%	0.15%	0.08%	0.08%
Flashbots	119,246	23%	1.11%	0.07%	0.32%
Titan	61,010	29%	0.24%	0.05%	0.10%
Aestus	51,789	2%	0.34%	0.06%	0.07%
Manifold	2,988	14%	1.17%	0.12%	0.27%

Table 1: payload delivery statistics by relay between May 1st, 2024 and September 30, 2024.

Key takeaways from Table 1:

Agnostic and Aestus are rarely the only relay that delivers a payload. Their average reorg rates are thus depressed due to multi-relay cooperation. Agnostic still performs very well when it is the only relay propagating a block. This also suggests that the market share of these relays is inflated relative to their impact.
Ultrasound’s solo performance is in-line with the cooperative performance of other relays. They are impressively fast.
Flashbots and Manifold do not perform well when they are the only relay propagating a block. Their solo reorg rates are above 1% and somewhat in-line with locally built blocks.

Yet, even isolating non-cooperative propagation doesn’t tell us the whole story. Over the past year, bloXroute, ultrasound, and more recently Aestus relay have introduced artificial latency into the system. This artificial latency, colloquially known as timing games as a service, raises revenue for proposers but puts strain on honest actors. The key dynamic that is driving this is that the expected value of a block increases as more time elapses since the block from the previous slot was built.

When a consensus client requests block headers from multiple relays it waits until it has received a response from each queried relay (or until a timeout) to choose the highest bid. Concerns have been raised by core developers about the impact of these delays on reorg rates when proposers fall back to local block building. The excessive wait time induced by these games means that if the proposer needs revert to building locally, if for example no bids are above the proposers min-bid setting, then there may not be enough time for the locally built block to be propagated. These delays also have negative externalities on other relays. For example, when the Flashbots relay returns a header immediately but the proposer waits an additional 500 ms for a response from bloXroute, Flashbots experiences the delay without the benefit. In this scenario, if the Flashbots relay delivers the best bid, it still has 500 ms less to propagate the block leading to increased reorg rates and worse apparent performance. As relays and proposers continue to optimize their setups to build and propose blocks as late as possible, slower infrastructure will exhibit worse performance even without any explicit changes to the Ethereum protocol. Further, since the precise configuration of these delays is a competitive advantage for relays, we may never be able to analyze the resulting impact on other relays due to a lack of data and transparency.

Blocks That Reorg Other Blocks

Previous sections discussed blocks which were reorged and therefore didn’t land on chain. We now turn to the blocks that immediately follow these reorged blocks, effectively doing the reorging. When we examine the mechanics of a block proposal, we see that the proposer decides whether there is a reorg attempt, and that the relay is simply a middleman. Every slot, each relay accepts bids from builders for multiple distinct parent hashes; in essence they hold a series of auctions without knowing which auction is going to matter. The proposer then submits the parent hash on which they want to build and the relay returns the best bid for that parent hash. This poses the question: if the relay does not control whether a reorg attempt takes place, why does Figure 1 show that the Flashbots relay reorgs more prior blocks than its peers?

Figure 5: diagram of a reorging MEV-Boost auction. The relay does not have prior knowledge of the proposer’s chain view and accepts blocks from builders for both parent hashes. Although the proposer’s chain view favours parent hash β, the majority of builders are expecting the proposer to call getHeader() on parent hash ɑ. Adapted from: https://writings.flashbots.net/searching-post-merge

In Figure 5 we skewed the number of builders submitting blocks for the two parent hashes. If we only consider MEV-Boost blocks that successfully reorg previous blocks and dive into the auction data we see that this skew holds true. In Figure 6 we isolate these blocks and show that en masse, even when a reorg does occur, builders are caught off-guard and have a different view of the tip of the blockchain than the proposer. Intuitively this makes sense, builders are well connected entities with highly optimized infrastructure, whereas proposers who are poorly connected are more likely to see late arriving blocks as invalid.

Figure 6: histogram of the share of block bids submitted building on the latest parent hash vs. on stale parent hashes (only considering slots that resulted in reorgs). From the 934 blocks in sample, the median slot had less than 4% of bids building on latent blocks. This is representative of builders and proposers not being in sync about what the state of the chain is.

The result of the mismatch of expectations between well connected builders and less well connected proposers is that the auctions for these reorging blocks are significantly less competitive. In Figure 7, we see that the median auction for reorging blocks has only 41 unique bids across all relays, while those same auctions have a median of 1,128 bids that are not reorging. This represents over a 96% reduction in bid counts, and by extension competitiveness, in these auctions.

Figure 7: cumulative distribution of unique bids between parent hashes in PBS auctions where the proposer requested a reorging parent hash. We see significantly fewer bids in these reorging auctions, leading to less competitive auctions with fewer builders participating.

If we think about these reorging blocks naïvely, their effective slot durations are twice as long as a normal block. Twice the duration should mean twice the order flow, and therefore blocks that are twice as valuable. In Figure 8, we plot a histogram of the comparative value of the best bid at +500ms into a slot (approximately the median winning bid arrival time for all blocks). However, due to the reduced competition, we do not see a skew towards reorging blocks that are twice as valuable. In fact, the median best reorging bid in our sample was 8% less valuable than if the proposer had seen and chosen to build on the newer parent hash. In our sample, there was no market for reorgs.

Figure 8: histogram of relative values of the best reorging v.s. non-reorging bid before +500ms during slots where the block ended up reorging the prior block. We found a median relative value of 92%, suggesting that the median proposer lost 8% of their revenue by inducing a reorg. Although the long-tail shows that in some cases time-bandit attacks may be net-profitable, the reduced competitiveness of these auctions leads to a reduction in revenue for proposers that induce reorgs.

One byproduct of these less competitive auctions is a shift in relay market structure. Ordinary MEV-Boost auctions have become dominated by bidding latency, with geographically distributed builder networks operated by the same entities choosing to only submit bids to the nearest relays in order to not bid against their other pubkeys. However, in these reorging auctions, the differing chain view of builders reduces the importance of latency and encourages builders to submit matching bids to more distant relays—because their machines colocated at distant relays expect a different parent hash. This leads to an increase in matching bids across relays and an increase in market share for slower and smaller relays, in particular the Flashbots relay.

Rates of reorged and reorging blocks at the Flashbots relay are an outlier amongst the major relays with most reorged blocks being solo propagated. We can loosely attribute reorged blocks to unoptimized infrastructure leading to higher latency and late propagation. However, when we invert the situation and focus on blocks that reorg the previous block, speed and infrastructure no longer play a significant role. In Figure 9, we see that the majority of reorging blocks at the Flashbots relay are now non-exclusive and shared with other relays. This suggests that Flashbots is not causing these reorgs, but instead gains relative market share through cooperative block propagation in these less competitive auctions.

Figure 9: the share of blocks propagated by each relay between May 1, 2024 and September 30, 2024 that reorged other blocks. We note that Flashbots has a significantly higher share than any other major relay, but that these blocks are not exclusive to Flashbots relay. Manifold is not present in the figure; it’s market share is only about 10% of the smallest relay in the figure, which leads to an extremely sparse and unreliable dataset.

Another disparity between these reorging auctions and normal auctions is builder market share. To submit bids and win these reorging auctions, a builder needs to have the same latent chain view as the proposer, which favors less well connected and more geographically distributed entities. Well connected builders in geographically favorable locations (near the relay for the previous block) are more likely to be building on the latest block and therefore not participating in the reorging auction. By contrast, if a builder is further away from the source of the previous block then it is more likely to fall behind the chain tip and view the previous block as invalid—when the proposer is also out of sync with the latest block, reorg attempts occur. The result is a drop in market share for the top three (and best connected) builders and a large increase from smaller builders; notably Flashbots, blockbeelder, and Builder+.

Block Builder	Market Share (all blocks)	Market Share (reorging)
beaverbuild	50%	41%
Titan Builder	35%	4%
rsync-builder	10%	7%
Flashbots	2%	31%
Blockbeelder	< 1%	5%
Builder+	< 1%	4%
Others	< 2%	8%

Table 2: builder market share between May 1, 2024 and Sep 30, 2024 for all blocks and for only blocks that reorg other blocks. We see a large drop for Titan Builder in reorging auctions, suggesting that they have a well-connected builder network that is usually building on the latest block. We see a large increase from Flashbots suggesting that it they are more frequently building on latent parent hashes.

Concluding Thoughts

Ethereum is designed with redundancy in mind; proposer-builder separation extends this principle by encouraging the propagation of blocks from many relays. However, our ecosystem’s failure to appropriately fund relays has driven these infrastructure providers to distinguish themselves from their competition. In this case, by adding artificial latency to the system relays are able to gain a last mover advantage. Although the participating relays hold up well amidst these intentional delays, little attention has been paid to the negative externalities forced upon honest relays.

Our analysis also showed that although latency leads to more reorgs, block builders are not anticipating them, leading to less competitive auctions when they do happen. The reduced competition drives down block values leading to lower revenue for proposers despite theoretically having access to twice the order flow.

Open Questions

If PBS as scaling technology relies on blocks being propagated by multiple relays, what is the impact of removing the relay in ePBS or TEE-Boost?
How large is the impact of relays adding artificial latency on reorg rates at non-participating relays or local building? How sensitive are reorg rates to the duration of client timeouts?
How large of a role does the geolocation of proposers play in reorg rates? Is the distribution of proposers for blocks relayed only by Flashbots or Manifold measurably different from other relays? What about position in network topography?
How often are builders building on the previous parent hash when there isn’t a reorging hash submitted by the proposer? Are the reorging auctions meaningfully different than regular auctions?
Does the type of MEV extracted differ in reorging auctions from normal auctions? Is there less non-atomic arbitrage?
Should proposers broadcast the parent hash on which they intend to build to relays and builders in order to get the most competitive bids?

potuz · December 21, 2024, 1:34am

This was a great read, which I haven’t really absorbed yet, but wanted to leave a couple of quick cellphone comments

Locally built blocks having higher reorg count than ultrasound is simply that they count the relays that timed out and not only those nodes that decided not to even ask the relay. There’s always been some guessing towards home stakers running in constrained environments, but in fact locally built blocks are mostly made from nodes actually connected to MEV boost and using min bid or similar. An analysis with these taken out would probably shed a different view.

I find fascinating the information about less bidding for reorging blocks. I would venture to guess that here the problem may be a mixture of client diversity and connectivity of builders. If builders are highly connected they’ll see the previous block as head. I’m not sure how much builders are

building for both heads even though they see the incoming block as canonical
adjusting their bidding strategy for both actively.

Really not into the depths of how builders do this but what I would do is monitor attestations more than the event stream for head