A TEE-Backed Leader Election Protocol to Minimize Worst-Case Cross-Chain Sequencer Latency

dmarz · March 12, 2025, 4:17am

Overview

High Availability (HA) L2 sequencer setups raise the variance of cross-chain a.k.a. “inter-sequencer” communication latencies, thereby increasing worst-case communication delays. By leveraging TEEs to gather honest measurements of network latencies, similar to ideas in DePIN, we can construct a leader-election schedule that rotates sequencers in a way that minimizes worst-case latency between pairs of sequencers across multiple L2 networks. Whether this is overoptimization or overengineering is left as an exercise to the reader.

Context

There are two relevant pieces of context to understand this proposal.

1. High-Availability L2 Sequencers

Under the hood, all OP Stack chains aim for “high availability” by rotating sequencers and/or having multiple standby sequencers. This ensures someone is always ready to propose a block, but in a permissioned setting. These are usually out-of-box web2 HA setups, typical crash fault-tolerant, and Paxos-style softwares. Some teams run each one of these sequencers in a different data center, some even run them on different continents to decrease possibility of single-point failures. I have yet to see any run in multiple cloud environments on different continents.

2. Optimism-Style “In-Cluster” Interoperability

Optimism is shipping an interoperability protocol that will allow chain’s in the same “interop set” (aka “in-cluster”), and specifically the Superchain at first, communicate cross-chain messages that are secured by the underlying fault proof and dispute game architecture. A consequence of this is that chain’s now must reorg with each other, or at least must reorg with each other if an “initiating” cross-chain message is reorged out of the source chain after it’s “executing” message has been included on the destination chain.

The speed of a cross-chain message in the happy path is approximately the network delay of communication. There is additional complexity that a finality gadget as well as equivocation protection may be introduced to reduce possibility of a reorg, but these things typically manifest themselves as delays to the cross-chain messages.

The Problem

We have two L2 networks— Network A and Network B. Each network has multiple TEE-equipped nodes that can become the leader (sequencer). Different pairs of nodes (one on Network A, one on Network B) can have significantly different communication latencies. A random schedule might frequently pair two physically distant leaders, causing high cross-chain message delays. We want a rotation schedule that often pairs two close leaders, thereby minimizing the worst-case cross-network latency.

To illustrate, suppose we have a simple example with 5 total TEE nodes:

Network A: Nodes 1, 2, 3
Network B: Nodes 4, 5

During a single epoch, the nodes measure (using TEEs) the following latency matrix (in milliseconds):

	Node 1	Node 2	Node 3	Node 4	Node 5
Node 1	—	8	15	30	35
Node 2	9	—	12	25	28
Node 3	16	7	—	40	42
Node 4	32	27	40	—	5
Node 5	34	26	42	6	—

We see that Node 1 ↔ Node 2 is very fast (8 - 9 ms), while cross-network latencies between A-nodes and B-nodes can be as high as 42 ms. If the schedule randomly pairs (Node 3, Node 4), we risk a 40 ms inter-chain delay. But if we carefully choose pairs like (Node 2, Node 4) or (Node 1, Node 4), we can keep inter-chain delays closer to 25–30 ms.

Solution

Below is a protocol for selecting which node on Network A and which node on Network B become leaders during each epoch, aiming to minimize the maximum cross-chain latency.

Two Chains, n and m Block Builders

Chain A has n TEE nodes: \{A_1, A_2, \dots, A_n\}.
Chain B has m TEE nodes: \{B_1, B_2, \dots, B_m\}.

Each node A_i (resp. B_j) measures its Round Trip Time (RTT) to each node in the other network using TEEs for authenticity and correctness.

Protocol Steps

TEE Node Discovery & Pings
- Every TEE node on both networks (A and B) discovers the other nodes.
- Once per epoch, each A_i pings each B_j, recording L(A_i, B_j). Symmetrically, B_j pings A_i.
Onchain Publication
- At epoch’s end, each node posts its collected ping results onchain, in an attested format.
- The onchain aggregator constructs a global latency matrix:
  L = \bigl( L(A_i, B_j) \bigr)_{\substack{i \in \{1,\dots,n\}\\ j \in \{1,\dots,m\}}}.
Leader Rotation Calculation
- We define a schedule over E upcoming epochs. In each epoch e, exactly one node from A and one node from B become leaders:
  (A_{i_e}, B_{j_e}) \quad \text{for} \quad e = 1, 2, \dots, E.
- Our objective is to choose \{(i_e, j_e)\} so as to minimize the worst-case cross-network latency:
  \min_{\{(i_e, j_e)\}} \left(\max_{e \in \{1,\ldots,E\}} L\bigl(A_{i_e}, B_{j_e}\bigr)\right).
  Other variants might aim to minimize the sum of latencies or incorporate weighting for certain epochs.
New Epoch Commences
- For epoch e, the protocol enforces that A_{i_e} is the active leader on Chain A, and B_{j_e} is the active leader on Chain B$.
- Because we specifically choose pairs with lower latency, cross-chain message passing during epoch e is faster.

Open Questions & Future Work

Algorithm Computation
Very small sets may be possible to run onchain, unlikely that larger sets can. As long as inputs are onchain it’s likely okay as we have integrity over the leader schedule calculation inside of the TEE. There is also likely some trade-off between the cost of posting this to L1 frequently and the benefit derived from such frequency. On top of this, it’s likely this data should be posted to the common chain of the two L2s, aka the L1.
Algorithm Complexity
As n and m grow large (with many epochs E), computing an optimal schedule becomes more complex. This may require heuristics or approximation algorithms inspired by bipartite matching or other combinatorial optimization techniques.