Benchmarking Op-stack Sequencers: Design & Implementation

What we are sharing in this post is an internal design doc that describes techniques for benchmarking Op-stack sequencers at production scale.

The design doc outlines the design principles, while this link contains implementation details and documentation for our testing tool built on these concepts for op-rbuilder.


Introduction

This RFC describes a technique for benchmarking an Op-stack sequencer with production-scale state sizes. Instead of waiting for a network to organically grow its state, we propose methods to artificially simulate large state conditions by leveraging the Engine API architecture and existing chain data.

The approach focuses on two key aspects: simulating gigabytes of state data and driving the Execution Layer directly without the complexity of running a full op-node. This enables early performance testing under realistic state conditions, helping identify potential issues before they impact production systems.

2. Goals/Non-goals

Goals:

  • Define technical mechanisms to simulate on gigabytes of state size
  • Describe how to drive the EL directly without requiring an op-node

Non-goals:

  • Describing specific benchmark tooling implementation
  • Defining developer experience (DevEx) for the benchmark system
  • Specifying how to simulate transactions and load patterns
  • Creating metrics or specific performance tests

4. Motivation

Benchmarking an Op-stack sequencer presents a fundamental challenge centered around state size and its impact on performance. Sequencer performance is heavily dependent on the size and structure of chain state. Empty state benchmarks provide misleading performance metrics. In production environments, chain state can accumulate to hundreds of gigabytes, dramatically altering performance characteristics compared to testing environments with minimal state.

Being a builder for a network and waiting for its state to naturally grow introduces significant risk: performance issues may only become apparent after months of state accumulation, when they’re already impacting users.

The core challenge lies in creating a benchmarking environment that can simulate realistic state sizes proactively, without waiting for organic chain growth, while providing consistent and reproducible results that can help identify performance bottlenecks before they impact production systems.

Proposal

The proposed solution leverages the fundamental architecture of Op-stack to create a benchmarking environment that can simulate large state sizes without requiring organic network growth. This document focuses specifically on the technical mechanisms that enable state size simulation, rather than describing the complete benchmark system implementation or developer experience. Our goal is to establish the foundational techniques that will later inform the development of comprehensive benchmarking tools.

Engine API Simulation

Op-stack follows the same architectural pattern as Ethereum L1, separating the system into two main components: a Consensus Layer (CL) and an Execution Layer (EL). The CL, implemented as op-node in Op-stack, handles consensus rules and determines which blocks should be included in the chain. The EL processes and executes these blocks, maintaining the chain’s state.

The Execution Layer operates passively, waiting for commands from the op-node through Engine API requests:

  1. forkchoiceUpdateV3 with payload attributes triggers asynchronous block building
  2. getPayloadV3 is called after the configured block time has passed to retrieve the block
  3. newPayloadV3 executes and validates the block (no-op since already processed, but still performed)
  4. forkchoiceUpdateV3 without payload attributes updates the chain head

Since the EL operates by responding to these API calls, we can create a lightweight test harness that mimics an op-node by making these Engine API calls in the correct sequence.

For a practical implementation and demo of this approach, see the code and this walkthrough video.

State Initialization

The EL is agnostic to the origin of its state data, which means we can initialize it with any existing state. We can take advantage of this by using an archive of state data from networks that already have hundreds of gigabytes of accumulated state, such as Base testnet or mainnet.

However, when using existing state data, we don’t have access to the private keys of any accounts in that state, making it impossible to simulate transactions from these accounts. To overcome this limitation, we can use Deposit transactions, a special type of transaction in Op-stack that allows minting balance in L2 accounts. The CL node includes these transactions in the payload attributes of the FCU request.

While these transactions need to be signed, they can be signed with any private key - the signer doesn’t need to have any existing state or balance in the L2 chain.

For example, a Deposit transaction that mints 1212121212121212 wei in the account 0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92200 would look like this:

{
  "sourceHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
  "from": "0xf39Fd6e51aad88F6F4ce6aB8827279cffFb92200",
  "to": "0x0000000000000000000000000000000000000000",
  "mint": "1212121212121212",
  "value": "0",
  "gasLimit": "210000",
  "isSystemTransaction": true,
  "input": "0x"
}

We can leverage this mechanism in our benchmarking setup: when triggering block building through the forkchoiceUpdateV3 call, we include Deposit transactions in the payload attributes to create accounts with balances that we can control for testing.

For a practical implementation and demo of this approach, see the code and this walkthrough video.

Load Testing

While this proposal focuses on state size simulation, transaction patterns and load testing are equally critical for comprehensive sequencer benchmarking. These aspects require simulating realistic user behaviors, transaction mixes, and varying load patterns. However, this is intentionally out of scope for this proposal.

These load testing requirements will be covered by our internal tool, contender.

4 Likes

This is great! Thanks for sharing!