i’d like to host a call in around 2 weeks time to discuss the status of mev-boost (and by extension the relay and builder implementations). i will return with an exact time once i have more clarity on some agenda items but the call will be early february (week of the 6th or potentially in the first few days of the month).
i am posting this now so i can circulate to all the relevant parties and soft-reserve some calendar time. timezone-wise i’m thinking around 2 or 3pm UTC. please comment if you would prefer a different time of the day.
as for the agenda:
builder-specs update for capella hard fork
builder-specs update for deneb (EIP-4844) hard fork
relay and builder implementation-readiness for the capella hard fork
testing and security updates
open time for research updates or discussion around mev-boost and builder ecosystem
i have a few more agenda items in mind around my thinking for a mev-boost roadmap this year and will update here once they are more concrete. if you have anything you’d like to discuss, please comment below.
Here is a rough transcript from the call. If any information was missed or misrepresented please call it out and will I add/fix. Also for those who did not attend the call reviewing the side chat above provides additional context. I did not capture the extended discussion post call, if anyone wants to add. Thank you to all of the organizers and participants involved.
builder-specs updates for capella: v0.3.0
The main thing here is updating these for cappella. This is the core API that consensus clients use to talk to the builder network we have that mev-boost helps orchestrate. There is V3 that was released last week.
For Capella the main things are updating the types of APIs so they support the new formats, withdrawals in capella meaning changes in builder APIs.
There is another repo of relay specs that flashbots currently maintains. The only change here is that it submits block builder submissions in both bellatrix and capella versions. It looks releasable but there may be an issue with how it’s displayed on the swagger ui.
So the main change was supporting either type across the fork in the block submission APIs. The main implementation of that so far is the flashbots relay. If you want to make a conformant relay these relay specs are the place to start.
builder-specs updates for deneb
Specs generally ~ There is another PR for denab, EIP 4844. There will be a bit more involved changes for upcoming fork with respect to builder APIs because now we have these blobs. There is still some design decisions being made about how blobs are passed back and forth. And that has some implications for the builder APIs. So the PR could still change.
Wrt 4844 ~ The main thing is if we hav done blob vs many uncoupled blobs you essentially need to send many unlinked blobs or blinded roots but I believe PR reflects those changes.
Capella hard fork readiness
Implementation of the changes in all the various software we have out there. A big one was getting all the relay operators together to coordinate for upcoming hard fork.
Rough overview of where the flashbots stack stands
There have been changes needed to lots of layers including the relay, builder, validation nodes, mev-boost is all wrapped up, and then also the CL client.
Use our Prysm fork both to trigger the block building progress but now we added data endpoint to get withdraws so a relay can validate them when a builder submits a block to filter out invalid withdrawals.
The Mev-boost relay from our side is dependent on our custom Prysm fork which is an additional endpoint. We use a custom Prysm fork to trigger the block building, and for the Capella upgrade we introduced another get_withdrawals endpoint, so the relay can verify the withdrawals of block submissions. Eventually the SSZ event would provide that functionality, standardized across CL clients.
Overall on the relay there are 4 open PRs that are one leading into another to separate out the individual changes. They all run on the Zhejiang network on our relay there. We’ve gotten a few thousand blocks through. All of the changes throughout the stack are there and documented in the side chat. We are wrapping up merging and getting into main branches early next week I think the work will be done.
All changes are there and mostly complete. No more major changes just polishing and merging it in. The next steps are participating in Sepolia and Goerli upgrade. Would be great to have many relays involved in tested upgrades.
There was one PR to expose withdraws. If we step back for a second we say okay with each fork we have to keep adding things for builder functionality then it will be daily to keep adding new endpoints. So I think instead we are leaning towards having one SSZ endpoint that links everything together.
The main thing is having relays have access to this information. Right now they have a Prysm fork, Ideally you can plug any client in. The are other use cases for having this information exposed as well, so I think it makes sense to standardize this SSZ endpoint.
So far the thing that is encouraging is that Randao and withdrawals are both very cheap for CL, so there is no real “cost” to doing this.
Hopefully we don’t have anything in the future, Maybe blobs can get hectic if we have to send a lot of data around. One question how this handles building on multiple heads?
Sepolia is happening next Monday or Tuesday. We could aim for Goerli for relays and coordinate async to make happen. If Sepolia goes well feeling is Goerli happens 2 weeks after that ~ Mid March seems realistic.
The Ultrasound relay runs on Goerli
Block Native: not running Sepolia will be testing on Goerli, will be ready to test on Goerli relay.
Brief update on Zhejiang
Chris: We put the whole stack up there and keep deploying. Working with different CL teams to test whole stack. Think we are now at the place where there is no errors and everything works post fork. The next interesting point of time will be when its actually part from before the upgrade to after the upgrade on the next testnet. Very useful to test everything.
One thing we can do is hard fork testing is actually testing the boundary. Mario at the EF working on Hive Support, pretty new. They have been testing circuit breaker idea in Hive. Hive is automated client testing framework. Cool to see the Builder Support. The results of all the different clients and how they behaved under circuit breaker scenario. Move towards building out, essentially have mev-boost relay and different components under different hard fork scenarios before, through, and after. Generally we have devnets to test pre and post fork. The part we have fewer shots at is the actual transitions. To the extent we can have automated testing there that is very valuable.
mev-rs boost and builder components
Alex: Still pretty early, bandwidth constrained on what can build out. Put together this thing called mempool builder. The idea is to emulate the builder APIs using the mempool of a local execution client. I think the clients are already using for testing. One example would be in the future: say Denab changes are done ahead of time then we could do early testing on CL side with that stack. If anyone interested, check out Repo
What do I mean? The idea is there is a consensus client keeping track of consensus and an EL client, there is another component which is this builder just building blocks from the mempool. Rather than having to have a Prysm fork, instead can run one piece of software it orchestrates all the right pieces at the right time.
Terrance - when relay says they won’t be able to test on pure Goerli is it because code is not ready or they don’t have any place to test. If the code is ready I feel like you should just test today. EF devops we run devnets every week so we can easily plug in the relay and test. No reason to wait until Goerli unless the code is not ready. If there is a bug will be a very stressful 2-3 weeks.
Justin: For USR, 2 people team with very limited budget, running a relay costs $1-2k if not more per month. We are very constrained from a human resources standpoint. In order to be effective as a relay you need the builders to connect. I have spent that last 2-3 months trying to convince builders. We could have dummy builders that are spin up but that also adds to the complexity of setting things up. Also need to coordinate with the validators to have a meaningful connection.
Terrance: We do have a mock builder and a mock validator.
Chris: FB builder can easily run on any testnet. You can even send fake bundles to send blocks. Builder can act as builder and a relay validation node in one instance. If you run a relay on top of a validation you can just use the builder to have it fulfill both the validation node and block building task on testnet in a single process. That’s kind of as easy as it gets, but still as involved as running a Geth instance. If you have funds it should be able to get winning blocks through any test-network.
Alex: We do have a lot of automated tooling at EF. If we can coordinate with the relays to say can point to a software release that is ready and we can start including that in all the relay testing. Warrants a more focused call for relay testing and that strategy generally where this would fall under.
Blocknative: would be happy to work with testnet. Were working on upgrades internally but haven’t committed any code to repo yet to accommodate capella.
Alex: Hearing 2 things: The relay operators have software set that they run and that is what Terrance is getting at if there is a fixed software artifact how can we test that? There is a second thing at the end of the day relay operators have actual deployments and it may be good to test those. Given the overhead it’s Tricky to have every operator spin up an instance for every devnet. Good compromise to the extent we can if you can produce most of the software you plan to run we can put it in this automated testing framework. For actual deployment we would test those Sepolia, Goerli, these other longstanding testnets that relay operators already support.
Sounds like just Flashbots will be operating a relay for Sepolia. This is fine. I think for future hard forks would be nice to try to push for this. I think there is a lot we can do around automated testing.
You can always run relays, The validator set is closed on Sepolia just have stable testate vs Goerli. Validators still do have to connect. But the idea is that this is something the relay would have to coordinate. If it seems like it’s worth testing we should be able to.
Local payload fallback upon relay failure
Terrance: One more thing worth bringing up. Post Shapella clients will have this feature. The clients will compare block value between local block and builder block and choose whatever is the highest. I don’ think this is going to matter in terms of relay and mev-boost because the validators will not sign it anyway.
Important change with capella was changing engine APIs can ask local pathway or remote pathway we now return the value of the block in a local pathway. There is a number of reasons to do this around censorship resistance it is a very cool thing added in Capella.
Will mev-boost update its min-bid strategy? I don’t think anyone has thought about changing mev-boost. CL client says regardless of what mev-boost sends me have the highest remote bid and local bid. Separately still min-bid option. You would need a remote bid above that floor to get that into beacon node separately you have this local comparison happening up stream.
Research Updates ~ Optimistic Relay
TLDR ~ Change how the block submission works
Non-optimistic block submission - block validation happens before a bid is marked active for the builder, so the relay sends the execution payload to the validation nodes. The validation nodes confirm that it’s valid and then the bid becomes active to win the auction. That all has to happen before get_header is called from the proposer perspective. get_header returns max bid from the relay.
Mark the bid active immediately. When the Builder submits block immediately we say this bid is active, it can win the auction. The validation happens asynchronously. We queue up in a different go routine the simulation of the block against the validation nodes. That happens post the bid being marked active.
The reason to do this is because often winning bids come in and the end of the slot and the validation takes some amount of times because it is a call between the relay and the validation nodes and there is some amount of time that is involved in simulating the block against Geth. If we remove those extra milliseconds at the end of the slot then the winning bid can come in closer to the 12 second mark and thus contain more mev for both the builder and proposer.
If you serve an invalid block to the proposer and they sign that header, and then no valid block is produced then you have a missed slot, this is worst case scenario
Defense ~ have collateral held for the builder so any builder that ends up submitting a block that wins an auction but doesn’t produce a valid bock on chain, their collateral will be used to repay the proposer who lost out on the slot.
Discussion (Mike & Justin)
In terms of making a refund to the proposer the relay has all the information it needs because it has the fee recipient in the registration message.
There have been refunds done by various companies: block native, bloXroute, maybe manifold also. Bugs and refunds given. Don’t think any issues with refunds. The Proposer will get compensated with a delay. Usually they receive the execution rewards immediately accessible in the EVM. Our goal with USR will be to make a refund within 24 hrs.
In terms of reputational damage for the relay itself, it should be fairly minimal if there is an understanding and education within the community that it was not the relay that was at fault it was the builder and the relay did the best it could to relay the proposer promptly
One thing we intend to do as relay operator every time we do such a refund we are very transparent about this: log on GitHub detailing why we made the refund. For example its exposing the bad block so everyone can verify that its invalid. If and when there is a bad event making sure you ask the builder to fix w/e bug caused the issue. Another possible idea is to have a time-out or cool-down period maybe one day. In addition to punishing the builder financially because the builder will lose funds (pure financial loss). In addition to financial incentive they will be disqualified for optimistic relay for 24 hrs which would lead to further losses because they would forgo winning any auctions. You can also have additional refund fee, fixed fee of 0.1 ETH etc. to be more stringent.
We don’t expect these things to happen often in practice. If and when there is a builder bug it’s overwhelmingly likely that it won’t lead to a loss for the proposer. Roughly speaking were seeing builders provide 50 blocks in a single slot and they are competing with many other builders. On the order of 1000 submission per slot, if there is even one detected to being invalid even if not winning one it will lead to builder being demoted. So if there is a bug there is a 99% chance it will be caught early it will lead to demotion without actually being the wining block that requires a refund. In practice we don’t expect to be doing refunds but we do want to have all of the infra in place especially from a policy standpoint to discourage bugs on the builders side to start with.
Another optimization - one of the bottle necks of block submission flow is actual data, bytes getting into the relay of the block. The blocks can be big both receiving the blocks and un-marshalling them into a go object is part of the slowness. Having builders only submit headers or by parsing the header first before the body and then marking the bid active immediately we could have some further performance improvements. Then we would have to build logic that says if the header wins the auction but body is not available, wait some amount of time and hope body comes over the network. This introduces more complexity which is why we are not proposing in initial version.
The design goal is that we are not going to significantly increase the number of missed slots. Ideally a missed slot should be an extremely rare event. By a refund doing a postmortem hopefully we will be able to see if this causes any network degradation and we can turn off immediately.
In the worst case it’s one missed slot per builder per manual intervention. It will be a single isolated missed slot. One of things we observe empirically is that everyday there is on the order of 10 orphan blocks. In edition to that there is about 1% of blocks that are empty because the validator did not show up. These provide zero network degradation. Optimistic relaying will be a least an order of magnitude less, maybe several orders of magnitudes less degradation than what we already have.
From a designer perspective we have designed the Ethereum blockchain to be ww3 resilient. If even 90% of blocks were empty the Ethereum chain would keep on running.
The intent is to merge into FB relay and there would it be on by default or auto opt-in? The way we are doing development now is running on Goerli planning on pushing to prod and starting with small set of one or 2 trusted builders. The nice thing about the patch is that it runs the flashbots relay in non optimistic mode by default it takes a manual intervention to modify the database to say turn this builder pub key into an optimistic builder. The default strategy is what the SQ is today. We made the PR mainly for visibility reasons. It is definitely opt in per builder and we expect that starting with a very small set of builders and a relatively small amount of collateral we can see how it runs and if there is anything we learn from that.
Our expectation is that because we are in a race to zero in a few months/weeks if you are not running in optimistic mode you are not relevant as a relay.
We have invited builders for early testing via the flashbots discord. We will will keep the order of collateral low say 1 ETH per builder.
Other relays besides ultrasound might want to adopt this but might be held back by financial and legal implications of holding collateral. One idea we have to address this we are calling a builder guarantor. Under this model, a relay would make it known to the builder that a slot was missed and the builder would be responsible for refunding the proposer that missed the slot. In the case that the builder denies to payback to the proposer then the guarantor would be responsible for refunding the proposers. Builder reputation far exceeds single slot payment. That would hopefully simplify a lot of things and allow different forms of guarantors to back the builders who are doing block submissions.
There is a way to run the relay without taking any collateral at least no financial collateral. You are using reputational collateral or legal collateral. One of the things we observed is that as a builder your reputation is worth a lot because it gives you access to bundle flow and proprietary transaction flow both of which makes you relevant as a builder. We believe that the repetition of builders is at least an order of magnitude if not several higher than 1 eth. What we are willing to do is to not take collateral for most builders at least the identifiable builders that have reputation and expect if and when a bad block is inserted on chain then they would make the refund directly to the proper without any sort of intermediary.
What happens is the builder decides to burn their reputation? In that case there would be some type of guarantor that would step in and make the proposer whole that guarantor potentially being the relay operator. One model we are considering for initial testing where there is only one etc of collateral per builder. Where we basically credit the builders of 1 etc of collateral for free with an understanding that if there is a bug on their side they would make proposer whole. And also an understanding of the proposers that if the builders don’t make them whole we step in as last resort guarantor.
If a block has a value that’s higher than the collateral we would process it in the non optimistic way. The first thing we do is see if the value exceeds the collateral of the builder or guarantor. So that there will never be a situation where a proposer misses out on a huge mev opportunity and that there is not sufficient collateral to repay them if they sign an invalid header. Higher collateral does imply a higher amount of their blocks submitted will be optimistic. In the long tail super high mev blocks are far enough that this doesn’t seem like that large of issue. The more collateral you have, the more of an edge you have. We want to limit the amount of collateral per builder. It could be a limit of 1 eth. The good news here is that if you look at mev distribution over all blocks less than 95% of blocks have 1 eth in rewards.
We want to improve CR of Ethereum, it’s perfectly fine if we do so on 95% of the blocks. 5% tail would go through status quo pessimistic validation. It is important that there is same limit for every builder but limit is low. If price of Ether increases dramatically we can reduce the amount of Ether bond over time. One possible policy would be to cap the amount of collateral to target 95% of the blocks. More of a policy thing for relays and we are encouraging relay operators to have a reasonable policy there in terms of the cap.
Good point, I edited the title with meeting index + date so we can easier keep things in order.
I think our forum is a superb place to host these conversations! Intuitive for anyone to get up to speed on previous calls and improves discoverability and cross-referencing/communication between topics. I’m not that worried about cluttering, if that becomes an issue we’ll find a solution.