This is a brainstorming to collect more specific details of what does it mean for a relay to be trust-worthy. Please reply with your requirements. This topic will be updated to reflect the main findings.
With the current design of proposer/builder separation that will go live at the merge, the relay is still a trusted mediator between proposers and builders.
In particular:
it can steal MEV opportunities from builders
it can lie on the amount to be paid to the proposer
it can withhold the block body after it has been signed by the proposer
it can deliver an invalid block to the proposer
it can filter out any transactions it dislikes
So a relay has to promise to do its best effort to avoid those situations, it has to promise to be trust-worthy.
this could broken down into the first round getting bids or the second round getting the full blocks
just want to point out there is a 1 second timeout at the moment for mev-boost to provide any available bids before clients will fall back to their local builder
This now seems the easier, cheaper, and more forward-looking option. I’m aware this brings challenges to Lido and maybe some other node operators, so we have to continue building on top of it to come with a solution that is satisfactory for everybody.
At the very least, it has to be documented. From there, the fancier the better. I imagine a system that when a tag is created, the source and the binaries are published, this triggers integration tests in staging, then canary deployment in production, and then full release. Everything automated with manual gates to move to the next stage.
This is, of course, hard. We have been improving our infrastructure, we are hiring a second devops engineer, and plan to share more about the way we run our relay. We have been supporting other teams that run relays, and I would be very happy to collaborate with them on improving our release processes.
When it comes to performance I think we should add uptime here. However, I do think it is important that the /eth/v1/builder/status endpoint be dynamic and return the last block number/header or a timestamp. Otherwise there’s nothing preventing a relay from setting their /eth/v1/builder/status endpoint to a statically served page to created artificial uptime. Curious to hear thoughts on this.