I’d like to modify the data API validator registration endpoint to be more private. Given a list of validator pubkeys, it’s possible to download all of the registrations and run analytics on them. One particular point of concern, these could be used to group validators that were previously not associated. Because validators send registrations in batches, registrations for validators on the same machine will most likely have the exact same timestamp. This information could facilitate DoS attacks.
There are multiple ways to go about this.
Remove the validator registration data API endpoint.
Return “yes” or “no” to whether the validator is registered, instead of the registration.
Require some type of signed message to get the registration.
The endpoint is very useful to solo stakers that want to verify their once-a-year block will actually be built how they want.
I agree that signing the message would be the best way to go, but it’s also the most complex - can it be easily done?
Otherwise, how about confirming whether an exact registration is seen - pubkey + gas + fee recipient? I think that wouldn’t leak as much information as the current API while still being useful.
Yes, this is a useful endpoint for both solo stakers and staking pools. It seems it would still achieve it’s goal without exposing the feeRecipients.
One idea could be that the API returns only yes or no about whether this is an active (or known?) validator on the relay.
It could also be an idea to allow sending pubkey+feeRecipient/gasLimit, and have the API return whether that’s the latest known/active registration (for someone to check the latest feeRecipient without exposing it publicly).
Lastly, staking pools want a way to check many pubkeys at once. This API could also accept an array of and return all the responses at once.
This would resolve my biggest concern (timestamps) but it would still be possible for anyone to check if a validator is connected to a relay at a given point in time. That information (pubkey, fee recipient, gas limit) is mostly public. You can assume 30000000 for the gas limit and look at the most recent (post-merge) block to determine the fee recipient.
I think this is okay though. It’s a good balance between privacy and usability. Requesting a signed message would require several changes to validators/remote-signers and I don’t think that’s worth it.
The gas_limit field could be optional. I don’t think the fee_recipient should be optional though. Only return found entries. For example, if the first one exists and the second one doesn’t:
Or, if you want to me be minimalist, an array of found public keys:
[
"<a_pubkey>"
]
There should be some limits/restrictions. For example, limit the number of entries to 1,000 per request. This would prevent someone from sending a request with 10,000,000 duplicate entries.
It’s an important detail whether fee_recipient should be optional!
It would make the API even more private, because you already need to know both, and can’t look anyone up just by pubkey. Downside is, it would probably make it a little harder to casually check a valid registration.
If there’s no specific needs otherwise, then I think there’s a strong argument for going with the more private solution (requiring pubkey and fee_recipient as inputs to the APIs, with gas_limit optional).
Definitely agree, returning the set of found entries is more intuitive. However, the real actionable information we need is the compliment of that set.
Perhaps returning two arrays, registered and unregistered?