We have recently introduced Flashnet, an anonymous broadcast protocol that we are developing that aims to provide censorship resistance and anonymity to participants with much lower latency overhead than existing protocols while retaining strong anonymity guarantees and bandwidth high enough for most blockchain use cases.
In this post I’d like to explore one of the novel aspects of the protocol in development, which is auction-based scheduling of messages. The idea is quite simple, but it seems to make for a much more practical approach.
The playful prototype I have implemented is a simple modification of ZIPNet: Low-bandwidth anonymous broadcast from (dis)Trusted Execution Environments, and the new scheduling is only one of the many improvements we are developing for Flashnet.
Message scheduling in ZIPNet
In short, ZIPNet’s anonymous messages are broadcast through a large vector. What’s critical for the protocol to function is for messages within the broadcast vector to not collide with each other — as the messages would simply come out scrambled. To ensure that doesn’t happen, clients participate in a separate scheduling beforehand.
When preparing to send a message, each client calculates a pseudorandom fingerprint for its message, along with the scheduling slot (position) to attempt to reserve. Each client then inserts its fingerprint at the calculated scheduling slot into the scheduling vector — simply by xor-ing it in place. The scheduling vector is initially all zeroes, and after this step it contains the client’s message fingerprint at some pseudorandom location.
The clients jointly only reveal the sum of the individual scheduling vectors, hiding who contributed which fingerprint. More concretely, each of the clients encrypts their scheduling vector by xor-ing it with a pseudorandom pad so as not to reveal the fingerprint in plaintext. The encrypted scheduling vectors are then aggregated (xor-ed) together, and eventually, after being sent across the network, the pads are removed (xor-ed), revealing the effect of xor-ing all of the initial clients’ scheduling vectors which contained their individual fingerprints. Each client then checks if their fingerprint has remained unaltered in its location — and if so, they have been allocated the respective indices in the (much larger) message vector for the next round.
Properties of ZIPNet’s scheduling
The xor-based scheduling is quite simple and decently performant. But it doesn’t schedule messages efficiently.
However, there is still plenty of wasted bandwidth in the message channel. In case of a collision no party writes a message to that slot, wasting bandwidth. The probability that some indices will collide during scheduling is high, even when not under heavy load, and this issue is not easily mitigated.
Another property, though not necessarily a downside in all circumstances, is that the messages are scheduled in essence at random. This will be just fine for some use cases, but if the messages in the overall system differ in urgency or utility, there’s simply no way to address it.
Lastly, scheduling in ZIPNet only allocates constant-sized buckets, which is problematic if messages are dynamic in size. Fragmenting the messages would usually solve this problem, but because messages are allocated at random only some of the fragments will pass through the channel in a single round.
Optimal scheduling
The ideal scheduling protocol would always allocate all of the bandwidth to messages, it would schedule messages atomically (even if they have to be fragmented), and it would schedule according to the messages’ utility within the context of the broader system. A protocol fulfilling those three would be at least much more efficient, if not optimal — it would maximize the total utility moving through the anonymous broadcast channel.
One way to realize the above goals for a scheduling protocol would be to run some kind of knapsack or auction for packing the message broadcast vector, but before we can get there we need to introduce a new data structure that can transmit metadata — size and utility of a message in addition to its fingerprint — while keeping the anonymity guarantees.
Invertible Bloom Lookup Tables (IBLTs)
In essence, an invertible bloom lookup table allows three operations we care about: inserting a chunk of data, aggregating two IBLTs together, and recovering all previously inserted chunks of data with high probability. The implementation follows Invertible Bloom Lookup Tables with Less Memory and Randomness for its improved efficiency.
Internally the IBLT is a multi-layer structure, with each of the layers consisting of two vectors. One vector is holding data buckets — in our case the short hash, utility, and size of a message — and one counting chunks in data buckets. The data buckets and the chunk counters are encoded as finite field elements, so they can be encrypted and aggregated through addition and subtraction in their respective fields. And crucially, encrypted IBLTs can still be aggregated.
Each data chunk is inserted once into each layer at a pseudorandom location, so that even if a given chunk collides with another one in a given bucket it’s unlikely that all buckets in all layers have more than a single chunk of data — at least one of the buckets will be “pure”, containing a single chunk. Given a single bucket with a single chunk, the chunk can be recovered — and removed from all the other bucket it appears in, leading to more chunks being recoverable.
Auction-based scheduling
The basic idea is to transmit the size and utility of each client’s message through an IBLT — a collision-tolerant data structure that can be aggregated when encrypted. Once the clients have all of the other clients’ message size and utility, they can simply run a knapsack solver to arrive at the optimal allocation of the message vector. This maximizes the total utility realized by the anonymous broadcast. Note that even if there is no clear way to discern utility, just transmitting the size and running the knapsack still brings a lot of benefits.
The messages can now be dynamically sized up to some maximum size. Message channel is allocated optimally, and there is no wasted message bandwidth.
When preparing to send a message, each client constructs an empty IBLT of appropriate size and inserts their message metadata — its hash, size, and utility. The client then encodes the IBLT as a vector of field elements and encrypts it by adding a pseudorandom pad (field addition). The encrypted IBLTs are then aggregated together (field addition), and eventually the pads are removed (field subtraction), revealing the aggregated IBLT which contains all of the original messages’ metadata. Each client then recovers the metadatas, and runs a knapsack solver. If their message (identified by hash) is in the knapsack solution, they have been allocated space in the message vector for the next round. The starting index for each message into the message vector can be trivially derived from the (deterministic) knapsack solution. Just as in the case of ZIPNet, the clients are running in TEEs to ensure they follow the scheduling protocol.
Impact on performance
The main downside of this approach is that it’s not as performant as the xor-based scheduling as it relies on finite field operations. If the actual messages are large and there aren’t many of them, this overhead is negligible compared to the processing required for transferring messages themselves, and the resulting overall system much more performant.
If there are lots of small messages, the tradeoff space is a more complex. On the one hand the auction-based scheduling is less performant in practice than the xor-based one. On the other hand the xor-based scheduling starts to deteriorate, necessitating a much larger message vector to relieve the impact of collisions, and therefore also increasing the computational load. The scheduling protocols should be benchmarked for specific use cases.
Scheduling anonymity
The scheduling messages (the IBLTs) are first encrypted with pads which only all of the servers together can remove, and then aggregated before the eventual decryption. The only way to deanonymize any participant would be for all of the servers to collude and decrypt the initial message. Just like in ZIPNet, a single honest server will preserve anonymity of all of the participants.
And, just like in ZIPNet, cover traffic is essential to build up an anonymity set, for the scheduling as well as the messaging.
Next steps
If you are interested, check out the implementation and the interactive visualizations!
As we continue working on Flashnet and its design matures, we will move to a rust-based implementation and I will stop hand-rolling all of the cryptography (I did it only out of a desire to learn).
Comment any use cases where anonymous broadcast would bring benefit to users that come to mind! It would be interesting to think through the upsides and the performance tradeoffs — whether we can already make it happen.
Reach out if you are curious to build or research!