Quantifying Info I: Quantifying The Information Leakage of Computing on Encrypted Transactions

Part 1

The following is an abstraction of a problem. For more context on application, a brief description is attached below.

Description

The setting
An adversary is given a finite set of functions \mathcal{F} where every element f_i\in\mathcal{F} is a mapping from a set X to a set L. \mathcal{F}. The adversary is also given a prior distribution, \mathcal{D}, over F.

The game
A function f^* is drawn from \mathcal{D}. The adversary does not know f^*.

Now there are n rounds. In each round i, the adversary selects any x_i\in X and is told f^*(x_i).

After the n'th round, the adversary attempts to guess f^*.

Example
X:={(1,1),(0,1),(0,0),(1,0)}
L:={0,1}
\mathcal{F}:={\text{AND, OR, NAND, XOR}}
\mathcal{D} is uniform over F.
n=1

A valid policy for the adversary is to query (1,1). If f^*(1,1)=1, choose either OR or AND according to a cointoss. If f^*(1,1)=0 choose NAND or XOR in the same way.

The success rate of this policy if \frac{1}{2} which is better than complete random guessing which yields \frac{1}{4}.

This is a relatively basic example with small sets. In general, |L|,n<<|F|,|X|. One can also reasonably assume \mathcal{D} is power law distributed.

Question

  • Is there existing literature which studies this setting? Information theory and computational learning theory seem like the most obvious areas to investigate.
  • Given a setting, how much can the adversary learn from the best possible policy? One way of measuring this is the reduction of entropy from the prior distribution to the posterior belief after the last query as a function of n (see part 2 for some more ideas)
  • Assuming the adversary is polynomially bounded (runs in probabilistic polynomial time), what is the best policy they can find given the setting?

The actual application

This question comes up when considering doing computation on private information. Feedback from this computation may degrade the privacy and this question aims to quantify to what degree privacy is degraded. More particularly, this problem comes up in a blockchain setting in which a transaction must be ordered in a block along with other transactions. This process requires doing computation on these transactions. In many cases the ordering of transactions within the block has an impact on how a block fares according to some objective function. In order to improve upon brute force search over all orderings, additional computation on private transactions may be required.

A transaction in blockchain can be considered a function which takes as input the state of a VM and outputs the future state (although the input and output can be generalised). In this case \mathcal{F} is the set of all transactions (finite because of gas limits), X is the set of all possible VM states and L are the bits of information leaked (e.g. boolean for success/revert of the transaction). One can simplify the problem by considering \mathcal{F} the set of all swaps, with each swap parameterised by pool, size and slippage limit.

Edit

  • The most concrete direction I have in mind at the moment is looking at bounds on concept learning with membership queries in computational learning theory (see this or this) and combining that with results from PAC-Bayes which is PAC learning with priors.

Part 2

Now we layer on some economics.

EDIT: Sorry I skimmed through some of the writing and forgot about the polynomial bound. Clearly what I suggested below is a naive and not so polynomial time algorithm.

I do have to ask couple questions from here though

  1. How much overlap is there in function in F every time an adversary has to solve ? (like can you just take advantage of memoization) so they just have to query without solving a lot of equalities?

with memoization and right kind of data structure you can do it linear time or better. I don’t think you even need a paper to refer to at that point.

  1. Could you give an example of a function F in closed-form?

Are you assuming that the adversary already knows what F and the ranges of the functions f_i are? (therefore the values in X). Can adversaries evaluate f_i(X) before choosing x? If F a finite set? If so, then I think it’s an easy trap to fall into thinking this as a learning problem.

I think it could rather be seen as a “search” problem where you’re trying to find values of X where each f_i(x) values are mutually exclusive. (regardless of whether X is a discrete or continuous set)

If you already know what each f_i looks like then you can just first solve for each pair f_i and f_j for x values where f_i(x) = f_j(x). You can create a set X’ using values that have overlap and just simply pick x values that aren’t in this set. If X \ X’ isn’t a null set then you can pretty much figure out what the chosen f is in 1 iteration. If not (the example you posted), then you can attach each segment in X’ with a number of functions it overlaps

Downside of the following approach would of course be that you’d have to have different ways of generating X’ per set of problems classes F. So really the papers you’d want to read just becomes how to solve for equalities efficiently when you have (|F| choose 2) number of equalities to solve for.

2 Examples - 1 the same as you posted and 1 my own.

Example 1:
AND(x) = OR(x) → (1,1) and (0,0)
AND(x) = NAND(x) → empty
AND(x) = XOR(x) → (0,0)
OR(x) = NAND(x) → (0,1), (1,0)
OR(x) = XOR(x) → (0,0), (1,0), (0,1)
NAND(x)=XOR(x) → (1,0), (0,1), (1,1)

(1,1): 2 overlaps
(0,0): 2 overlaps
(1,0): 3 overlaps
(0,1): 3 overlaps

From here we can tell that we should prob choose (1,1) or (0,0) as our first iteration.
If n=2, and we’ve narrowed it down f to OR and AND, then we repeat this again with OR and AND

AND(x) = OR(x) → (1,1) and (0,0)
(1,1): 1 overlap
(0,0): 1 overlap
(1,0): 0 overlap
(0,1): 0 overlap

So 2nd iteration we choose (1,0) or (0,1), which will give us a definitive answer.

Example 2:
F: x^2, x^3, -x, x
X: all real numbers

Again, just evaluate equalities and solve for x - you get

(0,0): 4 overlaps
(1,1): 3 overlaps
(-1,1): 2 overlaps
(-1,-1): 2 overlaps
(the rest): 0 overlap

Now you just choose any x that’s X\ {(0,0), (1,1), (-1,1), (-1,-1)} and you got yourself a solution

1 Like

Nice! Thanks for the response :slight_smile:

The VC dimension seems to be quite closely related to (or the opposite of?) what you are talking about, if you haven’t looked at it.

How much overlap is there in function in F every time an adversary has to solve ? (like can you just take advantage of memoization) so they just have to query without solving a lot of equalities?

We can assume they play the game many times with the same F (e.g. always trying to figure out what kind of swap a txn is)

Could you give an example of a function F in closed-form?

I assume you mean an element of F. Its easier to think of the functions as programs than analytic functions. e.g. swaps could be thought of as some code that does something like take a list of pools as input (each pool parameterised by a price, liquidity, asset pair) and checks whether a set of these pools conforms to certain parameters (e.g. prices are within a certain bound) before returning true or false