Portrait of a TEE: applications and identity

mateusz · November 21, 2024, 2:00pm

I’d like to offer some food for thought in the generic topic of identities in TEEs, with some emphasis on applications.

How we identify TEEs right now

The top-level question I’d want to start with is: what do we want to identify when it comes to a TEE instance?
There’s only a couple of things we can uniquely identify, so let’s look at those first:

Specific CPU, through for example encrypted ppid (can be extracted from an attestation)
Image boot process measurements, also extracted from an attestation
Any additional runtime extensions of registers, in particular the unused RTMR3
User data passed during quote generation, however it’s easily faked

From the above, we can identify the following (non-exhaustive):

Who has direct access to the specific hardware, by verifying a signature of encrypted ppid (with some caveats)
What is the image that’s running, based on measured boot
Any additional, post-boot configuration through runtime measurements

This is all very useful. We are all of course using PCRs to identify which VM image is running, and whether it’s the one we expect.
Dstack uses RTMR3 to specify the compose manifest which is running within an instance of dstack. RTMR3 is then used as a domain separator, for example keys generated are specific to the compose manifest’s hash — two different deployments will never get the same keys as their RTMR3 is simply different.
We are currently not using ppid to identify the infrastructure operator, and we should! Something as simple as a signed list of ppids published by infrastructure operators would give us something closely resembling “proof of cloud” (see cloud attestations). If each of those ppids would also come with an attestation, this signed list wouldn’t even be a bad solution.

What’s missing

There’s a couple obviously missing entities that we want to identify still.

It’s not currently possible to distinguish instances apart from ppid, which means if there’s two instances on the same CPU and one is misbehaving — no one would know! I’m not sure if it’s a problem we should solve, but it’s interesting that’s the case.
It is not obvious how to distinguish owners of workloads. Imagine two people want to deploy the same workload in Dstack — only the operator of underlying infrastructure can tell which is which. Ideally either no one is able to distinguish those or everyone can, depending on the privacy requirements. Crucially, the owners of workloads themselves must be able to tell them apart. Ideally we use runtime registers for this purpose, but it’s not trivial!
Most importantly, it’s not obvious how to connect instances (specific TEE virtual machines) with applications (more on those later).

Identifying TEE Applications

Short detour. What’s an application?
For the next bit I’ll assume an application is a uniquely identifiable, stable over time, software system that performs a specific function for users. Crucially, an application is governed (eg modified) by a specific process, usually a group of people.

Now this starts to become problematic, because we can identify instances and images, but applications we care about are spread across multiple instances, and multiple images that change over time. We can’t use any of the attestation-based identifiers, in particular runtime measurement extensions to RTMR3 won’t work.

Despair not, this problem is of course not new.

What I think identifying applications requires of us in tee world is a way to map the existing unique identifiers present in attestations (ppid, pcrs, rtmrs, in some combination) to a unique handle that identifies an application. This is usually referred to as “expected measurement” and you can find it on your favorite TEE app’s webpage.

For permissionless applications we need, of course, transparent and enforceable governance. The obvious solution for permissionless applications is smart contracts.

Measurements vs Smart Contracts

One side note I’d like to make here is the use of measurements as domain separators in key derivation and other functionalities.

Each identifier, when used as a domain separator, has a specific meaning. What I mean by this is that if we use boot measurements, each deployment configuration (image+firmware) will derive their keys. The same key will be derived regardless of any further runtime configuration.
This is catastrophic in some circumstances: in “searching in tdx” this would result in an unauthorized access to proprietary logic and data. It’s also a blessing in other circumstances: it would allow all instances in the same configuration to encrypt data to themselves.

If we want applications to be able to derive keys specific to themselves — any instance with a measurement allowlisted by governance of a smart contract — then the smart contract address effectively becomes the domain separator of choice. There’s a small caveat here that this would be a huge attack vector, as keys derived from an app handle rather than a specific version of an app could leak in case of an exploit on any of the app’s versions.
But using the smart contract address to form a p2p overlay mesh would actually be an amazing primitive.

Takeaway

The key point I’d like to reiterate is that we can’t use only measurements — including runtime measurements — to identify and distinguish applications. We need an external solution to map between applications and expected measurements, with smart contracts being the obvious solution.