The goal of this document is to explore alternatives to aTLS that better fit our use cases — specifically BuilderNet.
Breaking down aTLS
aTLS stands for “attested transport layer security”, and unsurprisingly is the combination of two protocols — a remote attestation protocol (MAA/DCAP/vTPM) and the TLS protocol. The aim of aTLS is to combine the security guarantees of attestations (measured boot endorsed by the infrastructure or hardware provider) and TLS (providing integrity, security and mitigating man-in-the-middle attacks). aTLS is one instance of a suite of protocols roughly equivalent to RATLS.
First, the remote attestation protocol is a way to provide evidence for what’s running in a remote machine. This comes in parts: the measurements of the machine (PCRs, RTMRs, various other measurements), and the chain of trust from hardware signatures through certificates all the way to some root of trust (sometimes called collateral or endorsements).
Second, TLS is a tool for establishing secure, encrypted TCP connections, while preventing man-in-the-middle attacks through key exchange, and certificate chains which aim to guarantee that you are connecting to the expected server (and not to some rando server who will steal your keys). The latter is usually connected to DNS, but not always.
How aTLS combines the two is through attesting the (temporary) TLS certificate along with a random nonce during the TLS handshake. This prevents most issues that could arise when considering attestations and TLS separately, and guarantees that the machine holding the TLS session decryption key is the same machine as the one producing the attestation.
The issues with aTLS
So, where does aTLS fall short?
- Longer connection establishment time. Since we are producing and validating an attestation, the handshake takes longer. This is not too bad for DCAP (raw TDX), but pretty impactful in MAA (around 2 seconds in our experience).
- Necessity to implement a custom TLS handshake protocol in our products. This is actually quite an issue, since the implementation is both security-critical and not trivial.
- Due to certificate restrictions, it’s not possible to realize the second part of TLS guarantees — server identity based on certificates, and connecting those to DNS. At least it’s true for ACME (like Let’s Encrypt).
- The TLS certificates are self-signed, and hence not accepted by standard tools (curl, https and tls libraries) without using “insecure” options
How are we solving those issues? We simply took away aTLS from any hot path in our products, and rather we rely on aTLS to exchange regular TLS certificates. There are some exceptions, but they are not very consequential.
If we have already moved away from aTLS for connections other than certificate exchange — why keep using it?
What aTLS does right
There is one thing that aTLS does really well, and why we are accepting its shortcomings: it ties the attestation to the temporary TLS certificate, and vice versa. It’s impossible to perform any kind of replay/man-in-the-middle/relay attacks thanks to those using the same nonce and session during key exchange and attestation.
And we absolutely need to re-create the security of the joint handshake.
What do we want from our communication channels?
Threat model: assume that either (or both) client or server are malicious, and can inspect, modify, replay, redirect, or relay any messages they wish to. We assume (for the purposes of communication channels) that code running inside TDX is free of bugs and backdoors, and that neither the client nor the server can break the security of TDX. The last assumption is a weak one, since security model of TDX does not include physical access — we’ll assume specifically that TDX VMs are run in a way that at least one party involved is disincentivized to break TDX — either the cloud service provider (CSP), bare metal hosting (BM), or the VM deployer.
Guarantees: any channel established with a TDX-protected endpoint (client, server, or both) must come with the following:
- A fresh attestation binding the TLS session to measured TDX VM
- End-to-end encryption and integrity for all requests and responses originating from the TDX-protected endpoint
- Assurance that unauthorized physical access to the TDX host machine is unlikely (see DCEA)
- Assurance from cloud service provider for TDX CVMs
- Assurance from bare metal hosting provider for TDX VMs deployed on bare metal machines
- Assurance from the VM deployer as a last resort (forces permissioning)
- Authenticated identity of the TDX VM deployer in order to attribute faults (in particular network censorship)
Request for comments
Feedback is more than welcome — I would love to see different perspectives on the whole issue of attested communication channels, as well as comments on any specific aspects of the problem statement. Very much welcome are pointers to related work and considerations and suggestions for any solutions.