Moving away from aTLS

The goal of this document is to explore alternatives to aTLS that better fit our use cases — specifically BuilderNet.

Breaking down aTLS

aTLS stands for “attested transport layer security”, and unsurprisingly is the combination of two protocols — a remote attestation protocol (MAA/DCAP/vTPM) and the TLS protocol. The aim of aTLS is to combine the security guarantees of attestations (measured boot endorsed by the infrastructure or hardware provider) and TLS (providing integrity, security and mitigating man-in-the-middle attacks). aTLS is one instance of a suite of protocols roughly equivalent to RATLS.

First, the remote attestation protocol is a way to provide evidence for what’s running in a remote machine. This comes in parts: the measurements of the machine (PCRs, RTMRs, various other measurements), and the chain of trust from hardware signatures through certificates all the way to some root of trust (sometimes called collateral or endorsements).

Second, TLS is a tool for establishing secure, encrypted TCP connections, while preventing man-in-the-middle attacks through key exchange, and certificate chains which aim to guarantee that you are connecting to the expected server (and not to some rando server who will steal your keys). The latter is usually connected to DNS, but not always.

How aTLS combines the two is through attesting the (temporary) TLS certificate along with a random nonce during the TLS handshake. This prevents most issues that could arise when considering attestations and TLS separately, and guarantees that the machine holding the TLS session decryption key is the same machine as the one producing the attestation.

The issues with aTLS

So, where does aTLS fall short?

  • Longer connection establishment time. Since we are producing and validating an attestation, the handshake takes longer. This is not too bad for DCAP (raw TDX), but pretty impactful in MAA (around 2 seconds in our experience).
  • Necessity to implement a custom TLS handshake protocol in our products. This is actually quite an issue, since the implementation is both security-critical and not trivial.
  • Due to certificate restrictions, it’s not possible to realize the second part of TLS guarantees — server identity based on certificates, and connecting those to DNS. At least it’s true for ACME (like Let’s Encrypt).
  • The TLS certificates are self-signed, and hence not accepted by standard tools (curl, https and tls libraries) without using “insecure” options

How are we solving those issues? We simply took away aTLS from any hot path in our products, and rather we rely on aTLS to exchange regular TLS certificates. There are some exceptions, but they are not very consequential.

If we have already moved away from aTLS for connections other than certificate exchange — why keep using it?

What aTLS does right

There is one thing that aTLS does really well, and why we are accepting its shortcomings: it ties the attestation to the temporary TLS certificate, and vice versa. It’s impossible to perform any kind of replay/man-in-the-middle/relay attacks thanks to those using the same nonce and session during key exchange and attestation.

And we absolutely need to re-create the security of the joint handshake.

What do we want from our communication channels?

Threat model: assume that either (or both) client or server are malicious, and can inspect, modify, replay, redirect, or relay any messages they wish to. We assume (for the purposes of communication channels) that code running inside TDX is free of bugs and backdoors, and that neither the client nor the server can break the security of TDX. The last assumption is a weak one, since security model of TDX does not include physical access — we’ll assume specifically that TDX VMs are run in a way that at least one party involved is disincentivized to break TDX — either the cloud service provider (CSP), bare metal hosting (BM), or the VM deployer.

Guarantees: any channel established with a TDX-protected endpoint (client, server, or both) must come with the following:

  • A fresh attestation binding the TLS session to measured TDX VM
  • End-to-end encryption and integrity for all requests and responses originating from the TDX-protected endpoint
  • Assurance that unauthorized physical access to the TDX host machine is unlikely (see DCEA)
    • Assurance from cloud service provider for TDX CVMs
    • Assurance from bare metal hosting provider for TDX VMs deployed on bare metal machines
    • Assurance from the VM deployer as a last resort (forces permissioning)
  • Authenticated identity of the TDX VM deployer in order to attribute faults (in particular network censorship)

Request for comments

Feedback is more than welcome — I would love to see different perspectives on the whole issue of attested communication channels, as well as comments on any specific aspects of the problem statement. Very much welcome are pointers to related work and considerations and suggestions for any solutions.

6 Likes

The requirements list “fresh attestation”. To me this reads like, we do not explicitly require per-session nonces, correct?

The original RA-TLS generated a temporary key/cert but explicitly did not generate a fresh attestation for each connection. Now, the RATLS paper linked in the article has a paragraph arguing for why a nonce is supposedly required.

If we assume the TLS key is not compromised, simply replaying an attestation would not enable a third party to impersonate a valid endpoint—making the nonce unnecessary in this scenario. Conversely, if the TLS key is compromised, a nonce would not help either.

This raises the question of why aTLS chose to include a per-session nonce, given that attestation generation is expensive and this approach appears to offer limited security benefits for busy endpoints.

If the goal is to convey attestation recency to the verifier, a more efficient approach could be to periodically regenerate the attestation and embed a timestamp. This would allow the same attestation to be reused across multiple sessions, while still enabling the verifier to check that it was generated within the last X hours.

That, however, leads to the question of time sources. Historically, trusted time has been problematic (at least for SGX). Perhaps TDX addresses this, or alternatively, NTP with signed responses could provide a verifiable time reference within the TEE.

This is correct under such assumptions, but in which other cases would a ‘fresh attestation’ be useful and from which perspective?
Some use-cases that I can quickly think of are:

  • when the TLS private key is frequently rotated, a verifier would need to make sure that it is getting a fresh attestation to make sure it is using the new TLS certificate.
  • From a verifier perspective, if the verifier only release confidential secrets and sensitive data conditionally upon a fresh valid attestation to make sure this was generated for their session only.
  • In some cases where the attestation is the authenticator (no separate PKI), a verifier would also need a per-session freshness with a verifier chosen nonce. I believe this is what the RATLS paper refer to prevent replay attacks.

There might be more use-cases that I didn’t mention, so I would like to hear your thoughts on those points or if you would add others to them.

This is a good point. Several month ago, I have explored the idea of timestamps in TEEs (TDX VMs) and as you mentioned there are some tradeoffs/issues and lack of trusted timestamps. Even if you have a secure network of NTP services within TDX instances, this would increase the system complexity and increase the attack surface since a malicious host still has full control of the network interfaces and could distort/dos/delay the results. Ofc, it can be fortified further against such attacks but then again this would increase the complexity and impact the efficiency even more. Triad Paper is very close to this topic.
There is also this nice paper that discusses the challenges of timestamps within TEEs

Sure — if the TLS key changes, a new attestation is required. But this would likely happen only once every few hours at most. I’d consider that orthogonal to the discussion around per-session nonces.

One thing we need to specify is whether anything other than the nonce changes as part of the attestation. With SGX, for example, the attributes conveyed through the attestation do not change at runtime — unlike with TDX, which can extend registers at runtime. In that sense, the scenario where an attacker might benefit from replaying an old attestation applies only to systems where the attestation attributes can change during execution.

Thinking about it further, I can see how an attacker might exploit an old attestation. Imagine a CVM with some security feature enabled is later compromised. This security feature is reflected in the attestation. The attacker could create an attestation with the feature enabled, then disable the feature afterward. The next time the CVM needs to attest, the attacker could present the old attestation (with the feature enabled) — potentially leading to harmful consequences.

The question is: if the attacker already controls the CVM and can mount this kind of attack, are replayed attestations really the primary concern?

As for your third point — I don’t see it. TLS itself prevents session replay. I don’t see how a per-session nonce helps here. I’ll reach out to the authors to clarify what they had in mind.

Just looked at the paragraph again:

Freshness of Attestation Reports To prevent replay attacks, it is essential
that pre-generation of attestation reports is not possible. Otherwise, an attacker
could intercept a valid attestation report and send it again when trying to spoof
an identity in an impersonation attack. To prevent replay attacks, RATLS must
include a nonce, which is generated by the challenger, in each attestation report
issued by the attester.

I got confused. We are not concerned about replaying a TLS session, but an attestation report. That’s a valid concern, I suppose. My point from above still holds though: if the attacker has this much control over the CVM, a replayed attestation may be the least concern.