The Cloudflare Blog

Post-quantumify internal services: Logfwrdr, Tunnel, and gokeyless

Sofía Celi — Fri, 25 Feb 2022 16:03:12 GMT

Theoretically, there is no impediment to adding post-quantum cryptography to any system. But the reality is harder. In the middle of last year, we posed ourselves a big challenge: to change all internal connections at Cloudflare to use post-quantum cryptography. We call this, in a cheeky way, “post-quantum-ifying” our services. Theoretically, this should be simple: swap algorithms for post-quantum ones and move along. But with dozens of different services in various programming languages (as we have at Cloudflare), it is not so simple. The challenge is big, but we are here and up for the task! In this blog post, we will look at what our plan was, where we are now, and what we have learned so far. Welcome to the first announcement of a post-quantum future at Cloudflare: our connections are going to be quantum-secure!

What are we doing?

The life of most requests at Cloudflare begins and ends at the edge of our global network. Not all requests are equal and on their path they are transmitted by several protocols. Some of those protocols provide security properties whilst others do not. For the protocols that do, for context, Cloudflare uses: TLS, QUIC, WireGuard, DNSSEC, IPsec, Privacy Pass, and more. Migrating all of these protocols and connections to use post-quantum cryptography is a formidable task. It is also a task that we do not treat lightly because:

We have to be assured that the security properties provided by the protocols are not diminished.
We have to be assured that performance is not negatively affected.
We have to be wary of other requirements of our ever-changing ecosystem (like, for example, keeping in mind our FedRAMP certification efforts).

Given these requirements, we had to decide on the following:

How are we going to introduce post-quantum cryptography into the protocols?
Which protocols will we be migrating to post-quantum cryptography?
Which Cloudflare services will be targeted for this migration?

Let’s explore now what we chose: welcome to our path!

TLS and post-quantum in the real world

One of the most used security protocols is Transport Layer Security (TLS). It is the vital protocol that protects most of the data that flows over the Internet today. Many of Cloudflare’s internal services also rely on TLS for security. It seemed natural that, for our migration to post-quantum cryptography, we would start with this protocol.

The protocol provides three security properties: integrity, authentication, and confidentiality. The algorithms used to provide the first property, integrity, seem to not be quantum-threatened (there is some research on the matter). The second property, authentication, is under quantum threat, but we will not focus on it for reasons detailed later. The third property, confidentiality, is the one that we are interested in protecting as it is urgent to do this now.

Confidentiality assures that no one other than the intended receiver and sender of a message can read the transmitted message. Confidentiality is especially threatened by quantum computers as an attacker can record traffic now and decrypt it in the future (when they get access to a quantum computer): this means that all past and current traffic, not just future traffic, is vulnerable to be read by anyone who obtains a quantum computer (and has stored the encrypted traffic captured today).

At Cloudflare, to protect many of our connections, we use TLS. We mainly use the latest version of the protocol, TLS 1.3, but we sometimes do still use TLS 1.2 (as seen in the image, though, it only shows the connections between websites to our network). As we are a company that pushes for innovation, this means that we are intent on using this time of migration to post-quantum cryptography as an opportunity to also update TLS handshakes to 1.3 and be assured that we are using TLS in the right way (by, for example, ensuring that we are not using deprecated features of TLS).

Cloudflare’s TLS and QUIC usage taken on the 17/02/2022 and showing the last 7 days.

Changing TLS 1.3 to provide quantum-resistant security for its confidentiality means changing the ‘key exchange’ phase of the TLS handshake. Let’s briefly look at how the TLS 1.3 handshake works.

The TLS 1.3 handshake

In TLS 1.3, there will always be two parties: a client and a server. A client is a party that wants the server to “serve” them something, which could be a website, emails, chat messages, voice messages, and more. The handshake is the process by which the server and client attempt to agree on a shared secret, which will be used to encrypt the subsequent exchange of data (this shared secret is called the “master secret”). The client selects their favorite key exchange algorithms and submits one or more “key shares” to the server (they send both the name of the key share and its public key parameter). The server picks one of the key exchange algorithms (assuming that it supports one of them), and replies with their own key share. Both the server and the client then combine the key shares to compute a shared secret (the “master secret”), which is used to protect the remainder of the connection. If the client only chooses algorithms that the server does not support, the server instead replies with the algorithms that it does support and asks the client to try again. During this initial conversation, the client and server also agree on authentication methods and the parameters for encryption, but we can leave that aside for today in this blog post. This description is also simplified to focus only in the “key exchange” phase.

There is a mechanism to add post-quantum cryptography to this procedure: you advertise post-quantum algorithms in the list of key shares, so the final derived shared key (the “master secret”) is quantum secure. But there are requirements we had to take into account when doing this with our connections: the security of much post-quantum cryptography is still under debate, and we need to respect our compliance efforts. The solution to these requirements is to use a “hybrid” mechanism.

A “hybrid” mechanism means to use both a “pre-quantum” or “classical” algorithm and a “post-quantum” algorithm, and mixing both generated shared secrets into the derivation of the “master secret”. The combination of both shared secrets is of the form \Z′ = Z || T\ (for TLS and with the fixed size shared secrets, simple concatenation is secure. In other cases, you have to be a bit more careful). This procedure is a concatenation consisting of:

A “classical” shared secret Z, derived following the guidelines of Federal Information Processing Standards (FIPS 104-2) approved mechanisms (as recommended over here), like, for example, the P-256 elliptic curve.
An auxiliary shared secret T, derived with some other method: in this case, in a quantum-secure way.

The usage of a “hybrid” approach allows us to safeguard our connections in case the security of the post-quantum algorithm fails. It also results in a suitable-for-FIPS secret, as it is approved in the “Recommendation for Key-Derivation Methods in Key-Establishment Schemes” (SP 800-56C Rev. 2), which is listed in the Annex D, as an approved key establishing technique for FIPS 140-2.

At Cloudflare, we are using different TLS libraries. We decided to add post-quantum cryptography to those, specifically, to the BoringCrypto library or the compiled version of Golang with BoringCrypto. We added our implementation of the Kyber-512 algorithm (this algorithm can be eventually swapped by another one; we’re not picking any here. Not only that, but we are using it for our testing phase) to those libraries and implemented the “hybrid” mechanism as part of the TLS handshake. For the “classical” algorithm we used curve P-256. We then compiled certain services with these new TLS libraries.

Name of algorithm	Number of times loop executed	Average runtime per operation	Number of bytes required per operation	Number of allocations
Curve P-256	23,056	52,204 ns/op	256™p	5 allocs/op
Kyber-512	100,977	11,793 ns/op	832 B/op	3 allocs/op

Table 1: Benchmarks of the “key share” operation of Curve P-256 and Kyber-512: Scalar Multiplication and Encapsulation respectively. Benchmarks ran on Darwin, amd64, Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz.

Note that as TLS supports the described “negotiation” mechanism for the key exchange, the client and server have a way of mutually deciding what algorithms they want to use. This means that it is not required that both a client or a server support or even prefer the exact same algorithms: they just need to share support for a single algorithm for a handshake to succeed. In turn, herewith, even if we advertise post-quantum cryptography and a server/client does not support it, they will not fail but rather agree on some other algorithm they share.

A note on a matter we left on hold above: why are we not migrating the authentication phase of TLS to post-quantum? Certificate-based authentication in TLS, which is the one we commonly use at Cloudflare, also depends on systems on the wider Internet. Thus, changes to authentication require a coordinated and much wider effort to change. Certificates are attested as proofs of identity by outside parties: migrating authentication means coordinating a ceremony of migration with these outside parties. Note though that at Cloudflare we use a PKI with internally-hosted Certificate Authorities (CAs), which means that we can more easily change our algorithms. This will still need careful planning. We will not do this today, but we will in the near future.

Cloudflare services

The first step of our post-quantum migration is done. We have TLS libraries with post-quantum cryptography using a hybrid mechanism. The second step is to test this new mechanism in specific Cloudflare connections and services. We will look at three systems from Cloudflare that we have started migrating to post-quantum cryptography. The services in question are: Logfwrdr, Cloudflare Tunnel, and GoKeyless.

A post-quantum Logfwdr

Logfwdr is an internal service, written in Golang, that handles structured logs, and sends them from our servers for processing (to a subservice called ‘Logreceiver’) where they write them to Kafka. The connection between Logfwdr and Logreceiver is protected by TLS. The same goes for the connection between Logreceiver and Kafka in core. Logfwdr pushes its logs through “streams” for processing.

This service seemed an ideal candidate for migrating to post-quantum cryptography as its architecture is simple, it has long-lived connections, and it handles a lot of traffic. In order to first test the viability of using post-quantum cryptography, and, we created our own instance of Logreceiver and deployed it. We also created our own stream (the “pq-stream”), which is basically a copy of a HTTP stream (which was remarkably easy to add). We then compiled these services with the modified TLS library, and we got a post-quantum protected Logfwdr.

Figure 1: TLS latency of Logfwdr for selected metals. Notice how post-quantum cryptography is faster than non-post-quantum one (it is labeled with a “PQ” acronym).

What we found was that using post-quantum cryptography is faster than using “classical” cryptography! This was expected, though, as we are using a lattice-based post-quantum algorithm (Kyber512). The TLS latency of both post-quantum handshakes and “classical” ones can be noted in Figure 1. The figure shows more handshakes are executed than usual behavior as these servers are frequently restarted.

Note though that we are not using “only” post-quantum cryptography but rather the “hybrid” mechanism described above. This could increase performance times: in this case, the increase was minimal and still kept the post-quantum handshakes faster than the classical ones. Perhaps what makes the TLS handshakes faster in the post-quantum case is the usage of TLS 1.3, as the “classical” Logfwdr is using TLS 1.2. Logfwdr, though, is a service that executes long-lived handshakes, so in aggregate TLS 1.2 is not “slower” but it does have a slower start time.

As shown in Figure 2, the average batch duration of the post-quantum stream is lower than when not using post-quantum cryptography. This may be in part due to the fact that we are not sending the quantum-protected data all the way to Kafka (as the non-post-quantum stream is doing). We didn’t yet change the connection to Kafka post-quantum.

Figure 2: Average batch send duration: post-quantum (orange) and non-post-quantum streams (green).

We didn’t encounter any failures during this testing that ran for about some weeks. This gave us good insight that putting post-quantum cryptography into our internal network with actual data is possible. It also gave us confidence to begin migrating codebases to modified TLS libraries, which we will maintain.

What are the next steps for Logfwdr? Now that we confirmed it is possible, we will first start migrating stream by stream to this hybrid mechanism until we reach full post-quantum migration.

A post-quantum gokeyless

gokeyless is our own way to separate servers and TLS long-term private keys. With it, private keys are kept on a specialized key server operated by customers on their own architecture or, if using Geo Key Manager, in selected Cloudflare locations. We also use it for Cloudflare-held private keys with a service creatively known as gokeyless-internal. The final piece of this architecture is another service called Keynotto. Keynotto is a service written in Rust that only mints RSA and ECDSA key signatures (that are executed with the stored private key).

How does the overall architecture of gokeyless work? Let’s start with a request. The request arrives at the Cloudflare network, and we perform TLS termination. Any signing request is forwarded to Keynotto. A small portion of requests (specifically from GeoKDL or external gokeyless) cannot be handled by Keynotto directly, and are instead forwarded to gokeyless-internal. gokeyless-internal also acts as a key server proxy, as it redirects connections to the customer’s keyservers (external gokeyless). External gokeyless is both the server that a customer runs and the client that will be used to contact it. The architecture can be seen in Figure 3.

Figure 3: The life of a gokeyless request.

Migrating the transport that this architecture uses to post-quantum cryptography is a bigger challenge, as it involves migrating a service that lives on the customer side. So, for our testing phase, we decided to go for the simpler path that we are able to change ourselves: the TLS handshake between Keynotto and gokeyless-internal. This small test-bed means two things: first, that we needed to change another TLS library (as Keynotto is written in Rust) and, second, that we needed to change gokeyless-internal in such a way that it used post-quantum cryptography only for the handshakes with Keynotto and for nothing else. Note that we did not migrate the signing operations that gokeyless or Keynotto executes with the stored private key; we just migrated the transport connections.

Adding post-quantum cryptography to the rustls codebase was a straightforward exercise and we exposed an easy-to-use API call to signal the usage of post-quantum cryptography (as seen in Figure 4 and Figure 5). One thing that we noted when reviewing the TLS usage in several Cloudflare services is that giving the option to choose the algorithms for a ciphersuite, key share, and authentication in the TLS handshake confuses users. It seemed more straightforward to define the algorithm at the library level, and have a boolean or API call signal the need for this post-quantum algorithm.

Figure 4: post-Quantum API for rustls.

Figure 5: usage of the post-quantum API for rustls.

We ran a small test between Keynotto and gokeyless-internal with much success. Our next steps are to integrate this test into the real connection between Keynotto and gokeyless-internal, and to devise a plan for a customer post-quantum protected gokeyless external. This is the first instance in which our migration to post-quantum will not be ending at our edge but rather at the customer’s connection point.

A post-quantum Cloudflare Tunnel

Cloudflare Tunnel is a reverse proxy that allows customers to quickly connect their private services and networks to the Cloudflare network without having to expose their public IPs or ports through their firewall. It is mainly managed at the customer level through the usage of cloudflared, a lightweight server-side daemon_,_ in their infrastructure. cloudflared opens several long-lived TCP connections (although, cloudflared is increasingly using the QUIC protocol) to servers on Cloudflare’s global network. When a request to a hostname comes, it is proxied through these connections to the origin service behind cloudflared.

The easiest part of the service to make post-quantum secure appears to be the connection between our network (with a service part of Tunnel called origintunneld located there) and cloudflared, which we have started migrating. While exploring this path and looking at the whole life of a Tunnel connection, we found something more interesting, though. When the Tunnel connections eventually reach core, they end up going to a service called Tunnelstore. Tunnelstore runs as a stateless application in a Kubernetes deployment, and to provide TLS termination (alongside load balancing and more) it uses a Kubernetes ingress.

The Kubernetes ingress we use at Cloudflare is made of Envoy and Contour. The latter configures the former depending on Kubernetes resources. Envoy uses the BoringSSL library for TLS. Switching TLS libraries in Envoy seemed difficult: there are thoughts on how to integrate OpenSSL to it (and even some thoughts on adding post-quantum cryptography) and ways to switch TLS libraries. Adding post-quantum cryptography to a modified version of BoringSSL, and then specifying that dependency in the Bazel file of Envoy seems to be the path to go for, as our internal test has confirmed (as seen in Figure 6). As for Contour, for many years, Cloudflare has been running their own patched version of it: we will have to again patch this version with our Golang library to provide post-quantum cryptography. We will make these libraries (and the TLS ones) available for usage.

Figure 6: Option to allow post-quantum cryptography in Envoy.

Changing the Kubernetes ingress at Cloudflare not only makes Tunnel completely quantum-safe (beyond the connection between our global network and cloudflared), but it also makes any other services using ingress safe. Our first tests on migrating Envoy and Contour to TLS libraries that contain post-quantum protections have been successful, and now we have to test how it behaves in the whole ingress ecosystem.

What is next?

The main tests are now done. We now have TLS libraries (in Go, Rust, and C) that give us post-quantum cryptography. We have two systems ready to deploy post-quantum cryptography, and a shared service (Kubernetes ingress) that we can change. At the beginning of the blog post, we said that “the life of most requests at Cloudflare begins and ends at the edge of our global network”: our aim is that post-quantum cryptography does not end there, but rather reaches all the way to where customers connect as well. Let’s explore the future challenges and this customer post-quantum path in this other blog post!

Building Confidence in Cryptographic Protocols

Thom Wiggers — Thu, 24 Feb 2022 17:30:00 GMT

An introduction to formal analysis and our proof of the security of KEMTLS

Good morning everyone, and welcome to another Post-Quantum–themed blog post! Today we’re going to look at something a little different. Rather than look into the past or future quantum we’re going to look as far back as the ‘80s and ‘90s, to try and get some perspective on how we can determine whether a protocol is or is not secure. Unsurprisingly, this question comes up all the time. Cryptographers like to build fancy new cryptosystems, but just because we, the authors, can’t break our own designs, it doesn’t mean they are secure: it just means we are not smart enough to break them.

One might at this point wonder why in a post-quantum themed blog post we are talking about security proofs. The reason is simple: the new algorithms that claim to be safe against quantum threats need proofs showing that they actually are safe. In this blog post, not only are we going to introduce how we go about proving a protocol is secure, we’re going to introduce the security proofs of KEMTLS, a version of TLS designed to be more secure against quantum computers, and give you a whistle-stop tour of the formal analysis we did of it.

Let’s go back for the moment to not being smart enough to break a cryptosystem. Saying “I tried very hard to break this, and couldn’t” isn’t a very satisfying answer, and so for many years cryptographers (and others) have been trying to find a better one. There are some obvious approaches to building confidence in your cryptosystem, for example you could try all previously known attacks, and see if the system breaks. This approach will probably weed out any simple flaws, but it doesn’t mean that some new attack won’t be found or even that some new twist on an old one won’t be discovered.

Another approach you can take is to offer a large prize to anyone who can break your new system; but to do that not only do you need a big prize that you can afford to give away if you’re wrong, you can’t be sure that everyone would prefer your prize to, for example, selling an attack to cybercriminals, or even to a government.

Simply trying hard, and inducing other people to do so too still felt unsatisfactory, so in the late ‘80s researchers started trying to use mathematical techniques to prove that their protocol was secure. Now, if you aren’t versed in theoretical computer science you might not even have a clear idea of what it even means to “prove” a protocol is secure, let alone how you might go about it, so let’s start at the very beginning.

A proof

First things first: let’s nail down what we mean by a proof. At its most general level, a mathematical proof starts with some assumptions, and by making logical inferences it builds towards a statement. If you can derive your target statement from your initial assumptions then you can be sure that, if your assumptions are right, then your final statement is true.

Euclid’s famous work, The Elements, a standard math textbook for over 2,000 years, is written in this style. Euclid gives five “postulates”, or assumptions, from which he can derive a huge portion of the geometry known in his day. Euclid’s first postulate, that you can draw a straight line between any two points, is never proven, but taken as read. You can take his first postulate, and his third, that you can draw a circle with any center and radius, and use it to prove his first proposition, that you can draw an equilateral triangle given any finite line. For the curious, you can find public-domain translations of Euclid’s work.

Euclid’s method of drawing an equilateral triangle based on the finite line AB, by drawing two circles around points A and B, with the radius AB. The intersection finds point C of the triangle. Original raster file uploader was Mcgill at en.wikibooks SVG: Beao, Public domain, via Wikimedia Commons.

Whilst it’s fairly easy to intuit how such geometry proofs work, it’s not immediately clear how one could prove something as abstract as the security of a cryptographic protocol. Proofs of protocols operate in a similar way. We build a logical argument starting from a set of assumptions. Security proofs, however, can be much, much bigger than anything in The Elements (for example, our proof of the security properties of KEMTLS, which we will talk about later, is nearly 500,000 lines long) and the only reason we are able to do such complex proofs is that we have something of an advantage over Euclid. We have computers. Using a mix of human-guided theorem proving and automated algorithms we can prove incredibly complex things, such as the security of protocols as the one we will discuss.

Now we know that a proof is a set of logical steps built from a set of assumptions, let’s talk a bit about how security proofs work. First, we need to work out how to describe the protocol in terms that we can reason about. Over the years researchers have come up with many ways for describing computer processes mathematically, most famously Alan Turing defined a-machines, which we now know as Turing Machines, which describe a computer program in an algebraic form. A protocol is slightly more complex than a single program. A protocol can be seen as a number of computers running a set of computer programs that interact with each other.

We’re going to use a class of techniques called process algebras to describe the interacting processes of a protocol. At its most basic level, algebra is the art of generalizing a statement by replacing specific values with general symbols. In standard algebra, these specific values are numbers, so for example we can write (cos 37)² + (sin 37)² = 1, which is true, but we can generalize it to (cos θ)² + (sin θ)² = 1, replacing the specific value, 37, with the symbol θ.

Now you might be wondering why it’s useful to replace things with symbols. The answer is it lets us solve entire classes of problems instead of solving each individual instance. When it comes to security protocols, this is especially important. We can’t possibly try every possible set of inputs to a protocol and check nothing weird happens to one of them. In fact, one of the assumptions we’re going to make when we prove KEMTLS secure is that trying every possible value for some inputs is impossible¹. By representing the protocol symbolically, we can write a proof that applies to all possible inputs of the protocol.

Let’s go back to algebra. A process algebra is similar to the kind of algebra you might have learnt in high school: we represent a computer program with symbols for the specific values. We also treat functions symbolically. Rather than try and compute what happens when we apply a function f to a value x, we just create a new symbol f(x). An algebra also provides rules for manipulating expressions. For example, in standard algebra we can transform y + 5 = x² - x into y = x² - x - 5. A process algebra is the same: it not only defines a language to describe interacting processes, it also defines rules for how we can manipulate those expressions.

We can use tools, such as the one we use called Tamarin, to help us do this reasoning. Every protocol has its own rules for what transformations are allowed. It is very useful to have a tool, like Tamarin, to which we can tell these rules and allow it to do all the work of symbol manipulation. Tamarin does far, far more than that, though.

A rule, that we tell Tamarin, might look like this:

rule Register_pk:
  [ Fr(~ltkA) ]
--[ GenLtk($A, ~ltkA)]->
  [ !Ltk($A, ~ltkA), !Pk($A, pk(~ltkA)), Out(pk(~ltkA)) ]

This rule is used to represent that a protocol participant has acquired a new public/private key pair. The rule has three parts:

The first part lists the preconditions. In this case, there is only one: we take a Fresh value called ~ltkA, the “long-term key of A”. This precondition is always met, because Tamarin always allows us to generate fresh values.
The third part lists the postconditions (what we get back when we apply the rule). Rather than operating on an initial statement, as in high-school algebra, Tamarin instead operates on what we call a model of “bag of facts”. Instead of starting with y + 5 = x² - x, we start with an empty “bag”, and from there, apply rules. These rules take facts out of the bag and put new ones in. In this case, we put in:
- !Ltk($A, ~ltkA) which represents the private portion of the key, ~ltkA, and the name of the participant it was issued to, $A.
- !Pk($A, pk(~ltkA)), which represents the public portion of the key, pk(~ltkA), and the participant was issued to, $A.
- Out(pk(~ltkA)), which represents us publishing the public portion of the key, pk(~ltkA) to the network. Tamarin is based on the Dolev-Yao model, which assumes the attacker controls the network. Thus, this fact makes $A’s public key available to the attacker.

We can only apply a rule if the preconditions are met: the facts we need appear in the bag. By having rules for each step of the protocol, we can apply the rules in order and simulate a run of the protocol. But, as I’m sure you’ve noticed, we skipped the second part of the rule. The second part of the rule is where we list what we call actions.

We use actions to record what happened in a protocol run. In our example, we have the action GenLtk($A, ~ltkA). GenLtk means that a new Long-Term Key (LTK) has been Generated. Whenever we trigger the Register_pk rule, we note this with the two parameters:, $A, the party to whom the new key pair belongs; and ~ltkA, the private part of the generated key².

If we simulate a single run of the protocol, we can record a list of all the actions executed and put them in a list. However, at any point in the protocol, there may be multiple rules that can be triggered. A list only captures a single run of a protocol, but we want to reason about all possible runs of the protocol. We can arrange our rules into a tree: every time we have multiple rules that could be executed, we give each one of them its own branch.

If we could write this entire tree, it would represent every possible run of the protocol. Because every possible run appears in this tree, if we can show that there are no “bad” runs on this tree, we can be sure that the protocol is “secure”. We put “bad” and “secure” in quotation marks here because we still haven’t actually defined what those terms actually mean.

But before we get to that, let’s quickly recap what we have so far. We have:

A protocol we want to prove.
A definition of protocol, as a number of computers running a set of computer programs that interact with each other.
A technique, process algebras, to describe the interacting processes of the protocol: this technique provides us with symbols and rules for manipulating them.
A tree that represents every possible run of the protocol.

We can reason about a protocol by looking at properties that our tree gives. As we are interested in cryptographic protocols, we would like to reason about its security. “Security” is a pretty abstract concept and its meaning changes in different contexts. In our case, to prove something is secure, we first have to say what our security goals are. One thing we might want to prove is, for example, that an attacker can never learn the encryption key of a session. We capture this idea with a reachability lemma.

A reachability lemma asks whether there is a path in the tree that leads to a specific state: can we “reach” this state? In this case, we ask: “can we reach a state where the attacker knows the session key?” If the answer is “no”, we are sure that our protocol has that property (an attacker never learns the session key), or at least that that property is true in our protocol model.

So, if we want to prove the security of a cryptographic protocol, we need to:

Define the goals of the security that is being proven.
Describe the protocol as an interacting process of symbols, rules, and expressions.
Build a tree of all the steps the protocol can take.
Check that the trees of protocol runs attain the goals of security we specified.

This process of creating a model of a program and writing rigorous proofs about that model is called “formal analysis”.

Writing formal proofs of protocol correctness has been very effective at finding and fixing all kinds of issues. During the design of TLS 1.3, for example, it uncovered a number of serious security flaws that were eventually fixed prior to standardization. However, something we need to be wary of with formal analysis is being over-reliant on its results. It’s very possible to be so taken with the rigour of the process and of its mathematical proofs, that the result gets overinterpreted. Not only can a mistake exist in a proof, even in a machine-checked one, the proof may not actually prove what you think it does. There are many examples of this: Needham-Schroeder had a proof of security written in the BAN logic, before Lowe found an attack on a case that the BAN logic did not cover.

In fact, the initial specification of the TLS 1.3 proof made the assumption that nobody uses the same certificate for both a client and a server, even though this is not explicitly disallowed in the specification. This gap led to the “Selfie” vulnerability where a client could be tricked into connecting to itself, potentially leading to resource exhaustion attacks.

Formal analysis of protocol designs also tells you nothing about whether a particular implementation correctly implements a protocol. In other blog posts, we will talk about this. Let’s now return to our core topic: the formal analysis of KEMTLS.

Proving KEMTLS’s security

Now that we have the needed notions, let’s get down to the nitty-gritty: we show you how we proved KEMTLS is secure. KEMTLS is a proposal to do authentication in TLS handshakes using key exchange (via key encapsulation mechanisms or KEMs). KEMTLS examines the trade-offs between post-quantum signature schemes and post-quantum key exchange, as we discussed in other blog posts.

The main idea of KEMTLS is the following: instead of using a signature to prove that you have access to the private key that corresponds to the (signing) public key in the certificate presented, we derive a shared secret encapsulated to a (KEM) public key. The party that presented the certificate can only derive (decapsulate) it from the resulting encapsulation (often also called the ciphertext) if they have access to the correct private key; and only then can they read encrypted traffic. A brief overview of how this looks in the “traditional arrows on paper” form is given below.

Brief overview of the core idea of KEMTLS.

We want to show that the KEMTLS handshake is secure, no matter how an adversary might mess with, reorder, or even create new protocol messages. Symbolic analysis tools such as Tamarin or ProVerif are well suited to this task: as said, they allow us to consider every possible combination or manipulation of protocol messages, participants, and key information. We can then write lemmas about the behavior of the protocol.

Why prove it in Tamarin?

There exists a pen-and-paper proof of the KEMTLS handshake. You might ask: why should we still invest the effort of modeling it in a tool like Tamarin?

Pen-and-paper proofs are in theory fairly straightforward. However, they are very hard to get right. We need to carefully express the security properties of the protocol, and it is very easy to let assumptions lead you to write something that your model does not correctly cover. Verifying that a proof has been done correctly is also very difficult and requires almost as much careful attention as writing the proof itself. In fact, several mistakes in definitions of the properties of the model of the original KEMTLS proof were found, after the paper had been accepted and published at a top-tier security conference.

? For those familiar with these kinds of game-based proofs, another “war story”: while modeling the ephemeral key exchange, the authors of KEMTLS initially assumed all we needed was an IND-CPA secure KEM. After writing out all the simulations in pseudo code (which is not part of the proof or the paper otherwise!), it turned out that we needed an additional oracle to answer a single decapsulation query, resulting in requiring a weaker variant of IND-CCA security of our KEM (namely, IND-1CCA security). Using an “only” IND-CPA-secure KEM turned out to not be secure!

Part of the problem with pen-and-paper proofs is perhaps the human nature to read between the lines: we quickly figure out what is intended by a certain sentence, even if the intent is not strictly clarified. Computers do not allow that: to everyone’s familiar frustration whenever a computer has not done what you wanted, but just what you told it to do.

A benefit of computer code, though, is that all the effort is in writing the instructions. A carefully constructed model and proof result in an executable program: verifying should be as simple as being able to run it to the end. However, as always we need to:

Be very careful that we have modeled the right thing and,
Note that even the machine prover might have bugs: this second computer-assisted proof is a complement to, and not a replacement of, the pen-and-paper proof.

Another reason why computer proofs are interesting is because they give the ability to construct extensions. The “pre-distributed keys” extension of KEMTLS, for example, has been only proven on paper in isolation. Tamarin allows us to construct that extension in the same space as the main proof, which will help rule out any cross-functional attacks. With this, the complexity of the proof is increased exponentially, but Tamarin allows us to handle that just by using more computer power. Doing the same on paper requires very, very careful consideration.

One final reason we wanted to perform this computer analysis is because whilst the pen-and-paper proof was in the computational model, our computer analysis is in the symbolic model. Computational proofs attain “high resolution” proofs, giving very tight bounds on exactly how secure a protocol is. Symbolic models are “low resolution”: giving a binary yes/no answer on whether a protocol meets the security goals (with the assumption that the underlying cryptographic primitives are secure). This might sound like computational proofs are the best: their downside is that one has to simplify the model in other areas. The computational proof of KEMTLS, for example, does not model TLS message formats, which a symbolic model can.

Modeling KEMTLS in Tamarin

Before we can start making security claims and asking Tamarin to prove them, we first need to explain to Tamarin what KEMTLS is. As we mentioned earlier, Tamarin treats the world as a “bag of facts”. Keys, certificates, identities, and protocol messages are all facts. Tamarin can take those facts and apply rules to them to construct (or deconstruct) new facts. Executing steps in the KEMTLS protocol is, in a very literal sense, just another way to perform such transformations — and if everything is well-designed, the only “honest” way to reach the end state of the protocol.

We need to start by modeling the protocol. We were fortunate to be able to reuse the work of Cremers et al., who contributed their significant modeling talent to the TLS 1.3 standardization effort. They created a very complete model of the TLS 1.3 protocol, which showed that the protocol is generally secure. For more details, see their paper.

We modified the ephemeral key exchange by substituting the Diffie-Hellman operations in TLS 1.3 with the appropriate KEM operations. Similarly, we modified the messages that perform the certificate handling: instead of verifying a signature, we send back a KemEncapsulation message with the ciphertext. Let’s have a look at one of the changed rules. Don’t worry, it looks a bit scary, but we’re going to break it down for you. And also, don’t worry if you do not grasp all the details: we will cover the most necessary bits when they come up again, so you can also just skip to the next section “Modeling the adversary” instead.

rule client_recv_server_cert_emit_kex:
let
  // … snip
  ss = kemss($k, ~saseed)
  ciphertext = kemencaps($k, ss, certpk)
  // NOTE: the TLS model uses M4 preprocessor macros for notation
  // We also made some edits for clarity
in
  [
    State(C2d, tid, $C, $S, PrevClientState),
    In(senc{<'certificate', certpk>}hs_keys),
    !Pk($S, certpk),
    Fr(~saseed)
  ]
  --[
    C2d(tid),
    KemEncap($k, certpk, ss)
  ]->
  [
    State(C3, tid, $C, $S, ClientState),
    Out(senc{<'encaps', ciphertext>}hs_keyc)
  ]

This rule represents the client getting the server’s certificate and encapsulating a fresh key to it. It then sends the encapsulated key back to the server.

Note that the let … in part of the rule is used to assign expressions to variables. The real meat of the rule starts with the preconditions. As we can see, in this rule there are four preconditions that Tamarin needs to already have in its bag for this rule to be triggered:

The first precondition is State(C2d, …). This condition tells us that we have some client that has reached the stage C2d, which is what we call this stage in our representation. The remaining variables define the state of that client.
The second precondition is an In one. This is how Tamarin denotes messages received from the network. As we mentioned before, we assume that the network is controlled by the attacker. Until we can prove otherwise, we don’t know whether this message was created by the honest server, if it has been manipulated by the attacker, or even forged. The message contents, senc{<'certificate', certpk>}hs_keys, is symmetrically encrypted ( senc{}) under the server’s handshake key (we’ve slightly edited this message for clarity, and removed various other bits to keep this at least somewhat readable, but you can see the whole definition in our model).
The third precondition states the public key of the server, !Pk(S, certpk). This condition is preceded by a ! symbol, which means that it’s a permanent fact to be consumed many times. Usually, once a fact is removed from the bag, it is gone; but permanent facts remain. S is the name of the server, and certpk is the KEM public key.
The fourth precondition states the fresh random value, ~saseed.

The postconditions of this rule are a little simpler. We have:

State(C3, …), which represents that the client (which was at the start of the rule in state C2d) is now in state C3.
Out(senc{<'encaps', ciphertext>}hs_keyc), which represents the action of the client sending the encapsulated key to the network, encrypted under the client’s handshake key.

The four actions recorded in this rule are:

First, we record that the client with thread id tid reached the state C2d.
Second and third, we record that the client was running the protocol with various intermediate values. We use the phrase “running with” to indicate that although the client believes these values to be the correct, it can’t yet be certain that they haven’t been tampered with, so the client hasn’t committed yet to them.
Finally, we record the parameters we put into the KEM with the KemEncap action.

We modify and add such rules to the TLS 1.3 model, so we can run KEMTLS instead of TLS 1.3. For a sanity check, we need to make sure that the protocol can actually be executed: a protocol that can not run, can not leak your secrets. We use a reachability lemma to do that:

lemma exists_C2d:
    exists-trace
    "Ex tid #j. C2d(tid)@#j"

This was the first lemma that we asked Tamarin to prove. Because we’ve marked this lemma exists-trace, it does not need to hold in all traces, all runs of the protocol. It just needs one. This lemma asks if there exists a trace ( exists-trace), where there exists ( Ex ) a variable tid and a time #j (times are marked with #) at which action C2d(tid) is recorded. What this captures is that Tamarin could find a branch of the tree where the rule described above was triggered. Thus, we know that our model can be executed, at least as far as C2d.

Modeling the adversary

In the symbolic model, all cryptography is perfect: if the adversary does not have a particular key, they can not perform any deconstructions to, for example, decrypt a message or decapsulate a ciphertext. Although a proof with this default adversary would show the protocol to be secure against, for example, reordering or replaying messages, we want it to be secure against a slightly stronger adversary. Fortunately, we can model this adversary. Let’s see how.

Let’s take an example. We have a rule that honestly generates long-term keys (certificates) for participants:

rule Register_pk:
  [ Fr(~ltkA) ] 
  --[  GenLtk($A, ~ltkA)  ]->
  [ !Ltk($A, ~ltkA), 
    !Pk($A, kempk($k, ~ltkA)), 
    Out(kempk($k, ~ltkA))
  ]

This rule is very similar to the one we saw at the beginning of this blog post, but we’ve tweaked it to generate KEM public keys. It goes as follows: it generates a fresh value, and registers it as the actor $A’s long-term private key symbol !Ltk and $A’s public key symbol !Pk that we use to model our certificate infrastructure. It also sends ( Out ) the public key to the network such that the adversary has access to it.

The adversary can not deconstruct symbols like Ltk without rules to do so. Thus, we provide the adversary with a special Reveal query, that takes the !Ltk fact and reveals the private key:

rule Reveal_Ltk:
   [ !Ltk($A, ~ltkA) ] --[ RevLtk($A) ]-> [ Out(~ltkA) ]

Executing this rule registers the RevLtk($A) action, so that we know that $A’s certificate can no longer be trusted after RevLtk occurred.

Writing security lemmas

KEMTLS, like TLS, is a cryptographic handshake protocol. These protocols have the general goal to generate session keys that we can use to encrypt user’s traffic, preferably as quickly as possible. One thing we might want to prove is that these session keys are secret:

lemma secret_session_keys [/*snip*/]:
  "All tid actor peer kw kr aas #i.
      SessionKey(tid, actor, peer, , )@#i &
      not (Ex #r. RevLtk(peer)@#r & #r < #i) &
      not (Ex tid3 esk #r. RevEKemSk(tid3, peer, esk)@#r & #r < #i) &
      not (Ex tid4 esk #r. RevEKemSk(tid4, actor, esk)@#r & #r < #i)
    ==> not Ex #j. K(kr)@#j"

This lemma states that if the actor has completed the protocol and the attacker hasn’t used one of their special actions, then the attacker doesn’t know the actor’s read key, kr. We’ll go through the details of the lemma in a moment, but first let’s address some questions you might have about this proof statement.

The first question that might arise is: “If we are only secure in the case the attacker doesn’t use their special abilities then why bother modeling those abilities?” The answer has two parts:

We do not restrict the attacker from using their abilities: they can compromise every key except the ones used by the participants in this session. If they managed to somehow make a different participant use the same ephemeral key, then this lemma wouldn’t hold, and we would not be able to prove it.
We allow the attacker to compromise keys used in this session after the session has completed. This means that what we are proving is: an attacker who recorded this session in the past and now has access to the long-term keys (by using their shiny new quantum computer, for example) can’t decrypt what happened in the session. This property is also known as forward secrecy.

The second question you might ask is: “Why do we only care about the read key?”. We only care about the read key because this lemma is symmetric, it holds for all actors. When a client and server have established a TLS session, the client’s read key is the server’s write key and vice versa. Because this lemma applies symmetrically to the client and the server, we prove that the attacker doesn’t know either of those keys.

Let’s return now to the syntax of this lemma.

This first line of this lemma is a “For all” statement over seven variables, which means that we are trying to prove that no matter what values these seven variables hold, the rest of the statement is true. These variables are:

the thread id tid,
a protocol participant, actor,
the person they think they’re talking to, peer,
the final read and write keys, kr and kw respectively,
the actor’s authentication status, aas,
and a time #i.

The next line of the lemma is about the SessionKey action. We record the SessionKey action when the client or the server thinks they have completed the protocol.

The next lines are about two attacker abilities: RevLtk, as discussed earlier; and RevEKemSk, which the attacker can use to reveal ephemeral secrets. The K(x) action means that the attacker learns (or rather, Knows) x. We, then, assert that if there does not Exist a RevEKemSk or RevLtk action on one of the keys used in the session, then there also does not exist a time when K(kr) (when the attacker learns the read key). Quod erat demonstrandum. Let’s run the proofs now.

Proving lemmas in Tamarin

Tamarin offers two methods to prove security lemmas: it has an autoprover that can try to find the solution for you, or you can do it manually. Tamarin sometimes has a hard time figuring out what is important for proving a particular security property, and so manual effort is sometimes unavoidable.

The manual prover interface allows you to select what goal Tamarin should pursue step by step. A proof quickly splits into separate branches: in the picture, you see that Tamarin has already been able to prove the branches that are green, leaving us to make a choice for case 1.

Screenshot from the Tamarin user interface, showing a prompt for the next step in a proof. The branches of the proof that are marked green have already been proven.

Sometimes whilst working in the manual interface, you realize that there are certain subgoals that Tamarin is trying to prove while working on a bigger lemma. By writing what we call a helper lemma we can give Tamarin a shortcut of sorts. Rather than trying to solve all the subgoals of one big lemma, we can split the proof into more digestible chunks. Tamarin can then later reuse these helper lemmas when trying to prove bigger lemmas; much like factoring out functions while programming. Sometimes this even allows us to make lemmas auto-provable. Other times we can extract the kinds of decisions we’re making and heuristics we’re manually applying into a separate “oracle” script: a script that interacts with Tamarin’s prover heuristics on our behalf. This can also automate proving tricky lemmas.

Once you realize how much easier certain things are to prove with helper lemmas, you can get a bit carried away. However, you quickly find that Tamarin is being “distracted” by one of the helper lemmas and starts going down long chains of irrelevant reasoning. When this happens, you can hide the helper lemma from the main lemma you’re trying to prove, and sometimes that allows the autoprover to figure out the rest.

Unfortunately, all these strategies require a lot of intuition that is very hard to obtain without spending a lot of time hands-on with Tamarin. Tamarin can sometimes be a bit unclear about what lemmas it’s trying to apply. We had to resort to tricks, like using unique, highly recognizable variable names in lemmas, such that we can reconstruct where a certain goal in the Tamarin interface is coming from.

While doing this work, auto-proving lemmas has been incredibly helpful. Each time you make a tiny change in either a lemma (or any of the lemmas that are reused by it) or in the whole model, you have to re-prove everything. If we needed to put in lots of manual effort each time, this project would be nowhere near done.

This was demonstrated by two bugs we found in one of the core lemmas of the TLS 1.3 model. It turned out that after completing the proof, some refactoring changes were made to the session_key_agreement lemma. These changes seemed innocent, but actually changed the meaning of the lemma, so that it didn’t make sense anymore (the original definition did cover the right security properties, so luckily this doesn’t cause a security problem). Unfortunately, this took a lot of our time to figure out. However, after a huge effort, we’ve done it. We have a proof that KEMTLS achieves its security goals.

Conclusions

Formal methods definitely have a place in the development of security protocols; the development process of TLS 1.3 has really demonstrated this. We think that any proposal for new security protocols should be accompanied by a machine-verified proof of its security properties. Furthermore, because many protocols are currently specified in natural language, formal specification languages should definitely be under consideration. Natural language is inherently ambiguous, and the inevitable differences in interpretation that come from that lead to all kinds of problems.

However, this work cannot be done by academics alone. Many protocols come out of industry who will need to do this for themselves. We would be the first to admit that the usability of these tools for non-experts is not all the way there yet — and industry and academia should collaborate on making these tools more accessible for everyone. We welcome and look forward to these collaborations in the future!

References

“Why does cryptographic software fail?: a case study and open problems” by David Lazar, Haogang Chen, Xi Wang and Nickolai Zeldovich
“SoK: Computer-Aided Cryptography” by Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers, Kevin Liao and Bryan Parno
“Post-quantum TLS without handshake signatures” by Peter Schwabe and Douglas Stebila and Thom Wiggers(*)
“More efficient post-quantum KEMTLS with pre-distributed public keys” by Peter Schwabe and Douglas Stebila and Thom Wiggers (*)
"On computable numbers, with an application to the Entscheidungsproblem." by Alan Turing
"A comprehensive symbolic analysis of TLS 1.3." by Cas Cremers, Marko Horvat, Jonathan Hoyland, Sam Scott, and Thyla van der Merwe (*)
"A Logic of Authentication" by Michael Burrows, Martin Abadi, and Roger Michael Needham

* The authors of this blog post were authors on these papers

.....

¹Of course, trying every value isn’t technically impossible, it’s just infeasible, so we make the simplifying assumption that it’s impossible, and just say our proof only applies if the attacker can’t just try every value. Other styles of proof that don’t make that assumption are possible, but we’re not going to go into them.

²For simplicity, this representation assumes that the public portion of a key pair can be derived from the private part, which may not be true in practice. Usually this simplification won’t affect the analysis.

Making protocols post-quantum

Thom Wiggers — Wed, 23 Feb 2022 13:59:42 GMT

Ever since the (public) invention of cryptography based on mathematical trap-doors by Whitfield Diffie, Martin Hellman, and Ralph Merkle, the world has had key agreement and signature schemes based on discrete logarithms. Rivest, Shamir, and Adleman invented integer factorization-based signature and encryption schemes a few years later. The core idea, that has perhaps changed the world in ways that are hard to comprehend, is that of public key cryptography. We can give you a piece of information that is completely public (the public key), known to all our adversaries, and yet we can still securely communicate as long as we do not reveal our piece of extra information (the private key). With the private key, we can then efficiently solve mathematical problems that, without the secret information, would be practically unsolvable.

In later decades, there were advancements in our understanding of integer factorization that required us to bump up the key sizes for finite-field based schemes. The cryptographic community largely solved that problem by figuring out how to base the same schemes on elliptic curves. The world has since then grown accustomed to having algorithms where public keys, secret keys, and signatures are just a handful of bytes and the speed is measured in the tens of microseconds. This allowed cryptography to be bolted onto previously insecure protocols such as HTTP or DNS without much overhead in either time or the data transmitted. We previously wrote about how TLS loves small signatures; similar things can probably be said for a lot of present-day protocols.

But this blog has “post-quantum” in the title; quantum computers are likely to make our cryptographic lives significantly harder by undermining many of the assurances we previously took for granted. The old schemes are no longer secure because new algorithms can efficiently solve their particular mathematical trapdoors. We, together with everyone on the Internet, will need to swap them out. There are whole suites of quantum-resistant replacement algorithms; however, right now it seems that we need to choose between “fast” and “small”. The new alternatives also do not always have the same properties that we have based some protocols on.

Fast or small: Cloudflare previously experimented with NTRU-HRSS (a fast key exchange scheme with large public keys and ciphertexts) and SIKE (a scheme with very small public keys and ciphertexts, but much slower algorithm operations).

In this blog post, we will discuss how one might upgrade cryptographic protocols to make them secure against quantum computers. We will focus on the cryptography that they use and see what the challenges are in making them secure against quantum computers. We will show how trade-offs might motivate completely new protocols for some applications. We will use TLS here as a stand-in for other protocols, as it is one of the most deployed protocols.

Making TLS post-quantum

TLS, from SSL and HTTPS fame, gets discussed a lot. We keep it brief here. TLS 1.3 consists of an Ephemeral Elliptic curve Diffie-Hellman (ECDH) key exchange which is authenticated by a digital signature that’s verified by using a public key that’s provided by the server in a certificate. We know that this public key is the right one because the certificate contains another signature by the issuer of the certificate and our client has a repository of valid issuer (“certificate authority”) public keys that it can use to verify the authenticity of the server’s certificate.

In principle, TLS can become post-quantum straightforwardly: we just write “PQ” in front of the algorithms. We replace ECDH key exchange by post-quantum (PQ) key exchange provided by a post-quantum Key Encapsulation Mechanism (KEM). For the signatures on the handshake, we just use a post-quantum signature scheme instead of an elliptic curve or RSA signature scheme. No big changes to the actual “arrows” of the protocol necessary, which is super convenient because we don’t need to revisit our security proofs. Mission accomplished, cake for everyone, right?

Upgrading the cryptography in TLS seems as easy as scribbling in “PQ-”.

Key exchange

Of course, it’s not so simple. There are nine different suites of post-quantum key exchange algorithms still in the running in round three of the NIST Post-Quantum standardization project: Kyber, SABER, NTRU, and Classic McEliece (the “finalists”); and SIKE, BIKE, FrodoKEM, HQC, and NTRU Prime (“alternates”). These schemes have wildly different characteristics. This means that for step one, replacing the key exchange by post-quantum key exchange, we need to understand the differences between these schemes and decide which one fits best in TLS. Because we’re doing ephemeral key exchange, we consider the size of the public key and the ciphertext since they need to be transmitted for every handshake. We also consider the “speed” of the key generation, encapsulation, and decapsulation operations, because these will affect how many servers we will need to handle these connections.

Table 1: Post-Quantum KEMs at security level comparable with AES128. Sizes in bytes.

Scheme	Transmission size (pk+ct)	Speed of operations
Finalists
Kyber512	1,632	Very fast
NTRU-HPS-2048-509	1,398	Very fast
SABER (LightSABER)	1,408	Very fast
Classic McEliece	261,248	Very slow
Alternate candidates
SIKEp434	676	Slow
NTRU Prime (ntrulpr)	1,922	Very fast
NTRU Prime (sntru)	1,891	Fast
BIKE	5,892	Slow
HQC	6,730	Reasonable
FrodoKEM	19,336	Reasonable

Fortunately, once we make this table the landscape for KEMs that are suitable for use in TLS quickly becomes clear. We will have to sacrifice an additional 1,400 - 2,000 bytes, assuming SIKE’s slow runtime is a bigger penalty to the connection establishment (see our previous work here). So we can choose one of the lattice-based finalists (Kyber, SABER or NTRU) and call it a day.¹

Signature schemes

For our post-quantum signature scheme, we can draw a similar table. In TLS, we generally care about the sizes of public keys and signatures. In terms of runtime, we care about signing and verification times, as key generation is only done once for each certificate, offline. The round three candidates for signature schemes are: Dilithium, Falcon, Rainbow (the three finalists), and SPHINCS+, Picnic, and GeMSS.

Table 2: Post-Quantum signature schemes at security level comparable with AES128 (or smallest parameter set). Sizes in bytes.

Scheme	Public key size	Signature size	Speed of operations
Finalists
Dilithium2	1,312	2,420	Very fast
Falcon-512	897	690	Fast if you have the right hardware
Rainbow-I-CZ	103,648	66	Fast
Alternate Candidates
SPHINCS+-128f	32	17,088	Slow
SPHINCS+-128s	32	7,856	Very slow
GeMSS-128	352,188	33	Very slow
Picnic3	35	14,612	Very slow

There are many signatures in a TLS handshake. Aside from the handshake signature that the server creates to authenticate the handshake (with public key in the server certificate), there are signatures on the certificate chain (with public keys for intermediate certificates), as well as OCSP Stapling (1) and Certificate Transparency (2) signatures (without public keys).

This means that if we used Dilithium for all of these, we require 17KB of public keys and signatures. Falcon is looking very attractive here, only requiring 6KB, but it might not run fast enough on embedded devices that don’t have special hardware to accelerate 64-bit floating point computations in constant time. SPHINCS+, GeMSS, or Rainbow each have significant deployment challenges, so it seems that there is no one-scheme-fits-all solution.

Picking and choosing specific algorithms for particular use cases, such as using a scheme with short signatures for root certificates, OCSP Stapling, and CT might help to alleviate the problems a bit. We might use Rainbow for the CA root, OCSP staples, and CT logs, which would mean we only need 66 bytes for each signature. It is very nice that Rainbow signatures are only very slightly larger than 64-byte ed25519 elliptic curve signatures, and they are significantly smaller than 256-byte RSA-2048 signatures. This gives us a lot of space to absorb the impact of the larger handshake signatures required. For intermediate certificates, where both the public key and the signature are transmitted, we might use Falcon because it’s nice and small, and the client only needs to do signature verification.

Using KEMs for authentication

In the pre-quantum world, key exchange and signature schemes used to be roughly equivalent in terms of work required or bytes transmitted. As we saw in the previous section, this doesn’t hold up in the post-quantum world. This means that this might be a good opportunity to also investigate alternatives to the classic “signed key exchange” model. Deploying significant changes to an existing protocol might be harder than just swapping out primitives, but we might gain better characteristics. We will look at such a proposed redesign for TLS here.

The idea is to use key exchange not just for confidentiality, but also for authentication. This uses the following idea: what a signature in a protocol like TLS is actually proving is that the person signing has possession of the secret key that corresponds to the public key that’s in the certificate. But we can also do this with a key exchange key by showing you can derive the same shared secret (if you want to prove this explicitly, you might do so by computing a Message Authentication Code using the established shared secret).

This isn’t new; many modern cryptographic protocols, such as the Signal messaging protocol, have used such mechanisms. They offer privacy benefits like (offline) deniability. But now we might also use this to obtain a faster or “smaller” protocol.

However, this does not come for free. Because authentication via key exchange (via KEM at least) inherently requires two participants to exchange keys, we need to send more messages. In TLS, this means that the server that wants to authenticate first needs to give the client their public key. The client obviously can not encapsulate a shared secret to a key he does not know.

We also still need to verify signatures on the certificate chain and the signatures for OCSP stapling and Certificate Transparency are still necessary. Because we need to do “offline” verification for those elements of the handshake, it is hard to get rid of those signatures. So we will still need to carefully look at those signatures and pick an algorithm that fits there.

   Client                                  Server
 ClientHello         -------->
                     <--------         ServerHello
                                             <...>
                     <--------         ^
                                  | Auth
 {Finished}          -------->                      |
 [Application Data]  -------->                      |
                     <--------          {Finished}  v
 [Application Data]  <------->  [Application Data]

: encrypted w/ keys derived from ephemeral KEX (HS)
{msg}: encrypted w/ keys derived from HS+KEM (AHS)
[msg]: encrypted w/ traffic keys derived from AHS (MS)

Authentication via KEM in TLS from the AuthKEM proposal

Authentication via KEM in TLS from the AuthKEM proposal

If we put the necessary arrows to authenticate via KEM into TLS it looks something like Figure 2. This is actually a fully-fledged proposal for an alternative to the usual TLS handshake. The academic proposal KEMTLS was published at the ACM CCS conference in 2020; a proposal to integrate this into TLS 1.3 is described in the draft-celi-wiggers-tls-authkem draft RFC.

What this proposal illustrates is that the transition to post-quantum cryptography might motivate, or even require, us to have a brand-new look at what the desired characteristics of our protocol are and what properties we need, like what budget we have for round-trips versus the budget for data transmitted. We might even pick up some properties, like deniability, along the way. For some protocols this is somewhat easy, like TLS; in other protocols there isn’t even a clear idea of where to start (DNSSEC has very tight limits).

Conclusions

We should not wait until NIST has finished standardizing post-quantum key exchange and signature schemes before thinking about whether our protocols are ready for the post-quantum world. For our current protocols, we should investigate how the proposed post-quantum key exchange and signature schemes can be fitted in. At the same time, we might use this opportunity for careful protocol redesigns, especially if the constraints are so tight that it is not easy to fit in post-quantum cryptography. Cloudflare is participating in the IETF and working with partners in both academia and the industry to investigate the impact of post-quantum cryptography and make the transition as easy as possible.

If you want to be a part of the future of cryptography on the Internet, either as an academic or an engineer, be sure to check out our academic outreach or jobs pages.

Reference

Post-Quantum TLS without Handshake Signatures by Peter Schwabe, Douglas Stebila and Thom Wiggers: https://eprint.iacr.org/2020/534/

¹Of course, it’s not so simple. The performance measurements were done on a beefy Macbook, using AVX2 intrinsics. For stuff like IoT (yes, your smart washing machine will also need to go post-quantum) or a smart card you probably want to add another few columns to this table before making a choice, such as code size, side channel considerations, power consumption, and execution time on your platform.

KEMTLS: Post-quantum TLS without signatures

Sofía Celi — Fri, 15 Jan 2021 12:00:00 GMT

The Transport Layer Security protocol (TLS), which secures most Internet connections, has mainly been a protocol consisting of a key exchange authenticated by digital signatures used to encrypt data at transport[1]. Even though it has undergone major changes since 1994, when SSL 1.0 was introduced by Netscape, its main mechanism has remained the same. The key exchange was first based on RSA, and later on traditional Diffie-Hellman (DH) and Elliptic-curve Diffie-Hellman (ECDH). The signatures used for authentication have almost always been RSA-based, though in recent years other kinds of signatures have been adopted, mainly ECDSA and Ed25519. This recent change to elliptic curve cryptography in both at the key exchange and at the signature level has resulted in considerable speed and bandwidth benefits in comparison to traditional Diffie-Hellman and RSA.

TLS is the main protocol that protects the connections we use everyday. It’s everywhere: we use it when we buy products online, when we register for a newsletter — when we access any kind of website, IoT device, API for mobile apps and more, really. But with the imminent threat of the arrival of quantum computers (a threat that seems to be getting closer and closer), we need to reconsider the future of TLS once again. A wide-scale post-quantum experiment was carried out by Cloudflare and Google: two post-quantum key exchanges were integrated into our TLS stack and deployed at our edge servers as well as in Chrome Canary clients. The goal of that experiment was to evaluate the performance and feasibility of deployment of two post-quantum key exchanges in TLS.

Similar experiments have been proposed for introducing post-quantum algorithms into the TLS handshake itself. Unfortunately, it seems infeasible to replace both the key exchange and signature with post-quantum primitives, because post-quantum cryptographic primitives are bigger, or slower (or both), than their predecessors. The proposed algorithms under consideration in the NIST post-quantum standardization process use mathematical objects that are larger than the ones used for elliptic curves, traditional Diffie-Hellman, or RSA. As a result, the overall size of public keys, signatures and key exchange material is much bigger than those from elliptic curves, Diffie-Hellman, or RSA.

How can we solve this problem? How can we use post-quantum algorithms as part of the TLS handshake without making the material too big to be transmitted? In this blogpost, we will introduce a new mechanism for making this happen. We’ll explain how it can be integrated into the handshake and we’ll cover implementation details. The key observation in this mechanism is that, while post-quantum algorithms have bigger communication size than their predecessors, post-quantum key exchanges have somewhat smaller sizes than post-quantum signatures, so we can try to replace signatures with key exchanges in some places to save space. We will only focus on the TLS 1.3 handshake as it is the TLS version that should be currently used.

Past experiments: making the TLS 1.3 handshake post-quantum

TLS 1.3 was introduced in August 2018, and it brought many security and performance improvements (notably, having only one round-trip to complete the handshake). But TLS 1.3 is designed for a world with classical computers, and some of its functionality will be broken by quantum computers when they do arrive.

The primary goal of TLS 1.3 is to provide authentication (the server side of the channel is always authenticated, the client side is optionally authenticated), confidentiality, and integrity by using a handshake protocol and a record protocol. The handshake protocol, the one of interest for us today, establishes the cryptographic parameters for securing and authenticating a connection. It can be thought of as having three main phases, as defined in RFC8446:

- The Parameter Negotiation phase (referred to as ‘Server Parameters’ in RFC8446), which establishes other handshake parameters (whether the client is authenticated, application-layer protocol support, etc).

- The Key Exchange phase, which establishes shared keying material and selects the cryptographic parameters to be used. Everything after this phase will be encrypted.

- The Authentication phase, which authenticates the server (and, optionally, the client) and provides key confirmation and handshake integrity.

The main idea of past experiments that introduced post-quantum algorithms into the handshake of TLS 1.3 was to use them in place of classical algorithms by advertising them as part of the supported groups[2] and key share[3] extensions, and, therefore, establishing with them the negotiated connection parameters. Key encapsulation mechanisms (KEMs) are an abstraction of the basic key exchange primitive, and were used to generate the shared secrets. When using a pre-shared key, its symmetric algorithms can be easily replaced by post-quantum KEMs as well; and, in the case of password-authenticated TLS, some ideas have been proposed on how to use post-quantum algorithms with them.

Most of the above ideas only provide what is often defined as ‘transitional security’, because its main focus is to provide quantum-resistant confidentiality, and not to take quantum-resistant authentication into account. It is possible to use post-quantum signatures for TLS authentication, but the post-quantum signatures are larger than traditional ones. Furthermore, it is worth noting that using post-quantum signatures is much more expensive than using post-quantum KEMs.

We can estimate the impact of such a replacement on network traffic by simply looking at the sum of the cryptographic objects that are transmitted during the handshake. A typical TLS 1.3 handshake using elliptic curve X25519 and RSA-2048 would transmit 1,376 bytes, which would correspond to the public keys for key exchange, the certificate, the signature of the handshake, and the certificate chain. If we were to replace X25519 by the post-quantum KEM Kyber512 and RSA by the post-quantum signature Dilithium II, two of the more efficient proposals, the size transmitted data would increase to 10,036 bytes[4]. The increase is mostly due to the size of the post-quantum signature algorithm.

The question then is: how can we achieve full post-quantum security and give a handshake that is efficient to be used?

A more efficient proposal: KEMTLS

There is a long history of other mechanisms, besides signatures, being used for authentication. Modern protocols, such as the Signal protocol, the Noise framework, or WireGuard, rely on key exchange mechanisms for authentication; but they are unsuitable for the TLS 1.3 case as they expect the long-term key material to be known in advance by the interested parties.

The OPTLS proposal by Krawczyk and Wee authenticates the TLS handshake without signatures by using a non-interactive key exchange (NIKE). However, the only somewhat efficient construction for a post-quantum NIKE is CSIDH, the security of which is the subject of an ongoing debate. But we can build on this idea, and use KEMs for authentication. KEMTLS, the current proposed experiment, replaces the handshake signature by a post-quantum KEM key exchange. It was designed and introduced by Peter Schwabe, Douglas Stebila and Thom Wiggers in the publication ‘Post-Quantum TLS Without Handshake Signatures’.

KEMTLS, therefore, achieves the same goals as TLS 1.3 (authentication, confidentiality and integrity) in the face of quantum computers. But there’s one small difference compared to the TLS 1.3 handshake. KEMTLS allows the client to send encrypted application data in the second client-to-server TLS message flow when client authentication is not required, and in the third client-to-server TLS message flow when mutual authentication is required. Note that with TLS 1.3, the server is able to send encrypted and authenticated application data in its first response message (although, in most uses of TLS 1.3, this feature is not actually used). With KEMTLS, when client authentication is not required, the client is able to send its first encrypted application data after the same number of handshake round trips as in TLS 1.3.

Intuitively, the handshake signature in TLS 1.3 proves possession of the private key corresponding to the public key certified in the TLS 1.3 server certificate. For these signature schemes, this is the straightforward way to prove possession; another way to prove possession is through key exchanges. By carefully considering the key derivation sequence, a server can decrypt any messages sent by the client only if it holds the private key corresponding to the certified public key. Therefore, implicit authentication is fulfilled. It is worth noting that KEMTLS still relies on signatures by certificate authorities to authenticate the long-term KEM keys.

With KEMTLS, application data transmitted during the handshake is implicitly authenticated rather than explicitly (as in TLS 1.3), and has slightly weaker downgrade resilience and forward secrecy; but full downgrade resilience and forward secrecy are achieved once the KEMTLS handshake completes.

By replacing the handshake signature by a KEM key exchange, we reduce the size of the data transmitted in the example handshake to 8,344 bytes, using Kyber512 and Dilithium II — a significant reduction. We can reduce the handshake size even for algorithms such as the NTRU-assumption based KEM NTRU and signature algorithm Falcon, which have a less-pronounced size gap. Typically, KEM operations are computationally much lighter than signing operations, which makes the reduction even more significant.

KEMTLS was presented at ACM CCS 2020. You can read more about its details in the paper. It was initially implemented in the RustTLS library by Thom Wiggers using optimized C/assembly implementations of the post-quantum algorithms provided by the PQClean and Open Quantum Safe projects.

Cloudflare and KEMTLS: the implementation

As part of our effort to show that TLS can be completely post-quantum safe, we implemented the full KEMTLS handshake in Golang’s TLS 1.3 suite. The implementation was done in several steps:

We first needed to clone our own version of Golang, so we could add different post-quantum algorithms to it. You can find our own version here. This code gets constantly updated with every release of Golang, following these steps.
We needed to implement post-quantum algorithms in Golang, which we did on our own cryptographic library, CIRCL.
As we cannot force certificate authorities to use certificates with long-term post-quantum KEM keys, we decided to use Delegated Credentials. A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. Therefore, they can be used for post-quantum KEM keys. See its implementation in our Golang code here.
We implemented mutual auth (client and server authentication) KEMTLS by using Delegated Credentials for the authentication process. See its implementation in our Golang code here. You can also check its test for an overview of how it works.

Implementing KEMTLS was a straightforward process, although it did require changes to the way Golang handles a TLS 1.3 handshake and how the key schedule works.

A “regular” TLS 1.3 handshake in Golang (from the server perspective) looks like this:

func (hs *serverHandshakeStateTLS13) handshake() error {
    c := hs.c

    // For an overview of the TLS 1.3 handshake, see RFC 8446, Section 2.
    if err := hs.processClientHello(); err != nil {
   	 return err
    }
    if err := hs.checkForResumption(); err != nil {
   	 return err
    }
    if err := hs.pickCertificate(); err != nil {
   	 return err
    }
    c.buffering = true
    if err := hs.sendServerParameters(); err != nil {
   	 return err
    }
    if err := hs.sendServerCertificate(); err != nil {
   	 return err
    }
    if err := hs.sendServerFinished(); err != nil {
   	 return err
    }
    // Note that at this point we could start sending application data without
    // waiting for the client's second flight, but the application might not
    // expect the lack of replay protection of the ClientHello parameters.
    if _, err := c.flush(); err != nil {
   	 return err
    }
    if err := hs.readClientCertificate(); err != nil {
   	 return err
    }
    if err := hs.readClientFinished(); err != nil {
   	 return err
    }

    atomic.StoreUint32(&c.handshakeStatus, 1)

    return nil
}

We had to interrupt the process when the server sends the Certificate (sendServerCertificate()) in order to send the KEMTLS specific messages. In the same way, we had to add the appropriate KEM TLS messages to the client’s handshake. And, as we didn’t want to change so much the way Golang handles TLS 1.3, we only added one new constant to the configuration that can be used by a server in order to ask for the Client’s Certificate (the constant is serverConfig.ClientAuth = RequestClientKEMCert).

The implementation is easy to work with: if a delegated credential or a certificate has a public key of a supported post-quantum KEM algorithm, the handshake will proceed with KEMTLS. If the server requests a Client KEMTLS Certificate, the handshake will use client KEMTLS authentication.

Running the Experiment

So, what’s next? We’ll take the code we have produced and run it on actual Cloudflare infrastructure to measure how efficiently it works.

Thanks

Many thanks to everyone involved in the project: Chris Wood, Armando Faz-Hernández, Thom Wiggers, Bas Westerbaan, Peter Wu, Peter Schwabe, Goutam Tamvada, Douglas Stebila, Thibault Meunier, and the whole Cloudflare Research team.

¹It is worth noting that the RSA key transport in TLS ≤1.2 has the server only authenticated by RSA public key encryption, although the server's RSA public key is certified using RSA signatures by Certificate Authorities.

²An extension used by the client to indicate which named groups -Elliptic Curve Groups, Finite Field Groups- it supports for key exchange.

³An extension which contains the endpoint’s cryptographic parameters.

⁴These numbers, as it is noted in the paper, are based on the round-2 submissions.