IMp免费翻墙网 IMp免费翻墙网 Counter Galois Onion: Improved encryption for Tor circuit trafficIMp免费翻墙网
It's always a good day when we can talk about cryptography. Especially when we are sunsetting one of the oldest and most important encryption algorithms in Tor and replacing it with a research-backed new design, called Counter Galois Onion.IMp免费翻墙网
This overhaul will defend users against a broader class of online attackers (described below), and form the basis for more encryption work in the future.IMp免费翻墙网
Which cryptography are we talking about here?
The algorithm in question is Tor's relay encryption. While Tor uses the standard TLS protocol for communication between relays, and between clients and relays, it needs a specialized algorithm for encrypting user data as it traverses multiple relays in a circuit.1IMp免费翻墙网
That's the relay encryption algorithm. The client shares a symmetric key with each relay on its circuit, and encrypts an outgoing message, or "relay cell" with each one of those keys. Each relay can remove a single layer of encryption, until the client's cell reaches the exit relay.IMp免费翻墙网
Of course, we need to make sure that the data isn't modified on the way from the client. For that, we include a cryptographic digest in the cell. The digest covers not only the cell itself, but also all previous cells sent through the circuit (to prevent re-ordering) and another secret shared key (to make the digest unpredictable).IMp免费翻墙网
So with the digest added, clients behave as follows: they calculate and set the digest, then they use a stream cipher (AES-128-CTR in Tor's case) multiple times to encrypt the cell for each relay.IMp免费翻墙网
If we simplify a lot, then a relay cell looks a little like this:IMp免费翻墙网
Field
Width
Zero
2 bytes
Digest
4 bytes
Other stuff
...
The "zero" field is there so that nodes other than the exit can avoid checking the digest: If a relay gets a cell with a value other than zero, then it can tell that it isn't the recipient of that cell, so it doesn't need to check the digest: it just decrypts the cell and forwards it to the next relay.IMp免费翻墙网
When we designed this algorithm, we didn't give it a name. Now that we're replacing it, it helps to have some way to identify it, so we're calling it "tor1".IMp免费翻墙网
Figure 1. The tor1 encryption algorithm, as used at a middle layer.
The input is a message M. A counter CTR is expanded via a psueodrandom function (PRF, instantiated with AES-128-CTR), to produce a stream of bytes. These bytes are xored with M to produce a ciphertext C.IMp免费翻墙网
Figure 2: The tor1 encryption algorithm, as used to originate a message.
The message M is mixed with the hash state HS, and a portion of the digest is appended to the message. The whole thing is then xored with the PRF (AES-128-CTR) to produce our ciphertext C.IMp免费翻墙网
Wait, that design looks funny!
Yeah, you wouldn't build it that way nowadays, would you? That's why we're replacing it.IMp免费翻墙网
Existing designs weren't suitable for Tor's needs. The original onion routing paper didn't specify a means for authentication. Designs like mixmaster and mixminion were optimized for larger message sizes, and required a separate full digest for every possible layer of encryption. (For example, Mixmaster supported up to 20 remailers, so had to reserve space for 20 digests (and other stuff) in every message.)IMp免费翻墙网
Some of the "weird things" in the current tor1 design are outdated, but not awful; others are things that are more valuable to resolve.IMp免费翻墙网
So, what are the problems with the tor1 design?
There are a few. First the big one:IMp免费翻墙网
Problem 1: Tagging attacks
Tagging attacks enable an active adversary to trace traffic by modifying it in one place on the network, and observing predicatable changes in another. Even when tagging attacks don't succeed immediately, their side effects can give the attacker more and more opportunities to retry.IMp免费翻墙网
This is the most important attack we're solving with CGO. Even without the other problems below, this one would be worth fixing on its own.IMp免费翻墙网
The main version of this attack arises because tor1's use of AES-CTR encryption with no hop-by-hop authentication means that the relay encryption is malleable. Since counter mode derives its ciphertext C by XORing a secret key stream S with the plaintext P (C = S ⊕ P), an attacker who can XOR their own pattern M in to the ciphertext will produce C' = (S ⊕ P) ⊕ M = S ⊕ (P ⊕ M) — that is, a valid encryption of (P ⊕ M).IMp免费翻墙网
An attacker can use this attack to ensure that they control both ends of the circuit. They XOR a pattern onto a cell at one end, and then see if any garbled cells at the other end become clear when whey remove that same pattern. Any circuits with an honest endpoint will fail (and not be deanonymized), but the client will retry them until they eventually choose a malicious endpoint.IMp免费翻墙网
If the attacker chooses a known-plaintext portion of the relay cell for their marker (such as the header or slack zero space), then they can use their marker to communicate an identifier across the circuit, by retrieving it at the end:IMp免费翻墙网
M = (P ⊕ M) ⊕ P.
M can then be used to transmit an IP address or unique identifier for the user.IMp免费翻墙网
In comparison to probabilistic traffic correlation, this attack provides definite results immediately, with a strength multiplier: it also allows the attacker to ensure that all the traffic they successfully carry is fully deanonymized, before the circuit is used for any application traffic at all.IMp免费翻墙网
The downside for the attacker is that the resulting failure rate of circuits can be detected by the client. Currently, Tor clients emit log notices and warnings when circuit failure rates are excessively high. Unfortunately, as vigilant users have noticed, when the DDoS attacks on Tor become severe, these detectors give false alarms.IMp免费翻墙网
This class of attacks (where an adversary is able to abuse the Tor Protocol to transmit information between relays before application activity) is known as Internal Covert Channel attacks. Tor is in the process of updating its threat model to cover these attack vectors explicitly, along with two other categories of attack vectors.IMp免费翻墙网
Problem 2: Forward secrecy begins when a circuit closes
This attack and the one after it are much less severe than the tagging attack above; we mention them for the sake of completeness.IMp免费翻墙网
In many modern online protocols, including messaging apps like Signal, the keys used to decrypt a message are destroyed as soon as the message is decrypted, so that nobody can steal them and use them later on. But Tor's old encryption algorithm (tor1) doesn't provide this property: the same AES keys are used for the entire life of the circuit. That means that if a key was stolen while the circuit was still alive, all previous traffic on the circuit could be decrypted.IMp免费翻墙网
When a circuit's lifetime is on the order of minutes, that's not so bad, but sometimes circuits stay around for days. (What's more, longer-lived circuits may be better for anonymity, especially when the user is maintaining a persistent identity, so it's a good idea to make them stronger.)IMp免费翻墙网
Although this attack is minor in comparison to the tagging issue, we may as well address it while we are updating our encryption.IMp免费翻墙网
Problem 3: A 4-byte authenticator? Seriously?
Yeah, that's not great. The use of a mere 4-byte digest means that there's a one-in-4-billion chance to forge a cell undetected.IMp免费翻墙网
That isn't a very good attack in practice: if the attacker doesn't get lucky with their guess, then their invalid message causes the circuit to fail, and they can't try again unless the client builds another circuit through them. The same pathbias mechanisms that help resist tagging attacks also help here, but it would be better not to need them.IMp免费翻墙网
(Also, it's using SHA-1, which is showing its age, to say the least.2)IMp免费翻墙网
So, how did we think about replacing this in the past?
We've wanted to replace this algorithm a few times, but we've been hung up on issues of design and efficiency.IMp免费翻墙网
We definitely don't want to follow remailer designs by adding one authenticator per layer of encryption: that way lies big overhead. If we tried something like that with onion services, we'd be devoting something like 15% of our bandwidth to authenticator fields.IMp免费翻墙网
One promising design element has been wide-block ciphers: these are ciphers (or modes of using ciphers) that encrypt an entire message as if it were a single opaque block: any change in the ciphertext garbles the entire message as if it were a single block in a regular block cipher.IMp免费翻墙网
You can make a wide-block cipher into an authenticated cipher by reserving some portion of the plaintext for a known value -- say, 16 bytes of zeros.IMp免费翻墙网
But most wide-block cipher designs are comparatively expensive. Nearly all of the strong ones (BEARESS, LIONESS, biIGE, HHFHFH) require two full encryptions and two hashes over the data. Newer modes like HCTR2 require only one encryption pass, but still need two hashes. (For comparison: tor1 requires 3 encryptions and one hash for a cell on a 3-hop circuit, whereas one of these designs requires on the order of 6 encryptions and 6 hashes.)IMp免费翻墙网
We're willing to pay some CPU cost for improved cryptography (and we should expect to pay some cost, since authentication doesn't come for free) but we need to keep the cost to a minimum.IMp免费翻墙网
Now, multiple passes are necessary for any wide-block design: it's provable that there's no way to make sure that changing any bit will potentially garble every other bit unless there are at least two passes. But we'd like to make these passes as cheap as possible!IMp免费翻墙网
There has also been excellent work on other wide-block designs built from scratch, rather than from an underlying cipher (notably AEZ).IMp免费翻墙网
What are we going with?
For years now, cryptographers have been looking for good solutions here.IMp免费翻墙网
Jean Paul Degabriele, Alessandro Melloni, Jean-Pierre Münch, and Martijn Stam have a design that they're calling Counter Galois Onion (CGO). It's based on a kind of construction called a Rugged Pseudorandom Permutation (RPRP): essentially, it's a design for a wide-block cipher that resists malleability in one direction (for the encrypt operation, but not the decrypt operation). If we deploy this so that clients always decrypt and relays always encrypt, then we have a tagging resistant3 cipher at less cost than a full SPRP!IMp免费翻墙网
Using a RPRP that they call UIV+ (see the paper), the authors achieve all of our goals (tagging resistance, immediate forward secrecy, longer authentication tags, limited bandwidth overhead, relatively efficient operation, and modernized cryptography).IMp免费翻墙网
(Shortly before this blog post went up, they revised their paper and released a new security proof.)IMp免费翻墙网
We've written a specification which matches their paper and their reference implementation.IMp免费翻墙网
How does it work?
CGO makes it so that if anybody tampers with any part of your encrypted data, the entire message, and all future messages, become unrecoverable. Here's how!IMp免费翻墙网
(If you don't like reading cipher diagrams, you may want to skip this section. And if you really like reading them, you should check out the specification and the paper!)IMp免费翻墙网
The figures below present the UIV+ building block, and show how it is used to build CGO encryption.IMp免费翻墙网
Figure 3: UIV+ encryption
The input X is split into two parts: A short X_L, and a longer X_R. X_R, and a "tweak" value H, are themselves passed as tweaks to a tweakable block cipher E_T (instantiated with LRW2), which is then used used to encrypt X_L. The output of this encryption seeds a PRF, which is xored into X_R to encrypt it.IMp免费翻墙网
Figure 4: Middle-layer CGO encryption.
CGO treats every message as a 16-byte tag T, and a 493-byte ciphertext C. These are passed as X_L and X_R to the UIV+ encryption algorithm above. The tweak value (H in UIV+) is here called T': each cell's "T" value, after encryption, is taken as the T' for the next cell.IMp免费翻墙网
Figure 5: Originating a CGO message
When _originating_ the message, CGO initializes its tag as a nonce value N. The value of N, _and the encryption keys_, are all transformed using an "Update" algorithm as the message is encrypted. The new N, and the new encryption keys, will be used to encrypt the next cell.IMp免费翻墙网
Okay, but how does all of this cryptography solve the problems we began with?IMp免费翻墙网
First (and most importantly) tagging attacks are prevented by two factors:IMp免费翻墙网
When encrypting4, the message is transformed in a wide-block construction, so that any change to the input renders the entire output unrecoverable.
The chaining of T' and N values means that a message's encryption depends on all previous messages, so if a single message is garbled, all subsequent messages will be unrecoverable.
Second, forward secrecy is achieved with the Update construction in figure 5. Every time a new cell is originated or received, the keys used to originate or receive it are transformed unrecoverably, so that the encryptor/decryptor no longer holds the keys necessary to decrypt earlier cells.IMp免费翻墙网
Third, the truncated digest is now replaced with a nice long 16-byte authenticator, like sensible people use.IMp免费翻墙网
Aside: What if we're wrong?
CGO is a fairly new design, and it's reasonable to ask whether there could be weaknesses in it that would make it worse than it's designed to be. I'd answer: There might be! Attacks only get better with time, and although the cryptographers behind CGO are skilled and well regarded, even the best cryptographers can make mistakes. There is a security proof, but it's fairly recent, and and it hasn't yet gotten intensive scrutiny. With time, as CGO gets attention from more cryptographers, we'll (hopefully) gain more confidence in its strength. (And if we do decide that we need to replace it, the work we've done to add it to Arti and the C Tor implementation will make it much easier to do a later migration to a different system later on.)IMp免费翻墙网
But what we are pretty sure about is that there aren't likely to be any weaknesses in CGO that would make it worse than tor1.IMp免费翻墙网
Where does our implementation stand?
It's underway!IMp免费翻墙网
We've implemented the cryptography for Arti, the Rust Tor implementation. We've also implemented it in C, since it won't do us any good unless relays support it too, and the Arti relay project is still a work in progress.IMp免费翻墙网
In order to build this implementation, we've had to refactor a lot of code to revise its existing assumptions: for example, we've had to revise all the places where we assumed anything about the layout of a relay cell, or where we assumed that there was only one way to do relay encryption. These changes will help us with any other changes to relay cell formatting and encryption in the future.IMp免费翻墙网
Our next steps are:IMp免费翻墙网
Enable CGO by default in Arti. (It's currently marked as experimental because of some of its dependencies.)IMp免费翻墙网
Implement CGO negotiation for onion services. (This feature is likely to be be Arti-only, due to its complexity.)IMp免费翻墙网
Tune the performance for modern CPUs. (The CGO authors got impressively good results for their optimized implementation, but some of the tricks they used will be pretty hard to deliver in the C Tor implementation without big refactoring. Fortunately, there are some low-hanging fruit in optimizing what we have today.)IMp免费翻墙网
Thanks for your help!
Thanks to all the cryptographers, programmers, researchers, and cypherpunks who have contributed to this work over the years. Thanks also to all the researchers who have advanced this work, and the state of modern cryptography; we all stand on the work they have built.IMp免费翻墙网
And thanks to Mike Perry for helping me write this and get the parts of the threat model right.IMp免费翻墙网
And finally, thanks to everybody who has donated to Tor over the years! A lot of this work is done through a grant from the Bureau of Democracy, Human Rights, and Labor, but many critical parts of this work (including years of past groundwork, and all the parts related to onion services) have been paid for out of our unrestricted funds, which rely on donors like you. Thanks for believing in our mission, and thanks for helping to make Tor better! <3IMp免费翻墙网
Why not just use multiple layers of nested TLS? First, because each layer of TLS adds bandwidth overhead, and that overhead adds up fast. Second, because TLS implementations aren't aiming for anonymity, they tend to do things like generate records of different sizes, making it hard to cause traffic to appear uniform. Third, because the TLS protocol exists in many different flavors that client implementations are often distinguishable from one another: we would need to take special care to keep every client on the same TLS implementation, with locked-down flags and options. That would make upgrading and portability highly difficult.↩IMp免费翻墙网
Yeah, the tor1 algorithm uses SHA-1. It was 2002, and OpenSSL didn't have SHA-256 support yet. This isn't as bad as it looks, though: we don't actually need collision resistence here, and the attacker doesn't actually get to see the digest under relevant circumstances, so the regular weaknesses of SHA-1 don't apply in full force.IMp免费翻墙网
IMp免费翻墙网
Nonetheless: we will be quite glad to say goodbye to SHA-1 once tor1 is finally replaced.↩IMp免费翻墙网
Technically, the client can tag their own traffic, but they wouldn't gain anything by doing so.↩IMp免费翻墙网
Remember the RPRP property: encryption resists malleability, but decryption is malleable.↩IMp免费翻墙网