This article first appeared in Dutch, simultaneously in Ministry Talks and via the Cybersecurity Research Flanders project. This is an English language translation of the original publication.

The year is 2015. Mae’s message to Annie that she sent via Facebook Messenger is intercepted.

“I’m sick of this place,” reads Mark, her boss at Facebook. Mae and Annie were no longer allowed to communicate. Above all, Mark now knows that Mae doesn’t appreciate her job or Facebook’s values and standards very much.

In a parallel universe, we find ourselves in the year 2022. Facebook is now called Meta. In 2016, Meta added end-to-end encryption to Messenger in view of their Commitment to Privacy. With end-to-end encryption, the content of a message is only readable by the sender and the recipient, not anymore by the messaging service. However, the message that Mae just sent to Annie via Messenger is intercepted again. Mark knows that Mae and Annie are communicating; Mae is in trouble again.

Since Edward Snowden’s revelations in 2013, the use of end-to-end encryption has been gaining momentum, because it safeguards our privacy. Despite this, Mae is in trouble in both universes. End-to-end encryption is a good start, but unfortunately, it seems insufficient.

In fact, end-to-end encryption only protects against reading messages during their transit. End-to-end encryption is therefore a minimum requirement for a private messaging service, just like a letter in the mail is also in a sealed envelope, or just just like a message that has to remain within the four walls of your home also doesn’t get projected onto your outside walls.

Few messaging apps turn on that end-to-end encryption by default. Messenger has been planning to turn that option on by default for a while, but for now, the user has to do that manually, after a search in the options. Even with Telegram you have to manually create a “Secret Chat”, which additionally means you lose a lot of functionality. The best students in the class here are Signal and Threema, where end-to-end encryption is a requirement.

To understand why Mae is in trouble, despite end-to-end encryption, we need to look at metadata. Since 2016, Meta, then Facebook, has been making a statement: they are not (or no longer) interested in the content of conversations, because conversations are now optionally end-to-end encrypted. For them, the value of a messaging service lies is in knowing the dynamics of the social network: Who talks to whom? When? Where are the individuals when they send those messages? How frequently do they talk? Like no other, Meta knows the value of metadata; they changed their name for a reason. Meta knows all too well how to exploit and valorize this data, and mention this in their privacy policy:

[W]e combine this information across different devices you use. For example, we use information collected about your use of our Products on your phone to better personalize the content (including ads) or features you see when you use our Products on another device, such as your laptop or tablet, or to measure whether you took an action in response to an ad we showed you on your phone on a different device.

Sealed sending, Signal and zero-knowledge

Their polar opposite is called Signal. Signal rules the Metadata-free Empire. Had Mae used Signal, Mark would not have known that Mae and Annie were still talking. In fact, since 2018, Signal has been using “sealed sending”, where the sender of a message does not identify themselves when sending a message. “Sealed sending” is the technological equivalent of not listing the sender on the envelope. This sounds simple in theory, but in practice is quite complicated. If you allow anyone to reach a digital mailbox, then you’re also allowing them to receive unwanted messages and you potentially create an avalanche of spam.

Signal uses a zero-knowledge proof against this. Zero-knowledge proofs have recently become become very popular to solve privacy problems in crypto currencies such as Monero and ZCash. They allow you to convince someone of certain propositions, without disclosing any information in the process. In the case of crypto currencies, a zero-knowledge proof is used to ensure that an encrypted transaction is sound: the payer proves that their wallet does not go below zero because of that transaction, without disclosing how big the transaction or the contents of their wallet is. In Signal, you prove to the servers that the receiver allows messages to be received from you, without disclosing who you are.

The practical implications of zero-knowledge proofs reach a lot further than private transactions or sending sealed messages. For example, it is possible to design an alternative to the EU Digital COVID certificate (in Belgium, COVID-Safe Ticket, CST), which dynamically generates a QR-code depending on the local rules. This QR-code, unlike the original on the CST, does not contain any information about your vaccination or test status. It only contains a “proof” that indicates you comply with the rules. Besides, creating such a zero-knowledge CST is possible without any cooperation of the involved governments. This highlights the contrast between the CST and the Bluetooth based contact tracing. The contact tracing application was developed with the explicit intention of collecting and processing the least amount of information possible, but this effort has clearly not taken place during the development of the digital COVID certificate.

Private group messaging

At the end of 2019, Signal introduced another technological advancement, also made possible in part because of zero-knowledge proofs. For sending messages in groups, most messaging services make use of “server fan-out”: the sender sends the message once to the server, and the server takes care to distribute the message to the individual group members. For the server to execute that process correctly, it needs to know the structure of the group. This is not the case for Signal, nor for most other “client fan-out” systems. In these systems, the sender is responsible for sending the message to every group member individually. The disadvantage of this technique is the necessary bandwidth when sending a message.

Signal always used client fan-out because of its privacy advantage. The most important argument for server fan-out, apart from the higher bandwidth requirement, is keeping the group structure consistent. In a naive client fan-out system, the group structure is dictated by the last observed update, independent of who supplied that update. You could consider this total anarchy, but only until December 2019.

In order to keep the group structure consistent, Signal will now store the group structure on their server, in encrypted form. Note that Signal still does not see who are the members of the group, but it will act as an authority on the correct member list. A group administrator can now change the list, by adding or removing members. This poses a problem, however: the server needs to be sure that the list is modified by someone with permission, but the users of the group do not want to identify themselves when carrying out such a mutation. The privacy of the users is guaranteed through a zero-knowledge proof. The user proves that they are in the list of administrators, without disclosing their identity.

The net effect is an anonymous authentication system: users can prove their permission level, without disclosing who they really are, all while the servers are ensured that only privileged users can access sensitive data.

Other modern cryptographic techniques

Zero-knowledge proofs are only one of the many modern cryptographic advancements that are currently being studied. They are deployed more and more, because they achieve a concrete result, without overly compromising performance of the application. Other techniques are still young, and might have an unacceptable impact on performance. As an example, homomorphic encryption allows to make computations on encrypted data, allowing to later decrypt the computed result. A dream application is to apply artificial intelligence in large data centres, without them ever needing the actual privacy sensitive data. This is possible on paper, but still prohibitively expensive in terms of computational cost. For simpler computations, homomorphic encryption is already very practical, and is for instance used in the new Signal group system.

Another promising technique is “secure multy-party computation” (SMPC or simply MPC). MPC allows two or more parties to keep some values secret, while being able to compute a common result from those secret values. The classic example is the rich man game of computing who is the richest among a group of people, without disclosing each others net value. Practically, MPC has been used to make statistics and studies on databases that could not be joined because of their privacy sensitive contents.

With modern cryptography, many modern digital privacy issues can be tackled. Not only on messaging services, but also far outside that realm. Often, those problems are not visible at the surface.

While the focus of this article was on the privacy enhancing techniques used in Signal, Signal is of course not holy or unique in improving technology. Newcomers such as Session also innovate in techniques to hide metadata. For an outsider, it is sadly difficult to judge these alternatives. Signal is critiqued, rightfully so, on requiring a phone number during registration. This critique often shadows the more fundamental, less visible positive aspects of Signal’s engineering. It is however clear that Signal leads many innovations in terms of privacy-enhancing technology. End-to-end encryption is clearly only the start of innovation in privacy-enhancing technology, and the innovations mentioned in this article are most definitely not the end.

Acknowledgement

The writing of this article was supported by Cybersecurity Vlaanderen, under the Cybersecurity: Outreach & Training programme. This article goes together with the article “Cryptographic techniques for data minimisation” (Nl. “Cryptografische technieken voor dataminimalisatie”, currently untranslated), which dives deeper into the technical details of the cryptographic primitives mentioned in this article.