The problems with repeated keystream in stream ciphers

Repeated keystream can sometimes be devastating when using stream ciphers. The Capture the Flag event co-organized by Debricked at Lund University included examples of this problem.

Stream ciphers try to mimic the One Time Pad (OTP), but without the inherent drawbacks of a cipher that requires a key the size of the plaintext. Instead, the stream cipher expands a short key (80-256 bits) to a long sequence through the use of a keystream generator. The keystream generator outputs keystream bits (or words) based on the value of an internal state, a key and an initialization vector (IV). The exact definition of this function varies between stream ciphers and often the key and IV is only used to initialize the internal state. The output then only depends on the current internal state. The goal of an attack could be to either compute the internal state or to recover the key. This assumes knowledge of the keystream, corresponding to a known plaintext attack. Another, weaker assumption, is that the attacker does not know the keystream, but the goal is instead to recover the plaintext given some ciphertext. Repeated keystream falls into this category.

Using and reusing an IV

The purpose of the IV is to allow reuse of the same key for several different messages, i.e., to avoid the keystream to be repeated. Each message is then associated with a new IV, which can be either computed using a counter or be arbitrarily chosen by the sender. How to choose the IV is defined by the protocol, but the main property is that it is assumed to be public knowledge. The key is the only value that is assumed unknown to attackers.

How to include the IV (and key) in the initial state computation in order to avoid attacks has been the subject of much research. However, the importance of the IV being unique for each new keystream that reuses the key is easily understood. If two messages m^0=m^0_0,m^0_1,m^0_2,\ldots and m^1=m^1_0,m^1_1,m^1_2,\ldots are encrypted using the keystream generated by the same key and the same IV, the bits in the two ciphertexts are given by

c^0_i=m^0_i \oplus k_i,
c^1_i=m^1_i \oplus k_i.

Combining the two equations gives

c^0_i \oplus c^1_i = m^0_i \oplus m^1_i.

Thus, if we have information about one of the plaintexts, this leaks information about the second plaintext. In one of the CTF challenges, you were given two ciphertexts, both encrypted using the same keystream. This is equivalent to an OTP encryption where the same “One” Time Pad has been used for two different plaintexts. Assuming that both plaintexts can be distinguished as being plaintext, it is possible to test words in a large dictionary to find the two plaintexts.

IV in RC4 and practical attacks

Modern stream ciphers always include an IV and the design is careful to ensure that the key and IV is properly mixed into the state of the cipher upon initialization. Otherwise, since the IV is public, it could be possible extract information about the initial state, and thus also the keystream.

An important exception to the use of IVs is RC4 which is probably the historically most deployed and used stream cipher. It was designed in 1987, but the design was not publicly disclosed. Still, reverse engineering of the design allowed it to be publicly known in 1994. It was for many years referred to as “alleged RC4”, ARC4 or ARCFOUR since the description that was reverse engineered was not the official description.

RC4 did not include the usage of an IV, so how to include this had to be up to the application or protocol designers. Two examples include the WEP encryption standard for WiFi and TLS. In WEP, the key was simply concatenated with the IV and the result was seen as the key in the initialization algorithm. This turned out to be a devastating design choice when an efficient key recovery attack on WEP was published by Fluhrer, Mantin and Shamir in 2005. The attack was soon followed by even more efficient ones. In TLS, the usage was different. Looking at the specification for TLS 1.2, we find

Note that the MAC is computed before encryption. The stream cipher encrypts the entire block,
including the MAC. For stream ciphers that do not use a synchronization vector (such as RC4), 
the stream cipher state from the end of one record is simply used on the subsequent packet.

The attack on RC4 that was possible on WEP could not be extended to TLS since it only generated one long keystream for each key. Still, RC4 was deprecated in TLS in early 2015 since keystream had been shown to exhibit several nonrandomness properties. This led to several attacks, e.g., the attack by AlFardan et.al. Nonrandomness properties in RC4 has been known for a long time, but as the attacks become increasingly efficient, the cipher should not be used in modern applications.

Write A Comment