DES - Standard Block Cipher Systems - 替換排列網路之線性攻擊研究

2.3 Standard Block Cipher Systems

2.3.1 DES

Data Encryption Standard (DES) originates from the Lucifer which was developed by IBM and later modified by NIST (National Institute of Standards and Technology) to become a block cipher standard in 1977. DES is one type of Feistel structure we described earlier. The block length is 64-bit and also 64-bit key length (including 8 parity check bits). Figure 3 is an overview of the DES structure.

plaintext

L0 R0

R1=L0+f(R0,K1) L1=R0

R16=L15+f(R15,K16) L16=R15

R2=L1+f(R1,K2) L2=R1

R15=L14+f(R14,K15) L15=R14

K16

IP^-1

ciphertext

Figure 3: DES structure

The IP and IP^-1 are initial permutation and inverse permutation, respectively. Each Li

and Ri is 32 bits in length. The f function takes Ri-1 and Ki as inputs and we show the f function in Figure 4.

Ri-1

E(Ri-1)

f(A,J) E

B1 B2 B3 B4 B5 B6 B7 B8

S1 S2 S3 S4 S5 S6 S7 S8

C1 C2 C3 C4 C5 C6 C7 C8

Figure 4: The DES f function 32 bits

48 bits

32 bits

Ri-1 is first extended to 48 bits and then XOR with round key (48-bit). The result is then divided into 8 blocks each with 6 bits. These 8 blocks are then input to the 8 S-boxes which output 4 bits each. The 8 blocks of Ci are permutated according to P and the output of f function is then the output of permutation P.

The key scheduling will generate 16 subkeys each with 48 bits from the initial 56 (64) bits key. We show the scheduling in Figure 5. The PC-1 and PC-2 are also permutations.

Figure 5: Key scheduling of DES

On January 2, 1997, NIST began the process of choosing a replacement of DES, which is called the Advanced Encryption Standard (AES). AES requires a block length with 128-bit and supporting key length with 128, 192, 256 bits (Nk=4, 6, 8).

On October 2, 2000, Rijndael [3][4] was selected as the new standard.

AES has block length with 128, 192, 256 (Nb=4, 6, 8) bits whose number of rounds Nr, are 10, 12, and 14, respectively. All operations in AES are byte oriented.

State is the input cut into byte array (Figure 6). AES first generates the subkeys we need using KeyExpansion algorithm from the initial key. Then for the first Nr-1 rounds, it performs the Round function, which contains the ByteSub、ShiftRow、

MixColumn and AddRoundKey. Finally we apply the FinalRound, which is the same

as Round except for no MixCloumn. The algorithm is given in Algorithm 2.1 in pseudo C language.

KeyExpansion generates the Nr+1 round subkeys from the initial key. The expanded key is a linear array of 4-byte word. The first Nk words contain the cipher key. All other words are defined recursively in terms of words with smaller indices. The algorithm is given in Algorithm 2.2.

Rijndael(State, Key)

Figure 6: AES State representation for Nb=4, 6, 8

Algorithm2.1: AES algorithm

The round constants are independent of Nk and defined by:

Rcon[i] = (RC[i],‘00’,‘00’,‘00’) with RC[i] representing an element in GF(2⁸) with a value of x^{( i - 1)} so that:

RC[1] = 1 (i.e. ‘01’)

RC[i] = x (i.e. ‘02’) ·(RC[i-1]) = x^(i-1)

RotByte is a rotate of the bytes, i.e., RotByte(B0,B1,B2,B3)=(B1,B2,B3,B0). Then the RoundKey i is given by the Round Key buffer word W[Nb*i] to W[Nb*(i+1)]^.

For the first Nr-1 rounds, we perform Round function, which contains four sub-function: ByteSub, ShiftRow, MixColumn, and AddRoundKey. The FinalRound is the same as Round except for no MixColumn. Next, we briefly introduce the sub-functions.

ByteSub is the function to replace one byte by another byte, i.e., it acts as a S-box. The detailed algorithm will be given in Chapter 4.

Algorithm 2.2:

KeyExpansion(byte Key[4*Nk] word W[Nb*(Nr+1)]) {

for(i = 0; i < Nk; i++)

W[i] = (Key[4*i],Key[4*i+1],Key[4*i+2],Key[4*i+3]);

for(i = Nk; i < Nb * (Nr + 1); i++) {

temp = W[i - 1];

if (i % Nk == 0)

temp = SubByte(RotByte(temp)) ⊕ Rcon[i / Nk];

W[i] = W[i - Nk] ⊕ temp;

} }

Figure 7: AES ByteSub function

The ShiftRow is a cyclic left shift of the State according to the offsets (Table 1).

Table 1: Shift offsets with different Nb

Nb C1 C2 C3

Figure 8: AES ShiftRow operation

MixColumn replaces a column by a new one formed by multiplying the column with a matrix.

Figure 9: AES MixColumn operation x0,0 x0,1 x0,2 x0,3

And the AddRoundKey is simply add the State with the RoundKey.

2.4 Other Block Cipher Systems

Although the previous two ciphers are the most commonly used today, there are still other systems not belonging to these two kinds. However, there exists one common feature in all of them: they use repeated rounds to achieve security requirement.

2.4.1 RC6

RC6 [28] is a block cipher designed to meet the requirements of AES. The design is based on RC5 and modified to increase security and performance. It has block length with 128-bit and can be seen as extending RC5 from 64-bit to 128 bit.

However, instead of using two 64-bit registers, they change to use four 32-bit registers since the AES architecture does not support 64-bit operations. Like RC5, RC6 makes an extensive use of data-dependant rotations. The philosophy of RC5 is to exploit operations (such as rotations) that are efficiently implemented on modern processors. RC6 follows the trend and it includes the 32-bit integer multiplication since this operation is now implemented on almost all processors. The advantage of the integer multiplication is to “diffuse” effectively. RC6 uses it to compute the rotation amounts, so that the rotation amounts are dependent on all of the bits of another register. Thus RC6 has much faster diffusion than RC5 and increases security with fewer rounds.

A version of RC6 is more accurately specified as RC6-w/r/b where the word size is w bits, encryption consists of a nonnegative number of rounds r, and b denotes the

length of the encryption key in bytes. RC6 consists of the following six basic operations:

a + b: integer addition modulo 2^w a－b: integer subtraction modulo 2^w a⊕b: bitwise exclusive-or of w-bit words a × b: integer multiplication modulo 2^w

a<<<b: rotate the w-bit word a to the left by the amount given by the least significant lgw bits of b

a>>>b: rotate the w-bit word a to the right by the amount given by the least significant lgw bits of b

The key scheduling is as follows. The user supplies a key of b bytes, where 0≦b

≦255. From this key, 2r + 4 words (w bits each) are derived and stored in the array S[0,1,…, 2r + 3]. This array is used in both encryption and decryption. The encryption and decryption algorithms are shown in the following figures.

Input: Plaintext stored in four w-bit input registers A, B, C, D Number r of rounds

w-bit round keys S[0,1,…,2r + 3]

Figure 10. Encryption algorithm with RC6-w/r/b

2.4.3 IDEA

IDEA (International Data Encryption Algorithm) [16][17] was developed by Lai in 1991. IDEA is used in PGP (Pretty Good Privacy), the cryptographic system for Internet and E-mail security. IDEA is also 64-bit block length as DES and the round number is 8 and the key size is 128-bit.

The algorithm is illustrated in Figure 12. The 64-bit plaintext is divided into four 16-bit blocks, X1,X2,X3,X4. In each round, six 16-bit subkeys are used, denoted by Ki,1,Ki,2,…,Ki,6 for round i. Since there are 8 rounds, 48 subkeys are used, plus 4 extra subkeys used after the last round to transform the output. And the four output

Input: Ciphertext stored in four w-bit input registers A, B, C, D Number r of rounds

w-bit round keys S[0,1,…,2r + 3]

Figure 11. Decryption algorithm with RC6-w/r/b

ciphertext blocks are denoted by Y1,Y2,Y3,Y4.

In each round, the 16-bit blocks are XORed, added and multiplied as the figure shows. The multiplication modulo 2¹⁶+1 can be regarded as the S-box of IDEA. After the last step, each of the resulting 16-bit blocks is multiplied modulo 2¹⁶+1 by its corresponding subkey.

The key scheduling is very simple as follows. The initial key of 128 bits is divided into 8 blocks of 16 bits and they become K1,1,…,K1,6, and K2,1,K2,2. Then the initial key is shifted 25 bits left and divided into 8 blocks of new subkeys. The procedure continues until 52 subkeys are generated.

The decryption algorithm is the same as encryption. The keys are used in reverse order with some modifications; they are the inverse of the encryption keys for

Figure 12. The IDEA structure

⊕: bit by bit XOR : addition modulo 2¹⁶

: multiplication modulo 2¹⁶+1 with zero corresponds to 2¹⁶

.

multiplications as well as addition.

In this chapter, we introduced several block cipher systems from basic schemes, Feistel Networks and SPNs, to standard systems, DES and AES. In the next chapter, we will start to use linear cryptanalysis to attack the SPNs and use our strategies to attack them more efficiently.

Chapter 3 Linear Cryptanalysis

In this chapter, we introduce linear cryptanalysis, which is the most important attack on block cipher systems. Section 1 briefly introduces the Matsui’s attack concept on DES. Section 2 gives an entire procedure of the attack on SPNs. Section 3 introduces some other improved techniques proposed by other researchers. Section 4 illustrates our new strategies, which can find trails with good bias to attack and we also show the performance of our new strategies in the end.

3.1 Matsui’s Attack on DES

Originally, Matsui and Yamagishi [21] developed the linear cryptanalysis against the FEAL [31] (Fast Data Encipherment Algorithm) cipher in 1992. In 1994, Matsui modified it and used it on DES [18] in a theoretical attack on the full 16-round DES, which requires 2⁴⁷ known plaintext-ciphertext pairs and successfully obtaines 14 key bits. Now it has become the most important attack against block ciphers. In Matsui’s paper, he introduced two versions of attack algorithms. The first one, called Algorithm 1, can only attack one key bit information. The second one, called Algorithm 2, can extract more key bits in one attack.

Algorithm 1:

Step 1: Let T be the number of plaintexts such that the left side of equation, ]

,..., , [ ] ,..., , [ ] ,..., ,

[i₁ i₂ i_a C j₁ j₂ j_b K k₁ k₂ k_c

P ⊕ = ,

is equal to zero.

Step 2: If T>N/2 (N denotes the number of plaintexts),

then guess K[k₁,k₂,...,k_c]=0 (when p>1/2) or 1 (when p<1/2), else guess K[k₁,k₂,...,k_c]=1 (when p>1/2) or 0 (when p<1/2).

Algorithm 2:

Step 1: For each candidate K_n⁽ⁱ⁾(i=1,2,...) of Kn, let Ti be the number of plaintexts such that the left side of equation

] ,..., , [ ] ,..., , )[

, ( ] ,..., , [ ] ,..., ,

[i₁ i₂ i_a C j₁ j₂ j_b F_n C_L K_n l₁ l₂ l_d K k₁ k₂ k_c

P ⊕ ⊕ =

is equal to zero.

Step 2: Let Tmax be the maximal value and Tmin be the minimal value of all Ti’s.

l If |T_max−N/2|>|T_min−N/2|, then adopt the key candidate corresponding to Tmax and guess K[k₁,k₂,...,kc]=0 (when p>1/2) or 1 (when p<1/2).

l If |T_max−N/2|<|T_min−N/2|, then adopt the key candidate corresponding to Tmin and guess K[k₁,k₂,...,k_c]=1 (when p>1/2) or 0 (when p<1/2).

In the remaining parts of this thesis, we focus on Algorithm 2 since it is much more powerful.

3.2 Linear Cryptanalysis on SPNs

Here we briefly explain how linear cryptanalysis works on SPNs. The detailed introduction is described in [11][33]. Keliher also discussed linear attacks on SPN in [14]. To apply linear attacks, we need to find a subset of bits that their XOR behaves in a non-random way. First, we introduce a useful lemma in linear attacks.

3.2.1 The Piling-up lemma

Suppose X1, X2,…∈{0,1} are independent random variables. p1, p2,…are real numbers such that 0≦pi≦1, and suppose that Pr[Xi=0]=pi and Pr[Xi=1]=1-pi. Then we define the bias of Xi to be εi = pi−₂¹. Let

i i₁,₂,...,

ε

denote the bias of the random variable

i X

X ⊕...⊕

1 . It is easy to see that

2 1 2

1_,_i

2

_i _i

ε ε

ε =

. And we can

generalize it in the following lemma.

Lemma 3.1 (Piling-up Lemma) [18]: Let ε_i₁_,_i₂_,...,_i_k denote the bias of the random variable

i X

X ⊕...⊕

1 . Then

k i

j k i i

i ε

ε¹^,²^,..., ⁼² ⁻¹

Π

₌₁ ^.

3.2.2 Linear approximations of S-boxes

Next, we need to compute the linear approximation table of an S-box so that we can determine the XOR of which bits is not random.

Example 3.1: Consider the following S-box: πS:{0,1}⁴ →{0,1}⁴. X1 X2 X3 X4 Y1 Y2 Y3 Y4

0 0 0 0 1 1 1 0

0 0 0 1 0 1 0 0

0 0 1 0 1 1 0 1

0 0 1 1 0 0 1 0

0 1 0 0 0 0 0 1

0 1 0 1 1 1 1 1

0 1 1 0 1 0 1 1

0 1 1 1 1 0 0 0

1 0 0 0 0 0 1 1

1 0 0 1 1 0 1 0

1 0 1 0 0 1 1 0

1 0 1 1 1 1 0 0

1 1 0 0 0 1 0 1

1 1 0 1 0 0 0 0

1 1 1 0 1 0 0 1

1 1 1 1 0 1 1 1

If we want to know the probability of X₂⊕Y₂⊕Y₃=0, then we count the number of rows in the above table where X₂⊕Y₂⊕Y₃=0and denote this number as NL value.

Then we divide NL by 2⁴(4 is the number of S-box input) to get the probability of

3 0

2⊕Y ⊕Y =

X . Here NL=4, thus the probability is 4/16 and the bias is –1/4.

In a similar way, we can record all possible input-output XOR in a linear approximation table (Table 2). We read the table by using the following notation:

} 1 , 0 { , ,

1 4

 ∈



 



⊕



 





⊕

_i₌ ^aⁱ^Xⁱ

⊕

_i₌ ^bⁱ^Yⁱ ^aⁱ ^bⁱ ^.

Take (a1,…,a4) as index of rows and (b1,…,b4) as index of columns. The values in the table indicate NL’s-8. Thus, X₂⊕Y₂⊕Y₃ of Example 3.1 is expressed as a=0100, b=

0110 and the corresponding NL-8 is in the shaded place of the table which is -4 as Example 3.1 counts. This table consists of 2ⁿ×2^m entries where n and m denote the number of X variables and Y variables respectively (in Example 3.1, n=m=4). In the linear cryptanalysis, we are searching for the pattern with a large bias size to attack.

Table 2: Linear approximation table of Example 3.1

X Y 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 +8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 -4 0 -4 0 -4 0 +4 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 +2 -2 +6 +2 +2 -2 -2 +2

3 0 0 0 0 0 0 0 0 +2 -6 -2 -2 +2 +2 -2 -2

4 0 +4 -2 -2 -2 -2 -4 0 0 0 -2 +2 +2 -2 0 0

5 0 0 -2 +2 -2 +2 +4 +4 0 0 -2 +2 +2 -2 0 0

6 0 0 -2 +2 +2 -2 0 0 -2 -2 0 +4 -4 0 -2 -2

7 0 0 -2 +2 +2 -2 0 0 -2 +2 0 0 +4 +4 -2 +2

8 0 0 0 0 0 0 0 0 -2 +2 +2 -2 +2 -2 -2 -6

9 0 0 0 0 0 0 0 0 -2 -2 +2 +2 +2 +2 +6 -2

10 0 0 0 0 -4 -4 +4 -4 0 0 0 0 0 0 0 0

11 0 +4 0 -4 +4 0 +4 0 0 0 0 0 0 0 0 0

12 0 0 +2 -2 -2 +2 0 0 +2 +2 0 +4 0 +4 -2 -2

13 0 0 +2 -2 -2 +2 0 0 -6 -2 0 0 0 0 -2 +2

14 0 +4 +2 +2 -2 -2 0 +4 0 0 +2 -2 -2 +2 0 0

15 0 0 -6 -2 -2 +2 0 0 0 0 +2 -2 -2 +2 0 0

3.2.3 Linear expression of a trail

We then use such weakness (large bias) to find a trail through entire SPN to get a linear expression involving only parts of plaintext bits and data bits into the last round (bits of UNr) and all subkeys encountered in the path. All other intermediate data bits of Ur、Vr, where r<Nr, will be cancelled. Thus we produce a linear expression in the following:

⊕

⊕ J K

I C K

P , (3.1)

where PI, CJ, and KK denote the XOR of some plaintext bits, data bits of UNr and encountered key bits respectively. But what we care is only

⊕ _J

I C

P . (3.2)

plaintext

S11 S12 S13 S14

Subkey K1 mixing

S21 S22 S23 S24

Subkey K2 mixing

S41 S42 S43 S44

Subkey K4 mixing Subkey K3 mixing

Subkey K5 mixing

ciphertext U4

S31 S32 S33 S34

P1 P2

…

P16

C1 C2

…

C16

Figure 13: A possible attack trail.

Figure 13 shows a possible attack trail. Here, PI is P₅⊕P₇⊕P₈ and CJ is

15 4 14 4 7 4 6

4 U U U

U ⊕ ⊕ ⊕ . The trail is formed as follows: In S12, we choose

4 4 3

1 X X Y

X ⊕ ⊕ ⊕ since it has large bias. Then we follow the output permutation and XOR with K2. Now in round 2, they become the input X2 of S24. So we can look up in the linear approximation table to check what bits X2 XORing with has large bias (row 4, since X2 represents 01002). As procedure continues we have a trail formed.

After the trail is determined, the overall bias of the entire SPN can be calculated by Piling-up lemma (each S-box encountered viewed as

ε ) and we denote the bias as

ε .

3.2.4 Subkeys attack

Once we have the trail and the bias, we then begin to extract the subkeys of the last round. It proceeds as follows:

1. The subkeys we are going to extract are those involved in the last part of the trail.

For example, in Figure 13, CJ of (3.2) are the bits into the second and fourth S-box. Then the subkeys being extracted are the corresponding position of the output bits of those S-boxes, i.e., the circled part in Figure 13.

2. Since the attack is a known plaintext attack, we have many plaintext- ciphertext pairs and we say we have T pairs. We maintain a counter array for each possible candidate subkeys. Then we partially decrypt the ciphertext for each candidate subkeys. If the linear expression (3.2) holds, then we increment the corresponding counter of that subkey.

3. In the end, we expect the counter, which is closest to (¹₂±ε)T , is the most likely subkey.

3.3 More on Linear Cryptanalysis

In this section we introduce some further researches done as the linear cryptanalysis develops. With the help of these techniques, we can increase the success rate and reduce the data pairs we need.

3.3.1 Linear hull

Nyberg [24] proposed the linear hull effect in 1994. The main result shows that the success rate of Algorithm 2 is underestimated in Matsui’s paper. They show this by declaring that the data complexity we need can be reduced. Since we may have many linear expressions with the same input and output mask but different internal subkeys, i.e., PI and CJ are the same but KK is different. For input mask a and output mask b, he uses ALH(a,b) to denote the approximation linear hull. We describe the definition and theorem in a more understandable version by [15].

Definition 3.1: Given nonzero N-bit masks a, b, the approximation linear hull, ALH(a,b), is the set of all T-round characteristics, for the T rounds under consideration, having a as the input mask for round 1 and b as the output mask for round T, i.e., all characteristics of the form Ω= a,a²,a³,...,a^T,b .

The characteristic Ω here is like the trail we said before. And we have the following theorem.

Theorem 3.1: Let a and b be fixed nonzero N-bit input and output masks, respectively, for T rounds of an SPN. Then

∈

∑

Ω

* ) , (

) ( ]

b a, [

b a ALH

T LCP

E . (3.3)

The ET[a,b] denotes the expected value of linear probability of mask (a,b) over all independent keys. And LCP(Ω) denotes the linear characteristic probability of a characteristic Ω. This theorem shows that under certain masks (a,b), we may have many different characteristics and the expected value of masks (a,b) is the sum of

) (Ω

LCP over a large set of characteristics. In other words, under certain PI and CJ, the expected value of bias is the sum of a large set of different trails with the same PI

and CJ. Therefore, the linear characteristic probability of best characteristic is strictly less than ET[a,b]. This implies that an attacker will overestimate the number of pairs required for a given success rate since the best trail we find is always smaller than ET[a,b].

3.3.2 Key ranking

After the linear cryptanalysis was proposed, Matsui experimented on the attack in 1994 again with some modifications [20]. In his paper, he uses two new linear approximation equations, each of which provides candidates for 13 key bits. Further, he adopts the reliability of key candidates into consideration. The key candidates means that he stores not only the most likely key bits but also the i^th likely candidates.

That is, he stores the key

ˆ , ˆ ,...

2 1

k

in order where

kˆ

_i is the i^th likely key bits.

Then if the most likely key tests to be wrong, he can go back to use the second likely key bits and so on. The test can be done by given a plaintext-ciphertext pair (P, C), and the rest key bits by exhaustive key search to test if the candidate key bits can generate C from P. To increase accuracy, a few more pairs {(P1, C1), (P2, C2),…} can be given since wrong key bits can generate the correct Ci with negligible probability.

Thus, if

ˆk

₁ fails the test, then

ˆk

₂ is used and so on until the correct one is found.

With this simple improvement, he increased the success rate. In his test, he successfully attacked the 26 key bits of the full 16-round DES with 2⁴³ plaintext- ciphertext pairs. The remaining 30 key bits can be found by exhaustive key search. In comparison with his original attack, more key bits are attacked with fewer pairs needed.

3.3.3 Multiple linear approximations

Kaliski and Robshaw [29] proposed a new idea on linear cryptanalysis by using multiple linear approximations in CRYPTO’94. Suppose they have n linear approximations, which involve the same key bits but differ in the plaintext and ciphertext bits that they use. For each linear approximation they assign a different weight ai (this may be decided by their biases) and

∑

= n =

ai 1

1 . Then for each candidate key bits K^(j), j=1,2,… and each linear approximation i, let Tji

be the number

of the linear equation holds. Then we calculate

∑

= ⁿ

i i j i

j aT

for each j. And the rest

parts are just like the original Algorithm 2 in Matsui’s attack, i.e., we see which Uj is furthest from N/2 (N is number of pairs) and we assume it to be the most likely key bits.

This technique is supposed to increase the success rate and reduce the data complexity. However, in their experiments, the increase of effectiveness on DES is somewhat limited. But, this is still an important skill since it may be generally applicable to other block ciphers and be extremely effective in reducing data complexity.

3.4 Our Attack Design

As we mentioned in the introduction, we want to use linear cryptanalysis many times to get most of the key bits. We use one trail to extract a subset of key bits and another trail to get another subset of key bits. Until the last round keys KNr+1 are all extracted then we go one level up to extract the key bits of KNr with new trails and so on.

3.4.1 Observations

Before we explain our strategies, there are some observations to be made.

1. The subkeys we are going to attack should not be too many in a single attack, i.e., the S-boxes involved in the last round should not be too many. This is because the more subkeys we want to extract in one attack the more time we need. For example, if we want to get 8 key bits in one time, then we have to test 2⁸ candidate key bits for all pairs. But if we get 4 bits and then another 4 bits in two attacks, we only need to test 2×2⁴ candidate keys for all pairs.

2. The fewer S-boxes are involved the larger the bias. So, maybe there exists one input-output XOR having the largest bias, but its output spreads to many S-boxes in the permutation. Then we should consider if it is worthwhile to choose such path.

3. It is easy to see that with first Nr-1 round trail we can get bits of KNr+1. So with Nr-2 round trail we can get bits of KNr. Continuing the process we can get all key

在文檔中替換排列網路之線性攻擊研究 (頁 22-0)