• 沒有找到結果。

Design of AES Crypto Engines

3.1 Previous Works on the AES algorithm

3.1.1 SubBytes Transformation

The SubBytes can be implemented in two types: combinational look up table (LUT) based and composite field based. The LUT based method is usually adopted in high throughput architectures because the SubBytes transformation is implemented in logic level and the critical path can be optimized by truth table or Karnaugh-Map. The composite filed method uses arithmetics in finite field to optimize the SubBytes in terms of hardware complexity.

A finite field arithmetic unit is used for SubBytes to reduce required hardware resources.

Therefore, the composite field based method is usually adopted for low cost architectures. In this subsection, previous works on the composite field based SubBytes are illustrated.

As mentioned earlier, the SubBytes can be decomposed into multiplicative inversion followed by affine transformation. The multiplicative inversion in GF (28) can be com-puted by extended Euclidean algorithm. However, the calculation of inverse in GF (28) is quite complicated while the calculation of inverse in GF (22) is relative easy, as sug-gested by Rijmen [57]. Therefore, field elements can be transformed to composite field GF((24)2)[22,31–33,38,53] or GF (((22)2)2)[20,54] to reduce the hardware complexity.

An element G in GF (28) can be represented over GF (24) as G = γ1y + γ0 with ir-reducible polynomial r(y) = y2 + τ y + ν. Note that coefficients γ1 and γ0 are both 4-bit elements in subfield GF (24). In this way, the pair [γ1, γ0] can be used to present the element Gin field GF ((24)2). The element can be represented using the polynomial basis [Y, 1] or using the normal basis [Y16, Y], where Y16 and Y are roots of r(y). Note that in normal

basis, Y16and Y are both roots of r(y). As a result,

r(y) = y2+ τ y + ν = (y + Y )(y + Y16), (3.1)

which means τ = (Y + Y16)is the trace and ν = (Y )(Y16)is the norm of Y.

Similarly, coefficients in GF (24) can be represented over GF (22) as Γ1z + Γ0 with irreducible polynomial r(z) = z2 + T z + N, where Γ1 and Γ0 are both in subfield GF (22).

Again, the element can be represented using the polynomial basis [Z, 1] or using the normal basis [Z4, Z]. Note that T = Z + Z4 is the trace and N = (Z)(Z4)is the norm of Z if they are represented in normal basis.

At last, the element in GF (22) can be also represented over GF (2) as g1w+ g0 with irreducible polynomial r(w) = w2 + w + 1, where g1 and g0 are both in GF (2), or single bits. The polynomial basis [W, 1] and the normal basis [W2, W] can be used to represent the pair [g1, g0]. The above decomposition can simplify the operation in GF (28)to GF (24), which in turn can be simplified further over GF (22)and GF (2).

Polynomial Basis

The multiplication in GF (28)can be mapped to GF ((24)2)modulo r(y) as

1y+ γ0)(δ1y+ δ0) = (γ1δ0+ γ0δ1+ γ1δ1τ)y + (γ0δ0+ γ1δ1ν). (3.2)

Thus, the multiplicative inverse can be computed by making the right hand side of equation 3.2 equal to 1. In this way, δ1y+δ0is the multiplicative inverse of γ1y+γ0. The multiplicative inverse can be found by solving following equations:

γ1δ0+ γ0δ1+ γ1δ1τ = 0 γ0δ0+ γ1δ1ν= 1.

(3.3)

(a)

(b)

Figure 3.1: Polynomial basis inverter (a) Over GF (28). (b) Over GF (24).

The multiplicative inverse is then given by

1y+ γ0)−1 = (δ1y+ δ0) = [θ−1γ1]y + [θ−10+ γ1τ)] (3.4)

where θ = γ12ν+ γ1γ0τ + γ02.

The multiplicative inverse in GF (28)is then decomposed into a series of multiplications, additions, and a multiplicative inverse in GF (24). The equation 3.2 and 3.4 then can be modified to decompose the operation in GF (24)into GF (22). Once the operation is reduced over GF (22), the inverse operation is identical to the square operation because for Γ ∈ GF(22), Γ4 = Γ, and therefore Γ2 = Γ−1.

Fig. 3.1(a) shows the multiplicative inverter over the polynomial basis in GF ((24)2). The inverter in field GF (24)can be implemented by a series of multiplication such that x−1 = x14, by combinational LUT with 16 entries, or by further decomposition of the inverter over GF(22). Fig. 3.1(b) shows different implementations of the inverter in GF (24).

Normal Basis

The operation in normal basis [Y16, Y], where Y16and Y are roots of r(y) = y2 + τ y + ν, uses properties τ = Y16+ Y, ν = (Y16)(Y ), and 1 = τ−1(Y16+ Y ). The multiplication in GF(28)then can be decomposed into GF ((24)2)modulo r(y) as:

1Y16+ γ0Y)(δ1Y16+ δ0Y) = γ1δ1Y32+ (γ1δ0+ γ0δ1)Y17+ γ0δ0Y2

= γ1δ1(τ Y16+ ν) + (γ1δ0+ γ0δ1)ν + γ0δ0(τ Y + ν)

= γ1δ1τ Y16+ γ0δ0τ Y + (γ1+ γ0)(δ1+ δ0)ντ−1(Y16+ Y )

= (γ1δ1τ + θ)Y16+ (γ0δ0τ + θ)Y

(3.5) where θ = (γ1 + γ0)(δ1 + δ0)ντ−1. Again, the multiplicative inverse can be calculated by making (γ1Y16+ γ0Y)(δ)1Y16+ δ0Y) = 1 = τ−1Y16+ τ−1Y. Then the inverse can be found by solving following equations:

1δ1τ+ (γ1+ γ0)(δ1+ δ0)ντ−1] = τ−10δ0τ+ (γ1+ γ0)(δ1+ δ0)ντ−1] = τ−1

(3.6)

The multiplicative inverse in normal basis is then given by

1Y16+ γ0Y)−1 = (δ1Y16+ δ0Y) = [θ0−1γ0]Y16+ [θ0−1γ1]Y, (3.7)

where θ0 = γ1γ0τ2+ (γ12+ γ02)ν.

The multiplicative inverse in GF (28)can be decomposed into a series of multiplications, additions, and a inverse in GF (24). Fig. 3.2 shows the inverter in GF (28) with normal basis. The inverter consists of three multipliers, two adders, one inverter, and one squarer multiplied by constant ν. The multiplication in GF (24)is analogous to that in GF (28)as

1Z4+ Γ0Z)(∆1Z4+ ∆0Z) = (Γ11T + Θ)Z4+ (Γ00T + Θ)Z (3.8)

Figure 3.2: Normal basis inverter in GF (28).

Figure 3.3: Normal basis multiplier in GF (24).

where Θ = (Γ1+ Γ0)(∆1 + ∆0)N T−1. The architecture of multiplier in GF (24)is shown in Fig. 3.3. Note that multiplications and additions are performed over GF (22). The mul-tiplication in GF (22)has the same structure, except that it lacks of scaling by norm, and in GF(2)the multiplication is identical to AND operation.

In addition to multipliers and adders, another operation needed in GF (24) is the com-bined operation of squaring followed by scaling ν as shown in Fig. 3.2. The comcom-bined operation can be represented as:

1Z4+ Γ0Z)2× ν = [(Γ1+ Γ0)2]Z4+ [(N × Γ0)2]Z. (3.9)

The operation in GF (24) now can be performed with addition, multiplication, and squar-ing in GF (22). Note that in GF (22) the inversion is the same as the squaring and can be

represented as:

(g1W2+ g0W)−1= (g1W2+ g0W)2 = g12W4+ g02W2

= g12(W2+ 1) + g02W2

= g12W2+ g12(W2+ W ) + g20W2

= g02W2+ g12W

= g0W2+ g1W,

(3.10)

indicating that the squaring or inversion in GF (22)is free by swapping the bit positions.

The remaining operation needed in GF (24)multiplier is multiplication in GF (22) and then scaling by N = W2. The combined operation can be represented as:

(g1W2+ g0W)(d1W2+ d0W) × N = g1d1W6+ (g1d0+ g0d1)W5+ g0d0W4

= g1d1+ (g1d0+ g0d1)W2+ g0d0W

= g1d1(W2+ W ) + (g1d0+ g0d1)W2+ g0d0W

= (g1d1+ g1d0+ g0d1)W2+ (g1d1+ g0d0)W

= [g0d0+ (g1+ g0)(d1+ d0)]W2+ (g1d1+ g0d0)W (3.11)

Inv-/SubBytes Sharing

For SubBytes and Inv-SubBytes transformations, the multiplicative inversion can be shared to reduce hardware resources. Fig. 3.4 shows the data flow of forward SubBytes transforma-tion. The input byte is applied to a field transformation matrix δ first and then the element is applied to the multiplicative inversion over composite field GF ((24)2)or GF (((22)2)2).

Then the field element is mapped back to GF (28)by the inverse of field transformation ma-trix δ−1. At last, the element is applied to the affine transformation to finish the SubBytes transformation. Note that the inverse filed transformation matrix and the affine transforma-tion can be combined to reduce the number of matrix operatransforma-tion. Fig. 3.4 also shows the flow

Figure 3.4: Inv-/SubBytes sharing structure.

of the Inv-SubBytes transformation. The input data byte is first applied to a inverse affine transformation and then the element is mapped in to composite field by δ. The final result can be obtained by applying the inverse field transformation to the multiplicative inverse.

Note that the inverse affine transformation and the field transformation matrix can also be combined to reduce the number of matrix operation. With this structure, the multiplicative inversion can be shared in both encryption and decryption process.

The result from [54] shows that total 180 and 182 gates are required for the SubBytes and Inv-SubBytes, respectively. The total number of gates for both encryption and decryption would be 362 gates. By the hardware sharing of the multiplicative inverse unit, the combined SubBytes/Inv-SubBytes only requires 234 gates.