• 沒有找到結果。

3. Fast matrix multiplication in AES Mixcolumns step

N/A
N/A
Protected

Academic year: 2023

Share "3. Fast matrix multiplication in AES Mixcolumns step "

Copied!
20
0
0

加載中.... (立即查看全文)

全文

(1)

Efficient schemes with diverse of a pair of circulant matrices for AES MixColumns-InvMixcolumns transformation

Jeng-Jung Wang1, Yan-Haw Chen2*, Guan-Hsiung Liaw3, Jack Chang4, Cheng-Chih Lee5

1,2,3,5

Dept. of Information Engineering, I-Shou University, Kaohsiung, Taiwan 84008.

4Intellectual Property Group, Davis, Wright, & Tremaine, Seattle, Washington, USA

1,2,3,5

yanchen@isu.edu.tw, 4JackChang@dwt.com

Abstract

Recently, AES is a commonly used encryption-decryption algorithm applied to wireless communication protocols. However, confidentiality and speed both associated with Cipher-InvCipher that are a very important issue in many current communication systems. In this paper, the key idea here is to propose a method with more variations in circulant matrix for enhancing security in AES MixColumns-InvMixColumns step. The paper is also to propose a method minimizes the number of multiplications for matrix multiplication theoretically based on two-point cyclic convolution properties of circulant matrix. The conventional 44 matrix multiplication typically needs 16 multiplications and 12 additions;

however, the proposed method, described herein as Scheme 3, can reduce the matrix multiplications into 5 multiplications and 15 additions, which is used for encryption and decryption. Using Scheme 3 and Horner’s rule-based multiplication running on Intel CPU, the computational cost of the matrix multiplication can be reduced by ~63%. Furthermore, experiments using Scheme 3 along with Horner’s rule-based multiplication by means of AES keys lengths with 128, 192, 256 bits were tested by running on STM32L476VG MCU, result leads to the reduction of encryption and decryption time respectively by ~60%. Finally, the proposed procedure enables found many a pair of the circulant matrices for AES Cipher-InvCipher so that diverse of a pair of the circulant matrices can enhance security of the data transmission.

Keywords: AES; Circulant; Lookup Table; Finite Field; Multiplication

*Corresponding author. Email: yanchen@isu.edu.tw, Fax: (886-7)-657-8944.

(2)

1. Introduction

New features are being introduced and protecting data transmission is now more important than ever. Thus, an improvement to efficiently apply the Advanced Encryption Standard (AES) to communication systems, and cloud computing in healthcare systems [18]

are important. The MixColumns-InvMixColumns transformation [13] is one of the functions in the Cipher-InvCipher. In AES, MixColumns transformation is a computationally expensive operation where the input matrix is multiplied with the MDS matrix. This transformation plays an important role with respect to the wide trail strategy in the cipher. In the early, the MDS matrix is also using in error correction code which authors by Lacan [14] and Macwillanms [9] have performed cyclic convolution of complex values with a hybrid transformation over finite fields. There exists several new research directions suggested by searching methods for finding MDS matrices in [7][8][16][17]. Moreover, in [10] has shown that the method can generate a random MDS matrix, and those techniques can be enhanced by dynamic MDS matrices. The diversity circulant matrices are used in the modern cryptographic method in AES. The computation of MDS matrix might be used in the encryption and decryption such as Rijndael method and Twofish method in [5]. However, these articles fail to mention to get inverse MDS matrices method.

Furthermore, due to attacks [1] on AES-128 using known-key distinguishing attack with a computation complexity 2 method, this leads to opportunities to enhance security of data transmission. We propose using different coefficients of the polynomial A(x) and the inverse polynomial A(x), namely A-1(x). They are used in AES MixColumns-InvColumns by using some of the bits from the AES key as an index to find the variations of the coefficients of the polynomial. The method would be more difficult for attackers to locate and thus less prone to attacks in general. This paper also proposes an efficient method to find pairs consisting of the polynomial A(x) and A-1(x) by the Find_inv_matrix() procedure. Scheme 3, as descried in this paper, may be designed as a circuit in VLSI, see [2][4][6][15][11][20], which can be used to decrease logic gates. The matrix product operation can be used with distinct method of the multiplication in finite field see [3][12]. The method also can provide the security of the data transfer to the health monitoring system on ARM-based microcontrollers [18].

The remaining portion of this paper is organized as follows: Section 2 introduces enhanced security in AES MixColumns step. Section 3 discusses the multiplication in finite field concepts necessary for further developments, and also proposes methods to reduce the multiplication in matrix products for the AES encryption-decryption which these methods are called Scheme 1, Scheme 2, and Scheme 3, respectively. Section 4 proposes an efficient

(3)

row vectors of the inverse matrix A for using in AES MixColumns-InvMixColumns step.

Section 5 presents a performance analysis of AES Cipher-InvCipher on Intel CPU and STM32L476VG ARM-based MCU. Section 6 concludes the paper.

2. Enhanced security in AES MixColumns step

This paper mainly is not focused on fix polynomial a(x) in AES MixColumns transformation. We aim to enhance security of this AES algorithm with diversity MixColumns of the coefficients of polynomial that can be for increasing security. Since, if data is given in both plaintext and ciphertext, the determining the key would require an exhaustive search. However, Encrypting and decrypting data is must to know the Table A and Table B as shown in Figure 1. In other words, the key cannot be known from the plaintext and the ciphertext because the ciphertext and plaintext are obtained from AES standard MixColumns (02, 03, 01, 01) and InvMixColumns (oe, ob, od, 09) transformation.

Furthermore, it might be sent the coefficient of the polynomial a(x) by elliptic curve cryptography of the ECDH algorithm to receiver. Receiver got the polynomial a(x) must to compute inverse the polynomial a(x) for decryption. So that it does not need to the Tabe A and Tabe B.

Figure 1: Some bits of a key as index of coefficients

(4)

3. Fast matrix multiplication in AES Mixcolumns step

A new method for computing of circulant matrix is described herein that is based on the 2-point cyclic convolution matrix. This section consists of three subsections, in the first subsection describes different method of the multiplication over finite field for matrix multiplication that can be also applied to matrix operation. Besides, Scheme 1, which uses a two point cyclic matrix for reducing multiplication of the matrix product, and Scheme 2 uses 2 multiplied by any element in GF(2m) which is zero for reducing a multiplication. The coefficients of the polynomial A(x) has the property (a0a1a2a3)r2, where aj is over GF(2m), whichcan use lookup table method for reducing 4 multiplications. Lastly, Scheme 3 uses sum of the coefficients of the polynomial A(x) that has the properties (a0a1a2a3)1, which reduce 4 multiplications in Scheme 3.

3.1 Multiplication over finite field

Let m i

i aix x

a

1

) 0

( and

1

) 0

( m

i i ix b x

b be polynomial equation of degree m-1 in GF(2m), where ai, bi {0, 1}. It is well know that finite field addition is defined as:

), ( ) ( )

(x a x b x

c (1)

Note that the symbol of “+” is XOR bitwise operation so it does not need extra defined function in C programming. Finite field multiplication is defined as:

), ( mod ) ( ) ( )

(x a x b x f x

c (2)

where the AES algorithm with multiplication is irreducible polynomial 1

)

(x x8x4x3x

f . In (2), the Russian Peasant method can be written as a function in C programming as follows:

(5)

Russian Peasant method

unsigned char GFM(unsigned char a, unsigned char b){

unsigned char c = 0;

for( int i = 0; i < 8; i++){

if (b & 1) c ^= a;

if (a & 0x80)

a = (a << 1) ^ 0x11b;

else

a <<= 1;

b >>= 1;

} return p;

}

In (2), the proposed multiplication can be evaluated by using Horner’s rule, according to the following recursive formula, c(x)(((a7Bxmod f(x)a6B)x2mod f(x)a5Bxmod f(x)a4B)

, ) ( mod )

(

mod 1 0

2 f x aBx f x a B

x   where B is represented as the polynomial b(x).

Thus, an expression (aiBxmod f(x)ajB) can be represented as a lookup table as following

] , [

Bt ai aj (aiBxmodf(x)ajB) , where ai,ajGF(2) . Let Bt[ai,aj] be c, the ccx2 )

(

modf x can be represented as ccx2f[cm1,cm2], where f[ci,cj]cire(x)xcjre(x) and

) ( mod )

(x x f x

re m is a remainder polynomial (e.g., re(x)=x4x3x1, binary 11001, Hex 0x1b). Horner’s rule method is rewritten in C programming as shown below:

Horner’s rule

unsigned char f[4]; unsigned char Bt[4];

unsigned char GFM(unsigned char a, unsigned char b){

unsigned char c; f[0] = 0; f[1] = 0x1b; f[2] = 0x36; f[3] = 0x2d; Bt[0] = 0; Bt[1] = b;

if (b & 0x80)

Bt[2] = (b << 1) ^ 0x1b;

else

Bt[2] = (b << 1);

Bt[3] = Bt[2] ^ b;

c= Bt[(a >> 6) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[(a >> 4) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[(a >> 2) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[a & 0x3];

return c;

}

As mentioned above, the two methods of multiplication can be used for making an 2D array GFMT[][] for lookup table method (i.e., GFMT[i][j]=GFM(i,j) where 0 i, j  255). An array

(6)

GFMT[][] needs 256*256=64K bytes for saving data. The lookup table method is shown as below:

Lookup table method

unsigned char GFM(unsigned char a, unsigned char b) {

unsigned char c=0;

c=GTMT[a][b];

return c;

}

3.2 Reducing multiplications in matrix multiplication

The AES MixColumns transformation, the modular product of A(x) and B(x), is presented as the four-term polynomial D(x), defined as

) ( mod ) ( ) ( )

(x A x B x T x

D (3)

where T(x)x41, A(x)a3x3a2x2a1xa0andB(x)b3x3b2x2b1xb0, for ai,biGF(2m). By (3), there is a circulant matrix form as:

.

3 2 1 0

0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

3 2 1 0

b b b b

a a a a

a a a a

a a a a

a a a a

d d d d

(4)

In (4), the matrix D is a product of matrices A and B, which requires 16 multiplications and 12 additions (16M, 12A) listed below:

(16M, 12A)

3 0 2 1 1 2 0 3 3

3 3 2 0 1 1 0 2 2

3 2 2 3 1 0 0 1 1

3 1 2 2 1 3 0 0 0

b a b a b a b a d

b a b a b a b a d

b a b a b a b a d

b a b a b a b a d

(7)

Using the two-point cyclic convolution matrix property for 22 matrices multiplication is given by:

   

   

.

y

0 1 0 1 0 0

1 1 0 1 0 0 1 0 0 1

1 0 1

0

b a a b b a

b a a b b a b b a a

a a

y (5)

Hence, the method only requires 3 multiplications and 4 additions (3M, 3A) as shown in Table 1.

Table 1: The two-point cyclic convolution method with (3M, 3A).

) ( 0 1

0

0 a b b

s s1a0a1

1 1 0

0 s sb

y y1s0s1b0

In Table 1, two entries a0 and a1 are fix data, the item s1a0a1 can be precomputed in the program. Thus, the 2-point cyclic matrix method only uses 3 multiplications and 3 additions. If the matrices

0 1

3 0

a a

a

A a is not 2-point cyclic matrix, that product of the matrix A and B is given by

.

1 0 0 1

1 3 0 0 1 0 0 1

3 0 1

0

b a b a

b a b a b b a a

a a y y

(6) Theorem 1 Let A be any nn cyclic matrix, where nn1n2 and GCD(n1,n2)1, then the matrix A can be partitioned into a cyclic n1n1 matrix, in which entries are n2n2 submatrix.

It is similar to the proof by Winograd (1978). Using (4), by Theorem 1, the four-point cyclic matrix can be partitioned as,

.

3 2 1 0

0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

3 2 1 0

b b b b

a a a a

a a a a

a a a a

a a a a

d d d d

(7)

From (7), it can be rewritten as

,

1 0 0 1

1 0 1

0

B B A A

A A D

D (8)

(8)

where , , , ,

2 3

1 2 1 0 1

3 0 0 3 2 1 1 0

0

a a

a A a

a a

a A a

d D d d

D d ,and .

3 2 1 1

0

0

b B b b

B b In (8), it

can be used to reduce the multiplications by (5) form as follows:

   

0 1

 

0 1

0 ,

0

1 1 0 1 0 0 1

0

H F

G F B A A B B A

B A A B B A D

D (9)

where

3 2 1 0 0 1

3 0 1

0

0( )

b b b b a a

a B a

B A

F ( ) ,

3 2 2 0 3 1

1 3 2 0 1 1

0

b

b a a a a

a a a B a

A A

G and

. )

(

1 0 2 0 3 1

1 3 2 0 0 1

0

b

b a a a a

a a a B a

A A

H The matrix F can be form by (6), and matrix G and

matrix H are form by (5) yields:

) ( ) (

) ( ) (

3 1 0 2 0 1

3 1 3 2 0 0

b b a b b a

b b a b b F a

      

      

2 1 3 2 0 3 2 2 0

3 1 3 2 0 3 2 2 0

( )

( )

b a a a a b b a a

b a a a a b b a G a

      

      

0 1 3 2 0 1 0 2 0

1 1 3 2 0 1 0 2 0

( )

( )

b a a a a b b a a

b a a a a b b a H a

(10)

Obviously, the matrices F, G, and H are combination of the sets with element bi. Rewrite the terms in s0b0b2, s1b1b3,s2a0s0a3s1,s3 a1s0a0s1, s4 b2b3, and s5 b0b1 as follows:

,

3 2

s

F s

   

 

) (

,

) (

( ) )

(

2 1 3 2 0 4 2 0

3 1 3 2 0 4 2

0

b a a a a s a a

b a a a a s a

G a and

 

 

 

) (

.

) (

( ) )

(

0 1 3 2 0 5 2 0

1 1 3 2 0 5 2

0

b a a a a s a a

b a a a a s a H a

Next, the matrix G and matrix H are replaced with w0 a0a2andw1 a3a1. Thus, the matrix G and H matrix can be given as

2 2 0

3 2 0

b r r

b r

G r and ,

0 2 1

1 2

1

b r r

b r H r

(9)

where r0w0s4,r1w0s5 ,andr2w0w1. Finally, the four-point cyclic matrix method can be obtained as a new matrix form

0 2 1 3

1 2 1 2

2 2 0 3

3 2 0 2

3 2 1 0

1 0

b r r s

b r r s

b r r s

b r r s

d d d d

H F

G F D

D .

In the simplified case, the MixColumns transformation can be performed by 10 multiplications and 17 additions. Two items w0a0a2 ,w1a3a1 and r2w0w1 are known because the value ai of the coefficients of polynomial A(x), can be precomputed in the program. So that the method only uses 10 multiplications and 14 additions, that is remarked as (10M, 14A).

Scheme 1. (10M, 14A)

1 3 1 2 0 0 1 0 0 1 3 1 3 0 0 2

3 1 1 2 0 0

, ,

, ,

a a w a a w s a s a s s a s a s

b b s b b s

1 0 2 1 0 0 1 3 2 0

0 w(b b) r w(b b), r w w

r

0 2 1 3 3

1 2 1 2 2

2 2 0 3 1

3 2 0 2 0

b r r s d

b r r s d

b r r s d

b r r s d

3.3 Reducing multiplications by multiply 2

The matrix product 

 



 



1 0 0 1

3 0 0

0 b

b a a

a B a

A can be further simplified by properties of addition over GF(2m). Adding two entries of 2a0b10 and 2a0b0 0 are into matrix product A0B0 as follows:

   

   

.

2 2

0 1 0 1 0 0

1 3 0 1 0 0 0 0 1 0 0 1

1 0 1 3 0 0 0

0

b a a b b a

b a a b b a b a b a b a

b a b a b B a

A (11)

(10)

In Scheme 2, the matrix F was replaced by (6). Now, the matrix F is replaced by (11) to obtain the following matrix

    

( ) ( )

   

.

) ( ) (

2 0 1 0 3 1 2 0 0

3 1 3 0 3 1 2 0 0 3 1

2 0 0 1

3

0

b b a a b b b b a

b b a a b b b b a b b

b b a a

a F a

 

0 1

1 0 22 10 01 ,

0

1 0 1 0

0

s t t

s t t s t s s a

s t s s F a

where s0 b0b2,s1 b1b3,t0 a0a3,t1 a0a1,and t2a0

s0s1

.

In Scheme 1, the two items s2a0s0a3s1 and s3a1s0a0s1 can be replaced as

1,

0 2

2 t ts

s s3t2t1s0, t0a0a3, t1a0a1, and t2 a0

s0s1

for computing MixColumns transformation. Consequently, in Scheme 1, each r2bi term can be replaced with lookup table method of tc[bi]r2bi, namely, constant multiplication doesn’t require computing multiplications as it did. It needs 256 bytes of memory, which is called Scheme 2.

Scheme 1 can further be rewritten as follows:

Scheme 2. (5M, 15A) (It needs 256 bytes as lookup table)

 

2 0 0 0 1 2 3 1 0 2 2

1 0 0 2 1 0 1 3 0 0

3 1 1 2 0 0

, ,

, ,

,

a a w s t t s s t t s

s s a t a a t a a t

b b s b b s

) ( ),

( 2 3 1 0 0 1

0

0 w b b r w b b

r

] [

] [

] [

] [

0 1 3 3

1 1 2 2

2 0 3 1

3 0 2 0

b tc r s d

b tc r s d

b tc r s d

b tc r s d

In Scheme 2, it uses only 5 multiplications and 18 additions with 256 bytes of memory for matrix multiplication. Obviously, if the coefficients of the polynomial A(x) have the equality a0+a3+a2+a1=1 in AES standard, then the property would make r2=w0+w1=1, based on Scheme 2. Consequently, the r21 doesn’t require lookup table computing as it did in Scheme 2 (e.g., tc[bi]r2bi 1bi ), does not need memory used in embedded system, so that the method can be rewritten as Scheme 3. In Scheme 3, there are three items

,

and ,

, 0 0 3 1 0 1

2 0

0 a a t a a t a a

w which can be precomputed in the program, so that the method only used 5 multiplications and 15 additions, namely, (5M, 15A).

參考文獻

相關文件

劉影梅,、蔣立琦(2004) 。Levels of Physical Activity among School-Age Children in Taiwan: A Comparison with International Recommendations。The Journal of Nursing Research,12(4),307-316。