**Efficient schemes with diverse of a pair of circulant matrices for ** **AES MixColumns-InvMixcolumns transformation **

Jeng-Jung Wang^{1}, Yan-Haw Chen^{2*}, Guan-Hsiung Liaw^{3}, Jack Chang^{4}, Cheng-Chih Lee^{5 }

1,2,3,5

*Dept. of Information Engineering, I-Shou University, Kaohsiung, Taiwan 84008. *

4*Intellectual Property Group, Davis, Wright, & Tremaine, Seattle, Washington, USA*

1,2,3,5

yanchen@isu.edu.tw, ^{4}JackChang@dwt.com

**Abstract**

Recently, AES is a commonly used encryption-decryption algorithm applied to wireless communication protocols. However, confidentiality and speed both associated with Cipher-InvCipher that are a very important issue in many current communication systems. In this paper, the key idea here is to propose a method with more variations in circulant matrix for enhancing security in AES MixColumns-InvMixColumns step. The paper is also to propose a method minimizes the number of multiplications for matrix multiplication theoretically based on two-point cyclic convolution properties of circulant matrix. The conventional 44 matrix multiplication typically needs 16 multiplications and 12 additions;

however, the proposed method, described herein as Scheme 3, can reduce the matrix
multiplications into 5 multiplications and 15 additions, which is used for encryption and
decryption. Using Scheme 3 and Horner’s rule-based multiplication running on Intel CPU, the
computational cost of the matrix multiplication can be reduced by ~63%. Furthermore,
experiments using Scheme 3 along with Horner’s rule-based multiplication by means of AES
keys lengths with 128, 192, 256 bits were tested by running on STM32L476VG MCU,** **result
leads to the reduction of encryption and decryption time respectively by ~60%. Finally, the
proposed procedure enables found many a pair of the circulant matrices for AES
Cipher-InvCipher so that diverse of a pair of the circulant matrices can enhance security of the
data transmission.** **

**Keywords**:** AES; Circulant; Lookup Table; Finite Field; Multiplication **

*Corresponding author. Email: yanchen@isu.edu.tw, Fax: (886-7)-657-8944.

**1. Introduction **

New features are being introduced and protecting data transmission is now more important than ever. Thus, an improvement to efficiently apply the Advanced Encryption Standard (AES) to communication systems, and cloud computing in healthcare systems [18]

are important. The MixColumns-InvMixColumns transformation [13] is one of the functions in the Cipher-InvCipher. In AES, MixColumns transformation is a computationally expensive operation where the input matrix is multiplied with the MDS matrix. This transformation plays an important role with respect to the wide trail strategy in the cipher. In the early, the MDS matrix is also using in error correction code which authors by Lacan [14] and Macwillanms [9] have performed cyclic convolution of complex values with a hybrid transformation over finite fields. There exists several new research directions suggested by searching methods for finding MDS matrices in [7][8][16][17]. Moreover, in [10] has shown that the method can generate a random MDS matrix, and those techniques can be enhanced by dynamic MDS matrices. The diversity circulant matrices are used in the modern cryptographic method in AES. The computation of MDS matrix might be used in the encryption and decryption such as Rijndael method and Twofish method in [5]. However, these articles fail to mention to get inverse MDS matrices method.

Furthermore, due to attacks [1] on AES-128 using known-key distinguishing attack with
a computation complexity 2 method, this leads to opportunities to enhance security of data
transmission. We propose using different coefficients of the polynomial *A*(*x*) and the inverse
polynomial *A*(*x*), namely *A*^{-1}(*x*). They are used in AES MixColumns-InvColumns by using
some of the bits from the AES key as an index to find the variations of the coefficients of the
polynomial. The method would be more difficult for attackers to locate and thus less prone to
attacks in general. This paper also proposes an efficient method to find pairs consisting of the
polynomial *A*(*x*) and *A*^{-1}(*x*) by the Find_inv_matrix() procedure.** **Scheme 3, as descried in this
paper, may be designed as a circuit in VLSI, see [2][4][6][15][11][20], which can be used to
decrease logic gates. The matrix product operation can be used with distinct method of the
multiplication in finite field see [3][12]. The method also can provide the security of the data
transfer to the health monitoring system on ARM-based microcontrollers [18].

The remaining portion of this paper is organized as follows: Section 2 introduces enhanced security in AES MixColumns step. Section 3 discusses the multiplication in finite field concepts necessary for further developments, and also proposes methods to reduce the multiplication in matrix products for the AES encryption-decryption which these methods are called Scheme 1, Scheme 2, and Scheme 3, respectively. Section 4 proposes an efficient

row vectors of the inverse matrix *A* for using in AES MixColumns-InvMixColumns step.

Section 5 presents a performance analysis of AES Cipher-InvCipher on Intel CPU and STM32L476VG ARM-based MCU. Section 6 concludes the paper.

**2. Enhanced security in AES MixColumns step **

This paper mainly is not focused on fix polynomial a(x) in AES MixColumns transformation. We aim to enhance security of this AES algorithm with diversity MixColumns of the coefficients of polynomial that can be for increasing security. Since, if data is given in both plaintext and ciphertext, the determining the key would require an exhaustive search. However, Encrypting and decrypting data is must to know the Table A and Table B as shown in Figure 1. In other words, the key cannot be known from the plaintext and the ciphertext because the ciphertext and plaintext are obtained from AES standard MixColumns (02, 03, 01, 01) and InvMixColumns (oe, ob, od, 09) transformation.

Furthermore, it might be sent the coefficient of the polynomial a(*x*) by elliptic curve
cryptography of the ECDH algorithm to receiver. Receiver got the polynomial a(x) must to
compute inverse the polynomial a(x) for decryption. So that it does not need to the Tabe A
and Tabe B.

Figure 1: Some bits of a key as index of coefficients

**3. Fast matrix multiplication in AES Mixcolumns step **

A new method for computing of circulant matrix is described herein that is based on the
2-point cyclic convolution matrix. This section consists of three subsections, in the first
subsection describes different method of the multiplication over finite field for matrix
multiplication that can be also applied to matrix operation. Besides, Scheme 1, which uses a
two point cyclic matrix for reducing multiplication of the matrix product, and Scheme 2 uses
2 multiplied by any element in *GF*(2^{m}) which is zero for reducing a multiplication. The
coefficients of the polynomial *A*(*x*) has the property (*a*_{0}*a*_{1}*a*_{2}*a*_{3})*r*_{2}, where *a**j* is* *over
*GF*(2^{m}), whichcan use lookup table method for reducing 4 multiplications. Lastly, Scheme 3
uses sum of the coefficients of the polynomial *A*(*x*) that has the properties (*a*_{0}*a*_{1}*a*_{2}*a*_{3})1,
which reduce 4 multiplications in Scheme 3.

**3.1 ** **Multiplication over finite field **

Let ^{m} ^{i}

*i* *a**i**x*
*x*

*a*

##

^{}

^{1}

) 0

( and

##

^{}

^{1}

) 0

( ^{m}

*i*
*i*
*i**x*
*b*
*x*

*b* be polynomial equation of degree *m*-1 in *GF*(2^{m}),
where *a**i*, *b**i* {0, 1}. It is well know that finite field addition is defined as:

), ( ) ( )

(*x* *a* *x* *b* *x*

*c* (1)

Note that the symbol of “+” is XOR bitwise operation so it does not need extra defined function in C programming. Finite field multiplication is defined as:

), ( mod ) ( ) ( )

(*x* *a* *x* *b* *x* *f* *x*

*c* (2)

where the AES algorithm with multiplication is irreducible polynomial 1

)

(*x* *x*^{8}*x*^{4}*x*^{3}*x*

*f* . In (2), the Russian Peasant method can be written as a function in C
programming as follows:

**Russian Peasant method **

unsigned char GFM(unsigned char a, unsigned char b){

unsigned char c = 0;

for( int i = 0; i < 8; i++){

if (b & 1) c ^= a;

if (a & 0x80)

a = (a << 1) ^ 0x11b;

else

a <<= 1;

b >>= 1;

} return p;

}** **

In (2), the proposed multiplication can be evaluated by using Horner’s rule, according to the
following recursive formula, *c*(*x*)(((*a*_{7}*Bx*mod *f*(*x*)*a*_{6}*B*)*x*^{2}mod *f*(*x*)*a*_{5}*Bx*mod *f*(*x*)*a*_{4}*B*)

, ) ( mod )

(

mod _{1} _{0}

2 *f* *x* *aBx* *f* *x* *a* *B*

*x* where *B* is represented as* *the polynomial *b*(*x*).

Thus, an expression (*a*_{i}*Bx*mod *f*(*x*)*a*_{j}*B*) can be represented as a lookup table as following

] , [

Bt *a*_{i} *a*_{j} (*a*_{i}*Bx*mod*f*(*x*)*a*_{j}*B*) , where *a*_{i},*a*_{j}*GF*(2) . Let Bt[*a*_{i},*a*_{j}] be *c*, the *c**cx*^{2}
)

(

mod*f* *x* can be represented as *c**cx*^{2}f[*c*_{m}_{}_{1},*c*_{m}_{}_{2}], where f[*c*_{i},*c*_{j}]*c*_{i}*re*(*x*)*x**c*_{j}*re*(*x*) and

) ( mod )

(*x* *x* *f* *x*

*re* ^{m} is a remainder polynomial (*e*.*g*., *re(x*)=*x*^{4}*x*^{3}*x*1, binary 11001, Hex
0x1b). Horner’s rule method is rewritten in C programming as shown below:

**Horner’s rule**

unsigned char f[4]; unsigned char Bt[4];

unsigned char GFM(unsigned char a, unsigned char b){

unsigned char c; f[0] = 0; f[1] = 0x1b; f[2] = 0x36; f[3] = 0x2d; Bt[0] = 0; Bt[1] = b;

if (b & 0x80)

Bt[2] = (b << 1) ^ 0x1b;

else

Bt[2] = (b << 1);

Bt[3] = Bt[2] ^ b;

c= Bt[(a >> 6) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[(a >> 4) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[(a >> 2) & 0x3];

c=(c << 2) ^ f[c >> 6] ^ Bt[a & 0x3];

return c;

}

As mentioned above, the two methods of multiplication can be used for making an 2D array
GFMT[][] for lookup table method (*i*.*e*., GFMT[*i*][*j*]=GFM(*i*,*j*) where 0 *i*, *j* 255). An array

GFMT[][] needs 256*256=64K bytes for saving data. The lookup table method is shown as below:

**Lookup table method**

unsigned char GFM(unsigned char a, unsigned char b) {

unsigned char c=0;

c=GTMT[a][b];

return c;

}

**3.2 ** **Reducing multiplications in matrix multiplication **

The AES MixColumns transformation, the modular product of *A*(*x*) and *B*(*x*), is
presented as the four-term polynomial *D*(*x*), defined as

) ( mod ) ( ) ( )

(*x* *A* *x* *B* *x* *T* *x*

*D* (3)

where *T*(*x*)*x*^{4}1, *A*(*x*)*a*_{3}*x*^{3}*a*_{2}*x*^{2}*a*_{1}*x**a*_{0}and*B*(*x*)*b*_{3}*x*^{3}*b*_{2}*x*^{2}*b*_{1}*x**b*_{0}, for *a*_{i},*b*_{i}*GF*(2^{m}).
By (3), there is a circulant matrix form as:

.

3 2 1 0

0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

3 2 1 0

*b*
*b*
*b*
*b*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*d*
*d*
*d*
*d*

(4)

In (4), the matrix *D* is a product of matrices *A* and *B,* which requires 16 multiplications and 12
additions (16M, 12A) listed below:

**(16M, 12A)**

3 0 2 1 1 2 0 3 3

3 3 2 0 1 1 0 2 2

3 2 2 3 1 0 0 1 1

3 1 2 2 1 3 0 0 0

*b*
*a*
*b*
*a*
*b*
*a*
*b*
*a*
*d*

*b*
*a*
*b*
*a*
*b*
*a*
*b*
*a*
*d*

*b*
*a*
*b*
*a*
*b*
*a*
*b*
*a*
*d*

*b*
*a*
*b*
*a*
*b*
*a*
*b*
*a*
*d*

Using the two-point cyclic convolution matrix property for 22 matrices multiplication is given by:

###

###

^{.}

y

0 1 0 1 0 0

1 1 0 1 0 0 1 0 0 1

1 0 1

0

*b*
*a*
*a*
*b*
*b*
*a*

*b*
*a*
*a*
*b*
*b*
*a*
*b*
*b*
*a*
*a*

*a*
*a*

*y* (5)

Hence, the method only requires 3 multiplications and 4 additions (3M, 3A) as shown in Table 1.

Table 1: The two-point cyclic convolution method with (3M, 3A).

)
( _{0} _{1}

0

0 *a* *b* *b*

*s* ^{s}^{1}^{}^{a}^{0}^{}^{a}^{1}

1 1 0

0 *s* *s**b*

*y* *y*_{1}*s*_{0}*s*_{1}*b*_{0}

In Table 1, two entries *a*_{0} and *a*_{1} are fix data, the item *s*_{1}*a*_{0}*a*_{1} can be
precomputed in the program. Thus, the 2-point cyclic matrix method only uses 3
multiplications and 3 additions. If the matrices _{}

0 1

3 0

*a*
*a*

*a*

*A* *a* is not 2-point cyclic matrix,
that product of the matrix *A* and *B* is given by

.

1 0 0 1

1 3 0 0 1 0 0 1

3 0 1

0

*b*
*a*
*b*
*a*

*b*
*a*
*b*
*a*
*b*
*b*
*a*
*a*

*a*
*a*
*y*
*y*

(6)
**Theorem 1 **Let *A* be any *n**n* cyclic matrix, where *n**n*_{1}*n*_{2} and GCD(*n*_{1},*n*_{2})1, then the
matrix *A* can be partitioned into a cyclic *n*_{1}*n*_{1} matrix, in which entries are *n*_{2}*n*_{2} submatrix.

It is similar to the proof by Winograd (1978). Using (4), by Theorem 1, the four-point cyclic matrix can be partitioned as,

.

3 2 1 0

0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

3 2 1 0

*b*
*b*
*b*
*b*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*a*
*a*
*a*
*a*

*d*
*d*
*d*
*d*

(7)

From (7), it can be rewritten as

,

1 0 0 1

1 0 1

0

*B*
*B*
*A*
*A*

*A*
*A*
*D*

*D* (8)

where _{,} _{,} _{,} _{,}

2 3

1 2 1 0 1

3 0 0 3 2 1 1 0

0

*a*
*a*

*a*
*A* *a*

*a*
*a*

*a*
*A* *a*

*d*
*D* *d*
*d*

*D* *d* ,and .

3 2 1 1

0

0

*b*
*B* *b*
*b*

*B* *b* In (8), it

can be used to reduce the multiplications by (5) form as follows:

###

###

_{0}

_{1}

###

_{0}

_{1}

###

_{0}

^{,}

0

1 1 0 1 0 0 1

0

*H*
*F*

*G*
*F*
*B*
*A*
*A*
*B*
*B*
*A*

*B*
*A*
*A*
*B*
*B*
*A*
*D*

*D* (9)

where _{}

3 2 1 0 0 1

3 0 1

0

0( )

*b*
*b*
*b*
*b*
*a*
*a*

*a*
*B* *a*

*B*
*A*

*F* ( ) ,

3 2 2 0 3 1

1 3 2 0 1 1

0

*b*

*b*
*a*
*a*
*a*
*a*

*a*
*a*
*a*
*B* *a*

*A*
*A*

*G* and

. )

(

1 0 2 0 3 1

1 3 2 0 0 1

0

*b*

*b*
*a*
*a*
*a*
*a*

*a*
*a*
*a*
*B* *a*

*A*
*A*

*H* The matrix *F* can be form by (6), and matrix *G *and

matrix *H* are form by (5) yields:

) ( ) (

) ( ) (

3 1 0 2 0 1

3 1 3 2 0 0

*b*
*b*
*a*
*b*
*b*
*a*

*b*
*b*
*a*
*b*
*b*
*F* *a*

###

###

^{}

_{}

^{}

2 1 3 2 0 3 2 2 0

3 1 3 2 0 3 2 2 0

( )

( )

*b*
*a*
*a*
*a*
*a*
*b*
*b*
*a*
*a*

*b*
*a*
*a*
*a*
*a*
*b*
*b*
*a*
*G* *a*

###

###

^{}

_{}

^{}

0 1 3 2 0 1 0 2 0

1 1 3 2 0 1 0 2 0

( )

( )

*b*
*a*
*a*
*a*
*a*
*b*
*b*
*a*
*a*

*b*
*a*
*a*
*a*
*a*
*b*
*b*
*a*
*H* *a*

(10)

Obviously, the matrices *F*, *G*, and *H *are combination of the sets with element *b*_{i}.* *Rewrite the
terms in *s*_{0}*b*_{0}*b*_{2}, *s*_{1}*b*_{1}*b*_{3},*s*_{2}*a*_{0}*s*_{0}*a*_{3}*s*_{1},*s*_{3} *a*_{1}*s*_{0}*a*_{0}*s*_{1}, *s*_{4} *b*_{2}*b*_{3}, and *s*_{5} *b*_{0}*b*_{1} as
follows:

,

3 2

*s*

*F* *s*

###

###

###

) (###

^{,}

) (

( ) )

(

2 1 3 2 0 4 2 0

3 1 3 2 0 4 2

0

*b*
*a*
*a*
*a*
*a*
*s*
*a*
*a*

*b*
*a*
*a*
*a*
*a*
*s*
*a*

*G* *a* and

###

###

###

###

) (###

^{.}

) (

( ) )

(

0 1 3 2 0 5 2 0

1 1 3 2 0 5 2

0

*b*
*a*
*a*
*a*
*a*
*s*
*a*
*a*

*b*
*a*
*a*
*a*
*a*
*s*
*a*
*H* *a*

Next, the matrix *G* and matrix* H* are replaced with *w*_{0} *a*_{0}*a*_{2}and*w*_{1} *a*_{3}*a*_{1}. Thus, the matrix
*G *and *H *matrix can be given as

2 2 0

3 2 0

*b*
*r*
*r*

*b*
*r*

*G* *r* and ,

0 2 1

1 2

1

*b*
*r*
*r*

*b*
*r*
*H* *r*

where *r*_{0}*w*_{0}*s*_{4},*r*_{1}*w*_{0}*s*_{5} ,and*r*_{2}*w*_{0}*w*_{1}. Finally, the four-point cyclic matrix method can be
obtained as a new matrix form

0 2 1 3

1 2 1 2

2 2 0 3

3 2 0 2

3 2 1 0

1 0

*b*
*r*
*r*
*s*

*b*
*r*
*r*
*s*

*b*
*r*
*r*
*s*

*b*
*r*
*r*
*s*

*d*
*d*
*d*
*d*

*H*
*F*

*G*
*F*
*D*

*D* .

In the simplified case, the MixColumns transformation can be performed by 10
multiplications and 17 additions. Two items *w*_{0}*a*_{0}*a*_{2} ,*w*_{1}*a*_{3}*a*_{1} and *r*_{2}*w*_{0}*w*_{1} are
known because the value *a*_{i} of the coefficients of polynomial *A*(*x*), can be precomputed in
the program. So that the method only uses 10 multiplications and 14 additions, that is
remarked as **(10M, 14A)**.

**Scheme 1. (10M, 14A)**

1 3 1 2 0 0 1 0 0 1 3 1 3 0 0 2

3 1 1 2 0 0

, ,

, ,

*a*
*a*
*w*
*a*
*a*
*w*
*s*
*a*
*s*
*a*
*s*
*s*
*a*
*s*
*a*
*s*

*b*
*b*
*s*
*b*
*b*
*s*

1 0 2 1 0 0 1 3 2 0

0 *w*(*b* *b*) *r* *w*(*b* *b*), *r* *w* *w*

*r*

0 2 1 3 3

1 2 1 2 2

2 2 0 3 1

3 2 0 2 0

*b*
*r*
*r*
*s*
*d*

*b*
*r*
*r*
*s*
*d*

*b*
*r*
*r*
*s*
*d*

*b*
*r*
*r*
*s*
*d*

**3.3 ** **Reducing multiplications by multiply 2 **

The matrix product

1 0 0 1

3 0 0

0 *b*

*b*
*a*
*a*

*a*
*B* *a*

*A* can be further simplified by properties of
addition over *GF*(2^{m}). Adding two entries of 2*a*_{0}*b*_{1}0 and 2*a*_{0}*b*_{0} 0 are into matrix product
*A*0*B*0 as follows:

###

###

^{.}

2 2

0 1 0 1 0 0

1 3 0 1 0 0 0 0 1 0 0 1

1 0 1 3 0 0 0

0

*b*
*a*
*a*
*b*
*b*
*a*

*b*
*a*
*a*
*b*
*b*
*a*
*b*
*a*
*b*
*a*
*b*
*a*

*b*
*a*
*b*
*a*
*b*
*B* *a*

*A* (11)

In Scheme 2, the matrix *F* was replaced by (6). Now, the matrix *F* is replaced by (11) to
obtain the following matrix

###

###

( ) ( )###

^{.}

) ( ) (

2 0 1 0 3 1 2 0 0

3 1 3 0 3 1 2 0 0 3 1

2 0 0 1

3

0

*b*
*b*
*a*
*a*
*b*
*b*
*b*
*b*
*a*

*b*
*b*
*a*
*a*
*b*
*b*
*b*
*b*
*a*
*b*
*b*

*b*
*b*
*a*
*a*

*a*
*F* *a*

###

###

_{0}

_{1}

###

_{1}

_{0}

_{2}

^{2}

_{1}

^{0}

_{0}

^{1}

^{,}

0

1 0 1 0

0

*s*
*t*
*t*

*s*
*t*
*t*
*s*
*t*
*s*
*s*
*a*

*s*
*t*
*s*
*s*
*F* *a*

where *s*_{0} *b*_{0}*b*_{2},*s*_{1} *b*_{1}*b*_{3},*t*_{0} *a*_{0}*a*_{3},*t*_{1} *a*_{0}*a*_{1},and _{t}_{2}*a*_{0}

###

*s*

_{0}

*s*

_{1}

###

.In Scheme 1, the two items *s*_{2}*a*_{0}*s*_{0}*a*_{3}*s*_{1} and *s*_{3}*a*_{1}*s*_{0}*a*_{0}*s*_{1} can be replaced as

1,

0 2

2 *t* *t**s*

*s* *s*_{3}*t*_{2}*t*_{1}*s*_{0}, *t*_{0}*a*_{0}*a*_{3}, *t*_{1}*a*_{0}*a*_{1}, and *t*_{2} *a*_{0}

###

*s*

_{0}

*s*

_{1}

###

for computing MixColumns transformation. Consequently, in Scheme 1, each*r*

_{2}

*b*

_{i}term can be replaced with lookup table method of

*tc*[

*b*

_{i}]

*r*

_{2}

*b*

_{i}, namely, constant multiplication doesn’t require computing multiplications as it did. It needs 256 bytes of memory, which is called Scheme 2.

Scheme 1 can further be rewritten as follows:

**Scheme 2. (5M, 15A) **(It needs 256 bytes as lookup table)

###

2 0 0 0 1 2 3 1 0 2 2

1 0 0 2 1 0 1 3 0 0

3 1 1 2 0 0

, ,

, ,

,

*a*
*a*
*w*
*s*
*t*
*t*
*s*
*s*
*t*
*t*
*s*

*s*
*s*
*a*
*t*
*a*
*a*
*t*
*a*
*a*
*t*

*b*
*b*
*s*
*b*
*b*
*s*

) ( ),

( _{2} _{3} _{1} _{0} _{0} _{1}

0

0 *w* *b* *b* *r* *w* *b* *b*

*r*

] [

] [

] [

] [

0 1 3 3

1 1 2 2

2 0 3 1

3 0 2 0

*b*
*tc*
*r*
*s*
*d*

*b*
*tc*
*r*
*s*
*d*

*b*
*tc*
*r*
*s*
*d*

*b*
*tc*
*r*
*s*
*d*

In Scheme 2, it uses only 5 multiplications and 18 additions with 256 bytes of memory
for matrix multiplication. Obviously, if the coefficients of the polynomial *A*(*x*) have the
equality *a*_{0}+*a*_{3}+*a*_{2}+*a*_{1}=1 in AES standard, then the property would make *r*_{2}=*w*_{0}+*w*_{1}=1, based
on Scheme 2. Consequently, the *r*_{2}1 doesn’t require lookup table computing as it did in
Scheme 2 (*e*.*g*., *tc*[*b*_{i}]*r*_{2}*b*_{i} 1*b*_{i} ), does not need memory used in embedded system, so that
the method can be rewritten as Scheme 3. In Scheme 3, there are three items

,

and ,

, _{0} _{0} _{3} _{1} _{0} _{1}

2 0

0 *a* *a* *t* *a* *a* *t* *a* *a*

*w* which can be precomputed in the program, so that the
method only used 5 multiplications and 15 additions, namely, **(5M, 15A)**.