• 沒有找到結果。

Halve-and-Add Algorithm

The halve-and-add algorithm[5] is similar to double-and-add algorithm but the point doubling step is replaced by point halving. Next, the procedure of point halving is given.

Point Halving

For P=(x1, y1), 2P=(x3, y3), the formula of point doubling is given in equation (2.38) which is the same as:

1 1

1 x

x + y λ =

a x32 +λ+

) 1

3(

2

3 = x1 +x λ+

y 

(3.2)

Point halving is the reverse of point doubling. Given an input point 2P=(x3, y3) find P=(x1, y1). In order to compute x1, and y1, first we have to solve λ from:

Solve

The idea of trace plays an important role in deriving the algorithm for point having. Let cGF(2n), trace is defined as:

The trace of an element in finite field is either 0 or 1. Following are some properties of trace: let c,dGF(2n),

Trace is linear:

)

My implement uses pseudo-random curve over GF(2163) which has the form

b

The coefficient a in equation (2.16) is always equal to 1. So:

Tr(a)=1 (3.10)

Tr(x)=Tr(a) (3.11)

The following theorem finds the correct solution of equation (3.3) while halving a point:

Let P=(x1, y1) and 2P=(x3, y3).

Let λˆ be a solution to (3.3) and t= y3 +x3λˆ.

Suppose that Tr(a)=1. Then λˆ is the correct solution if and only if Tr(t)=0

(3.12)

We will prove the theorem. If λˆ is a correct solution then it will satisfy equation (4.4), that is,

) ˆ 1

3(

3 2

1 = y +x λ+

x (3.13)

From equation (4.10) and equation (4.11)

Tr(y3 +x3(λˆ+1))=Tr(x12

)=Tr(x1)=Tr(a)=1 (3.14)

and

Tr(y3 +x3(λˆ+1))=Tr((y3+x3λˆ)+x3)=Tr(y3+ x3λˆ)+Tr(x3)=Tr(t)+1 (3.15)

Finally, we can get

= +

Else if λˆ is not a correct solution then the correct solution must be λˆ+1. Now λˆ+1 will satisfy equation (3.4), substitute λˆ+1 into equation (3.4)

ˆ) the λ-representation of 2P as the input to point halving, then t in equation (3.12) can be computed directly from this λ-representation

λ

Next is the full algorithm of point halving. The input of the algorithm is λ -representation 2P=(x ,λ ). The output is the λ-representation of P=(x ,λ )

1. Find a solution λˆ of λ2 +λ =a+x3

2. Compute t= x3(x33 +λˆ)

3. If Tr(t)=0, then λ1=λˆ, x1 = t+x3

else λ1=λˆ+1, x1 = t

(3.21)

Point halving requires a multiplication and three major operations:

Solving λ2 +λ =a+x3

Computing the trace of t

Calculating a square root t or t+x3

Normal basis is of the form {β2n12n2,...,β2,β}. Let c be an element in field GF(2n).

By equation (2.3):

β β

β

β2 2 2 1 2 0

1 1 c 2,...,c c c

c= n n + n n + (3.22)

The trace of c is

0 1 2

1 c ,..., c c c

c= n + n + + + (3.23)

The square root equals a cyclic shift right one bit, an inverse of squaring.

Now deal with the solutions of the second degree equation in (3.21). Let c equation (3.22), there are two ways to solve a second degree equation as given bellow.

=c

A solution is given by:

,

These operations are expected to be inexpensive relative to normal basis multiplication.

Or we can solve equation (3.25) by half-trace

1

Substitute equation (3.28) into (3.25) and from equation (2.2) (2.5)

c

3 3

3 3

2

3) ( ) ( )

(a x H a x tr a x a x a x

H + + + = + + + = + (3.31)

Compare the operations of point halving and point doubling in affine and projective coordinates.

Table 3.1: Comparison between halving and doubling in affine and projective coordinates Operations Affine coordinates Projective coordinates Halving

Multiplication 2 4 1

Squaring 1 5 0

Inversion 1 0 0

Solving Second

Degree Equation 0 0 1

Square Root 0 0 1

Check 0 0 1

If computation time of 1 second degree equation solving + 1 square root + 1 check is less than 3 multiplications + 5 squaring, then halving a better performance than point doubling in projective coordinates.

Halve-and-Add Algorithm

Now we have gone through point halving. We want to employ it into scalar multiplication. Let GF(2n), given a point P on elliptic curve of odd odder r and a scalar k.

In order to compute kP, we will prove that[6]:

For every scalar k, we can find k’ such that

1 '

n k

i

(3.32)

}

Divide both side by 2n-1 gives the result:

}

Next is a left-to-right version of the halve-and-add algorithm, where k is converted to k’ by equation (3.33) first. Given GF(2n), the input is k’ and P while the output is kP. before added to Q. Q could have projective coordinates and Q+P is done by (2.41).

For example, let GF(24) and r=11=”1011”. Given P and a scalar k=10, compute kP.

First we will convert k using equation (3.33):

"

One is required to compute the value of P/2(mod11) in this example. We have 2-1(mod11)=6, since 6*2(mod11)=12(mod11)=1. Given any integer x, x/2(mod11)=x*6(mod11). From (3.35)

k'= “0 0 1 1”

P= P ÆP/2=6P Æ6P/2=3P Æ14P/2=7P

Q= O ÆO ÆO ÆO+3P=3P Æ3P+7P=10P

The result is the same as the one computed from (3.1).

Another version of the halve-and-add algorithm is a right-to-left method. Point halving occurs on the accumulator Q, hence the projective coordinates is not usable.

}

Use the same condition GF(24) and r=11=”1011”. Given P and a scalar k=10, that is, k’=”0011”. Start from right to left

k'= “0 0 1 1”

9P/2=10PÅ 7P/2=9PÅ P/2+P=6P+P=7PÅ O+P=PÅ O =Q

And the final answer is 10P. Unlike algorithm (3.35), here only requires one register for Q.

We can further encode the scalar k or k’ of the halve-and-add algorithm when computing kP to reduce the Hamming weight of k or k, hence reduce the amount of point additions.

Since point addition is more expensive than point doubling or halving, the performance of scalar multiplication is improved. Add-and-subtract algorithm [2] eliminates the situation of continuous 1’s by combinations of additions and subtractions. Given an n-bit scalar k

=

= n

i i

ei

k

0

2 ei∈{−1,0,1}

(3.37)

Using add-and-subtract algorithm, we find m:

Let kn1kn2... kk1 0 be the binary representation of k, Let hnhn1... hh1 0 be the sum of kn1kn2... kk1 0+kn1kn2...k1 Let gngn1...g1g0 equals to 00kn1kn2...k1

for i from 0 to n {

if hi=1 and gi=0, then ei=1 else if hi=0 and gi=1, then ei=-1 else ei=0

}

(3.38)

Take k=29=”11101” for example. h=”11101”+”1110”=”101011”

h= “1 0 1 0 1 1”

g= “0 0 1 1 1 0”

e= “1 0 0 -1 0 1”

It’s easy to verify that:

29

Combine add-and-subtract algorithm with (3.35):

}

-P is given by (2.36). Combining add-and-subtract algorithm with (3.1) or (3.36) will do too.

C HAPTER 4

Implementation Results and Comparisons

My implementation uses pseudo-random curve of the form in normal basis over GF(2163)

b x x xy

y2 + = 3 + 2 + (4.1)

The normal basis is of type 4 which is not optimal normal basis. The base point P=(Px, Py)

β β β

β2 161 2 1 2 0

162 162 x 161,...,x ,x x

Px = + (4.2)

β β β

β2 161 2 1 2 0

162 162 y 161,...,y ,y y

Py = + (4.3)

Express Px and Py as 163bit numbers x162x161,...,x1x0 and y162y161,...,y1y0. Their value in hexadecimal equals

Px=0_bb95_2eb0_8fc0_b1c8_699f_739a_9357_3474_1e04_4460 (4.4)

Py= 7_f185_6ef0_98cf_adc8_077e_e437_33a7_f113_1e41_ae66 (4.5)

If P is in λ-representation, then

Pλ= 3_e6c0_a681_341a_b0a3_6cc5_c338_7bff_ea7e_014f_a6a3 (4.6)

The value of coefficient b in equation (4.1) is

b= 6_fcde_3c9e_f967_437b_e459_b1ce_438e_3479_a9e7_d133 (4.7)

r=5846006549323611672814742442876390689256843201587 (4.8)

The number of points on elliptic curve is 2r.

The fundamental element of the entire circuits is the GF(2163) normal basis serial multiplier.

Let the inputs equal (2.5) and output equals (2.6). Using the algorithm in [2], derive the product.

...

) (

)

( 0 13 132 117 2 117 92 111 145

1

0 =a b +b +b +b +a b +b +b +b +

c (4.9)

The formulas for other coordinates can be derived from above:

...

) (

)

( 1 14 133 118 3 118 93 112 146

2

1 =a b +b +b +b +a b +b +b +b +

c

...

) (

)

( 2 15 134 119 4 119 94 113 147

3

2 =a b +b +b +b +a b +b +b +b +

c

#

We can implement this using three register to store input A, B, and output C. Implement equation (4.9) and cyclic shift these three register by one bit at each cycle. The product is generated bit by bit. The circuit diagram is given bellow:

Figure 4.1: Normal Basis Multiplier version 1

The combinational circuit of the input of c0 is concealed. Only the idea of connection is given. The latency of this multiplier is 163 cycles and c0 has a larger fain-in. We can modify the above multiplier by adding one term at one cycle[9]. For example:

) ( 0 13 132 117

1 1

2 c a b b b b

c = + + + +

) ( 119 94 113 147

4 2

3 c a b b b b

c = + + + +

#

The following is the multiplication cell for adding one term at each cycle:

Figure 4.2: Multiplier element of ck Modify the original multiplier we’ll get:

This is a conceptual diagram showing the difference of wiring. The fan-in of the output register is reduced. Another benefit of this multiplier is that we could set the register of C to a value say D at beginning. Then the final output will equal A*B+D equivalent to the effect of a MAC, multiplication-and-accumulator.

The solution of the second degree equation is given by equation (3.27). This can be easily implemented using a one bit register and an exclusive-or. Since the solution is given out serially, we can modify the above multiplier by adding each ai term of the product at each cycle. For example,

( )

(

111 145 117 92

)

2 1

132 0 117 13 1

0

b b b b a c c

b b b b a c c

+ + + +

=

+ + + +

=

#

Use similar cells in Figure 4.2, the new normal basis multiplier is

Figure 4.4 serial input normal basis multiplier

Combine the solution circuit with the serial input normal results an efficient implementation

The input of point halving is in λ-representation. For the implementation of point halving, a normal basis multiplier is used. The second degree equation is solved by half-trace as given by equation (3.31). Trace t is given by exclusive-or every bit of t. Since only one multiplier is required, the over all latency is 163 cycles. The architecture of point halving is given bellow. Let 2P=(x3, λ3), the output is P=(x1, λ1)

Figure 4.5: Circuit for point halving The procedure of point halving is:

Figure 4.6: Point halving flow

The coefficient a of pseudo-random is always equal to one. One or the multiplication identity in normal basis is a number where every bit of it is 1. The right hand side of equation (3.3) equals:

3 3

3 1 x x

x

a+ = + =

That is, exclusive-or each bit of x3 with 1 is the same as inverting each bit.

In order to implement scalar multiplication efficiently, algorithm (3.39) is chosen. Since the point addition in projective coordinates requires no inversion, we let the accumulator Q of (3.39) in projective coordinates. The point addition Q+P or Q-P has Q in projective coordinates and P in λ-representation P=(X1,λ1). From (3.5) we modify formula (2.41) as:

,

My implementation of (2.41) contains three multipliers. Due to the data dependency, the data calculated at each multiplication is arranged as follow with minimum latency. The data dependency is indicated.

Table 4.1: The data flow of mix-coordinates addition (5.10)

As we can see from the above table, the timing of this mix-coordinates addition equals to 4 multiplications which is 4*163 cycles.

The following is the circuit diagram of the mix-coordinates addition. The multiplier in the diagram has three inputs where two are from multiplication and one for accumulation.

The neg signal is for adding –P to Q. The ini signal indicates the initial condition when O+P=P. That is, X3=X2, Y3=Y2 or X2+Y2, and Z3=1

Figure 4.7: Circuit for mix-coordinates addition

My proposed design is a scalar multiplication circuit based on algorithm (3.39). It is composed of the point halving circuit and the point adding circuit plus some control signals.

The inputs are k’ which is derived from k as shown in (3.33) and base point P. The output is kP. k' is first encoded into e as in (3.39). From (3.38), the implementation of the encoding logic uses two shift register to store g and h. The shift registers shift one bit every one point halving complete. We observe the msb of the g and h registers to decide whether the input to the point addition circuits is P or -P. Since there are separate registers for the accumulator Q and P, the halving circuit and adding circuits can process at the same time. This makes

different, the halving circuit must hold its output until the adding circuit reads the result. The point addition circuit adds P or –P to the accumulator when ei is 1 or -1. The control flow of the whole circuit is:

Figure 4.8: The control of point halving and projective addition

The synthesized result is given bellow. The cycle time is set to 5ns and the synthesis standard library is 0.18μm technology.

Table 4.2: The synthesized results

Circuits Gate Counts

Multiplier 6961

Halving 14321

Addition 45723

Scalar multiplication 77100

The average latency of scalar multiplication is about 37000 cycles and frequent 200Mhz. So the throughput is 2*163*200Mhz/37000=1.76Mbit/s

The verification is given by an integrated FPGA system called iProve. This system allows displaying the outputs from FPGA on ModelSim directly. The FPGA chip is Xilinx Virtex2: XC2V8000. The synthesis frequency is set to 90Mhz and the total LUTs is 8815.

The table bellow lists a comparison of the Elliptic Curve Cryptosystems implementation.

We can see that our design has about the same throughput as [12] while the area is smaller.

Table 4.3: The performance comparison of Elliptic Curve Cryptosystems implementations on ASIC

Authors Huang [10] Okada [11] Bai [12] Daneshbeh [13] Sozzani [14] Proposed

Technology 0.35μm 0.25μm 0.18μm 0.18μm 0.13μm 0.18μm

Field GF(2251) GF(2163) GF(2233) GF(2163) GF(2163) GF(2163)

Gate counts 56K 165K 120K 74K ? 77K

Clock rate 100Mhz 66Mhz 100Mhz 700Mhz 400Mhz 200Mhz

Latency for kP (cycles)

? ? ? 212,552 11,320 37,000

Processor Y Y N Y Y N

Algorithm for kP

Montgomery

(affine) ? Montgomer y

Double -and -Add (serial)

Montgomery (parallel)

Halve -and- Add

Basis Poly Poly Poly Poly Poly Normal

Throughput 91Kb/s 501Kb/s 1.86Mb/s 1.1Mb/s 12Mb/s 1.76Mb/s

Table 4.4: The performance comparison of Elliptic Curve Cryptosystems implementations on FPGA

Authors Orlando &

Paar[15] Gura[16] Lutz[17] Proposed

Platform Xilinx

XCV400E

Xilinx XCV2000E

Xilinx XCV2000E

Xilinx XC2V8000

Technology 0.18μm 0.18μm 0.18μm 0.15/0.12μm

Field 2167 2163 2163 2163

LUTs 3002 19508 10017 8815

FFs 1769 6442 1930 N/A

Processor Y Y Y N

Clock rate 76Mhz 66Mhz 66Mhz 90Mhz

Algorithm for kP Montgomery Montgomery τ-NAF Halve-and-Add

C HAPTER 5

Conclusion

In this paper, an implementation of Elliptic Curve Cryptosystems is shown. The architecture uses point halving to reduce the computation complexity. Point halving only requires one multiplier and some addition circuits. We can replace double-and-add algorithm by halve-and-add algorithms.

The normal basis multiplier in the implementation is a serial multiplier. The projective addition circuit contains three multiplier and the timing equals to 4 times the timing of a multiplier and no inversion over finite field is required. The input is encoded as for the use of halve-and-add. We can further reduce the Hamming weight of the input, using add-and-subtract algorithm. The halving circuit and projective addition circuit can work in parallel under certain condition when the data have no dependency.

The implementation is synthesized using synthesis library of 0.18μm technology. We use Xilinx Virtex2 (XC2V8000) to verify the implementation.

A PPENDIX

Elliptic Curve Cryptosystems

In elliptic curve cryptosystems, we need to map a message onto a point on an elliptic curve.

Then elliptic curve cryptosystems operate on that point to yield a new point that serves as the ciphertext. The idea of the mapping method is the following. Let equation (2.15) be the elliptic curve. The message m will be assign as the x-coordinates of a point first. However, there is only 1/2 chance that there exist a solution y such that

)

3 (mod

2 x am b p

y ≡ + + (a.1)

Therefore, we append a few bits at the end of m, and try every pattern of these bits until there is a solution for equation (a.1). Namely, let K be a large integer so that when trying to map a message as a point on elliptic curve the failure rate of 1/2K is low. Suppose that

(m+1)K<p (a.2)

Represent the message m as

x=mK+j, where 0≤ j<K (a.3)

For j=0, 1, …, K-1, try to a solution y from (a.1). If a solution y exists, then message m is mapped to Pm=(x, y) and we can stop trying. Otherwise, increase j by one and use this new x to find a solution again. If we can’t found any solution for j=0 to K-1, then we failed to map

x K

m= / (a.4)

For example, let message m=5, p=179 and elliptic curve be y2 =x3+2x+7. Pick K=10, so the failure rate is 1/210, which is acceptable. x=mK+j=50+j, x=50, 51, …, 59. For x=51 we get x3+2x+7=121(mod 179), thus y=11. The message m is mapped to point (51, 11) and can be recover by m=

51/10

=5.

For elliptic curve over GF(2n) of the form (2.16). The steps of representing message m are the same. Let message m has t-bit, we append u-bit number j to the end of m and t+un.

The message m will be represented as x=m2u+j. For j=0, 1, …, 2u-1, try to find a solution y from (2.16). If a solution is found we take Pm=(x, y), else increase j and try again. Solving y from (2.16) given x is explained in [8].

Elliptic Curve Cryptosystems rely on the difficulty of solving the discrete logarithm problem for elliptic curves, which is described as follow. Suppose P, Q are two points on elliptic curve, find k such that Q=kP[7].

a.1 Elliptic Curve ElGamal Cryptosystem

An Elliptic Curve ElGamal Cryptosystem, a public key system, is one popular application of elliptic curve cryptography. One uses public key to encrypt plaintext and use private key to decrypt ciphertext. Let’s take a look at this cryptosystem. Alice wants to send a message to Bob, so Bob chooses an elliptic curve (2.15), where p is a large prime. He also chooses a point P and a scalar k, which is the private key. He computes

kP

Q= (a.5)

The point Q and P are public keys of Bob. Alice represents her message as a point x on elliptic curve (2.15). She also chooses a private integer a, and computes. The add and subtracts here are point operations.

aP

y1 = and y2 =x+aQ (a.6)

She sends y1 and y2 to Bob. Bob can decrypt x by calculating

x kaP akP x kaP aQ

x ky

y21 =( + )− = + − = (a.7)

Next is a example of Elliptic Curve ElGamal Cryptosystem. Let the point P=(4,11) and elliptic curve y2x3 +3x+45(mod8831). The message of Alice is represented as point Pm=(5, 1743). She wants to send the message to Bob.

Bob has a private key k=3 and computes Q=kP=(413, 1808). Q is made public. Alice takes Bob’s public key Q. She chooses a random number a=8. She computes y1=aP=(5415, 6321) and y2=Pm+aQ=(6626,3576) and sends (y1, y2) to Bob. Bob wants to decrypt (y1, y2).

Bob first calculates ky1=3(5415, 6321)=(673, 146) and subtracts this from y2

(6626, 3576)-(673,146)=(6626, 3576)+(673,-146)=(5,1743)

a.2 Elliptic Curve Diffie-Hellman Key Exchange

Another useful system is the Elliptic Curve Diffie-Hellman Key Exchange, which can be used for key exchange for private key system. Alice and Bob want to exchange a key. They

+ +

aP=(1794,6375) and bP=(3861, 1242)

Alice take bP and multiply by a to get the key

a(bP)=12(3861, 1242) =(1472,2098)

In the same way, Bob takes aP and compute b(aP)

a(bP)=12(3861, 1242) =(1472,2098)

Now they have the same key.

a.3 Elliptic Curve Digital Signature Algorithm

Signature is the opposite of public key system. One use the private to sign and others use the public key to verify the signature. Next is the Elliptic Curve Digital Signature Algorithm:

Let p be a prime and let elliptic curve E defined over GF(p).

A is a point on E having prime order q and define:

K=(p, q, E, P, m, Q), where Q=mP p, q, E, P and Q are public key and m is the private key K=(p, q, E, P, m, Q) and k is a random number, define

sigK(x, k)=(r, s), where

kP=(u, v)

(a.8)

s=k-1(SHA-1(x)+mr)mod q Verification is given bellow:

w=s-1modq i=wSHA-1(x) mod q

j=wrmod q (u,v)=iP+jQ verK(x, (r,s)) is true if and only if

u mod q= r

Let E: )y2x3 +x+6(mod11 and p=11, q=13, P=(2,7), m=7 and Q=(7,2). Suppose message x and SHA-1(x)=4, Alice sign the message with random value k=3. She computes:

(u, v)=3(2, 7)=(8, 3) r=u mod 13=8, and s=3-1(4+7*8)mod 13=7 (8, 7) is the signature.

Bob verifies the signature by

w=7-1 mod 13=2 i=2*4mod 13=8

u mod 13=8=r.

Then the signature is verified.

BIBLIOGRAPHY

[1] “Certicom ECC FAQ”, http://www.certicom.com/index.php?action=ecc,ecc_faq

[2] IEEE Std 1363-2000, IEEE standard specifications for public-key cryptography, IEEE Computer Society, August 29, 2000.

[3] Douglas R. Stinson, Cryptography: Theory and Practice - Second edition, Chapman &

Hall/CRC , 2002

[4] J. Lopez and R. Dahab, “Improved algorithms for elliptic curve arithmetic in GF(2n)", Selected Areas in Cryptography - SAC '98, LNCS 1556, 1999, 201-212.

[5] K. Fong, D. Hankerson, J. Lopez, and A. Menezes. “Field Inversion and Point Halving Revisited". IEEE Transactions on Computers, 53(8):1047-1059, August 2004.

[6] E. Knudsen, “Elliptic scalar multiplication using point halving", Advances in Cryptology - Asiacrypt '99, LNCS 1716, 1999, 135-149.

[7] W. Trappe and L.C. Washington: Introduction to Cryptography with Coding Theory, Prentice Hall, 2001.

[8] A. X9.62. Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), 1998.

[9] Philip H. W. Leong and Ivan K. H. Leung. “A microcoded elliptic curve processor using FPGA technology”. IEEE Transactions on VLSI Systems, 10(5), October 2002.

[11] Souichi Okada, Naoya Torii, Kouichi Itoh, and Masahiko Takenaka. “Implementation of elliptic curve cryptographic coprocessor over GF(2m) on an FPGA.” In Cryptographic Hardware and Embedded Systems (CHES), pages 25–40. Springer-Verlag, 2000.

[12] Guoqiang Bai, Zhun Huang, Hang Yuan, Hongyi Chen, Ming Liu, Gang Chen, Tao Zhou, and Zhihua Chen. “A high performance VLSI chip of the elliptic curve cryptosystems,”

7th Int. Conf. SICT, pp. 2059-2062, Oct. 2004

[13] A. Daneshbeh, M. Hasan, “Area Efficient High Speed Elliptic Curve Cryptoprocessors for Random Curves,” Proceedings of ITCC 04, Las Vegas, NE, USA, 2004

[14] F. Sozzani, G. Bertoni, S. Turcato, L. Breveglieri, “A parallelized Design for an Elliptic Curve Cryptosystem Coprocessor” Proceedings of ITCC 05, 2005.

[15] G. Orlando and C. Paar. “A high-performance reconfigurable elliptic curve processor for GF(2m).” In Cryptographic Hardware and Embedded Systems (CHES), 2000.

[16] N. Gura, S. C. Shantz, H. Eberle, S. Gupta, V. Gupta, D. Finchelstein, E. Goupy, and D.

Stebila. “And end-to-end systems approach to elliptic curve cryptography.” In Cryptographic Hardware and Embedded Systems (CHES), 2002.

[17] J. Lutz, A. Hasan., “High Performance FPGA based Elliptic Curve Cryptographic Co-Processor”. Proceedings of ITCC 04, Las Vegas, NE, USA, 2004

相關文件