The halve-and-add algorithm[5] is similar to double-and-add algorithm but the point doubling step is replaced by point halving. Next, the procedure of point halving is given.
Point Halving
For P=(x1, y1), 2P=(x3, y3), the formula of point doubling is given in equation (2.38) which is the same as:
1 1
1 x
x + y λ =
a x3 =λ2 +λ+
) 1
3(
2
3 = x1 +x λ+
y
(3.2)
Point halving is the reverse of point doubling. Given an input point 2P=(x3, y3) find P=(x1, y1). In order to compute x1, and y1, first we have to solve λ from:
Solve
The idea of trace plays an important role in deriving the algorithm for point having. Let c∈GF(2n), trace is defined as:
The trace of an element in finite field is either 0 or 1. Following are some properties of trace: let c,d∈GF(2n),
Trace is linear:
)
My implement uses pseudo-random curve over GF(2163) which has the form
b
The coefficient a in equation (2.16) is always equal to 1. So:
Tr(a)=1 (3.10)
Tr(x)=Tr(a) (3.11)
The following theorem finds the correct solution of equation (3.3) while halving a point:
Let P=(x1, y1) and 2P=(x3, y3).
Let λˆ be a solution to (3.3) and t= y3 +x3λˆ.
Suppose that Tr(a)=1. Then λˆ is the correct solution if and only if Tr(t)=0
(3.12)
We will prove the theorem. If λˆ is a correct solution then it will satisfy equation (4.4), that is,
) ˆ 1
3(
3 2
1 = y +x λ+
x (3.13)
From equation (4.10) and equation (4.11)
Tr(y3 +x3(λˆ+1))=Tr(x12
)=Tr(x1)=Tr(a)=1 (3.14)
and
Tr(y3 +x3(λˆ+1))=Tr((y3+x3λˆ)+x3)=Tr(y3+ x3λˆ)+Tr(x3)=Tr(t)+1 (3.15)
Finally, we can get
= +
Else if λˆ is not a correct solution then the correct solution must be λˆ+1. Now λˆ+1 will satisfy equation (3.4), substitute λˆ+1 into equation (3.4)
ˆ) the λ-representation of 2P as the input to point halving, then t in equation (3.12) can be computed directly from this λ-representation
λ
Next is the full algorithm of point halving. The input of the algorithm is λ -representation 2P=(x ,λ ). The output is the λ-representation of P=(x ,λ )
1. Find a solution λˆ of λ2 +λ =a+x3
2. Compute t= x3(x3+λ3 +λˆ)
3. If Tr(t)=0, then λ1=λˆ, x1 = t+x3
else λ1=λˆ+1, x1 = t
(3.21)
Point halving requires a multiplication and three major operations:
Solving λ2 +λ =a+x3
Computing the trace of t
Calculating a square root t or t+x3
Normal basis is of the form {β2n−1,β2n−2,...,β2,β}. Let c be an element in field GF(2n).
By equation (2.3):
β β
β
β2 2 2 1 2 0
1 1 c 2,...,c c c
c= n− n− + n− n− + (3.22)
The trace of c is
0 1 2
1 c ,..., c c c
c= n− + n− + + + (3.23)
The square root equals a cyclic shift right one bit, an inverse of squaring.
Now deal with the solutions of the second degree equation in (3.21). Let c equation (3.22), there are two ways to solve a second degree equation as given bellow.
=c
A solution is given by:
,
These operations are expected to be inexpensive relative to normal basis multiplication.
Or we can solve equation (3.25) by half-trace
1
Substitute equation (3.28) into (3.25) and from equation (2.2) (2.5)
c
3 3
3 3
2
3) ( ) ( )
(a x H a x tr a x a x a x
H + + + = + + + = + (3.31)
Compare the operations of point halving and point doubling in affine and projective coordinates.
Table 3.1: Comparison between halving and doubling in affine and projective coordinates Operations Affine coordinates Projective coordinates Halving
Multiplication 2 4 1
Squaring 1 5 0
Inversion 1 0 0
Solving Second
Degree Equation 0 0 1
Square Root 0 0 1
Check 0 0 1
If computation time of 1 second degree equation solving + 1 square root + 1 check is less than 3 multiplications + 5 squaring, then halving a better performance than point doubling in projective coordinates.
Halve-and-Add Algorithm
Now we have gone through point halving. We want to employ it into scalar multiplication. Let GF(2n), given a point P on elliptic curve of odd odder r and a scalar k.
In order to compute kP, we will prove that[6]:
For every scalar k, we can find k’ such that
1 '
n k
∑
− i≡
(3.32)
}
Divide both side by 2n-1 gives the result:
}
Next is a left-to-right version of the halve-and-add algorithm, where k is converted to k’ by equation (3.33) first. Given GF(2n), the input is k’ and P while the output is kP. before added to Q. Q could have projective coordinates and Q+P is done by (2.41).
For example, let GF(24) and r=11=”1011”. Given P and a scalar k=10, compute kP.
First we will convert k using equation (3.33):
"
One is required to compute the value of P/2(mod11) in this example. We have 2-1(mod11)=6, since 6*2(mod11)=12(mod11)=1. Given any integer x, x/2(mod11)=x*6(mod11). From (3.35)
k'= “0 0 1 1”
P= P ÆP/2=6P Æ6P/2=3P Æ14P/2=7P
Q= O ÆO ÆO ÆO+3P=3P Æ3P+7P=10P
The result is the same as the one computed from (3.1).
Another version of the halve-and-add algorithm is a right-to-left method. Point halving occurs on the accumulator Q, hence the projective coordinates is not usable.
}
Use the same condition GF(24) and r=11=”1011”. Given P and a scalar k=10, that is, k’=”0011”. Start from right to left
k'= “0 0 1 1”
9P/2=10PÅ 7P/2=9PÅ P/2+P=6P+P=7PÅ O+P=PÅ O =Q
And the final answer is 10P. Unlike algorithm (3.35), here only requires one register for Q.
We can further encode the scalar k or k’ of the halve-and-add algorithm when computing kP to reduce the Hamming weight of k or k’, hence reduce the amount of point additions.
Since point addition is more expensive than point doubling or halving, the performance of scalar multiplication is improved. Add-and-subtract algorithm [2] eliminates the situation of continuous 1’s by combinations of additions and subtractions. Given an n-bit scalar k
∑
== n
i i
ei
k
0
2 ei∈{−1,0,1}
(3.37)
Using add-and-subtract algorithm, we find m:
Let kn−1kn−2... kk1 0 be the binary representation of k, Let hnhn−1... hh1 0 be the sum of kn−1kn−2... kk1 0+kn−1kn−2...k1 Let gngn−1...g1g0 equals to 00kn−1kn−2...k1
for i from 0 to n {
if hi=1 and gi=0, then ei=1 else if hi=0 and gi=1, then ei=-1 else ei=0
}
(3.38)
Take k=29=”11101” for example. h=”11101”+”1110”=”101011”
h= “1 0 1 0 1 1”
g= “0 0 1 1 1 0”
e= “1 0 0 -1 0 1”
It’s easy to verify that:
29
Combine add-and-subtract algorithm with (3.35):
}
-P is given by (2.36). Combining add-and-subtract algorithm with (3.1) or (3.36) will do too.
C HAPTER 4
Implementation Results and Comparisons
My implementation uses pseudo-random curve of the form in normal basis over GF(2163)
b x x xy
y2 + = 3 + 2 + (4.1)
The normal basis is of type 4 which is not optimal normal basis. The base point P=(Px, Py)
β β β
β2 161 2 1 2 0
162 162 x 161,...,x ,x x
Px = + (4.2)
β β β
β2 161 2 1 2 0
162 162 y 161,...,y ,y y
Py = + (4.3)
Express Px and Py as 163bit numbers x162x161,...,x1x0 and y162y161,...,y1y0. Their value in hexadecimal equals
Px=0_bb95_2eb0_8fc0_b1c8_699f_739a_9357_3474_1e04_4460 (4.4)
Py= 7_f185_6ef0_98cf_adc8_077e_e437_33a7_f113_1e41_ae66 (4.5)
If P is in λ-representation, then
Pλ= 3_e6c0_a681_341a_b0a3_6cc5_c338_7bff_ea7e_014f_a6a3 (4.6)
The value of coefficient b in equation (4.1) is
b= 6_fcde_3c9e_f967_437b_e459_b1ce_438e_3479_a9e7_d133 (4.7)
r=5846006549323611672814742442876390689256843201587 (4.8)
The number of points on elliptic curve is 2r.
The fundamental element of the entire circuits is the GF(2163) normal basis serial multiplier.
Let the inputs equal (2.5) and output equals (2.6). Using the algorithm in [2], derive the product.
...
) (
)
( 0 13 132 117 2 117 92 111 145
1
0 =a b +b +b +b +a b +b +b +b +
c (4.9)
The formulas for other coordinates can be derived from above:
...
) (
)
( 1 14 133 118 3 118 93 112 146
2
1 =a b +b +b +b +a b +b +b +b +
c
...
) (
)
( 2 15 134 119 4 119 94 113 147
3
2 =a b +b +b +b +a b +b +b +b +
c
#
We can implement this using three register to store input A, B, and output C. Implement equation (4.9) and cyclic shift these three register by one bit at each cycle. The product is generated bit by bit. The circuit diagram is given bellow:
Figure 4.1: Normal Basis Multiplier version 1
The combinational circuit of the input of c0 is concealed. Only the idea of connection is given. The latency of this multiplier is 163 cycles and c0 has a larger fain-in. We can modify the above multiplier by adding one term at one cycle[9]. For example:
) ( 0 13 132 117
1 1
2 c a b b b b
c = + + + +
) ( 119 94 113 147
4 2
3 c a b b b b
c = + + + +
#
The following is the multiplication cell for adding one term at each cycle:
Figure 4.2: Multiplier element of ck Modify the original multiplier we’ll get:
This is a conceptual diagram showing the difference of wiring. The fan-in of the output register is reduced. Another benefit of this multiplier is that we could set the register of C to a value say D at beginning. Then the final output will equal A*B+D equivalent to the effect of a MAC, multiplication-and-accumulator.
The solution of the second degree equation is given by equation (3.27). This can be easily implemented using a one bit register and an exclusive-or. Since the solution is given out serially, we can modify the above multiplier by adding each ai term of the product at each cycle. For example,
( )
(
111 145 117 92)
2 1
132 0 117 13 1
0
b b b b a c c
b b b b a c c
+ + + +
=
+ + + +
=
#
Use similar cells in Figure 4.2, the new normal basis multiplier is
Figure 4.4 serial input normal basis multiplier
Combine the solution circuit with the serial input normal results an efficient implementation
The input of point halving is in λ-representation. For the implementation of point halving, a normal basis multiplier is used. The second degree equation is solved by half-trace as given by equation (3.31). Trace t is given by exclusive-or every bit of t. Since only one multiplier is required, the over all latency is 163 cycles. The architecture of point halving is given bellow. Let 2P=(x3, λ3), the output is P=(x1, λ1)
Figure 4.5: Circuit for point halving The procedure of point halving is:
Figure 4.6: Point halving flow
The coefficient a of pseudo-random is always equal to one. One or the multiplication identity in normal basis is a number where every bit of it is 1. The right hand side of equation (3.3) equals:
3 3
3 1 x x
x
a+ = + =
That is, exclusive-or each bit of x3 with 1 is the same as inverting each bit.
In order to implement scalar multiplication efficiently, algorithm (3.39) is chosen. Since the point addition in projective coordinates requires no inversion, we let the accumulator Q of (3.39) in projective coordinates. The point addition Q+P or Q-P has Q in projective coordinates and P in λ-representation P=(X1,λ1). From (3.5) we modify formula (2.41) as:
,
My implementation of (2.41) contains three multipliers. Due to the data dependency, the data calculated at each multiplication is arranged as follow with minimum latency. The data dependency is indicated.
Table 4.1: The data flow of mix-coordinates addition (5.10)
As we can see from the above table, the timing of this mix-coordinates addition equals to 4 multiplications which is 4*163 cycles.
The following is the circuit diagram of the mix-coordinates addition. The multiplier in the diagram has three inputs where two are from multiplication and one for accumulation.
The neg signal is for adding –P to Q. The ini signal indicates the initial condition when O+P=P. That is, X3=X2, Y3=Y2 or X2+Y2, and Z3=1
Figure 4.7: Circuit for mix-coordinates addition
My proposed design is a scalar multiplication circuit based on algorithm (3.39). It is composed of the point halving circuit and the point adding circuit plus some control signals.
The inputs are k’ which is derived from k as shown in (3.33) and base point P. The output is kP. k' is first encoded into e as in (3.39). From (3.38), the implementation of the encoding logic uses two shift register to store g and h. The shift registers shift one bit every one point halving complete. We observe the msb of the g and h registers to decide whether the input to the point addition circuits is P or -P. Since there are separate registers for the accumulator Q and P, the halving circuit and adding circuits can process at the same time. This makes
different, the halving circuit must hold its output until the adding circuit reads the result. The point addition circuit adds P or –P to the accumulator when ei is 1 or -1. The control flow of the whole circuit is:
Figure 4.8: The control of point halving and projective addition
The synthesized result is given bellow. The cycle time is set to 5ns and the synthesis standard library is 0.18μm technology.
Table 4.2: The synthesized results
Circuits Gate Counts
Multiplier 6961
Halving 14321
Addition 45723
Scalar multiplication 77100
The average latency of scalar multiplication is about 37000 cycles and frequent 200Mhz. So the throughput is 2*163*200Mhz/37000=1.76Mbit/s
The verification is given by an integrated FPGA system called iProve. This system allows displaying the outputs from FPGA on ModelSim directly. The FPGA chip is Xilinx Virtex2: XC2V8000. The synthesis frequency is set to 90Mhz and the total LUTs is 8815.
The table bellow lists a comparison of the Elliptic Curve Cryptosystems implementation.
We can see that our design has about the same throughput as [12] while the area is smaller.
Table 4.3: The performance comparison of Elliptic Curve Cryptosystems implementations on ASIC
Authors Huang [10] Okada [11] Bai [12] Daneshbeh [13] Sozzani [14] Proposed
Technology 0.35μm 0.25μm 0.18μm 0.18μm 0.13μm 0.18μm
Field GF(2251) GF(2163) GF(2233) GF(2163) GF(2163) GF(2163)
Gate counts 56K 165K 120K 74K ? 77K
Clock rate 100Mhz 66Mhz 100Mhz 700Mhz 400Mhz 200Mhz
Latency for kP (cycles)
? ? ? 212,552 11,320 37,000
Processor Y Y N Y Y N
Algorithm for kP
Montgomery
(affine) ? Montgomer y
Double -and -Add (serial)
Montgomery (parallel)
Halve -and- Add
Basis Poly Poly Poly Poly Poly Normal
Throughput 91Kb/s 501Kb/s 1.86Mb/s 1.1Mb/s 12Mb/s 1.76Mb/s
Table 4.4: The performance comparison of Elliptic Curve Cryptosystems implementations on FPGA
Authors Orlando &
Paar[15] Gura[16] Lutz[17] Proposed
Platform Xilinx
XCV400E
Xilinx XCV2000E
Xilinx XCV2000E
Xilinx XC2V8000
Technology 0.18μm 0.18μm 0.18μm 0.15/0.12μm
Field 2167 2163 2163 2163
LUTs 3002 19508 10017 8815
FFs 1769 6442 1930 N/A
Processor Y Y Y N
Clock rate 76Mhz 66Mhz 66Mhz 90Mhz
Algorithm for kP Montgomery Montgomery τ-NAF Halve-and-Add
C HAPTER 5
Conclusion
In this paper, an implementation of Elliptic Curve Cryptosystems is shown. The architecture uses point halving to reduce the computation complexity. Point halving only requires one multiplier and some addition circuits. We can replace double-and-add algorithm by halve-and-add algorithms.
The normal basis multiplier in the implementation is a serial multiplier. The projective addition circuit contains three multiplier and the timing equals to 4 times the timing of a multiplier and no inversion over finite field is required. The input is encoded as for the use of halve-and-add. We can further reduce the Hamming weight of the input, using add-and-subtract algorithm. The halving circuit and projective addition circuit can work in parallel under certain condition when the data have no dependency.
The implementation is synthesized using synthesis library of 0.18μm technology. We use Xilinx Virtex2 (XC2V8000) to verify the implementation.
A PPENDIX
Elliptic Curve Cryptosystems
In elliptic curve cryptosystems, we need to map a message onto a point on an elliptic curve.
Then elliptic curve cryptosystems operate on that point to yield a new point that serves as the ciphertext. The idea of the mapping method is the following. Let equation (2.15) be the elliptic curve. The message m will be assign as the x-coordinates of a point first. However, there is only 1/2 chance that there exist a solution y such that
)
3 (mod
2 x am b p
y ≡ + + (a.1)
Therefore, we append a few bits at the end of m, and try every pattern of these bits until there is a solution for equation (a.1). Namely, let K be a large integer so that when trying to map a message as a point on elliptic curve the failure rate of 1/2K is low. Suppose that
(m+1)K<p (a.2)
Represent the message m as
x=mK+j, where 0≤ j<K (a.3)
For j=0, 1, …, K-1, try to a solution y from (a.1). If a solution y exists, then message m is mapped to Pm=(x, y) and we can stop trying. Otherwise, increase j by one and use this new x to find a solution again. If we can’t found any solution for j=0 to K-1, then we failed to map
⎣
x K⎦
m= / (a.4)
For example, let message m=5, p=179 and elliptic curve be y2 =x3+2x+7. Pick K=10, so the failure rate is 1/210, which is acceptable. x=mK+j=50+j, x=50, 51, …, 59. For x=51 we get x3+2x+7=121(mod 179), thus y=11. The message m is mapped to point (51, 11) and can be recover by m=
⎣
51/10⎦
=5.For elliptic curve over GF(2n) of the form (2.16). The steps of representing message m are the same. Let message m has t-bit, we append u-bit number j to the end of m and t+u≦n.
The message m will be represented as x=m2u+j. For j=0, 1, …, 2u-1, try to find a solution y from (2.16). If a solution is found we take Pm=(x, y), else increase j and try again. Solving y from (2.16) given x is explained in [8].
Elliptic Curve Cryptosystems rely on the difficulty of solving the discrete logarithm problem for elliptic curves, which is described as follow. Suppose P, Q are two points on elliptic curve, find k such that Q=kP[7].
a.1 Elliptic Curve ElGamal Cryptosystem
An Elliptic Curve ElGamal Cryptosystem, a public key system, is one popular application of elliptic curve cryptography. One uses public key to encrypt plaintext and use private key to decrypt ciphertext. Let’s take a look at this cryptosystem. Alice wants to send a message to Bob, so Bob chooses an elliptic curve (2.15), where p is a large prime. He also chooses a point P and a scalar k, which is the private key. He computes
kP
Q= (a.5)
The point Q and P are public keys of Bob. Alice represents her message as a point x on elliptic curve (2.15). She also chooses a private integer a, and computes. The add and subtracts here are point operations.
aP
y1 = and y2 =x+aQ (a.6)
She sends y1 and y2 to Bob. Bob can decrypt x by calculating
x kaP akP x kaP aQ
x ky
y2 − 1 =( + )− = + − = (a.7)
Next is a example of Elliptic Curve ElGamal Cryptosystem. Let the point P=(4,11) and elliptic curve y2 ≡ x3 +3x+45(mod8831). The message of Alice is represented as point Pm=(5, 1743). She wants to send the message to Bob.
Bob has a private key k=3 and computes Q=kP=(413, 1808). Q is made public. Alice takes Bob’s public key Q. She chooses a random number a=8. She computes y1=aP=(5415, 6321) and y2=Pm+aQ=(6626,3576) and sends (y1, y2) to Bob. Bob wants to decrypt (y1, y2).
Bob first calculates ky1=3(5415, 6321)=(673, 146) and subtracts this from y2
(6626, 3576)-(673,146)=(6626, 3576)+(673,-146)=(5,1743)
a.2 Elliptic Curve Diffie-Hellman Key Exchange
Another useful system is the Elliptic Curve Diffie-Hellman Key Exchange, which can be used for key exchange for private key system. Alice and Bob want to exchange a key. They
+ +
≡
aP=(1794,6375) and bP=(3861, 1242)
Alice take bP and multiply by a to get the key
a(bP)=12(3861, 1242) =(1472,2098)
In the same way, Bob takes aP and compute b(aP)
a(bP)=12(3861, 1242) =(1472,2098)
Now they have the same key.
a.3 Elliptic Curve Digital Signature Algorithm
Signature is the opposite of public key system. One use the private to sign and others use the public key to verify the signature. Next is the Elliptic Curve Digital Signature Algorithm:
Let p be a prime and let elliptic curve E defined over GF(p).
A is a point on E having prime order q and define:
K=(p, q, E, P, m, Q), where Q=mP p, q, E, P and Q are public key and m is the private key K=(p, q, E, P, m, Q) and k is a random number, define
sigK(x, k)=(r, s), where
kP=(u, v)
(a.8)
s=k-1(SHA-1(x)+mr)mod q Verification is given bellow:
w=s-1modq i=wSHA-1(x) mod q
j=wrmod q (u,v)=iP+jQ verK(x, (r,s)) is true if and only if
u mod q= r
Let E: )y2 ≡ x3 +x+6(mod11 and p=11, q=13, P=(2,7), m=7 and Q=(7,2). Suppose message x and SHA-1(x)=4, Alice sign the message with random value k=3. She computes:
(u, v)=3(2, 7)=(8, 3) r=u mod 13=8, and s=3-1(4+7*8)mod 13=7 (8, 7) is the signature.
Bob verifies the signature by
w=7-1 mod 13=2 i=2*4mod 13=8
u mod 13=8=r.
Then the signature is verified.
BIBLIOGRAPHY
[1] “Certicom ECC FAQ”, http://www.certicom.com/index.php?action=ecc,ecc_faq
[2] IEEE Std 1363-2000, IEEE standard specifications for public-key cryptography, IEEE Computer Society, August 29, 2000.
[3] Douglas R. Stinson, Cryptography: Theory and Practice - Second edition, Chapman &
Hall/CRC , 2002
[4] J. Lopez and R. Dahab, “Improved algorithms for elliptic curve arithmetic in GF(2n)", Selected Areas in Cryptography - SAC '98, LNCS 1556, 1999, 201-212.
[5] K. Fong, D. Hankerson, J. Lopez, and A. Menezes. “Field Inversion and Point Halving Revisited". IEEE Transactions on Computers, 53(8):1047-1059, August 2004.
[6] E. Knudsen, “Elliptic scalar multiplication using point halving", Advances in Cryptology - Asiacrypt '99, LNCS 1716, 1999, 135-149.
[7] W. Trappe and L.C. Washington: Introduction to Cryptography with Coding Theory, Prentice Hall, 2001.
[8] A. X9.62. Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), 1998.
[9] Philip H. W. Leong and Ivan K. H. Leung. “A microcoded elliptic curve processor using FPGA technology”. IEEE Transactions on VLSI Systems, 10(5), October 2002.
[11] Souichi Okada, Naoya Torii, Kouichi Itoh, and Masahiko Takenaka. “Implementation of elliptic curve cryptographic coprocessor over GF(2m) on an FPGA.” In Cryptographic Hardware and Embedded Systems (CHES), pages 25–40. Springer-Verlag, 2000.
[12] Guoqiang Bai, Zhun Huang, Hang Yuan, Hongyi Chen, Ming Liu, Gang Chen, Tao Zhou, and Zhihua Chen. “A high performance VLSI chip of the elliptic curve cryptosystems,”
7th Int. Conf. SICT, pp. 2059-2062, Oct. 2004
[13] A. Daneshbeh, M. Hasan, “Area Efficient High Speed Elliptic Curve Cryptoprocessors for Random Curves,” Proceedings of ITCC 04, Las Vegas, NE, USA, 2004
[14] F. Sozzani, G. Bertoni, S. Turcato, L. Breveglieri, “A parallelized Design for an Elliptic Curve Cryptosystem Coprocessor” Proceedings of ITCC 05, 2005.
[15] G. Orlando and C. Paar. “A high-performance reconfigurable elliptic curve processor for GF(2m).” In Cryptographic Hardware and Embedded Systems (CHES), 2000.
[16] N. Gura, S. C. Shantz, H. Eberle, S. Gupta, V. Gupta, D. Finchelstein, E. Goupy, and D.
Stebila. “And end-to-end systems approach to elliptic curve cryptography.” In Cryptographic Hardware and Embedded Systems (CHES), 2002.
[17] J. Lutz, A. Hasan., “High Performance FPGA based Elliptic Curve Cryptographic Co-Processor”. Proceedings of ITCC 04, Las Vegas, NE, USA, 2004