Proposed Architectures - 抵抗簡單能量攻擊法的橢圓曲線運算單元之設計與實現

In this chapter, a bottom-up illustration of the proposed architectures is presented.

The architecture of universal dual-field Galois field arithmetic unit is illustrated in section 5.1. The architecture of universal dual-field elliptic curve scalar multiplier is illustrated in section 5.2. And the architecture of universal dual-field elliptic curve arithmetic unit is illustrated in section 5.3. These designs are suitable for any field length which is shorter than the given ones. Both prime field, GF (p), and binary extension field, GF (2^m), applications are included. All of the background knowledge and mathematical theorems are mentioned in earlier chapters.

In this thesis, all of hardware implementation are coded in Verilog HDL (hardware de-scription language) and synthesized on both application-specific integrated circuit (ASIC) and field-programmable gate arrays (FPGAs). The designs are implemented with UMC¹ 0.18-µm CMOS process and the Synopsys² Design Compiler, and the FPGA platform is Xilinx³ Virtex-4 XC4VLX160.

5.1 Galois Field Arithmetic Unit

In an elliptic curve cryptosystem, four operations : modular addition, modular sub-traction, Montgomery multiplication, and Montgomery division, are used. Therefore an area-efficient universal Galois field arithmetic unit (GFAU) is proposed to meet this

re-1United Microelectronics Corporation. The SoC solution foundry. http://www.umc.com

2Synopsys, Inc. The developer of EDA tools. http://www.synopsys.com

3Xilinx, Inc. The developer and fabless manufacturer of FPGAs. http://www.xilinx.com

quirement. The ”universal” used here indicates that the field length is designed as an input, length smaller than 512 is permitted. Among these operations, the Montgomery division is the most complicated, and consumes most iterations. Thus, how to integrate the other operations into the hardware of Montgomery division algorithm is the most important topic in this section. Back to the Algorithm 3.9, the Montgomery division flow is showed below:

Figure 5.1: Flow chart of the Montgomery division algorithm.

The Montgomery division algorithm can be separated into three parts:

1. EEA: Bit-wise reduce U and V until V = 0.

2. RECOVER: Divide R by 2 until k = m.

3. NEGATE : Negate R to get the final result.

These three main parts are main states in the finite state machine of the the Montgomery divider. In part EEA, one subtracter is used to handle (U − V )/2 and (V − U)/2. The most significant bit (MSB) of U − V determines if the result of U − V should be negated.

R + S, 2R, and 2S can be combined with the conditional subtraction of R and S in step 2.5 and step 2.6 by one carry-save adder (CSA), three adder, and two multiplexer.

Which is controlled by the MSB of 2R − p or 2S − p and R + S − p respectively. With these elements, each iteration of part EEA can be accomplished by one cycle. In part RECOVER, R = (R + p)/2 simply reuse one adder. And in part NEGATE, R = p − R reuses the only one subtracter. Note that in 2’s complement number system, −p can be derived by adding 1’s complement of p with 1, that is:

−p = 2^m− p = (2^m− 1 − p) + 1 = ¯p + 1 (5.1) Since p is odd, ¯p is always even. Adding ¯p by 1 is simply turning the MSB of ¯p from 0 to 1.

Therefore negating p only requires bit-wise inversing p except the MSB. An incrementer is spared here. From above analysis, a 514-bit Montgomery divider totally takes one 514-bit CSA, four 514-bit CPA (including one 514-bit subtracter), one 514-bit negater, one 10-bit incrementer, and one 10-bit decrementer.

In Montgomery multiplication, looking back to algorithm 3.3, involves two main parts:

1. MM : Adding partial products and modular right shift.

2. RECOVER: Bound C in GF (p).

Part MM executes step 3 in algorithm 3.3 and algorithm 3.4. Step 3 in algorithm 3.3 is implemented by a CSA and a carry propagation adder (CPA). But step 3 in algorithm 3.4 only requires the CSA. Part RECOVER take charge of step 4 in algorithm 3.3 with only changing the input of the CSA after part MM. Thus for the dual-field design, the hardware implementation of the Montgomery multiplication only contains one 514-bit

CSA and one 514-bit CPA. Besides, modular addition and modular subtraction simply utilize the existing elements of the Montgomery divider. The graphical illustration of the flow of the Montgomery multiplication is showed below: Merge the flow of the Montgomery

INITIAL

U = p, V = B, R = 0, S = A, k = 0

yes k < m

yes no

OUTPUT R = A*B*2^-m(mod p) R = (R + skV+((skvk)+rk)p)/2

k = k + 1

R ≥ p

R = R - p no

Figure 5.2: Flow chart of the Montgomery multiplication algorithm.

multiplication with the flow of the Montgomery division, part MM in the Montgomery multiplication and part EEA in the Montgomery division can be replaced by a new part named as EEA MM. Part RECOVER and part NEGATE are retained in the GFAU. As a result, the GFAU consists of three main parts:

1. EEA MM : Operation of EEA for the Montgomery division, MM for the Montgomery multiplication, modular addition, and modular subtraction.

2. RECOVER: Divide R by 2 until k = m in the Montgomery division and bound C in GF (p) in the Montgomery multiplication.

3. NEGATE : Negate R to get the final result in the Montgomery division and bound C in GF (p) in the Montgomery multiplication.

IDLE

Figure 5.3: Finite states transfer chart of the GFAU.

The finite states transfer chart is described in Figure 5.3. In state IDLE, if IN A or IN B is zero, state machine directly transfers to state OUTVALUE. In state EEA MM, if addition (MODE 0) and subtraction (MODE 1) are demanded, state machine transfers to state OUTVALUE and output the result. While multiplication or division is demanded, state machine transfers to state RECOVER when register V equals to zero or counter K equals to LENGTH respectively. Division (MODE 3) is the only one operation that requires state NEGATE. Therefore, if MODE is 2 which means multiplication, the state machine should directly transfer from RECOVER to NEGATE with doing anything.

State NEGATE transfers to state OUTVALUE when K equals to LENGTH given from

input which denotes termination of the division.

514 514 514 514 514 514 514 514

Figure 5.4: Architecture of the GFAU.

The complete architecture of the GFAU is showed above in Figure 5.4. The control signal is generated by the finite state machine in Figure 5.3. All inputs are stored through some combinational logic controlled by the finite state machine in four main registers:

U(514-bit), V(514-bit), S(514-bit), K(10-bit) and two 1-bit register. Values are pulled out to one level of combinational logic , then temporary wires named as R1, R2, R3, R4, R5, 2 × R, and 2 × S are produced. These values get through the datapath and another level of combinational logic and update the value of the registers at each rising edge of the clock signal. Recall the hardware requirement of the Montgomery division mentioned before, the chief advantage of the GFAU is revealed: with almost the same hardware requirement, just changing the control signal, the GFAU can do the four fundamental operations of arithmetic over the Galois field.

在文檔中抵抗簡單能量攻擊法的橢圓曲線運算單元之設計與實現 (頁 63-68)