Galois Field Arithmetic Unit - Proposed Architectures

Proposed Architectures

4.1 Galois Field Arithmetic Unit

In this section, the radix-2 Galois field arithmetic unit (R2-GFAU) and radix-4 Galois field arithmetic unit (R4-GFAU) are proposed based on the proposed R2-UD/M and R4-UD/M, respectively. These two architectures support all finite field operations, such as MA, MS, MMM, MM, MD, and MMD over dual fields. To increase the operating frequency and reduce the hardware cost, many techniques had been presented. Figures 4.1 and 4.2 are the architectures of R2-GFAU and R4-GFAU. Since the architectures are very similar, only the details of R4-GFAU are illustrated in the following.

In Figure 4.2, the R4-GFAU is controlled by inputs to accomplish the dual-field mod-ular operations. In R4-GFAU, the U V data-path is used to execute the U V operations, and the R, S data-path are used to finish the R, S operations. The following shows an example about the data flow of R4-GFAU. Initially, we set the operation is MMD over GF(p). During the operations, the U V data-path cell compares the two operands (U^′,V^′)= (^U₂,V ) when the operating step is 11. Suppose the decision results are ^U₂ > V and i < m, and then the (R^′,S^′,P^′,P^′′) is set to (2 · R,−S,+p,−p) in R data-path and (S^′′,P^′′′) is (4 · S,−p) in S data-path to compute the next R, S values. The result of R

is selected from 2R − S + p, 2R − S, 2R − S − p, and 2R − S − 2p in R data-path by deciding whose range is within [0, p − 1]. And the result of S is selected from 4S, 4S − p, 4S − 2p, and 4S − 3p in S data-path.

Data Pre- operationData Pre- operation

FS1=S +P

Data Post- operationData Post- operationData Post- operation P

Figure 4.1: Architecture of R2-GFAU.

4.1.1 Data-path Separation

As the critical path in the proposed R4-UD is from U V path to R, S data-path, a data-path separation method is presented to separate it. The control signal from the U V data-path is stored and sent to RS data-path in the next cycle. Although this approach increases one cycle, the critical path can be reduced from two adders to one adder without considering the data pre-/post-operation. Figure 4.3 shows the detailed flow of the proposed method. Firstly, the U V path is executed. Then, the RS data-path is executed in the next cycle. We can clearly see the data-path is separated and the cycle count is increased by 1.

Data Pre- operation FUV1=U -V F_UV2=V -U

Figure 4.2: Architecture of R4-GFAU.

4.1.2 Hardware Sharing

Since both carry-propagation adder and XOR gate are the kernel arithmetic units of every modular operation, we can reuse these addition units to reduce the cost. The detailed hardware sharing method is shown in tables 4.1 and 4.2. The MMD and MM operations require the most adder units in U V , R, and S data-path. And the MA, MS, and MMM operations require only R data-path.

Besides, the division operation requires 21 different operations in R, S data-path. To reduce the hardware complexity, we propose a swap logic circuit. In algorithm 3.3, the operations of value R, S have some common arithmetic operations, such as R = R − 2 · S (mod p) and S = S − 2 · R (mod p) in step 15 and 22. We exploit a swap logic circuit to decide the R, S values are swapped or not in the beginning. The swap operation is decided by the previous and current value of swap signal, SWp and SWc. Note that when the operating step is 3, 4, 6, 8, 10, or 12 in algorithm 3.3, the swap signal is set to 1. Otherwise, the signal value is set to 0. The two operands R, S are swapped when the previous and current swap signals have different values. All the operations of this

c=d and U>V

Figure 4.3: Data-path separation method.

algorithm are paired, such as operating steps 4 and 5, 6 and 7, and 8 and 9. By swap logic circuit, the similar operations can be shared and then the number of operations are reduced to 11 types.

In addition, the proposed R4-UD has some common controlled signals between dual fields (e.g., j < m, c = 0, d = 0.), so we can share them to reduce the complexity of controller.

Table 4.1: Details of hardware sharing method in R2-GFAU.

Field Operation FU V 1 FU V 2 FU V 3 FR1 FR2 FR3 FS1 FS2

Table 4.2: Details of hardware sharing method in R4-GFAU.

Field Operation FU V 1 FU V 2 FU V 3 FR1 FR2 FR3 FR4 FR5 FS1 FS2 FS3 FS4

GF(p)

MA/MS X X

MMM X X

MM X X X X X X X

MMD X X X X X X X X

MD X X X X X

GF(2^m)

MA X

MMM X

MM X X

MMD X X X

MD X X

4.1.3 Degree Checker

Intuitively, the degree-check operations in GF (2^m), such as 2 · S (mod p) and 4 · S (mod p), are implemented by using huge multiplexers shown in Figure 4.4(a), but this method results in a long critical path. The critical path is log2nAND+log2nOR.

Figure 4.4(b) shows the proposed degree checker, which requires only n 2-to-1 AND gates and 1 n-to-1 OR gate to finish the degree checking operation. The critical path lies in AND+log2nOR. With this approach, it can compare the degree of the input value Din with the field length. Note that the m-th bit of field length register is set to 1 and others are set to zero. If the input degree is smaller than field length, the output Dout is 0. Otherwise, the output is 1. This approach can also be used in the counter operation j < m by setting Din = i and i = {in−2, i_n−3, ..., i0, 1^′b1} every cycle, where i = {in−1= 0, ..., ij = 0, i_j−1 = 1, ..., i0 = 1}. When D^out = 1, variable j is equal to m.

4.1.4 Ladder Selection

In R4-GFAU, the selection in the data post-operation of RS data-path are more com-plex than R2-GFAU. Intuitively, the post-operation architecture is used a lot of multicom-plex- multiplex-ers to implement, which is shown in Figure 4.5. In each state, the data should be selected by multiplexer, so the number of multiplexers is equal to the total number of states. To reduce the selection complexity, we propose a ladder selection architecture shown in

Fig-...

Figure 4.4: (a) The degree checking architecture by intuitive implementation. (b) Archi-tecture of degree checker.

ure 4.6. The output value of RS data-path is decided by a fixed order. For example, if the operating operation is S = 4S (mod p), the operands {S^′′, P^′′′} = {4S, −p}. With the order, which is from FS3 < 0 to FS2< 0, the correct value is decided. Consequently, the output value is within the range [0, p − 1]. The data selection hardware cost in the post-data operation block is reduced by this approach.

FR3=R +S +P +P

Figure 4.5: The data post-operation by intuitive implementation.

4.2 Dual-Field Elliptic Curve Cryptography

在文檔中抵抗能量攻擊法的雙域橢圓曲線密碼運算單元之設計與實現 (頁 48-53)