Dual-Field Elliptic Curve Cryptography Proces- Proces-sorProces-sor

Proposed Architectures

4.2 Dual-Field Elliptic Curve Cryptography Proces- Proces-sorProces-sor

Figure 4.4: (a) The degree checking architecture by intuitive implementation. (b) Archi-tecture of degree checker.

ure 4.6. The output value of RS data-path is decided by a fixed order. For example, if the operating operation is S = 4S (mod p), the operands {S^′′, P^′′′} = {4S, −p}. With the order, which is from FS3 < 0 to FS2< 0, the correct value is decided. Consequently, the output value is within the range [0, p − 1]. The data selection hardware cost in the post-data operation block is reduced by this approach.

FR3=R +S +P +P

Figure 4.5: The data post-operation by intuitive implementation.

4.2 Dual-Field Elliptic Curve Cryptography Proces-sor

Figure 4.7 shows the overall block diagram of our proposed DECP with a standard ad-vanced microcontroller bus architecture (AMBA) high-performance bus (AHB) interface.

FS2=S -2·p

Figure 4.6: Architecture of ladder selection.

The ECSM with modular operations over dual fields, required for the ECC schemes such as signature, authentication, and key exchange defined in IEEE 1363 [1], can be calculated through the Galois field arithmetic unit (GFAU). The inputs are the user public/private-key, EC coordinates, EC parameters and protocol instructions. To perform these contents in real-time, the instruction decoding and pre-/post-processing satges are combined in our processor. After the instruction decoding, the pre-processing stage is to convert the EC co-ordinates and parameters into the Montgomery domain. Before returning the calculation results, the EC coordinates are converted back to the integer domain at post-processing stage. All the operands are stored in register file and transmitted to GFAU controlled by EC controller. Furthermore, to reduce the host CPU loading, the pre-/post-process stage can be achieved by the MMD and MMM operations. To convert an input value X between integer domain and Montgomery domain, it can be simply achieved through MMD(X, 1) ≡ X · 2^m (mod p) and MMM(X · 2^m, 1) ≡ X (mod p).

In Figure 4.8, the ECSM is based on the double-and-add/sub always method to achieve. To save one register, the point would be inversed and recovered, when the ECSUB is executed.

The performance analysis of R2-DECP and R4-DECP are shown in Tables 4.3 and 4.4. The execution cycle of R4-DECP is about two times better than the cycle of R2-DECP. The R4-DECP offers higher throughout with some area overhead. The overhead is reduced much by our proposed techniques, such as swap logic and ladder selection.

Table 4.3: Performance Analysis of R2-DECP.

Operations Execution Cycles

GF(p) GF(2^m)

ECDBL 1 · D + 3 · M + 4 · A/S 1 · D + 2 · M + 8 · A

= 4m + 4 ∼ 5m + 4 = 3m + 8 ∼ 4m + 8 ECADD 1 · D + 2 · M + 6 · S 1 · D + 2 · M + 9 · A

= 3m + 6 ∼ 4m + 6 3m + 9 – 4m + 9

Domain 3·D + 2·M 3· D + 2·M

Tran. (DT) = 5m ∼ 8m = 5m ∼ 8m

ECSM m· ECDBL +^m·ECADD

3 +DT m· ECDBL +^m·ECADD

3 +DT

= 5m²+ 11m ∼ 6.33m²+ 14m = 4m²+ 14m ∼ 5.33m²+ 17m

Operation Critical Path Complexity

GF(p) GF(2^m)

ECSM m log2m

4.3 Dual-Field Elliptic Curve Cryptography Proces-sor with Power Analysis Countermeasures

To avoid the power analysis on our operating secret key, we randomize our operating domain. Generally the operating data have a factor 2^m, that means the data is operated in the Montgomery domain. In addition, if the factor is 2⁰, that means operating domain is the integer domain. We define data has a factor 2^λ, that represents the data operated in random domain, where 0 ≤ λ ≤ m. By the masking method, the domain value is changed in each ECSM operation. However, the total random numbers are just m + 1, which is too small, so we exploit the proposed URD and URM algorithms to increase it.

Before operating ECSM, we choose a random number r to decide the random domain, 2^λ. Note that the number of ones in r is equal to the λ.

However, the random domain method can only randomize the first m cycle in division operation, but the next m cycle should be protected by another method. Since the S data-path is not used in the next m cycle, we set the input S^′′ to a random number to randomize the power consumption. Moreover, the total random number of MA and MS is still equal to m+1. Because the two operations are only accomplished by the R data-path,

Table 4.4: Performance Analysis of R4-DECP.

Operations Execution Cycles

GF(p) GF(2^m)

ECDBL 1 · D + 3 · M + 4 · A/S 1 · D + 2 · M + 8 · A

= 2.06m + 4 ∼ 2.62m + 4 = 1.56m + 8 ∼ 2.12m + 8 ECADD 1 · D + 2 · M + 6 · S 1 · D + 2 · M + 9 · A

= 1.56m + 6 ∼ 2.12m + 6 1.56m + 9 – 2.12m + 9

Domain 3·D + 2·M 3· D + 2·M

Tran. (DT) = 2.68m ∼ 4.36m = 2.68m ∼ 4.36m

ECSM m· ECDBL +^m·ECADD

3 +DT m· ECDBL +^m·ECADD

3 +DT

= 2.58m²+ 8.68m ∼ 3.32m²+ 10.36m = 2.08m²+ 13.68m ∼ 2.82m²+ 15.36m

we also set the input of the S data-path to a random number to randomize the power consumption. By this approach, the secret information can be masked. In addition, we use the double and add/sub always method to resist SPA attack. The detail operating follow is shown in Figure 4.10.

By applying the above idea, the R2-DECP with power analysis countermeasures (R2-DECPAC) is proposed and shown in Figure 4.9. The R2-DECPAC is based on a R2-GFAU with power analysis countermeasure (R2-GFAUPAC) to support the random domain op-erations. We do three step modification from DECP. First, the architecture of R2-GFAUPAC is based on the proposed URD and URM to implement. Second, we include a |r|-bit chaos-based pseudo number generator [61] into our R2-DECPAC to generate a m-bit random number in each ECSM operation. This approach can prevent the PA attack on the input of random numbers. Finally, the ECSM stage in EC controller is based on double-and-add/sub always method. The performance analysis is shown in Table 4.5.

Compared with our previous work [33] shown in Table 4.6, our approach requires lower area overhead and includes a pseudo random number generator. In addition, this SPA countermeasure has 50% execution cycle increase, but does not have any area degreation.

AMBA AHB

Figure 4.7: Architecture of DECP.

Table 4.5: Performance Analysis of R2-DECPAC.

Operations Execution Cycles

GF(p) GF(2^m)

ECSM m· ECDBL +m · ECADD+DT m· ECDBL +m · ECADD+DT

= 7m²+ 15m ∼ 9m²+ 18m = 6m²+ 22m ∼ 8m²+ 25m

Table 4.6: Comparison with our previous work.

R2-DECPAC ESSCIRC’10 [33]

SPA countermeasure Double-and-add/sub- Double-and-add/sub-always method always method DPA countermeasure Random-domain method Random scalar method

Additional

— (n + |r|)-bit adder

arithmetic unit (n + |r|)-bit multiplier

Random number |r|-bit chaos-based generator pseudo number generator — Execution cycle increase

Data /Command

Figure 4.8: The flow chart of DECP.

AMBA AHB

Figure 4.9: Architecture of R2-DECPAC.

Data/Command Input

Random Number Generation

Scalar Scan

counter_k≤m N

Subtraction/Addition

Point Addition/Subtraction

Fake Point Addition/Subtraction

Data Output Domain

Transformation

Point Doubling Y

Domain Transformation

Figure 4.10: Flow Chart of R2-DECPAC.

Chapter 5

在文檔中抵抗能量攻擊法的雙域橢圓曲線密碼運算單元之設計與實現 (頁 53-60)