• 沒有找到結果。

In this chapter, we show the implementation results of our proposed GFAU, DECP, GFAUPAC, and DECPAC. The comparison tables show our designs outperform relative works.

5.1 Galois Field Arithmetic Unit

Tables 5.1 and 5.2 show the implementation result of the proposed R2-GFAU and R4-GFAU. The proposed R4-GFAU requires about half operation cycles of the R2-GFAU, but results in two times hardware cost. The AT product, gates × execution time, of R2-GFAU is 1.3 and 1.4 times better than R4-GFAU. This product ratio would decrease, since the GFAU is part of th DECP. Without including the GFAU, the architecture of R2-DECP and R4-R2-DECP are similar. Besides, due to the proposed R2-UD and R4-UD, our proposed designs reduce about 2∼3 times operation cycle compared with Chen’s work [36]

and Kaihara and Takagi’s work [35] based on T-UMD. Compared with Tseng’s work [32]

and Liu’s works [31] based on L-UD, our GFAU is 1∼3 times better in execution cycles.

By the proposed data-path separation technique, the operating frequency of our GFAU is better than them. In [30], this work is based on word-based architecture and the K-UI algorithm, but results in larger execution time. Besides, because our design supports more modular operations, the hardware cost is larger than previous works. But the execution time of our designs are faster due to the fast UD algorithm. The AT product of our GFAU is 2.3∼91.5 better than previous works.

Table 5.1: Comparisons among 256-bit finite field designs over GF(p).

256 MMM/MM 257 0.64µ@400.0

MA/MS 2 5.0n@400.0

100.5K,

MMD/MD 191 0.58µ@327.8 1.3 R4-GFAU,1 90nm

42Kcells

256 MMM/MM 129 0.39µ@327.8

MA/MS 2 7.1n@327.8

TC’07 [36]1 0.35µm 33Kcells 256 MD 624 1.76µ@354.6 2.3

TC’05 [35]1 0.35µm 27Kcells 256 MD 517 4.53µ@114.2 4.8

MMM 175 1.53µ@114.2

Dual-field design. 1 Synthesis result. a Supporting MMM, MA, and MS.

5.2 Dual-Field Elliptic Curve Cryptography Proces-sor

The implementation results of R2-DECP and R4-DECP are shown in Tables 5.3 and 5.4. The results are verified by the NIST recommended ECs [2, 59]. The AT product of these two designs are almost the same, and the proposed R4-DECP can achieve higher throughput.

In Tables 5.5 and 5.6, compared with our previous work [33], our R2-DECP is 1.4 and 1.6 better due to the proposed R2-UD and degree checker. Based on R2-UD, our design reduces 27% execution cycle compared with [33]. In addition, the operating frequency increases 10% in binary field operation, because of the proposed degree checker. Due to

Table 5.2: Implementation results of proposed 256-bit GFAU over GF(2m).

256 MMM/MM 257 0.46µ@555.6

MA/MS 2 3.6n@555.6

100.5K,

MMD/MD 216 0.56µ@384.6 1.4 R4-GFAU,1 90nm

42Kcells

256 MMM/MM 129 0.33µ@384.6

MA/MS 2 5.2n@384.6

the proposed hardware sharing methods, our design is smaller than [33].

Our proposed 160-bit and 256-bit R4-DECP are implemented in UMC 90nm CMOS technology. Figure 5.1 shows the physical view of the DECP, which has core area of 0.29mm2 and 0.45mm2, and the post-layout simulation results are shown in Tables 5.7, 5.8, 5.9, and 5.10.

The comparison with previous works is given in Tables 5.7, 5.8, 5.9, 5.10, and 5.11.

Our design supports all EC functions including point addition, point doubling, point scalar multiplication, domain transformation, and finite-field operations. In [13], Chen adopts T-UMD and systolic array to accomplish ECSM, but is three times slower than us in execution cycle. Furthermore, our design achieves competitive execution cycles with Satoh and Takano’s work [23] and Lai and Huang’s work [24] using 1 64-bit and 4 32-bit multiplier. Both [24] and [15] exploit parallel architecture technique to reduce the execution cycle but substantially increase the hardware cost. Consequently, the area of our DECP is about 2 times smaller than theirs. In [9], the work uses systolic array to achieve the highest operating frequency but is about 3 times slower than our design in execution cycle. Compared with the 160-bit and 256-bit designs in [24], our DECP is about 4 and 2 times better in AT product. From the table, our DECP outperforms other EC processor designs in terms of functionality, hardware efficiency, execution time, and power consumption.

GFAU

Figure 5.1: (a) Layout of 160-bit R4-DECP chip. (b) Layout of 256-bit R4-DECP chip.

Table 5.3: Implementation results of 256-bit R2/4-DECP over GF(p).

R2-DECP,1 90nm 82.0 256 347,266 0.86@400.0 1 R4-DECP,1 90nm 134.3 256 193,386 0.51@333.3 1.0

5.3 Dual-Field Elliptic Curve Cryptography Proces-sor with Power Analysis Countermeasures

Table 5.12 shows the implementation results of R2-GFAUPAC. The AT product be-tween R2-GFAU and R2-GFAUPAC are similar, since the algorithms and number of arithmetic units are almost the same. To compare with our previous work, we imple-ment the proposed DECPAC with maximum field size 521 bit. The 521-bit R2-DECPAC adopts a 32-bit chaos-based pseudo number generator which passes the random tests [62]

shown in Figure 5.2. The implementation results of R2-DECPAC is shown in Table 5.13.

Compared with R2-DECP, the R2-DECPAC requires 1.55 and 1.65 times execution cycles over dual fields, respectively, due to the double-and-add/sub always method. Moreover,

Figure 5.2: Random test on a 32-bit pseudo number generator.

Table 5.4: Implementation results of 256-bit R2/4-DECP over GF(2m).

Tech. Gates(K) Key

Cycles

Time(ms)@

AT

Size fmax(MHz)

R2-DECP,1 90nm 82.0 256 298,210 0.54@555.6 1 R4-DECP,1 90nm 134.3 256 165,354 0.44@377.3 1.3

the area degradation is just 8.4%, and the AT product is 1.7 and 1.8 times worse than R2-DECP. In Tables 5.14 and 5.15, compared with [33] based on L-UD, our approach is 1.3 times better in execution cycles due to the proposed R2-UD. In addition, [32] used sclar spliting to resist SPA attcak, but is 1.8 times slower than ours in execution cycles. The implemetation results show our approach is advantageous in system speed and hardware cost.

Table 5.5: Comparisons among 521-bit ECC designs over GF(p).

Tech. Gates(K) Key

Cycles

Time(ms)@

AT

Power

Size fmax(MHz) (mW)

R2-DECP,1 90nm 165.9 521 1,438,637 3.88@370.3 1 ESSCIRC’10 [33],1 90nm 170.7 521 1,967,982 5.31@370.3 1.4 MT’08 [32],1,a 0.18µm 225.0 512 1,824,522 13.7@133.0 4.8

a 512-bit DECP.

Table 5.6: Comparisons among 521-bit ECC designs over GF(2m).

Tech. Gates(K) Key

Cycles

Time(ms)@

AT

Power

Size fmax(MHz) (mW)

R2-DECP,1 90nm 165.9 409 769,492 1.38@555.6 1 ESSCIRC’10 [33]⋆,1 90nm 170.7 409 1,165,672 2.23@500.0 1.6

Table 5.7: Comparisons among 160-bit ECC designs over GF(p).

Tech. Core(mm2) Key

Cycles Time(ms)@

AT Power

/Gates(K) Size fmax(MHz) (mW)

R4-DECP⋆,2 90nm 0.29/82.8 160 79,528 0.31@256.4 1 22.5 TCAS-2’09 [24],3 0.13µm 1.44/169.4 160 74,021 0.61@121.0 4.0 70.0 TVLSI’08 [25],2 0.13µm 1.06/150.6 160 74,021 0.34@217.0 2.0

TC’03 [23],1 0.13µm – /117.5 160 153,000 1.21@137.7 5.5

2Post-simulation result. 3Measurement result.

Table 5.8: Comparisons with 160-bit ECC designs over GF(2m).

Tech. Core(mm2) Key

Cycles Time(ms)@

AT Power

/Gates(K) Size fmax(MHz) (mW)

R4-DECP,2 90nm 0.29/82.8 160 56,506 0.19@289.9 1 25.9 TCAS-2’09 [24],3 0.13µm 1.44/169.4 160 54,319 0.37@146.0 4.0 82.1 TVLSI’08 [25],2 0.13µm 1.06/150.6 160 54,319 0.16@350.0 1.5

TC’03 [23],1 0.13µm – /117.5 160 86,000 0.19@510.2 1.4 DATE’07 [21]1,a 0.25µm – / – 163 9,251 0.08@111.1 154.2

a 163-bit ECC processor.

Table 5.9: Comparisons among 256-bit ECC designs over GF(p).

Tech.

Core(mm2) Key

Cycles

Time(ms)@

AT

Power

/Gates(K) Size fmax(MHz) (mW)

R4-DECP,2 90nm 0.45/122.0

160 79,720 0.32@250.0 256 193,386 0.77@250.0 1 31.0 TCAS-2’09 [24],1 0.13µm – /197.0 256 252,067 1.21@208.0 2.5

ISCAS’08 [55]⋆,2 0.18µm 17.8/ –

160 28,000 0.12@233.0 ∼10 256 70,457 0.30@233.0 ∼10 MT’07 [31],1 0.18µm – /292.5 256 439,746 5.86@75.0 18.2 TC’03 [23],1 0.13µm – /120.2 256 369,000 2.69@137.0 3.4 TCAS-2’07 [9]1 0.13µm – /122.0 256 562,000 1.01@556.0 1.3

Table 5.10: Comparisons among 256-bit ECC designs over GF(2m).

Tech.

Core(mm2)

Field Cycles

Time(ms)@

AT

Power

/Gates(K) fmax(MHz) (mW)

R4-DECP,2 90nm 0.45/122.0

160 56,698 0.20@277.8 256 165,354 0.59@277.8 1 35.6 TCAS-2’09 [24],1 0.13µm – /197.0 256 195,714 0.74@263.0 2.0

ISCAS’08 [55],2 0.18µm 17.8/ –

160 22,000 0.095@233.0 ∼10 256 56,050 0.24@233.0 ∼10 TC’03 [23],1 0.13µm – /120.2 256 230,000 0.45@510.0 0.6 DATE’08 [20]1,a 90nm – /1494.7 233 3,077 0.015@200.0 0.3 64.64

JSSC’01 [12]3,b 0.25µm – /880.0 256 725,000 14.5@50.0 177.2

a 233-bit ECC processor. b Including modular exponentiation hardware.

Table 5.11: Comparisons among 571-bit ECC designs over GF(2m).

Tech.

Core(mm2) Key

Cycles

Time(ms)@

AT

Power

/Gates(K) Size fmax(MHz) (mW)

R4-DECP,1 90nm – /308.2 571 719,659 2.09@344.8 1 TVLSI’09 [13]2,a 0.13µm 2.34/331.7 571 2,033,500 4.9@415.0 2.5 277.6

TC’07 [15]1,b 0.13µm

– /343.0 571 407,048 1.39@292.0 0.7 – /244.0 571 451,140 1.55@292.0 0.6 DATE’07 [21]1 0.25µm – / – 571 322,275 0.48@53.3 396.1 CHES’00 [18]1 0.25µm – /165.0 571 1,452,000 22.0@66.0 5.6

a Including modular exponentiation hardware. b This work can perform Hyper-ECC.

Table 5.12: Implementation results of 256-bit R2-GFAUPAR.

Tech. Gates(K) Field Function Cycles

Time(ms)@

AT fmax(MHz) RD 316 0.79µ@400.0 1a GF(p256) RM 257 0.64µ@400.0

R2-GFAUPAC,1 90nm 56.4

MA/MS 2 5.0n@400.0

RD 427 0.77µ@555.6 1.2b GF(2256) RM 257 0.46µ@555.6

MA/MS 2 3.6n@555.6

a Compared with 256-bit R2-GFAU for GF(p256) division operation.

b Compared with 256-bit R2-GFAU for GF(2256) division operation.

Table 5.13: Implementation results of 256-bit R2-DECPAR.

Tech. Gates(K) Field Cycles

Time(s)@

AT fmax(MHz)

R2-DECPAC,1 90nm 88.8

GF(p256) 539,134 1.37@392.1 1.7a GF(2256) 494,196 0.89@555.6 1.8b

a Compared with 256-bit R2-DECP for GF(p256) ECSM operation.

b Compared with 256-bit R2-DECP for GF(2256) ECSM operation.

Table 5.14: Comparisons among 521-bit ECC designs over GF(p).

Tech.

R2-DECPAC,1 90nm 179.9/8.4% 521 2,020,494 5.46@370.3 1 39.4%

ESSCIRC’10 [33],1,a 90nm 185.1/8.9% 521

2,534,400 6.84∼7.26 1.3∼ 37.2∼

∼2.690,063 @370.3 1.4 45.6%

MT’08 [32],1,b 0.18µm 277.0/23.1% 512 3,649,044 27.4@133.0 7.7 62.5%

a PA-resistant DECP.b 512-bit SPA-resistant DECP.

Table 5.15: Comparisons among 521-bit ECC designs over GF(2m).

Tech.

Gates(K)/Area Key

Cycles

Time(ms)@

AT

Time

Degradation Size fmax(MHz) Increase

R2-DECPAC,1 90nm 179.9/6.9% 409 1,224,496 2.20@555.6 1 59.1%

ESSCIRC’10 [33],1,a 90nm 185.1/8.9% 409 1,748,502∼ 3.5∼3.7 1.6∼ 50.0∼

1,852,862 @500.0 1.7 59.3%

a PA-resistant DECP.

Chapter 6

相關文件