Energy-Efficient and SCA-Resistant Crypto Engine for IEEE 802.15.4/6 Applications

Bio-Signal Processing

7.5.2 Energy-Efficient and SCA-Resistant Crypto Engine for IEEE 802.15.4/6 Applications

The comparison of our CE chip with other related AES [76, 104–107] and ECC [22, 24, 25, 28, 57, 58] hardware implementations are summarized in Table 7.10. By using our highly-integrated architecture with low overhead of randomized techniques, it shows advantages in energy efficiency and power-analysis resistance. Besides, through system-level integration, security schemes specified in both of IEEE 802.15.4 and IEEE 802.15.6 standards are supported. Compared to a previous work of IEEE 802.15.4 security device with RSA-based Diffie-Hellman Key (DHK) agreement [50], we also implement another crypto engine (CE-II) by the same FPGA family. Note that the key size of DF-ECC processor in CE-II is set to 224 bits, where it achieves the same level of security as 2048-bit RSA used in [50]. The synthesized results are shown in Table 7.10. Our CE-II occupies 20,166 slice LUTs and 4,399 slice registers which are 13% and 65% less than those of [50], respectively. On the other hand, our DF-ECC processor needs at most 336K cycles to complete the DHK agreement, which is about 94% less than those of RSA-based DHK agreement design. In addition, our design supports higher security level against power-analysis attacks. These advantages indicate that our proposed solution is well suitable for the resource constrained applications such as IoT.

Table 7.10: ASIC and FPGA Comparison Among Previous Works

Technology Area AES Throughput AES Energy ECSM Time ECSM Power-Analysis

Standards (mm²) (Mb/s)@f (MHz) (µJ/Mb) (ms)@f (MHz) Energy (µJ) Resistance

Our CE

90-nm 0.34 60@25 1.65 GF (p192) 10.8@25 4.73 SPA and DPA IEEE 802.15.4

(Measurement) GF (2¹⁹²) 9.2@25 4.02 attacks IEEE 802.15.6

JSSC’11 [104]

45-nm 0.052 53,000@2,100 2.36 - - - SPA attacks

-(Measurement)

-(Measurement) GF (2¹⁶⁰) 0.27@158 21.6

ISCAS’11 [58]

90-nm 0.29 - - GF (p160) 0.31@256 6.98

-(Post-layout) GF (2¹⁶⁰) 0.19@290 4.92

TCAS-II’09 [24]

130-nm 1.44 -

-GF (p160) 0.61@121 42.6

-(Measurement) GF (2¹⁶⁰) 0.37@146 30.5

MWSCAS’09 [25]

180-nm 2.10 - - - -

-(Post-layout) GF (2¹⁶³) 1.89@181 257

TC’08 [22]

130-nm - - - - -

-SPA attacks

-(Synthesis) GF (2¹⁶³) 244@0.001 8.94

CRASH’05 [57]

90-nm 0.09 - - GF (p192) 1.13@600 37.29

-(Post-layout) GF (2¹⁹¹) 0.71@600 23.46

Our CE-II Spartan 6 20,166/4,399^†

133.295@56.234 - GF (p224)/ 336/286K

- SPA and DPA IEEE 802.15.4

(Synthesis) xc6slx75-3 (LUTs/REGs) GF (2²²⁴) cycles attacks IEEE 802.15.6

SCVT’11 [50] Spartan 6 23,079/12,679^‡

452.85@233.503 - RSA2048

6,291K^∗

- - IEEE 802.15.4

(Synthesis) xc6slx75-3 (LUTs/REGs) cycles

Energy = average power × time.

†AES core + DF-ECC processor.

‡AES core + RSA processor.

∗Estimated by m × (m + 0.5 × m) cycles [108], where m is the key size.

Chapter 8 Conclusion

8.1 Summary

In this dissertation, our research works about the design and implementation for PKC have been reported. We reviewed the state-of-the-art approaches of the ECC hardware implementation and SCA countermeasure. Several design techniques have been presented for hardware performance improvement, and some arithmetical methods of ECC have been used to avoid key-dependent processed data against SCAs. Unfortunately, there is relatively little total solution for the SCA-resistant ECC processor. Also, although the ECC has been adopted in some existing standards, there is a lack of design methods for distinct realistic applications. According to these considerations, our design objectives are not only the hardware efficiency against SCAs but also the standard compliance.

In our work, we adopt a new top-to-down design approach including the basic modular operations over DFs, operation scheduling for both the ECSM and ECGP, and on-chip implementation of the random bitstream generation.

The randomized Montgomery operations with low overhead of hardware complexity are proposed for DPA resistance, where the iteration reduction of randomized Mont-gomery division is also achieved to improve the execution time as compared to Kaliski’s Montgomery inversion. The domain conversion can be immediately performed by several operations of Montgomery multiplication and division, and it is time costless for the com-putation of ECSM. To prevent the attackers from observing some key-dependent specific processed data such as zero value, the masked base point technique of ZPA resistance is

exploited. By reusing the PEs, a parallel computation of ECPG is efficiently implemented to save time overhead. Besides, to avoid the potential threats in operation scheduling, the RL-DAA ECSM with SPA and CPA resistance is used, where a modification for that is to explore the parallelism. The execution time can be further improved by the priority-oriented scheduling method. Note that our SCA countermeasure does not need to modify the standard cell and ASIC/FPGA design flow, and it can be applied for the standard ECC function over DFs.

As above, the randomized computation requires the random sequence. The evidence is that the low randomness of bitstream results in weakness of DPA resistance. To defeat this problem, we introduce a new design of RO-RNG, which generates the random sequence without deterministic state. A jitter amplifier is exploited to enlarge the sample space, leading to higher capability against bias sampling. After applying this technique, ten one-million-bit sequences pass the 15 patterns in NIST random tests with 99% confidence.

From measurement results, the DPA attacks cannot reveal the key value in ECC chip using these random sequences even with 12 million power traces.

For hardware architecture, the integrated and fully-pipelined PEs (i.e., JS-GFAU, GFAU, and MAS) are used to save cost area from multiple ALUs. To improve the uti-lization further, the heterogeneous two-PE architecture is exploited for the parallel com-putation of ECSM and ECPG without duplicating PEs. Besides, the two-level memory hierarchy with local memory coherence is applied to reduce the data transition, gain-ing benefits in the power dissipation as compared with conventional shift-register based approaches.

By using a UMC 90-nm CMOS technology, several SCA-resistant ECC chips with dif-ferent specifications and design criteria were fabricated for various applications, including the mobile device, computing server, and IoT. A 0.41 mm² 160-bit ECC chip, soECC-P, performs one GF (p)/GF (2^m) ECSM in 0.34/0.29 ms 11.7/9.3 µJ. It is effective at the hardware cost for the mobile device. A 521-bit ECC chip, soECC-S, supporting compressed public-key form can achieve each GF (p521) ECSM in 3.40 ms and GF (2⁵²¹) ECSM in 2.77 ms. This is the fastest design for the cloud computing. Furthermore, a 192-bit ECC chip, soECC-G, operating at low supply voltage 0.5 V and cooperating with bio-signal module achieves 10.8/9.2 ms 438/437 µW GF (p192)/GF (2¹⁹²) ECSM. This is

targeted at the power efficiency and suitable for the applications of IoT.

在文檔中具側漏資訊攻擊防禦之高硬體效能橢圓曲線密碼處理器 (頁 134-138)