Dual-Field Arithmetic - 具側漏資訊攻擊防禦之高硬體效能橢圓曲線密碼處理器

Divisor Multiplier

Adder/

Subtractor Multiplier

Adder/

Subtractor

RO-RNG

Figure 7.9: System architecture of soECC-P.

0.8 1 1.2 1.4

Figure 7.10: Shmoo plot for the measurement results of chip soECC-P.

Figure 7.11: Chip micrograph of our 160-bit DF-ECC processor, soECC-P.

7.4.3 so ECC-S: A 1.38 mm

3.40/2.77 ms GF (p

521

)/GF (2

⁵²¹

) 521-bit SCA-Resistant DF-ECC Processor Using Heteroge-neous Two-PE Architecture

Figure 7.12 shows the block diagram of the system of soECC-S. To save the compu-tation overhead against SCAs from the key-blinded approach with extended key size, the radix-4 randomized Montgomery operations [97] described in sub-section 4.1 are exploited.

The heterogeneous two-PE architecture is adopted to accelerate the ECPC, ECPG, and modular operations over DFs, where it consists of one JS-GFAU and one MAS. The memory hierarchy with local memory coherency is used to transfer data efficiently. With RL-DAA ECSM in Algorithm 9 and masked base point approaches, the SPA, DPA, ZPA, and CPA attacks can be defeated. To give the robustness against SCAs, a RO-RNG with jitter amplifier is implemented in device. For accelerating the calculation of ECPG, the EC points can be randomly generated by parallel computation from the components of JS-GFAU and MAS. Since the ECPG can be achieved in chip device, another advantage is the transmission reduction of public key [7]. The compressed form is that the y co-ordinate, denoted ˜y, is a single bit, where ˜y = y (mod 2). The decompression of y is to compute a square root z of g ≡ x³ + apx + bp (mod p) over GF (p). Let ˜z be the rightmost bit of z. If ˜z = ˜y, then y ← z, else y ← p − z. In the case of field GF (2^m), the decompression of y is first to compute β ≡ α(x²)⁻¹, where α ≡ x³+ abx²+ bb (mod p(x)).

And then find a field element z such that z²+ z = β. Let ˜z be the rightmost bit of z.

Finally, compute y ≡ (z + ˜z + ˜y)x and return coordinate value y.

Table 7.5 lists the summary of chip performance for our 521-bit SCA-resistant DF-ECC processor. The measurement results of operating frequency and energy dissipation over supply voltage are shown in Figure 7.13(a) and Figure 7.13(b), respectively. The range of supply voltage is from 0.6 V to 1.2 V. The maximum frequency is higher as the field length is lower because the critical path depends on the field length. In contrast, the energy consumption per ECSM operation is proportional to the field length because of the binary method of scalar multiplication. Figure 7.14 shows the die photo of the ECC chip.

521-bit Modulus 521-bit Field Length

1-bit Field Select Wrapper and Address Decoder

521-bit Key

Operands Operands

Instruction Decode Task Management Pre/Post-Processing

Memory Control

Internal Bus

ECC Control

Shared Control Logic

Dual-Field Arithmetic

Divisor Multiplier

Adder/

Subtractor Multiplier

Adder/

Subtractor

RO-RNG with JA

Jacobi Symbol

Point/Curve Check

Dual Fields Dual Fields

Figure 7.12: System architecture of soECC-S.

Table 7.5: Chip Summary of soECC-S

Technology 90-nm

Core Area 1.38 mm²

Gate Count 342 K

Key Size 521

Field Dual

Field Length

GF (p) GF (2^m) 160 521 163 409 521 Time (ms/ECSM) 0.29 3.40 0.25 1.72 2.77

f (MHz) 214 187 224 217 216

Energy (µJ/ECSM) 57 598 56 329 532

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

(a) Operating frequency over supply voltage

0.50 0.6 0.7 0.8 0.9 1 1.1 1.2

(b) Energy dissipation over supply voltage

Figure 7.13: Shmoo plot for the measurement results of chip soECC-S.

Figure 7.14: Chip micrograph of our 521-bit DF-ECC processor, soECC-S.

7.4.4 so ECC-G: A 10.8/9.2 ms 438/437 µW GF (p

₁₉₂

)/GF (2

¹⁹²

) 192-bit SCA-Resistant DF-ECC Processor Using Single-GFAU Architecture

Figure 7.15 shows the block diagram of the system of our proposed crypto engine (CE), where the following are the supported security schemes manipulated by CE control.

• AES schemes: CTR, CBC-MAC, CMAC, and CCM modes, where the encryption and decryption key sizes are 128 bits.

• ECC schemes: ECPA, ECPD, ECPS, ECSM operations, and DHK agreement over GF (p) and GF (2^m), where the public and private key sizes are 192 bits.

• Modular operations: addition, subtraction, multiplication, inversion, and division over GF (p) and GF (2^m).

• Random number sequence: 8-bit true random bitstream per cycle.

To conveniently integrate our proposed CE into an embedded system, a standard AMBA AHB bus interface [96] is used. Also, for real-time transmitting the encrypted

AES Core

CBC Reg. Nonce/IV Reg.

Reg.CTR Plaintext

Reg.

Ciphertext

128 Reg.

128

CE Control

128

soECC-G

AMB A AH B BU S

0.5V

O N /O FF

DF-ECC I/O Input Text

Output Text Key

Address Decoder Application Data

RO-RPG/

RNG 8 CTR

CBCCMAC CCM

Figure 7.15: System architecture of our CE.

data memory is exploited to access outside memory without dominating system bus. Since the electronic metastability inherently exists in free running ring oscillators (ROs), the ROs can be efficiently reused to implement both random power generator (RO-RPG) [84]

and random number generator (RO-RNG), where they are required to protect the key in AES core from revealing by power-analysis attacks. To save the SCA-resistant overhead of ECC processor, soECC-G, from the key-blinded approach with extended key size, the radix-2 randomized Montgomery operations described in sub-section 4.1 are exploited. A single processing element, GFAU, is efficiently exploited to accelerate the ECPC in ECC schemes.

To support the requirements of security functions in the applications of IoT, a 128-bit AES core, a 192-128-bit DF-ECC processor, and an 8-128-bit RO-RNG are integrated with bio-signal processing system [98]. The digital processing module and sensing interface are included, and a 32-bit RISC CPU core, Andes N903-C05 [99], is utilized to enhance the instruction scheduling. To reduce the system power, the processor sleeps in the data collection stage and activates in the data processing stage by a wakeup/power control logic. To improve the battery life of portable device, the chip working at 0.5 V low supply voltage is achieved by a reconstructed logic cell [100]. Moreover, to apply the voltage scaling scheme, the cell behavior and timing information in the range of 1 V to 0.5 V supply voltage are simulated and re-calibrated, then the cell library after picking out the cells which work normally is reconstructed. With the reconstructed cell library, the proposed CE chip can be implemented by using standard-cell based design procedure. By scaling the supply voltage from 1.0 V to 0.5 V, the power is reduced by 80-84%, where Figure 7.16 plots the power consumption versus the voltage and frequency. Since the power consumption is dominated by leakage power at low frequency, the CE operating frequency is raised to work at 25 MHz for the sake of energy efficiency. Additionally, as the operations in security schemes are finished, the CE can be turned off by the power gating for leakage power saving. The hardware performance of our AES core achieves 60 Mb/s 99 µW, where the throughput is 6 times higher than the 10 Mb/s required in IEEE 802.15.6 standard and 30 times higher than the 2 Mb/s specified in IEEE 802.15.4 standard. The RO-RNG generates 25 Mb/s random sequence and consumes 47 µW. For the DF-ECC processor it can perform one GF (p192) ECSM in 10.8 ms with 438 µW and

one GF (2¹⁹²) in 9.2 ms with 437 µW, sufficiently passing the 250 ms at 13.56 MHz reaction requirement in ISO 18000-3 of RFID tag applications [44] using Schnorr’s identification protocol [22]. For the hardware complexity, the equivalent gate counts of AES core, RO-RNG, and DF-ECC processor are 7.24 K, 0.43 K, and 61.68 K, respectively. Figure 7.17 shows the die photo and Table 7.6 shows the summary of chip performance for our 192-bit SCA-resistant DF-ECC processor.

Figure 7.16: The power consumption of CE chip working at different supply voltage and operation frequency.

Figure 7.17: Chip micrograph of our CE cooperating with embedded processor and other components, such as data memory (DM), program memory (PM), sensing interface, and bio-signal processing module.

在文檔中具側漏資訊攻擊防禦之高硬體效能橢圓曲線密碼處理器 (頁 119-128)