• 沒有找到結果。

Chapter 3 AES Design

3.3 Proposed AES Architecture

3.3.2 Reconfigurable Key Unit

a x

( )

d2 x

Figure 3.11 Implementation of MixColumns and InvMixcolumns

3.3.2 Reconfigurable Key Unit

Fig. 3.12 shows our Key Unit, it is composed of two part, control logic and Key- Generator. The KeyGenerator generate the round key for AES encryption every round.

The counter count 10 rounds for 128-bit key, 12 rounds for 192-bit key and 14 rounds for 256-bit key. But we only need 128-bit key in every round, so we use registers to store the round key for next round when 192-bit key and 256-bit key scheduling. The SBox in KeyGenerator is the same as that in Main Function Unit. And the control logic chooses the correct round key for AddRoundKey transformation of Main Function Unit in every round.

Figure 3.12 Block diagram of Key Unit for Encryption

The Key Unit for decryption is different with from that for encryption. At first, we generate all keys for each round and stored in the STACK. When all key we need for decryption is ready, we start to decrypt the cipher. The Fig. 3.13 shows our Key Unit for decryption.

Figure 3.13 Block diagram of Key Unit for Decryption

Chapter 4

Simulation and FPGA Verification

AES arithmetic in hardware and design for embedded system are given in this work. This chapter shows the hardware implantation results. The hardware implementation results and design flow are described in Sec. 4.1. The RTL synthesizer uses Synopsys1 Design Compiler for ASIC. The FPGA verification will discuss in Sec.

4.2.

4.1 ASIC Implementation

Fig. 4.1 illustrates the entire ASIC design and testing flow with various CAD (Computer Aided Design) tools. The design is done by pre-layout gate-level simulation but the pre-layout simulation can not calculate the circuit speed precisely. The results for post-layout gate-level simulation will be worse than the results shown in former.

Tab. 4.1 compares our design with other proposed paper. [14] implements the SBox using Look-up-Table. [13] uses composite field arithmetic to implement the SBox. Our design is 2-stage pipelined. The throughput in 128 bit-key length is 1.82Gbps.

Figure 4.1 ASIC design flow

Table 4.1 The AES Core Comparison

Kuo [14] Lai [15] Horng [13] Ours

Technology 0.18 0.25 0.18 0.18

Clock rate (MHz) 154 125 125 150

Gate count 173K 80K 67.9K 47.5K

Throughput (Gbps) 1.6 1.454 1.6 1.82

Pipeline stage 1 6 1 2

Key Size All 128 All All

Function E E/D E/D E

And Tab. 4.2 compares S+Core with AES-128 encryption accelerator and S+Core

without AES-128 encryption accelerator. Tab. 4.2 shows the time we need to encrypt the first data. We don’t need to spend so much time calculating data as before did.

Table 4.2 The comparison of S+Core with accelerator and not

S+Core (without accelerator) 3329 cycles S+Core (with accelerator) 206 cycles

4.2 FPGA Verification

Figure 4.2 illustrates the FPGA design and testing flow in contrast to the ASIC design flow. Besides the RTL simulation, we also verified our design by using Field Programmable Gate Array (FPGA). Our design is implemented in S+Core, and the operation clock rate is about 33MHz. Tab. 4.3 shows the hardware utilization of our design.

Figure 4.2 FPGA design flow

Table 4.3 The hardware utilization on S+Core

Device S+Core

Number of Slice Flip-Flops 20801/93184 (22%) Number of 4 Input LUT 62400/93184 (66%)

clcok rate 33MHz

Chapter 5 Conclusions

First, we have proposed an efficient AES design supported 128, 192, and 256 bits key length. Because of our real time variable KeyGenerator, we don’t need to store all round keys. We only need 10% storage area than others. Second, by implementing the multiplicative inverter in composite field, the area cost can be smaller then that in Look-up-table (LUT). The whole design area can be also reduced by sharing the hardware for encryption and decryption. We also proposed an AES accelerator for 32-bit Embedded Processors. Third, we extend the instruction set of the processor.

Because of that, we only need less than 300 cycle count. The processor without accelerator needs over 3000 cycle count to process AES encryption. We speed up 10 times by our design. Besides, from the analysis of various instruction schedules, the 2-stage pipelined architecture is suitable and efficient for most schedules. The total gate count is about 47.5K gates, and maximal throughput is about 1.82Gbps with UMC 0.18 μm process.

Bibliography

[1] W. Stallings, Cryptography and Network Security: Principles and Practice.

Prentice Hall, 2002.

[2] Recommendation on Key Management, NIST Special Publications Std. 800-57, 2005.

[3] J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES Algorithm Submission, September 3, 1999.

[4] X. Zhang, K. K. Parhi, “High-speed VLSI Architectures for the AES algorithm,”

IEEE Trans. On VLSI Systems, vol. 12, no. 9, pp. 957-967, 2004.

[5] C. Paar, “Efficient VLSI architecture for bit-parallel computations in Galois field,”

Ph.D. dissertation, Institute for Experimental Mathematics, University of Essen, Essen, Germany, 1994.

[6] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndael hardware architecture with S-Box optimization,” in Proc. ASIACRYPT 2001, Gold Coast, Australia, Dec. 2000, pp. 239-254.

[7] M. H. Jing, Y. H. Chen, Y. T. Chang, and C. H. Hsu, “The design of a fast inverse module in AES,” in Proc. Int. Conf. Info-Tech and Info-Net, vol. 3, Beijing, China, Nov. 2001, pp. 298–303.

[8] V. Fischer and M. Drutarovsky, “Two methods of Rijndael implementation in reconfigurable hardware,” in Proc. CHES 2001, Paris, France, May 2001, pp.

77–92.

[9] H. Kuo and I. Verbauwhede, “Architectural optimization for a 1.82 Gbits/sec VLSI implementation of the AES Rijndael algorithm,” in Proc. Cryptographic Hardware and Embedded Systems (CHES) 2001, Paris, France, May 2001, pp.

51–64.

[10] C. C. Lu and S. Y. Tseng, “Integrated design of AES (advanced encryption standard) encrypter and decrypter,” in Proc. IEEE Int. Conf. Application Specific Systems, Architectures Processors, 2002, pp. 277–285.

[11] X. Zhang and K. K. Parhi, “Implementation approaches for the advanced encryption standard algorithm,” IEEE Circuits Syst. Mag., vol. 2, no. 4, pp.

24–46, 2002.

[12] http://w3.sunplus.com/products/S%2Bcore.asp

[13] C. L. Horng, “An AES cipher chip design using on-the fly key scheduler”, Master Thesis, Dept. Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, June 2004.

[14] I. Verbauwhede, P. Schaumont, and H. Kuo, “Design and performance testing of a 2.29Gb/s Rijndael Processor”, IEEE Jour. of Solid-State Circuits, vol. 38, no. 3, March 2003, pp. 569-572, 2003.

[15] Y. K. Lai, L. C. Chang, L. F. Chen, C. C. Chou, and C. W. Chiu, “A novel memory less AES cipher architecture for networking applications”, in Proc.

IEEE Circuit and Systems Symp, May 2004.

About the Author

姓 名:葉博元 Po-Yuan Yeh 出 生 地:台北市

出生日期:1982. 11. 09

學 歷:

1989. 9 ~ 1995. 6 台北市立康寧國民小學 1995. 9 ~ 1998. 6 台北市立三民國民中學 1998. 9 ~ 2001. 6 國立台北師大附中

2001. 9 ~ 2005. 6 國立中正大學 電機工程學系 學士 2005. 9 ~ 2007. 8 國立交通大學 電子研究所系統組

相關文件