• 沒有找到結果。

Reconfigurable VLSI Architecture for FFT Processor

6 Conclusion

This paper presents low-power and high-speed FFT processors based on CORDIC and split-radix techniques for OFDM systems. The architectures are mainly based on a reusable IP 128-point CORDIC-based split-radix FFT core. The pipelined CORDIC arithmetic unit is used to compute the complex multiplications involved in FFT, and moreover the required twiddle factors are obtained by using the proposed ROM-free twiddle factor generator rather than storing them in a large ROM space.

The CORDIC-based

128/256/512/1024/2048/4096/8192- point FFT processors have been implemented by 0.18 μm CMOS, which take 395μs , 176.8μs , 77.9μs , 33.6μs , 14μs , 5.5 μs and 1.88μs to compute 8192-point, 4096-point, 2048-poin, 1024-point, 512-point, 256-point and 128-point FFT, respectively.

The CORDIC-based FFT processors are designed by using the portable and reusable Verilog®. The 128-point FFT core is a reusable intellectual property (IP), which can be implemented in various processes and combined with an efficient use of hardware resources for the trade-offs of performance, area, and power consumption.

References:

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Reg. Memory

128*32 Reg.

Modify Split-Radix 2/8 FFT

Architecture

Controller

8*32 8*32

32 32

16 16

16 16

pipelined CORDIC rotations,” IEE Proc.-Vis.

Image Signal Procss., Vol. 153, No. 4, Aug. 2006, pp.405-410.

[2] J. C. Kuo, C. H. Wen, A. Y. Wu,

“Implementation of a programmable 64/spl sim/2048-point FFT/IFFT processor for OFDM-based communication systems,”

Proceedings of the 2003 International Symposium on Circuits and Systems, Volume 2, 25-28 May 2003 pp.II-121 - II-124.

[3] L. Xiaojin, Z. Lai, C. J. Cui, “A low power and small area FFT processor for OFDM demodulator,” IEEE Transactions on Consumer Electronics, Volume 53, Issue 2, May 2007, pp.

274 – 277.

[4] J. Lee, H. Lee, S. I. Cho, S. S. Choi, “A high-speed, low-complexity radix-216 FFT processor for MB-OFDM UWB systems,”

Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp.

[5] A. Cortes, I. Velez, J. F. Sevillano, A. Irizar, “An approach to simplify the design of IFFT/FFT cores for OFDM systems,” IEEE Transactions on Consumer Electronics, Volume 52, Issue 1, Feb.

2006, pp.26 – 32.

[6] Y. H. Lee, T. H. Yu, K. K. Huang, A. Y. Wu,

“Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems,” IEEE Workshop on Signal Processing Systems Design and Implementation, Oct. 2006 pp.62-65.

[7] C. L. Wey, W. C. Tang, S. Y. Lin, “Efficient memory-based FFT architectures for digital video broadcasting (DVB-T/H),” 2007 International Symposium on VLSI Design, Automation and Test, 25-27 April 2007, pp.1-4.

[8] Y. W. Lin, H. Y. Liu, C. Y. Lee, “A 1-GS/s FFT/IFFT processor for UWB applications,”

IEEE Journal of Solid-State Circuits, Volume 40, Issue 8, Aug. 2005, pp.1726-1735.

[9] C. D. Thompson, “Fourier transform in VLSI,”

IEEE Transactions on Computers, Vol.32, No. 11, 1983, pp.1047-1057.

[10] E. H. Wold, A. M. Despain, “Pipelined and parallel-pipelined FFT processor for VLSI implementation,” IEEE Transactions on Computers, Vol.33, No. 5, 1984, pp.414-426.

[11] T. Widhe, “Efficient implementation of FFT processing elements,” Linkoping Studies in Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.

[12] P. Duhamel, H. Hollmann, “Implementation of

"split-radix" FFT algorithms for complex, real, and real symmetric data.” IEEE International

Conference on Acoustics, Speech, and Signal Processing, Volume 10, April 1985, pp.784 – 787.

[13] A .A. Petrovsky, S. L. Shkredov, “Automatic generation of split-radix 2-4 parallel-pipeline FFT processors: hardware reconfiguration and core optimizations,” 2006 International Symposium on Parallel Computing in Electrical Engineering, pp.181-186.

[14] S. Bouguezel, M. O. Ahmad, M. N. S. Swamy,

“A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Volume 51, Issue 9, 2004, pp.1723- 1732.

[15] W. C. Yeh, C. W. Jen, “High-speed and low-power split-radix FFT.” IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 51, Issue 3, March 2003, pp.864 – 874.

[16] M. D. Ercegovac, T. Lang, “CORDIC algorithm and implementations.” Digital Arithmetic, Morgan Kaufmann Publishers, 2004, Chapter 11.

[17] T. Y. Sung, H. C. Hsin, “Fixed-point error analysis of CORDIC arithmetic for special-purpose signal processors,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E90-A, No.9, Sep. 2007, pp.2006-2013.

[18] “TSMC 0.18 CMOS Design Libraries and Technical Data, v.3.2,” Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, and National Chip Implementation Center (CIC), National Science Council, Hsinchu, Taiwan, R.O.C., 2006.

[19] Cadence design systems:

http://www.cadence.com/products /pages/

default.aspx.

Fig. 1 The proposed 128-point CORDIC- based split-radix FFT processor

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

-1

-1 -1

-1

-1 -1

-1 -1

(n) x

) 8 / (n N x +

) 4 / (n N x +

) 8 / 3 (n N x +

) 2 / (n N x +

) 8 / 5 (n N x +

) 4 / 3 (n N x +

) 8 / 7 (n N x +

j

j

Nn

W

Nn

W3

Nn

W5

Nn

W7

) 1 8 ( +k X

) 3 8 ( +k X

) 5 8 ( +k X

) 7 8 ( +k X

CM 8(1)

CM

8(3) CORDIC

CORDIC CORDIC CORDIC

) 8 (k a

) 2 8 ( +k a

) 4 8 ( +k a

) 6 8 ( +k a

A d d S u b

R e [X] I m [X]

S h i f t e r 2 / S u b

L a t c h L a t c h

L a t c h L a t c h

M u x

] ' I m [ 2 _ 2 ] ' R e [ 2

2 X X

S h i f t e r 2 / S u b

S h i f t e r 4 / S u b S h i f t e r 4 / S u b

16-bit Accumulator

16-bit Reg.

16-bit Shifter

16-bit Shifter/Adder

Nn

θ1 θN5n θN3n θN7n

Control π

2

4

8 16

16

16

16 16 16 16 2 2

8192-point FFT Processor 4096-point FFT Processor 2048-point FFT Processor 1024-point FFT Processor 512-point FFT Processor

256-point FFT Processor 128-point FFT Processor

IP R a d i x 2 S P l i t 2/4

P/S S/P

S P l i t 2/8 S P l i t 2/8 S P l i t 2/8 S P l i t 2/8

4096/2048/1024/512/256/0*32 Internal Memory

8192/4096/2048/1024/512/256/128*32 External Memory

Fig. 2 Data flow of the butterfly computation of the modified split-radix 2/8 FFT

Fig. 3 Constant multiplier (CM) architecture for the modified split-radix 2/8 FFT

Fig. 4 Proposed ROM-free twiddle factor generator for 128-point FFT

Fig. 5 Hardware architecture of the 128/256/512/1024/2048 /4096 /8192-point FFT processor

Fig. 6 Layout view of the 8192-point FFT processor

Fig. 7 Log-log plot of the CORDIC computations versus the number of FFT points

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Full-Twiddle Factor ROM

CORDIC Twiddle Factor Generator

ROM-free Twiddle Factor Generator (Sung, Hsin and Cheng, 2008) 8192-Point ROM

bit 16 K 4 ×

11-bit Adder 11-bit Shifter

16-bit CORDIC 16-bit Shifter 16-bit Adder bit

K 18

~ ~150gates ~50gates ~90gates ~200gates

16-bit Accumulator 16-bit Shifter 16-bit Shifter/Adder gates 2 200 2 90

~ × + × gates

0 9

~ 200gates

~

16-bit Register gates 32

~

1bit~1gate

(T. Y. Sung, 2006) [1]

FFT Size Core Area Power Consumption

Clock Rate 128-point 2.28mm2

80mW 200MHz 256-point 2.37mm2

84mW 200MHz 512-poiint 2.49mm2

88mW 200MHz 1024-point 2.62mm2

94mW 200MHz 2048-point 2.81mm2

99mW 200MHz 4096-point 3.10mm2

106mW 200MHz 8192-point 3.62mm2

117mW 200MHz Table 1 Hardware requirements of the full-ROM, the CORDIC

twiddle factor generator [1], and the ROM-free twiddle factor generator

Table 2 Core areas, power consumptions, clock rates of 128-, 256-, 512-, 1024-, 2048-, 4096- and 8192-point FFT Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

附件二: 出席國際學術會議心得報告及發表之論文

行政院國家科學委員會補助國內專家學者出席國際學術會議報告

98 年 05 月 26 日

相關文件