Reconfigurable VLSI Architecture for FFT Processor
6 Conclusion
This paper presents low-power and high-speed FFT processors based on CORDIC and split-radix techniques for OFDM systems. The architectures are mainly based on a reusable IP 128-point CORDIC-based split-radix FFT core. The pipelined CORDIC arithmetic unit is used to compute the complex multiplications involved in FFT, and moreover the required twiddle factors are obtained by using the proposed ROM-free twiddle factor generator rather than storing them in a large ROM space.
The CORDIC-based
128/256/512/1024/2048/4096/8192- point FFT processors have been implemented by 0.18 μm CMOS, which take 395μs , 176.8μs , 77.9μs , 33.6μs , 14μs , 5.5 μs and 1.88μs to compute 8192-point, 4096-point, 2048-poin, 1024-point, 512-point, 256-point and 128-point FFT, respectively.
The CORDIC-based FFT processors are designed by using the portable and reusable Verilog®. The 128-point FFT core is a reusable intellectual property (IP), which can be implemented in various processes and combined with an efficient use of hardware resources for the trade-offs of performance, area, and power consumption.
References:
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Reg. Memory
128*32 Reg.
Modify Split-Radix 2/8 FFT
Architecture
Controller
8*32 8*32
32 32
16 16
16 16
pipelined CORDIC rotations,” IEE Proc.-Vis.
Image Signal Procss., Vol. 153, No. 4, Aug. 2006, pp.405-410.
[2] J. C. Kuo, C. H. Wen, A. Y. Wu,
“Implementation of a programmable 64/spl sim/2048-point FFT/IFFT processor for OFDM-based communication systems,”
Proceedings of the 2003 International Symposium on Circuits and Systems, Volume 2, 25-28 May 2003 pp.II-121 - II-124.
[3] L. Xiaojin, Z. Lai, C. J. Cui, “A low power and small area FFT processor for OFDM demodulator,” IEEE Transactions on Consumer Electronics, Volume 53, Issue 2, May 2007, pp.
274 – 277.
[4] J. Lee, H. Lee, S. I. Cho, S. S. Choi, “A high-speed, low-complexity radix-216 FFT processor for MB-OFDM UWB systems,”
Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp.
[5] A. Cortes, I. Velez, J. F. Sevillano, A. Irizar, “An approach to simplify the design of IFFT/FFT cores for OFDM systems,” IEEE Transactions on Consumer Electronics, Volume 52, Issue 1, Feb.
2006, pp.26 – 32.
[6] Y. H. Lee, T. H. Yu, K. K. Huang, A. Y. Wu,
“Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems,” IEEE Workshop on Signal Processing Systems Design and Implementation, Oct. 2006 pp.62-65.
[7] C. L. Wey, W. C. Tang, S. Y. Lin, “Efficient memory-based FFT architectures for digital video broadcasting (DVB-T/H),” 2007 International Symposium on VLSI Design, Automation and Test, 25-27 April 2007, pp.1-4.
[8] Y. W. Lin, H. Y. Liu, C. Y. Lee, “A 1-GS/s FFT/IFFT processor for UWB applications,”
IEEE Journal of Solid-State Circuits, Volume 40, Issue 8, Aug. 2005, pp.1726-1735.
[9] C. D. Thompson, “Fourier transform in VLSI,”
IEEE Transactions on Computers, Vol.32, No. 11, 1983, pp.1047-1057.
[10] E. H. Wold, A. M. Despain, “Pipelined and parallel-pipelined FFT processor for VLSI implementation,” IEEE Transactions on Computers, Vol.33, No. 5, 1984, pp.414-426.
[11] T. Widhe, “Efficient implementation of FFT processing elements,” Linkoping Studies in Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.
[12] P. Duhamel, H. Hollmann, “Implementation of
"split-radix" FFT algorithms for complex, real, and real symmetric data.” IEEE International
Conference on Acoustics, Speech, and Signal Processing, Volume 10, April 1985, pp.784 – 787.
[13] A .A. Petrovsky, S. L. Shkredov, “Automatic generation of split-radix 2-4 parallel-pipeline FFT processors: hardware reconfiguration and core optimizations,” 2006 International Symposium on Parallel Computing in Electrical Engineering, pp.181-186.
[14] S. Bouguezel, M. O. Ahmad, M. N. S. Swamy,
“A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Volume 51, Issue 9, 2004, pp.1723- 1732.
[15] W. C. Yeh, C. W. Jen, “High-speed and low-power split-radix FFT.” IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 51, Issue 3, March 2003, pp.864 – 874.
[16] M. D. Ercegovac, T. Lang, “CORDIC algorithm and implementations.” Digital Arithmetic, Morgan Kaufmann Publishers, 2004, Chapter 11.
[17] T. Y. Sung, H. C. Hsin, “Fixed-point error analysis of CORDIC arithmetic for special-purpose signal processors,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E90-A, No.9, Sep. 2007, pp.2006-2013.
[18] “TSMC 0.18 CMOS Design Libraries and Technical Data, v.3.2,” Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, and National Chip Implementation Center (CIC), National Science Council, Hsinchu, Taiwan, R.O.C., 2006.
[19] Cadence design systems:
http://www.cadence.com/products /pages/
default.aspx.
Fig. 1 The proposed 128-point CORDIC- based split-radix FFT processor
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
-1
-1 -1
-1
-1 -1
-1 -1
(n) x
) 8 / (n N x +
) 4 / (n N x +
) 8 / 3 (n N x +
) 2 / (n N x +
) 8 / 5 (n N x +
) 4 / 3 (n N x +
) 8 / 7 (n N x +
j
− j
−
Nn
W
Nn
W3
Nn
W5
Nn
W7
) 1 8 ( +k X
) 3 8 ( +k X
) 5 8 ( +k X
) 7 8 ( +k X
CM 8(1)
CM
8(3) CORDIC
CORDIC CORDIC CORDIC
) 8 (k a
) 2 8 ( +k a
) 4 8 ( +k a
) 6 8 ( +k a
A d d S u b
R e [X] I m [X]
S h i f t e r 2 / S u b
L a t c h L a t c h
L a t c h L a t c h
M u x
] ' I m [ 2 _ 2 ] ' R e [ 2
2 X X
S h i f t e r 2 / S u b
S h i f t e r 4 / S u b S h i f t e r 4 / S u b
16-bit Accumulator
16-bit Reg.
16-bit Shifter
16-bit Shifter/Adder
Nn
θ1 θN5n θN3n θN7n
Control π
2
4
8 16
16
16
16 16 16 16 2 2
8192-point FFT Processor 4096-point FFT Processor 2048-point FFT Processor 1024-point FFT Processor 512-point FFT Processor
256-point FFT Processor 128-point FFT Processor
IP R a d i x 2 S P l i t 2/4
P/S S/P
S P l i t 2/8 S P l i t 2/8 S P l i t 2/8 S P l i t 2/8
4096/2048/1024/512/256/0*32 Internal Memory
8192/4096/2048/1024/512/256/128*32 External Memory
Fig. 2 Data flow of the butterfly computation of the modified split-radix 2/8 FFT
Fig. 3 Constant multiplier (CM) architecture for the modified split-radix 2/8 FFT
Fig. 4 Proposed ROM-free twiddle factor generator for 128-point FFT
Fig. 5 Hardware architecture of the 128/256/512/1024/2048 /4096 /8192-point FFT processor
Fig. 6 Layout view of the 8192-point FFT processor
Fig. 7 Log-log plot of the CORDIC computations versus the number of FFT points
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Full-Twiddle Factor ROM
CORDIC Twiddle Factor Generator
ROM-free Twiddle Factor Generator (Sung, Hsin and Cheng, 2008) 8192-Point ROM
bit 16 K 4 ×
11-bit Adder 11-bit Shifter
16-bit CORDIC 16-bit Shifter 16-bit Adder bit
K 18
~ ~150gates ~50gates ~90gates ~200gates
16-bit Accumulator 16-bit Shifter 16-bit Shifter/Adder gates 2 200 2 90
~ × + × gates
0 9
~ 200gates
~
16-bit Register gates 32
~
1bit~1gate
(T. Y. Sung, 2006) [1]
FFT Size Core Area Power Consumption
Clock Rate 128-point 2.28mm2
80mW 200MHz 256-point 2.37mm2
84mW 200MHz 512-poiint 2.49mm2
88mW 200MHz 1024-point 2.62mm2
94mW 200MHz 2048-point 2.81mm2
99mW 200MHz 4096-point 3.10mm2
106mW 200MHz 8192-point 3.62mm2
117mW 200MHz Table 1 Hardware requirements of the full-ROM, the CORDIC
twiddle factor generator [1], and the ROM-free twiddle factor generator
Table 2 Core areas, power consumptions, clock rates of 128-, 256-, 512-, 1024-, 2048-, 4096- and 8192-point FFT Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing