Wavelet transform has been adopted by JPEG2000 as the underlying method to decompose an image into subbands with orientation selectivity. It provides many desirable properties, e.g. multiresolution analysis, high correlation across wavelet subbands of the same orientation, and energy clustering within each subband; these properties are suitable for the image compression applications. In EBCOT, all the code blocks of an image are coded to generate a set of code streams with their respective contributions to the decoded image, based on which the optimal code stream can be obtained by concatenating the suitably truncated code streams through the PCRD algorithm.
As some code blocks, which are less important, are not needed for the optimal decoded image at a given bit rate, waste of computational power and memory space may result. Furthermore, implementation of the PCRD algorithm is one of the crucial issues. To overcome the above-mentioned problems, a simple context-based rate distortion estimation (CBRDE) is proposed to arrange the scanning order of code blocks in an adaptive manner. To avoid transmitting the side information regarding the scanning order of code blocks from encoder to decoder, the proposed CBRDE is based on the MQ table, which is available at both encoder and decoder. The second tier of EBCOT, i.e. PCRD can therefore be replaced by the CBRDE-based adaptive block ordering (ABO).
Experimental results show that the rate distortion curves are almost convex.
Recall that for each code block, the coding procedure proceeds in distinct passes. Thus, the proposed image coding with adaptive block ordering
ordering, however, at the cost of increasing complexity from the implementation point of view.
References:
[1] J.M.Shapiro,“Embedded ImageCoding Using Zero-TreesofWaveletCoefficients,”IEEE Trans. On Signal Processing, vol. 40, pp. 3445-3462, 1993.
[2] A.Said and W.A.Pearlman,“A New, Fast, and Efficient Image Codec Based on Set Partitioning in HierarchicalTrees,”IEEE Trans.On Circuits Syst. Video Tech. vol. 6, pp. 243-250, 1996.
[3] W. A. Pearlman, A. Islam, N. Nagaraj, and A.
Said,“Efficient,Low Complexity ImageCoding With a Set-Partitioning Embedded Block Coder,” IEEE Trans. On Circuits Syst. Video Tech. vol.
14, pp. 1219-1235, Nov., 2004.
[4] D.Taubman,“High PerformanceScalableImage Compression with EBCOT,” IEEE Trans. On Image Processing, vol. 9, pp. 1158-1170, July, 2000.
[5] A. Skodras, C. Christopoulos, and T. Ebrahimi,
“The JPEG 2000 still image compression standard,”IEEE SignalProcess.Mag., vol. 18, pp.
36-58, September, 2001.
[6] H.-C. Fang, Y.-W. Chang, T.-C. Wang, C.-T.
Huang, and L.-G. Chen, “High-Performance JPEG 2000 Encoder with Rate-Distortion Optimization,”IEEE Trans.On Multimedia, vol.
8, no. 4, pp. 645-653, August. 2006.
Q BPC MQ
Encoder Rate control
EBCOT algorithm WT
Image
Code stream
Fig. 1 Block diagram of JPEG2000 Encoder
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
0 50 100 150 200 250 0
20 40 60 80 100 120 140
true bit plane 7
(a)
0 50 100 150 200 250 300 350 400 450 500
0 50 100 150 200 250 300 350
true bit plane 7
(b)
Fig. 2 Performance of CBRDE applied to Lena image; (a) horizontal axis: trueD, vertical axis:
estimated E
D ; (b) horizontal axis: trueR , vertical axis: estimatedE
R .Fig. 3 Proposed image encoder using EBC with ABO
Fig. 4 Proposed image decoder using EBC with ABO
(a)
(b)
Fig. 5 Test images (a) Barbara (b) Fingerprint
DeQ BPC
MQ
CBRDE ABO
EBC
IDWT
Decoded image
Q BPC MQ
CBRDE ABO
EBC DWT
Image Bit-stream
Bit-stream
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
0.2 0.3 0.4 0.5 0.6 0.7 20
40 60 80 100 120 140 160 180 200
bpp
mse
Fig. 6 Rate distortion curves of Barbara image by EBC with the CBRDE-based ABO (dashed line) and EBC with a fixed scan order (solid line); Vertical axis: mean square error (MSE);
Horizontal axis: bit rate (bpp).
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100
200 300 400 500 600
bpp
mse
Fig. 7 Rate distortion curves of Fingerprint image by EBC with the CBRDE-based ABO (dashed line) and EBC with a fixed scan order (solid line); Vertical axis: mean square error (MSE); Horizontal axis: bit rate (bpp).
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Reconfigurable VLSI Architecture for FFT Processor
Tze-Yun Sung
Department of Microelectronics Engineering
Chung Hua University 707, Sec. 2, Wufu Road Hsinchu City 300-12, Tawan
bobsung@chu.edu.tw
Hsi-Chin Hsin
Department of Computer Science and Information Engineering
National United University Miaoli 36003, Taiwan
hsin@nuu.edu.tw
Abstract: - This paper presents a CORDIC (Coordinate Rotation Digital Computer)-based split-radix fast Fourier transform (FFT) core for OFDM systems, for example, Ultra Wide Band (UWB), Asymmetric Digital Subscriber Line (ADSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting – Terrestrial (DVB-T), Very High Bitrate DSL (VHDSL), and Worldwide Interoperability for Microwave Access (WiMAX).
High-speed 128/256/512/1024/ 2048/4096/8192-point FFT processors have been implemented by 0.18 (1p6m) at 1.8V, in which all the control signals are generated internally. These FFT processors outperform the conventional ones in terms of both power consumption and core area.
Key-Words: - IP, FFT, split-radix, CORDIC, OFDM.
1 Introduction
High-performance fast Fourier transform (FFT) processor is needed especially for real-time digital signal processing applications. Specifically, the computation of discrete Fourier transform (DFT) ranging from 128 to 8192 points is required for the orthogonal frequency division multiplexer (OFDM) of the following standards: Ultra Wide Band (UWB), Asymmetric Digital Subscriber Line (ADSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting – Terrestrial (DVB-T), Very High Bitrate DSL (VHDSL) and Worldwide Interoperability for Microwave Access (WiMAX) [1]-[8]. Thompson [9] proposed an efficient VLSI architecture for FFT in 1983. Wold and Despain [10]
proposed pipelined and parallel-pipelined FFT for VLSI implementations in 1984. Widhe [11]
developed efficient processing elements of FFT in 1997. To reduce the computation complexity, the split-radix 2/4, 2/8, and 2/16 FFT algorithms were proposed in [12]-[15].
As the Booth multiplier is not suitable for hardware implementations of large FFT, we propose the CORDIC-based multiplier. Moreover, we develop a ROM-free twiddle factor generator using simple shifters and adders only [1], which obviates the need to store all the twiddle factors in a large ROM space.
As a result, the proposed CORDIC-based split-radix FFT core with the ROM-free twiddle factor generator is suitable for the wireless local area network
(WLAN) applications. In this paper, a high-performance
128/256/512/1024/2048/4096/8192-point FFT processor is presented for the European and Japanese standards. The remainder of this paper proceeds as follows. In Section II, the split-radix 2/8 FFT algorithm and the CORDIC algorithm are reviewed briefly. In Section III, the reusable IP 128-point CORDIC-based split-radix FFT core is proposed. In Section IV, the hardware implementations of FFT processors are described. The performance analysis is presented in Section V. Finally, the conclusion is given in Section VI.
2 Review of Split-Radix FFT and CORDIC Algorithm
2.1 Split-Radix FFT
The idea behind the split-radix FFT algorithm is to compute the even and odd terms of FFT separately.
The even term of the split-radix 2/8 FFT algorithm is given by
2 ))
( ) ( ( )
2
( /2 1 /2
0
Nnk N
n
N W n x n x k
X
∑
−=
+ +
= (1)
where /2
2 2
/ jN
N e
W = − π and k =0,1,2,....,(N/2)−1. The odd term is as follows:
The National Science Council of Taiwan, under Grant NSC97-2221-E-216-044, and the Chung Hua University, Hsinchu, Taiwan, under Contract CHU-NSC97-2221-E-216-044 supported this work.
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Nnk Nnl l l l
l l
N n
l l
W W W NW n x NW n x
NW n N x n x NW
n x
NW n x NW n x n x l
k X
8 / 8
4 2
4
4 4
1 8 /
0
2 4 4
) ) 8 ) ( 7 8 )
( 5
8 ) ( 3 8 ) ( ( ) 8 ) ( 6
8 ) ( 4 8 )
( 2 ) ( ((
) 8 (
−
−
−
−
=
+ + +
+
+ + + + +
+
+ + +
+
=
+ ∑
(2)
where 1k =0,1,2,....,(N/8)− and l =1,3,5,7. The split-radix 2/8 FFT algorithm, which combined with radix-2 and radix-4 proves effective to develop a reusable IP 128-point FFT core.
2.2 CORDIC Algorithm
The CORDIC algorithm in the circular coordinate system is as follows [16].
) ( 2 ) ( ) 1
(i x i y i
x + = −σi −i (3) )
( 2 ) ( ) 1
(i y i x i
y + = +σi −j (4) )
( ) ( ) 1
(i z i i
z + = −σiα (5) i)=tan− 2−i
( 1
α (6)
where σi =sign(z(i)) with z( →i) 0 in the rotation mode, and σi =−sign(x(i))⋅sign(y(i)) with
0 ) ( →i
y in the vectoring mode. The scale factor:
) (i
k is equal to 1+σi22−2i . After n micro-rotations, the product of the scale factors is given by
∏
∏
−=
− −
=
+
=
= 1
0 1 2
0
1 () 1 2
n i n i
i
i k
K (7)
Notice that CORDIC in the circular coordinate system with rotation mode can be written by
⎥⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎡
= −
⎥⎦
⎢ ⎤
⎣
⎡
0 0 0 0
0 0
cos sin
sin cos
y x z z
z K z
y x
n c
n , (8)
where ⎥
⎦
⎢ ⎤
⎣
⎡
0 0
y
x and ⎥
⎦
⎢ ⎤
⎣
⎡
n n
y
x are the input vector and the
output vector, respectively, z0is the rotation angle, and Kc is the scale factor. In [1], the circular rotation computation of CORDIC was used for complex multiplication withe−jθ, which is given by
⎥⎦
⎢ ⎤
⎣
⎥⎡
⎦
⎢ ⎤
⎣
⎡
= −
⎥⎦
⎢ ⎤
⎣
⎡
] Im[
] Re[
cos sin
sin cos
] Im[
] Re[
' '
X X X
X
θ θ
θ
θ (9)
3 Reusable IP 128-Point CORDIC-Based Split-Radix FFT Core
Figure 1 shows the proposed 128-point CORDIC-based split-radix FFT processor, which can be used as a reusable IP core for various FFT with
ROM-free twiddle factor generator are used. In addition, an internal (128 32-bit) SRAM is used to store the input and output data for hardware efficiency, through the use of the in-place computation algorithm [1]
3.1 CORDIC-Based Split-Radix 2/8 FFT Processor
For the butterfly computation of the proposed CORDIC-based split-radix 2/8 FFT processor, sixteen complex additions, two constant multiplications (CM), and four CORDIC operations are needed, as shown in Figure 2. The CORDIC algorithm has been widely used in various DSP applications because of the hardware simplicity.
According to equation (9), the twiddle factor multiplication of FFT can be considered a 2-D vector rotation in the circular coordinate system. Thus, CORDIC in the circular coordinate system with rotation mode is adopted to compute complex multiplications of FFT.
The pipelined CORDIC arithmetic unit can be obtained by decomposing the CORDIC algorithm into a sequence of operational stages. In [17], we derived the error analysis of fixed-point CORDIC arithmetic, based on which, the number of the CORDIC stages can be determined effectively. For example, the number of the CORDIC stages is 12 if the overall relative error of 16-bit CORDIC arithmetic is required to be less than10−3 . The pipelined CORDIC arithmetic unit with 12 stages and an additional pre-scalar stage. In which, the pre-calculated scaling factor Kc≈1.64676 and the Booth binary recoded format leads to 1.101001. The main concern for the design of the CORDIC arithmetic unit is throughput rather than latency. The proposed CORDIC arithmetic unit in terms of gate counts is less than 4 real multipliers significantly. In addition, the power consumption can be reduced significantly by using the proposed CORDIC arithmetic unit; it has been reduced by 30%
according to the report of PrimePower® distributed by Synopsys.
As the twiddle factors: W and 81 W are equal to 83 )
1 2 (
2 − j and (1 ) 2
2 + j
− , respectively, a
complex number, say (a +bj), times W or 81 W can 83 be written by
)) ( ) 2 ((
)) 2 1 2 ( ( 2 )
(a+bj × − j = a+b + j −a+b (10)
−
−
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
where 2
2 can be represented as 1.0101010 using the Booth binary recoded form (BBRF). Thus, the CM unit can be implemented by using simple adders and shifters only. Figure 3 shows the pipelined CM architecture, which uses three subtractions/additions and therefore improves on the computation speed significantly.
Based on the above-mentioned CORDIC arithmetic unit and CM unit, the computational circuit and hardware architecture of the CORDIC-based split-radix 2/8 FFT butterfly computation are realized.
As one can see, the pipelined CORDIC arithmetic unit aims at increasing the throughput of complex multiplications..
3.2 ROM-Free Twiddle Factor Generator In the conventional FFT processor, a large ROM space is needed to store all the twiddle factors. To reduce the chip area, a twiddle factor generator is thus proposed. Figure 4 shows the ROM-free twiddle factor generator using simple adders and shifters for 128-point FFT. In which, the 16-bit accumulator is to generate the value 2nπ for each index n;
1 2log2 3−
= N−
n , the 16-bit shifter is to divide 2nπ by N, and the 16-bit shifter/adder is to produce the twiddle factors: θN1n, θN3n, θN5n and θN7n. By using the twiddle factor generator, the chip area and power consumption can be reduced significantly at the cost of an additional logic circuit. Table 1 shows the gate counts of the full-ROM storing all the twiddle factors, the CORDIC twiddle factor generator [1] and the ROM-free twiddle factor generator.
4 Implementation of FFT Processors
The 128/256/512/1024/2048/4096/8192- point FFT processors. In which, the radix-2 and split-radix 2/4 butterfly processors [1] using the pipelined CORDIC arithmetic units and twiddle factor generators are implemented; and moreover, two memory banks (4096/2048/1024/512/256/0 × 32-bit and 8192/4096/2048/1024/512/256/128 × 32-bit) are allocated for increased efficiency by using the in-place computation algorithm [1]. Hardware architecture is shown in Figure 5.
The hardware code written in Verilog® is running on a workstation with the ModelSim® simulation tool and Synopsys® synthesis tool (design compiler). The chips are synthesized by the TSMC 0.18μm 1p6m CMOS cell libraries [18]. The physical circuit is synthesized by the Astro® tool. The circuits are evaluated by DRC, LVS and PVS [19].
The layout view of the8192-point FFT processor is
shown in Figure 6. The core areas, power consumptions, clock rates of 128-point, 256-point, 512-point, 1024-point, 2048-point, 4096-point and 8192-point FFT processors are shown in Table 2. All the control signals are internally generated on-chip.
The chip provides both high throughput and low gate count.
5 Performance Analysis of The Proposed FFT Architecture
FFT processors used to compute 128/256/512/1024/
2048/4096/8192-point FFT are composed mainly of the 128-point CORDIC-based split-radix 2/8 FFT core; the computation complexity using a single 128-point FFT core is O(N/6) for N-point FFT. The log-log plot of the CORDIC computations versus the number of FFT points is shown in Figure 7. As one can see, the proposed FFT architecture is able to improve the power consumption and computation speed significantly.
6 Conclusion
This paper presents low-power and high-speed FFT processors based on CORDIC and split-radix techniques for OFDM systems. The architectures are mainly based on a reusable IP 128-point CORDIC-based split-radix FFT core. The pipelined CORDIC arithmetic unit is used to compute the complex multiplications involved in FFT, and moreover the required twiddle factors are obtained by using the proposed ROM-free twiddle factor generator rather than storing them in a large ROM space.
The CORDIC-based
128/256/512/1024/2048/4096/8192- point FFT processors have been implemented by 0.18 μm CMOS, which take 395μs , 176.8μs , 77.9μs , 33.6μs , 14μs , 5.5 μs and 1.88μs to compute 8192-point, 4096-point, 2048-poin, 1024-point, 512-point, 256-point and 128-point FFT, respectively.
The CORDIC-based FFT processors are designed by using the portable and reusable Verilog®. The 128-point FFT core is a reusable intellectual property (IP), which can be implemented in various processes and combined with an efficient use of hardware resources for the trade-offs of performance, area, and power consumption.
References:
[1] T. Y. Sung, “Memory-efficient and high-speed split-radix FFT/IFFT processor based on
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Reg. Memory
128*32 Reg.
Modify Split-Radix 2/8 FFT
Architecture
Controller
8*32 8*32
32 32
16 16
16 16
pipelined CORDIC rotations,” IEE Proc.-Vis.
Image Signal Procss., Vol. 153, No. 4, Aug. 2006, pp.405-410.
[2] J. C. Kuo, C. H. Wen, A. Y. Wu,
“Implementation of a programmable 64/spl sim/2048-point FFT/IFFT processor for OFDM-based communication systems,”
Proceedings of the 2003 International Symposium on Circuits and Systems, Volume 2, 25-28 May 2003 pp.II-121 - II-124.
[3] L. Xiaojin, Z. Lai, C. J. Cui, “A low power and small area FFT processor for OFDM demodulator,” IEEE Transactions on Consumer Electronics, Volume 53, Issue 2, May 2007, pp.
274 – 277.
[4] J. Lee, H. Lee, S. I. Cho, S. S. Choi, “A high-speed, low-complexity radix-216 FFT processor for MB-OFDM UWB systems,”
Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp.
[5] A. Cortes, I. Velez, J. F. Sevillano, A. Irizar, “An approach to simplify the design of IFFT/FFT cores for OFDM systems,” IEEE Transactions on Consumer Electronics, Volume 52, Issue 1, Feb.
2006, pp.26 – 32.
[6] Y. H. Lee, T. H. Yu, K. K. Huang, A. Y. Wu,
“Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems,” IEEE Workshop on Signal Processing Systems Design and Implementation, Oct. 2006 pp.62-65.
[7] C. L. Wey, W. C. Tang, S. Y. Lin, “Efficient memory-based FFT architectures for digital video broadcasting (DVB-T/H),” 2007 International Symposium on VLSI Design, Automation and Test, 25-27 April 2007, pp.1-4.
[8] Y. W. Lin, H. Y. Liu, C. Y. Lee, “A 1-GS/s FFT/IFFT processor for UWB applications,”
IEEE Journal of Solid-State Circuits, Volume 40, Issue 8, Aug. 2005, pp.1726-1735.
[9] C. D. Thompson, “Fourier transform in VLSI,”
IEEE Transactions on Computers, Vol.32, No. 11, 1983, pp.1047-1057.
[10] E. H. Wold, A. M. Despain, “Pipelined and parallel-pipelined FFT processor for VLSI implementation,” IEEE Transactions on Computers, Vol.33, No. 5, 1984, pp.414-426.
[11] T. Widhe, “Efficient implementation of FFT processing elements,” Linkoping Studies in Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.
[12] P. Duhamel, H. Hollmann, “Implementation of
Conference on Acoustics, Speech, and Signal Processing, Volume 10, April 1985, pp.784 – 787.
[13] A .A. Petrovsky, S. L. Shkredov, “Automatic generation of split-radix 2-4 parallel-pipeline FFT processors: hardware reconfiguration and core optimizations,” 2006 International Symposium on Parallel Computing in Electrical Engineering, pp.181-186.
[14] S. Bouguezel, M. O. Ahmad, M. N. S. Swamy,
“A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Volume 51, Issue 9, 2004, pp.1723- 1732.
[15] W. C. Yeh, C. W. Jen, “High-speed and low-power split-radix FFT.” IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 51, Issue 3, March 2003, pp.864 – 874.
[16] M. D. Ercegovac, T. Lang, “CORDIC algorithm and implementations.” Digital Arithmetic, Morgan Kaufmann Publishers, 2004, Chapter 11.
[17] T. Y. Sung, H. C. Hsin, “Fixed-point error analysis of CORDIC arithmetic for special-purpose signal processors,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E90-A, No.9, Sep. 2007, pp.2006-2013.
[18] “TSMC 0.18 CMOS Design Libraries and Technical Data, v.3.2,” Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, and National Chip Implementation Center (CIC), National Science Council, Hsinchu, Taiwan, R.O.C., 2006.
[19] Cadence design systems:
http://www.cadence.com/products /pages/
default.aspx.
Fig. 1 The proposed 128-point CORDIC- based split-radix FFT processor
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
-1
-1 -1
-1
-1 -1
-1 -1
(n) x
) 8 / (n N x +
) 4 / (n N x +
) 8 / 3 (n N x +
) 2 / (n N x +
) 8 / 5 (n N x +
) 4 / 3 (n N x +
) 8 / 7 (n N x +
j
− j
−
Nn
W
Nn
W3
Nn
W5
Nn
W7
) 1 8 ( +k X
) 3 8 ( +k X
) 5 8 ( +k X
) 7 8 ( +k X
CM 8(1)
CM
8(3) CORDIC
CORDIC CORDIC CORDIC
) 8 (k a
) 2 8 ( +k a
) 4 8 ( +k a
) 6 8 ( +k a
A d d S u b
R e [X] I m [X]
S h i f t e r 2 / S u b
L a t c h L a t c h
L a t c h L a t c h
M u x
] ' I m [ 2 _ 2 ] ' R e [ 2
2 X X
S h i f t e r 2 / S u b
S h i f t e r 4 / S u b S h i f t e r 4 / S u b
16-bit Accumulator
16-bit Reg.
16-bit Shifter
16-bit Shifter/Adder
Nn
θ1 θN5n θN3n θN7n
Control π
2
4
8 16
16
16
16 16 16 16 2 2
8192-point FFT Processor 4096-point FFT Processor 2048-point FFT Processor 1024-point FFT Processor 512-point FFT Processor
256-point FFT Processor 128-point FFT Processor
IP R a d i x 2 S P l i t 2/4
P/S S/P
S P l i t 2/8 S P l i t 2/8 S P l i t 2/8 S P l i t 2/8
4096/2048/1024/512/256/0*32 Internal Memory
8192/4096/2048/1024/512/256/128*32 External Memory
Fig. 2 Data flow of the butterfly computation of the modified split-radix 2/8 FFT
Fig. 3 Constant multiplier (CM) architecture for the modified split-radix 2/8 FFT
Fig. 4 Proposed ROM-free twiddle factor generator for 128-point FFT
Fig. 5 Hardware architecture of the 128/256/512/1024/2048 /4096 /8192-point FFT processor
Fig. 6 Layout view of the 8192-point FFT processor
Fig. 7 Log-log plot of the CORDIC computations versus the number of FFT points
Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing
Full-Twiddle Factor ROM
CORDIC Twiddle Factor Generator
ROM-free Twiddle Factor Generator (Sung, Hsin and Cheng, 2008) 8192-Point ROM
bit 16 K 4 ×
11-bit Adder 11-bit Shifter
16-bit CORDIC 16-bit Shifter 16-bit Adder bit
K 18
~ ~150gates ~50gates ~90gates ~200gates
16-bit Accumulator 16-bit Shifter 16-bit Shifter/Adder gates 2 200 2 90
~ × + × gates
0 9
~ 200gates
~
16-bit Register gates 32
~
1bit~1gate
(T. Y. Sung, 2006) [1]
FFT Size Core Area Power Consumption
Clock Rate 128-point 2.28mm2
80mW 200MHz 256-point 2.37mm2
84mW 200MHz 512-poiint 2.49mm2
88mW 200MHz 1024-point 2.62mm2
94mW 200MHz 2048-point 2.81mm2
99mW 200MHz 4096-point 3.10mm2
106mW 200MHz 8192-point 3.62mm2
117mW 200MHz Table 1 Hardware requirements of the full-ROM, the CORDIC
twiddle factor generator [1], and the ROM-free twiddle factor generator
Table 2 Core areas, power consumptions, clock rates of 128-, 256-, 512-, 1024-, 2048-, 4096- and 8192-point FFT Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing