5 Conclusion - 行政院國家科學委員會專題研究計畫成果報告

Wavelet transform has been adopted by JPEG2000 as the underlying method to decompose an image into subbands with orientation selectivity. It provides many desirable properties, e.g. multiresolution analysis, high correlation across wavelet subbands of the same orientation, and energy clustering within each subband; these properties are suitable for the image compression applications. In EBCOT, all the code blocks of an image are coded to generate a set of code streams with their respective contributions to the decoded image, based on which the optimal code stream can be obtained by concatenating the suitably truncated code streams through the PCRD algorithm.

As some code blocks, which are less important, are not needed for the optimal decoded image at a given bit rate, waste of computational power and memory space may result. Furthermore, implementation of the PCRD algorithm is one of the crucial issues. To overcome the above-mentioned problems, a simple context-based rate distortion estimation (CBRDE) is proposed to arrange the scanning order of code blocks in an adaptive manner. To avoid transmitting the side information regarding the scanning order of code blocks from encoder to decoder, the proposed CBRDE is based on the MQ table, which is available at both encoder and decoder. The second tier of EBCOT, i.e. PCRD can therefore be replaced by the CBRDE-based adaptive block ordering (ABO).

Experimental results show that the rate distortion curves are almost convex.

Recall that for each code block, the coding procedure proceeds in distinct passes. Thus, the proposed image coding with adaptive block ordering

ordering, however, at the cost of increasing complexity from the implementation point of view.

References:

[1] J.M.Shapiro,“Embedded ImageCoding Using Zero-TreesofWaveletCoefficients,”IEEE Trans. On Signal Processing, vol. 40, pp. 3445-3462, 1993.

[2] A.Said and W.A.Pearlman,“A New, Fast, and Efficient Image Codec Based on Set Partitioning in HierarchicalTrees,”IEEE Trans.On Circuits Syst. Video Tech. vol. 6, pp. 243-250, 1996.

[3] W. A. Pearlman, A. Islam, N. Nagaraj, and A.

Said,“Efficient,Low Complexity ImageCoding With a Set-Partitioning Embedded Block Coder,” IEEE Trans. On Circuits Syst. Video Tech. vol.

14, pp. 1219-1235, Nov., 2004.

[4] D.Taubman,“High PerformanceScalableImage Compression with EBCOT,” IEEE Trans. On Image Processing, vol. 9, pp. 1158-1170, July, 2000.

[5] A. Skodras, C. Christopoulos, and T. Ebrahimi,

“The JPEG 2000 still image compression standard,”IEEE SignalProcess.Mag., vol. 18, pp.

36-58, September, 2001.

[6] H.-C. Fang, Y.-W. Chang, T.-C. Wang, C.-T.

Huang, and L.-G. Chen, “High-Performance JPEG 2000 Encoder with Rate-Distortion Optimization,”IEEE Trans.On Multimedia, vol.

8, no. 4, pp. 645-653, August. 2006.

Q BPC MQ

Encoder Rate control

EBCOT algorithm WT

Image

Code stream

Fig. 1 Block diagram of JPEG2000 Encoder

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

0 50 100 150 200 250 0

20 40 60 80 100 120 140

true bit plane 7

(a)

0 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300 350

true bit plane 7

(b)

Fig. 2 Performance of CBRDE applied to Lena image; (a) horizontal axis: trueD, vertical axis:

estimated ^E

  ^

^D ; (b) horizontal axis: trueR , vertical axis: estimatedE

  

R .

Fig. 3 Proposed image encoder using EBC with ABO

Fig. 4 Proposed image decoder using EBC with ABO

(a)

(b)

Fig. 5 Test images (a) Barbara (b) Fingerprint

DeQ BPC

CBRDE ABO

EBC

IDWT

Decoded image

Q BPC MQ

CBRDE ABO

EBC DWT

Image Bit-stream

Bit-stream

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

0.2 0.3 0.4 0.5 0.6 0.7 20

40 60 80 100 120 140 160 180 200

bpp

mse

Fig. 6 Rate distortion curves of Barbara image by EBC with the CBRDE-based ABO (dashed line) and EBC with a fixed scan order (solid line); Vertical axis: mean square error (MSE);

Horizontal axis: bit rate (bpp).

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100

200 300 400 500 600

bpp

mse

Fig. 7 Rate distortion curves of Fingerprint image by EBC with the CBRDE-based ABO (dashed line) and EBC with a fixed scan order (solid line); Vertical axis: mean square error (MSE); Horizontal axis: bit rate (bpp).

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Reconfigurable VLSI Architecture for FFT Processor

Tze-Yun Sung

Department of Microelectronics Engineering

Chung Hua University 707, Sec. 2, Wufu Road Hsinchu City 300-12, Tawan

bobsung@chu.edu.tw

Hsi-Chin Hsin

Department of Computer Science and Information Engineering

National United University Miaoli 36003, Taiwan

hsin@nuu.edu.tw

Abstract: - This paper presents a CORDIC (Coordinate Rotation Digital Computer)-based split-radix fast Fourier transform (FFT) core for OFDM systems, for example, Ultra Wide Band (UWB), Asymmetric Digital Subscriber Line (ADSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting – Terrestrial (DVB-T), Very High Bitrate DSL (VHDSL), and Worldwide Interoperability for Microwave Access (WiMAX).

High-speed 128/256/512/1024/ 2048/4096/8192-point FFT processors have been implemented by 0.18 (1p6m) at 1.8V, in which all the control signals are generated internally. These FFT processors outperform the conventional ones in terms of both power consumption and core area.

Key-Words: - IP, FFT, split-radix, CORDIC, OFDM.

1 Introduction

High-performance fast Fourier transform (FFT) processor is needed especially for real-time digital signal processing applications. Specifically, the computation of discrete Fourier transform (DFT) ranging from 128 to 8192 points is required for the orthogonal frequency division multiplexer (OFDM) of the following standards: Ultra Wide Band (UWB), Asymmetric Digital Subscriber Line (ADSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting – Terrestrial (DVB-T), Very High Bitrate DSL (VHDSL) and Worldwide Interoperability for Microwave Access (WiMAX) [1]-[8]. Thompson [9] proposed an efficient VLSI architecture for FFT in 1983. Wold and Despain [10]

proposed pipelined and parallel-pipelined FFT for VLSI implementations in 1984. Widhe [11]

developed efficient processing elements of FFT in 1997. To reduce the computation complexity, the split-radix 2/4, 2/8, and 2/16 FFT algorithms were proposed in [12]-[15].

As the Booth multiplier is not suitable for hardware implementations of large FFT, we propose the CORDIC-based multiplier. Moreover, we develop a ROM-free twiddle factor generator using simple shifters and adders only [1], which obviates the need to store all the twiddle factors in a large ROM space.

As a result, the proposed CORDIC-based split-radix FFT core with the ROM-free twiddle factor generator is suitable for the wireless local area network

(WLAN) applications. In this paper, a high-performance

128/256/512/1024/2048/4096/8192-point FFT processor is presented for the European and Japanese standards. The remainder of this paper proceeds as follows. In Section II, the split-radix 2/8 FFT algorithm and the CORDIC algorithm are reviewed briefly. In Section III, the reusable IP 128-point CORDIC-based split-radix FFT core is proposed. In Section IV, the hardware implementations of FFT processors are described. The performance analysis is presented in Section V. Finally, the conclusion is given in Section VI.

2 Review of Split-Radix FFT and CORDIC Algorithm

2.1 Split-Radix FFT

The idea behind the split-radix FFT algorithm is to compute the even and odd terms of FFT separately.

The even term of the split-radix 2/8 FFT algorithm is given by

2 ))

( ) ( ( )

( ^/² ¹ _/₂

Nnk N

N W n x n x k

∑

⁻

+ +

= (1)

where ^/²

2 2

/ jN

N e

W = ⁻ ^π and k =0,1,2,....,(N/2)−1. The odd term is as follows:

The National Science Council of Taiwan, under Grant NSC97-2221-E-216-044, and the Chung Hua University, Hsinchu, Taiwan, under Contract CHU-NSC97-2221-E-216-044 supported this work.

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Nnk Nnl l l l

l l

N n

l l

W W W NW n x NW n x

NW n N x n x NW

n x

NW n x NW n x n x l

k X

8 / 8

4 2

4 4

1 8 /

2 4 4

) ) 8 ) ( 7 8 )

( 5

8 ) ( 3 8 ) ( ( ) 8 ) ( 6

8 ) ( 4 8 )

( 2 ) ( ((

) 8 (

−

+ + +

+

+ + + + +

+

+ + +

+

=

+ ∑

(2)

where 1k =0,1,2,....,(N/8)− and l =1,3,5,7. The split-radix 2/8 FFT algorithm, which combined with radix-2 and radix-4 proves effective to develop a reusable IP 128-point FFT core.

2.2 CORDIC Algorithm

The CORDIC algorithm in the circular coordinate system is as follows [16].

) ( 2 ) ( ) 1

(i x i y i

x + = −σ_i ⁻ⁱ (3) )

( 2 ) ( ) 1

(i y i x i

y + = +σ_i ⁻^j (4) )

( ) ( ) 1

(i z i i

z + = −σ_iα (5) i)=tan⁻ 2⁻i

( ¹

α (6)

where σ_i =sign(z(i)) with z( →i) 0 in the rotation mode, and σ_i =−sign(x(i))⋅sign(y(i)) with

0 ) ( →i

y in the vectoring mode. The scale factor:

) (i

k is equal to 1+σ_i²2⁻²ⁱ . After n micro-rotations, the product of the scale factors is given by

∏

⁻

− −

= ¹

0 1 2

1 () 1 2

n i n i

i k

K (7)

Notice that CORDIC in the circular coordinate system with rotation mode can be written by

⎥⎦

⎢ ⎤

⎣

⎥⎡

⎦

⎢ ⎤

⎣

⎡

= −

⎥⎦

⎢ ⎤

⎣

⎡

0 0 0 0

0 0

cos sin

sin cos

y x z z

z K z

y x

n c

n , (8)

where ⎥

⎦

⎢ ⎤

⎣

⎡

0 0

x and ⎥

⎦

⎢ ⎤

⎣

⎡

n n

x are the input vector and the

output vector, respectively, z₀is the rotation angle, and Kc is the scale factor. In [1], the circular rotation computation of CORDIC was used for complex multiplication withe⁻^j^θ, which is given by

⎥⎦

⎢ ⎤

⎣

⎥⎡

⎦

⎢ ⎤

⎣

⎡

= −

⎥⎦

⎢ ⎤

⎣

⎡

] Im[

] Re[

cos sin

sin cos

] Im[

] Re[

' '

X X X

θ θ

θ (9)

3 Reusable IP 128-Point CORDIC-Based Split-Radix FFT Core

Figure 1 shows the proposed 128-point CORDIC-based split-radix FFT processor, which can be used as a reusable IP core for various FFT with

ROM-free twiddle factor generator are used. In addition, an internal (128 32-bit) SRAM is used to store the input and output data for hardware efficiency, through the use of the in-place computation algorithm [1]

3.1 CORDIC-Based Split-Radix 2/8 FFT Processor

For the butterfly computation of the proposed CORDIC-based split-radix 2/8 FFT processor, sixteen complex additions, two constant multiplications (CM), and four CORDIC operations are needed, as shown in Figure 2. The CORDIC algorithm has been widely used in various DSP applications because of the hardware simplicity.

According to equation (9), the twiddle factor multiplication of FFT can be considered a 2-D vector rotation in the circular coordinate system. Thus, CORDIC in the circular coordinate system with rotation mode is adopted to compute complex multiplications of FFT.

The pipelined CORDIC arithmetic unit can be obtained by decomposing the CORDIC algorithm into a sequence of operational stages. In [17], we derived the error analysis of fixed-point CORDIC arithmetic, based on which, the number of the CORDIC stages can be determined effectively. For example, the number of the CORDIC stages is 12 if the overall relative error of 16-bit CORDIC arithmetic is required to be less than¹⁰⁻³ . The pipelined CORDIC arithmetic unit with 12 stages and an additional pre-scalar stage. In which, the pre-calculated scaling factor K_c^≈¹^.⁶⁴⁶⁷⁶ and the Booth binary recoded format leads to 1.101001. The main concern for the design of the CORDIC arithmetic unit is throughput rather than latency. The proposed CORDIC arithmetic unit in terms of gate counts is less than 4 real multipliers significantly. In addition, the power consumption can be reduced significantly by using the proposed CORDIC arithmetic unit; it has been reduced by 30%

according to the report of PrimePower® distributed by Synopsys.

As the twiddle factors: W and ₈¹ W are equal to ₈³ )

1 2 (

2 − j and (1 ) 2

2 + j

− , respectively, a

complex number, say (a +bj), times W or ₈¹ W can ₈³ be written by

)) ( ) 2 ((

)) 2 1 2 ( ( 2 )

(a+bj × − j = a+b + j −a+b (10)

−

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

where 2

2 can be represented as 1.0101010 using the Booth binary recoded form (BBRF). Thus, the CM unit can be implemented by using simple adders and shifters only. Figure 3 shows the pipelined CM architecture, which uses three subtractions/additions and therefore improves on the computation speed significantly.

Based on the above-mentioned CORDIC arithmetic unit and CM unit, the computational circuit and hardware architecture of the CORDIC-based split-radix 2/8 FFT butterfly computation are realized.

As one can see, the pipelined CORDIC arithmetic unit aims at increasing the throughput of complex multiplications..

3.2 ROM-Free Twiddle Factor Generator In the conventional FFT processor, a large ROM space is needed to store all the twiddle factors. To reduce the chip area, a twiddle factor generator is thus proposed. Figure 4 shows the ROM-free twiddle factor generator using simple adders and shifters for 128-point FFT. In which, the 16-bit accumulator is to generate the value 2_nπ for each index n;

1 2^log² ³−

= ^N⁻

n , the 16-bit shifter is to divide 2n^π by N, and the 16-bit shifter/adder is to produce the twiddle factors: θ_N¹ⁿ, θ_N³ⁿ, θ_N⁵ⁿ and θ_N⁷ⁿ. By using the twiddle factor generator, the chip area and power consumption can be reduced significantly at the cost of an additional logic circuit. Table 1 shows the gate counts of the full-ROM storing all the twiddle factors, the CORDIC twiddle factor generator [1] and the ROM-free twiddle factor generator.

4 Implementation of FFT Processors

The 128/256/512/1024/2048/4096/8192- point FFT processors. In which, the radix-2 and split-radix 2/4 butterfly processors [1] using the pipelined CORDIC arithmetic units and twiddle factor generators are implemented; and moreover, two memory banks (4096/2048/1024/512/256/0 ^× 32-bit and 8192/4096/2048/1024/512/256/128 ^× 32-bit) are allocated for increased efficiency by using the in-place computation algorithm [1]. Hardware architecture is shown in Figure 5.

The hardware code written in Verilog^® is running on a workstation with the ModelSim^® simulation tool and Synopsys^® synthesis tool (design compiler). The chips are synthesized by the TSMC 0.18μm 1p6m CMOS cell libraries [18]. The physical circuit is synthesized by the Astro^® tool. The circuits are evaluated by DRC, LVS and PVS [19].

The layout view of the8192-point FFT processor is

shown in Figure 6. The core areas, power consumptions, clock rates of 128-point, 256-point, 512-point, 1024-point, 2048-point, 4096-point and 8192-point FFT processors are shown in Table 2. All the control signals are internally generated on-chip.

The chip provides both high throughput and low gate count.

5 Performance Analysis of The Proposed FFT Architecture

FFT processors used to compute 128/256/512/1024/

2048/4096/8192-point FFT are composed mainly of the 128-point CORDIC-based split-radix 2/8 FFT core; the computation complexity using a single 128-point FFT core is O(N/6) for N-point FFT. The log-log plot of the CORDIC computations versus the number of FFT points is shown in Figure 7. As one can see, the proposed FFT architecture is able to improve the power consumption and computation speed significantly.

6 Conclusion

This paper presents low-power and high-speed FFT processors based on CORDIC and split-radix techniques for OFDM systems. The architectures are mainly based on a reusable IP 128-point CORDIC-based split-radix FFT core. The pipelined CORDIC arithmetic unit is used to compute the complex multiplications involved in FFT, and moreover the required twiddle factors are obtained by using the proposed ROM-free twiddle factor generator rather than storing them in a large ROM space.

The CORDIC-based

128/256/512/1024/2048/4096/8192- point FFT processors have been implemented by 0.18 μm CMOS, which take 395μs , 176.8μs , 77.9μs , 33.6μs , 14μs , 5.5 μs and 1.88μs to compute 8192-point, 4096-point, 2048-poin, 1024-point, 512-point, 256-point and 128-point FFT, respectively.

The CORDIC-based FFT processors are designed by using the portable and reusable Verilog^®. The 128-point FFT core is a reusable intellectual property (IP), which can be implemented in various processes and combined with an efficient use of hardware resources for the trade-offs of performance, area, and power consumption.

References:

[1] T. Y. Sung, “Memory-efficient and high-speed split-radix FFT/IFFT processor based on

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Reg. Memory

128*32 Reg.

Modify Split-Radix 2/8 FFT

Architecture

Controller

8*32 8*32

32 32

16 16

pipelined CORDIC rotations,” IEE Proc.-Vis.

Image Signal Procss., Vol. 153, No. 4, Aug. 2006, pp.405-410.

[2] J. C. Kuo, C. H. Wen, A. Y. Wu,

“Implementation of a programmable 64/spl sim/2048-point FFT/IFFT processor for OFDM-based communication systems,”

Proceedings of the 2003 International Symposium on Circuits and Systems, Volume 2, 25-28 May 2003 pp.II-121 - II-124.

[3] L. Xiaojin, Z. Lai, C. J. Cui, “A low power and small area FFT processor for OFDM demodulator,” IEEE Transactions on Consumer Electronics, Volume 53, Issue 2, May 2007, pp.

274 – 277.

[4] J. Lee, H. Lee, S. I. Cho, S. S. Choi, “A high-speed, low-complexity radix-216 FFT processor for MB-OFDM UWB systems,”

Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, May 2006, pp.

[5] A. Cortes, I. Velez, J. F. Sevillano, A. Irizar, “An approach to simplify the design of IFFT/FFT cores for OFDM systems,” IEEE Transactions on Consumer Electronics, Volume 52, Issue 1, Feb.

2006, pp.26 – 32.

[6] Y. H. Lee, T. H. Yu, K. K. Huang, A. Y. Wu,

“Rapid IP design of variable-length cached-FFT processor for OFDM-based communication systems,” IEEE Workshop on Signal Processing Systems Design and Implementation, Oct. 2006 pp.62-65.

[7] C. L. Wey, W. C. Tang, S. Y. Lin, “Efficient memory-based FFT architectures for digital video broadcasting (DVB-T/H),” 2007 International Symposium on VLSI Design, Automation and Test, 25-27 April 2007, pp.1-4.

[8] Y. W. Lin, H. Y. Liu, C. Y. Lee, “A 1-GS/s FFT/IFFT processor for UWB applications,”

IEEE Journal of Solid-State Circuits, Volume 40, Issue 8, Aug. 2005, pp.1726-1735.

[9] C. D. Thompson, “Fourier transform in VLSI,”

IEEE Transactions on Computers, Vol.32, No. 11, 1983, pp.1047-1057.

[10] E. H. Wold, A. M. Despain, “Pipelined and parallel-pipelined FFT processor for VLSI implementation,” IEEE Transactions on Computers, Vol.33, No. 5, 1984, pp.414-426.

[11] T. Widhe, “Efficient implementation of FFT processing elements,” Linkoping Studies in Science and Technology, Thesis No. 619, Linkoping University, Sweden, 1997.

[12] P. Duhamel, H. Hollmann, “Implementation of

Conference on Acoustics, Speech, and Signal Processing, Volume 10, April 1985, pp.784 – 787.

[13] A .A. Petrovsky, S. L. Shkredov, “Automatic generation of split-radix 2-4 parallel-pipeline FFT processors: hardware reconfiguration and core optimizations,” 2006 International Symposium on Parallel Computing in Electrical Engineering, pp.181-186.

[14] S. Bouguezel, M. O. Ahmad, M. N. S. Swamy,

“A new radix-2/8 FFT algorithm for length-q/spl times/2/sup m/ DFTs,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Volume 51, Issue 9, 2004, pp.1723- 1732.

[15] W. C. Yeh, C. W. Jen, “High-speed and low-power split-radix FFT.” IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume 51, Issue 3, March 2003, pp.864 – 874.

[16] M. D. Ercegovac, T. Lang, “CORDIC algorithm and implementations.” Digital Arithmetic, Morgan Kaufmann Publishers, 2004, Chapter 11.

[17] T. Y. Sung, H. C. Hsin, “Fixed-point error analysis of CORDIC arithmetic for special-purpose signal processors,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.E90-A, No.9, Sep. 2007, pp.2006-2013.

[18] “TSMC 0.18 CMOS Design Libraries and Technical Data, v.3.2,” Taiwan Semiconductor Manufacturing Company, Hsinchu, Taiwan, and National Chip Implementation Center (CIC), National Science Council, Hsinchu, Taiwan, R.O.C., 2006.

[19] Cadence design systems:

http://www.cadence.com/products /pages/

default.aspx.

Fig. 1 The proposed 128-point CORDIC- based split-radix FFT processor

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

-1

-1 -1

-1

-1 -1

(n) x

) 8 / (n N x +

) 4 / (n N x +

) 8 / 3 (n N x +

) 2 / (n N x +

) 8 / 5 (n N x +

) 4 / 3 (n N x +

) 8 / 7 (n N x +

− j

−

W³

W⁵

W⁷

) 1 8 ( +k X

) 3 8 ( +k X

) 5 8 ( +k X

) 7 8 ( +k X

CM 8(1)

8(3) CORDIC

CORDIC CORDIC CORDIC

) 8 (k a

) 2 8 ( +k a

) 4 8 ( +k a

) 6 8 ( +k a

A d d S u b

R e [X] I m [X]

S h i f t e r 2 / S u b

L a t c h L a t c h

M u x

] ' I m [ 2 _ 2 ] ' R e [ 2

2 X X

S h i f t e r 2 / S u b

S h i f t e r 4 / S u b S h i f t e r 4 / S u b

16-bit Accumulator

16-bit Reg.

16-bit Shifter

16-bit Shifter/Adder

θ1 θ_N⁵ⁿ θ_N³ⁿ θ_N⁷ⁿ

Control π

8 16

16 16 16 16 2 2

8192-point FFT Processor 4096-point FFT Processor 2048-point FFT Processor 1024-point FFT Processor 512-point FFT Processor

256-point FFT Processor 128-point FFT Processor

IP R a d i x 2 S P l i t 2/4

P/S S/P

S P l i t 2/8 S P l i t 2/8 S P l i t 2/8 S P l i t 2/8

4096/2048/1024/512/256/0*32 Internal Memory

8192/4096/2048/1024/512/256/128*32 External Memory

Fig. 2 Data flow of the butterfly computation of the modified split-radix 2/8 FFT

Fig. 3 Constant multiplier (CM) architecture for the modified split-radix 2/8 FFT

Fig. 4 Proposed ROM-free twiddle factor generator for 128-point FFT

Fig. 5 Hardware architecture of the 128/256/512/1024/2048 /4096 /8192-point FFT processor

Fig. 6 Layout view of the 8192-point FFT processor

Fig. 7 Log-log plot of the CORDIC computations versus the number of FFT points

Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

Full-Twiddle Factor ROM

CORDIC Twiddle Factor Generator

ROM-free Twiddle Factor Generator (Sung, Hsin and Cheng, 2008) 8192-Point ROM

bit 16 K 4 ×

11-bit Adder 11-bit Shifter

16-bit CORDIC 16-bit Shifter 16-bit Adder bit

K 18

~ ~150gates ~50gates ~90gates ~200gates

16-bit Accumulator 16-bit Shifter 16-bit Shifter/Adder gates 2 200 2 90

~ × + × gates

0 9

~ 200gates

16-bit Register gates 32

1bit~1gate

(T. Y. Sung, 2006) [1]

FFT Size Core Area Power Consumption

Clock Rate 128-point 2.28mm²

80mW 200MHz 256-point 2.37mm²

84mW 200MHz 512-poiint 2.49mm²

88mW 200MHz 1024-point 2.62mm²

94mW 200MHz 2048-point 2.81mm²

99mW 200MHz 4096-point 3.10mm²

106mW 200MHz 8192-point 3.62mm²

117mW 200MHz Table 1 Hardware requirements of the full-ROM, the CORDIC

twiddle factor generator [1], and the ROM-free twiddle factor generator

Table 2 Core areas, power consumptions, clock rates of 128-, 256-, 512-, 1024-, 2048-, 4096- and 8192-point FFT Proceedings of the 9th WSEAS International Conference on Multimedia Systems & Signal Processing

行政院國家科學委員會補助國內專家學者出席國際學術會議報告

98 年 05 月 26 日

在文檔中行政院國家科學委員會專題研究計畫成果報告 (頁 58-67)