Design issue - Proposed architecture - 可重組態之低功率快速傅立葉轉換處理器

Chapter 3. Proposed architecture

3.1. Design issue

We discuss some idea of hardware design in this section. Table 3.1 shows the control code for different Size of FFT. In order to achieve simple control circuit, control code adopts four bit instead of three bit. The most significant bit (MSB) of control code is the Radix-2_Flag. When the Radix-2_Flag is high, the radix-2/4 butterfly operator executes radix-2 butterfly computation. The last three bit of control code is related to initial stage, as shown in eq. 3.1. According to Fig. 3.2, when N is equal to 128, 512, 2048 and 8192, butterfly operator need an extra step of radix-2 computation first.

Table 3.1 Different size of FFT and the corresponding control code

(3.1)

Figure 3.2 Control of signal flow

3.2. Radix-2/4 Butterfly

The proposed radix-2/4 butterfly operator is shown in Fig. 3.3. Radix-4

computation is the common mode. Radix-2 mode is enabled when N is equal to 128, 512, 2048 and 8192. The proposed architecture can execute two radix-2 computations at a time. So the throughput is the double of conventional architecture.

Figure 3.3 Radix-2/4 butterfly operator

3.3. Multiplier module

W

Figure 3.4 The proposed multiplier module

Fig. 3.4 shows the proposed multiplier module. It consists of three complex multiplier and several multiplexers. Exactly as we discussed in chapter 2.2.1,it has no switching activity at stage 010, as shown in Fig. 3.5. Fig. 3.5(a), (b), (c) and (d) represent the state of execution of data sequence 1, 2, 3, and 4 at stage 010, respectively.

6 Figure 3.5 The behavior of multiplier module at stage 010

3.4. Phase Compensator

Figure 3.6 Phase compensator

Fig 3.6 shows the architecture of phase compensator. The purpose is to recover the phase of outputs from multiplier module since the modified coefficients are fed into the multiplier module. From eq. 2.3, we assume the modified coefficient,

W ,

_N^C^' is fed into the multiplier module. We derived the output of phase compensator as follows:

Assume

input = X W

* _N^C^'

1 *1 * _N^C'

out = input = X W

' 3 / 4

2 * * _N^C ^N

out = input j = X W

⁺

' / 2

3 * ( 1) * _N^C ^N

out = input − = X W

⁺

' / 4

4 * ( ) * _N^C ^N

out = input − = j X W

⁺

' ' / 4 ' / 2 ' 3 / 4

*{ , , , }

C C N C N C N

N N N N

C N

output X W W W W

X W

+ + +

=

3.5. Memory Address Assignment

We adopt the in-place memory addressing scheme for the radix-4 FFT

algorithm [8]. For the concurrent read and write operations, the memory is partitioned into four banks. Table 3.2 shows the address assignment for a 16-point FFT.

Table 3.2 Address assignment for a 16-point FFT

According to this table, four inputs can be read from different banks and four outputs can be written to different banks for all butterfly computation of 16-poin FFT.

SEL is the selection of memory banks. In our design, we use several adders to implement, as shown in eq. 3.2.

(3.2)

This memory assignment strategy can be extended for long-length point FFT.

Table 3.3 shows the address assignment for a 64-point FFT. We can see table 3.2 is a sub-block in table 3.3. It means that the memory assignment strategy is adaptable for reconfigurable design.

Table 3.3 Address assignment for a 64-point FFT

Chapter 4. Simulation and Performance Analysis

In this chapter we discuss simulation and verification with ideal model which is built by MATLAB. The ideal model can provide a complete mathematical and

simulation environment. The design flow is illustrated in Fig 4.1, and this is a kind of waterfall models which is worked well up to 100k gate count design.

After RTL code is developed, we verify the ideal model and RTL model to check if they have the same function. There are two ways for implement design after

function verification, one is synthesis for ASIC, the other is FPGA prototyping. FPGA prototyping is usually used to verify design, because FPGA can verify the behavior of real hardware. We synthesize the RTL design to Gate-level netlist by reasonable design constrain after the FPGA prototyping. The synthesis report shows the timing, area. We also run Gate-level simulation to double check the function and then run PrimePower by waveform from Gate-level simulation to analyze power consumption.

Figure 4.1 Design and verification flow

4.1. Simulation

Figure 4.2 Simulation environment

Fig. 4.2 shows the RTL simulation environment that determines the signal to quantization noise ratio (SNQR) between the ideal FFT and a Fixed-point FFT model.

Ideal FFT is built by MATLAB and practical FFT is our RTL design. The input data are 100 random patterns with 16 bit word length for each Size of FFT. The definition of SNQR is

Table .4.1 shows the mean square error for each Size of FFT. Fig. 4.3 is plotted according to table 4.1. We can see that mean square error increases when the size of FFT increases. The SNQR has the same conclusion, as shown in Fig 4.4.

Table 4.1 Mean square error for each size of FFT

Figure 4.3 Mean square error versus different size of FFT

64 128 256 512 1024 2048 4096 8192

20 40 60 80 100 120 140

FFT Size

S N Q R (d B)

Figure 4.4 SNQR of each Size of FFT

4.2. FPGA prototyping

Figure 4.5 XILINX VIRTEX-4 FPGA

Fig. 4.5 shows the used FPGA. It is convenient to verify the design because it can connect to computer through USB. Patterns are fed to FPGA by computer and the results of computation return back to monitor. Fig 4.6 shows the waveform of the design in FPGA. The first four signals are inputs which are made by MATLAB, and the rest are outputs which are delivered to computer by FPGA.

Figure 4.6 Waveform of the proposed design in FPGA

Figure 4.7 Verification flow of FPGA

Fig 4.7 shows the verification flow of FPGA. The sine wave which is sampled by N-point is fed to FPGA, and then we export the output file from FPGA to MATLAB.

The output file is converted to waveform by MATLAB in order to verification easily.

The following figures show the verifications for each Size of FFT according to Fig 4.7.

We feed a sine wave in image part and observe the output from FPGA. We decreased the magnitude of sine wave for long-length point FFT like 4096 and 8192 to avoid overflow. So the figures have a little distortion. Fig 4.16 shows that the synthesis report of proposed architecture from Xiline ISE.

N=64:

Input signal

0 10 20 30 40 50 60 70

Real part of output

0 10 20 30 40 50 60 70

Image part of output

Figure 4.8 Verification of 64-point FFT

N=128

Input signal

Real part of output

Image part of output

Figure 4.9 Verification of 128-point FFT

N=256

0 50 100 150 200 250 300

-0.015 -0.01 -0.005 0 0.005 0.01 0.015

Input signal

0 50 100 150 200 250 300

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Real part of output

0 50 100 150 200 250 300

-0.01 -0.005 0 0.005 0.01 0.015 0.02 0.025

Image part of output

Figure 4.10 Verification of 256-point FFT

N=512

Input signal

0 100 200 300 400 500 600

Real part of output

0 100 200 300 400 500 600

Image part of output

Figure 4.11 Verification of 512-point FFT

N=1024

0 200 400 600 800 1000 1200

-4

Input signal

0 200 400 600 800 1000 1200

-2

Real part of output

0 200 400 600 800 1000 1200

-0.035

Image part of output

Figure 4.12 Verification of 1024-point FFT

N=2048

0 500 1000 1500 2000 2500

-2

Input signal

0 500 1000 1500 2000 2500

-2

Real part of output

0 500 1000 1500 2000 2500

-0.07

Image part of output

Figure 4.13 Verification of 2048-point FFT

N=4096

0 500 1000 1500 2000 2500 3000 3500 4000 4500 -1

Input signal

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 -0.8

Real part of output

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 -1

Image part of output

Figure 4.14 Verification of 4096-point FFT

N=8192

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 -4

Input signal

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 -0.4

Real part of output

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 -0.6

Image part of output

Figure 4.15 Verification of 8192-point FFT

Fig 4.16 The synthesis report of proposed architecture from Xilinx ISE

4.3. Synthesis Reports and Power Analysis

In this section, we discuss the implementation of the proposed FFT design. The proposed design synthesized to UMC 0.18um CMOS standard cell technology library with Synopsys Design Compiler. The gate count of the proposed architecture is shown in Table 4.2.

Table 4.2 Synthesis report of proposed architecture

Gate count Size

Proposed FFT

(not include memory)

⁵³³⁰⁶ ^x

RAM 4*483248 4*32*2048

Coefficient Rom1

⁶²⁰¹ ^32*1024

Coefficient Rom2

¹⁰⁴¹⁷ ^32*2048

Coefficient Rom3

⁶⁵⁰² ^32*1024

Table 4.3 shows the power consumptions for each Size of FFT. The report of power consumptions is obtained by the waveform of gate level simulation which is fed to PrimePower. The whole time of power consumption measurement includes the time which data write into memory and the time of computation and the time which data read from memory to output. According to table 2.8, the reduction of switching activity decreases as the increasing size of the FFT. Table 4.3 proves the conclusion of chapter 2.3. Fig. 4.17 is plotted according to table 4.3.

Table 4.3 Power consumption for each Size of FFT

Figure 4.17 Power consumption versus each size of FFT

4.4. Comparisons

The following tables shows the comparisons of gate count, power consumption and latency.

Table 4.4 Comparisons of gate count and power consumption

Table 4.5 Comparisons of latency

Chapter 5. Conclusions and Future works

In this thesis, we propose a low power reconfigurable FFT/IFFT processor. The proposed memory based FFT processor can be configured as from 64-point to 8192-point. A low power design with minimum switching activity is proposed.

Chapter 4.3 shows that it is efficient for short-length point FFT. The maximum power consumption is 75.82 at power supply 1.8 V. The gate count of the proposed architecture without memory is 53306 under Synopsys Design Complier with UMC 0.18um library.

A low power reconfigurable FFT is presented in this thesis. The minimum switching activity is efficient to reduce the dynamic power consumption. However, this technique restricts the flexible of the FFT processor. In the future, we can try to use the digital signal processor to implement this technique.

Bibliography

[1] A. V. Oppenheim and R. W. Schafer, DISCRETE-TIME SIGNAL PROCESSING, New Jersey, 2

^nd

Edition, Prentice-Hall, 1999.

[2] J.W. CooIey and J.W. Tukey, “An algorithm for the machine calculation of complex Fourier series. Math. Comp., 19:297-301. April 1965.

[3] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice-Hall. 1975.

[4] E. H. Wold, A. M. Despain, “Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementation”, IEEE Trans. Comput., Vol. 33, no. 5, pp. 414—426, May 1984.

[5] Jen-Ming Wu, and Yang-Chun Fan,"“Coefficient Ordering Based Pipelined FFT/IFFT with Minimum Switching Activity for Low Power OFDM

Communication”, IEEE Int’t Symposium on Consumer Electronics, St.

Petersburg, Russia, 2006.

[6] “A low-power VLSI architecture for a shared-memory FFT processor with a mixed-radix algorithm and a simple memory control scheme”, Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International

Symposium

[7] M. Hasan, T. Arslan and J.S. Thompson “A Novel Coefficient Ordering based

Low Power Pipelined Radix-4 FFT Processor for Wireless LAN Applications”

IEEE transactions on consumer electronics, Vol. 49, No. 1, Feb. 2003.

[8] L. G. Johnson ‘Conf1ict free memory addressing for dedicated FFT hardware.”

IEEE Trans. Circuits Syst. II, vol. 39. pp. 312-316. May 1992.

[9] Xiaojin Li, Zongsheng Lai, Jianmin Cui “A Low Power and Small Area FFT Processor for OFDM Demodulator” IEEE Transactions on Consumer Electronics, Vol. 53, No. 2, MAY 2007

[10] Chin-Long Wey, Wei-Chien Tang, and Shin-Yo Lin “Efficient Memory-Based FFT Architectures for Digital Video Broadcasting (DVB-T/H)” VLSI Design, Automation and Test, 2007. VLSI-DAT 2007.

[11] Guihua Liu,Quanyuan Feng” ASIC Design of Low-power Reconfigurable FFT Processor” ASIC, 2007. ASICON '07.

Vita

姓名 : 黃謙若性別 : 男

出生地 : 台北市

生日 : 民國七十三年八月二十日

地址 : 台北市北投區復興四路 99 號 3 樓

學歷 : 國立交通大學電子工程研究所碩士班 2006/09~2008/06

國立中興大學電機工程學系 2002/09~2006/06

國立師範大學附屬高級中學 1999/09~2002/06

論文題目 : A Low Power Reconfigurable FFT Processor with Minimum Switching Activity

可重組態之低功率快速傅立葉轉換處理器

在文檔中可重組態之低功率快速傅立葉轉換處理器 (頁 38-0)