Experimental Results - 高效能且低成本之可參數化快速傅利葉轉換硬體產生器

Chapter 4 Experiments

4.2 Experimental Results

Figure 41 shows the relation between throughput and area for N=256, where area indicates the number of gate counts. For Pease, three architectures are generated, from left to right, the parameters are j=1, j=2, and j=4respectively. For all architectures, we assume k=1. For R2MDC, three architectures are generated, from left to right, the parameters are ¹ same as the area of R2MDC under the same throughput. From Table 3, we can find that the hardware requirement is also the same under the same throughput. Figure 42 shows the FFT Length N=256FFT Length N=256 FFT Length N=256

Area (gate counts)Area (gate counts)Area (gate counts)Area (gate counts)

Pease R2MDC

FFT Length N=1024

0 0.0002 0.0004 0.0006 0.0008 0.001 0.0012

KKK

AreaAreaAreaArea Pease

R2MDC

Figure 42 Relation between throughput and area for Pease and R2MDC, N=1024

Figure 43 shows the relation between throughput and area for N=256. For Pease, five architectures are generated, from left to right, the parameters are j=8, j=16,…, and j=128 respectively. For R2²MDC, six architectures are generated, from left to right, the parameters are t=1, t=2,…, and t=32 respectively. We can find that the area of Pease is greatly larger than the area of R2²MDC vertical expansion architectures under the same throughput because of the great number of multipliers usage of Pease. It can be also seen in Table 3. Figure 44 shows the relation between throughput and area for N=1024. For Pease, five architectures are generated, from left to right, the parameters are j=8, j=16 ,…, and j=128 respectively. For R2²MDC, five architectures are generated, from left to right, the parameters are t=1,

t ,…, and t =16 respectively. We can find that the area of Pease is still greatly larger than the area of R2²MDC vertical expansion architectures under the same throughput.

FFT Length N=256 FFT Length N=256FFT Length N=256 FFT Length N=256 Area (gate counts)Area (gate counts)Area (gate counts)Area (gate counts)

Pease R22MDC

Figure 43 Relation between throughput and area for Pease and R2²MDC, N=256

FFT Len gth N=10 24

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035

KK KK

Throug hp ut Throug hp utThroug hp ut Throug hp ut

Area (gate counts)Area (gate counts)Area (gate counts)Area (gate counts)

Pease R22MDC

Figure 44 Relation between throughput and area for Pease and R2²MDC, N=1024

Figure 45 shows the joint result of Figure 41 and Figure 43. And Figure 46 shows the

FFT Length N=256 Area (gate counts)Area (gate counts)Area (gate counts)Area (gate counts)

Pease

R2MDC/R22MDC

Figure 45 Relation between throughput and area for Pease and R2MDC/R2²MDC, N=256

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035

Area (gate counts)Area (gate counts)Area (gate counts)Area (gate counts)

K FFT Length N=1024FFT Length N=1024 FFT Length N=1024

Pease

R2MDC/R22MDC

Figure 46 Relation between throughput and area for Pease and R2MDC/R2²MDC, N=1024

Compared with the Pease architecture, for the length of 256 and 1024 cases, the generated FFT processor saves about 30.8% area under throughput constraints, as shown in Table 4.

Table 4 Area comparison FFT

Length (N)

Pease R2²MDC Area Reduction

Percentage (%) Throughput Area Throughput Area

256

0.0078 190524 0.0078 128033 32.8

0.0156 307040 0.0156 202469 34.06

0.0313 533357 0.0313 350469 34.29

0.0625 1044244 0.0625 641511 38.57

1024

0.0016 434154 0.002 313669 27.75

0.0031 565576 0.0039 417760 26.14

0.0063 825269 0.0078 623772 24.42

0.0125 1314636 0.0156 1029338 21.70

Chapter 5 Conclusions and Future Work

The FFT processor is an important computing block in communication and signal processing systems. To improve productivity and shorten time-to-market, an automatic FFT generator can be used to design a specified FFT processor. In this thesis, we propose a parameterizable FFT generator with two approaches to make good design trade-off between throughput and area under the design constraints. First, the vertical expansion approach parallels the datapath to increase the throughput.

Second, the horizontal compression approach folds the datapath to reduce the hardware usage. Besides, only the best FFT architecture is generated under the user-specified throughput constraint to reduce the computation time in our proposed FFT generator. Compared with the Pease architecture, for the length of 256 and 1024 cases, the generated FFT processor saves about 30.8% area under throughput constraints.

Various FFT architectures are proposed in literature. It can be implemented into our proposed FFT generator. In the future, more FFT algorithms such as the R2³MDC FFT algorithm, mixed-radix FFT [17] algorithm will be considered to enlarge the search space. Besides, the bitwidth optimization techniques proposed in [18] will also be considered.

Reference

[1] J. W. Cooley and J. W. Turkey, “An Algorithm for Machine Computation of Complex Fourier Series,” Math. Computation, Vol. 19, pp. 297-301, April 1965.

[2] L. R. Rabiner and B. Gold. Theory and Application of Digital Signal Processing.

Prentice-Hall, Inc., 1975.

[3] E. H. Wold and A. M. Despain, “Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementation,” IEEE Trans. Computers, vol. 33, no. 5, pp. 414-426, May 1984.

[4] A.M. Despain. “Fourier Transform Computer using CORDIC Iterations,” IEEE Trans. Comput., C-23(10):993-1001, Oct.1974.

[5] S. He and M. Torkelson, “A New Approach to Pipeline FFT Processor,” in Proc.

10^th Int’l Parallel Processing Symp. (IPPS ’96), pp.766-770, 1996.

[6] R. Storn. “Radix-2 FFT-pipeline Architecture with Reduced Noise-to-signal Ratio,” IEE Proceedings- Vision, Image and Signal Processing, 141:81-86, 1994.

[7] S. He and M. Torkelson, "Designing Pipeline FFT Processor for OFDM (de)Modulation", International Symposium on Signals, Systems, and Electronics, pp. 257- 262, Oct. 1998.

[8] P. Duhamel, H. Hollmann, “Split Radix FFT Algorithm,” Electronics Letters, vol.

20, pp.14-16, January 1984.

[9] P. Duhamel, and H. Hollmann, “Split Radix FFT Algorithm,” Electronics Letters, vol. 20, pp. 14-16, Jan. 5, 1984.

[11] G. Nordin, P. A. Milder, J. C. Hoe, and M. Püschel, “Automatic Generation of Customized Discrete Fourier Transform IPs,” In Proc. of ACM/IEEE Design Automation Conf., pp. 471-474, 2005.

[12] P. A. Milder, M. Ahmad, J.C. Hoe, and M. Püschel, “Fast and Accurate Resource Estimation of Automatically Generated Custom DFT IP Cores,” In Proc. of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 211-220 2006.

[13] P. A. Milder, F. Franchetti, J. C. Hoe, and M. Püschel, “Formal Datapath Representation and Manipulation for Implementing DSP Transforms,”In Proc.

of ACM/IEEE Design Automation Conf., pp. 385-390, 2008.

[14] J. Takala, T.Jarvinen, P. Salmela, and D. Akopial. Multi-port Interconnection Networks for Radix-r Algorithms. In Proc. IEEE International Conference Acoustics, Speech, Signal Processing, pp. 1177-1180, 2001.

[15] Synopsys DesignWare[Online], Available: http://www.synopsys.com . [16] Matlab [Online], Available: http://www.mathworks.com .

[17] R.C. Singleton, “An Algorithm for Computing the Mixed Radix Fast Fourier Transform,” IEEE Trans. on AudioElectroacoust., vol. 1, no. 2, pp. 93-103, June 1969.

[18] C.Y. Wang, C.B. Kuo, and J.Y. Jou, “ Hybrid Word-Length Optimization Methods of Pipelined FFT Processors”, IEEE Trans. Computers, vol. 56, no. 8, pp. 1105- 1118, Aug. 2007.

[19] P.D. Welch, “A Fixed-Point Fast Fourier Transform Error Analysis,” IEEE Trans. Audio Electroacoustics, vol. 17, pp. 151-157, June 1969.

[20] A. Pomerleau, H.L. Buijs, and M. Fournier, “A Two-Pass Fixed Point Fast Fourier Transform Error Analysis,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 25, pp. 582-585, Dec. 1977.

在文檔中高效能且低成本之可參數化快速傅利葉轉換硬體產生器 (頁 43-0)