Chapter 5 Conclusion
5.2 Future Research Directions
The presented GDA design approach involves cyclic convolution, its partitioning scheme, and hardware efficient GDA implementation. Since the linear convolution and correlation own similar characteristics to cyclic convolution, if any DSP algorithm can be expressed as cyclic convolution, we can apply the proposed GDA design approach to achieve efficient hardware cost for applications.
On the power consumption point of view, with the approach of address grouping in the proposed GDA design, we will further explore how to decide the set of seed partial products for groups in the memory module of GDA design to have optimal transition activity on the bit-line of memory and achieve lower power consumption. However, since the optimal arrangement of these seed partial products depends on the characteristic of image sequences as well as the distribution of input
data, there should exist an optimal arrangement of seed partial products for each kind of image sequences.
For the application of prime-length DCT, since the prime length cyclic convolution DCT algorithm has less overhead in the pre- and post- processing, the GDA-based variable-length DCT design should be an alternative hardware-efficient DA solution for the shape adaptive discrete cosine transform (SA-DCT) in MPEG-4 codec application. However, since there exist more overhead in non-prime length cyclic convolution DCT, this part of realization in SA-DCT must be combined with the existing DA design or the other efficient design.
Based on the derivation of DSST’s in cyclic convolution, a unified GDA-based design of DSST’s should be a considerable approach for the hybrid system with the requirements of multimedia and communication such as the portable devices. With a commonly used memory module in the GDA design, we can preload the corresponding partial products, and configure the design with different data flow for computations of the involved DSST’s. Actually, with the acceptable overhead in cyclic convolution algorithm, a unified DFT/IDFT should be the possible design for communication applications. However, for a long time, the approaches of general purpose design and dedicated design have been the traded-off between flexibility and hardware cost.
Bibliography
[1] T. M. Pytosh and A. M. Magnasi, “A new parallel 2-D FFT architecture,”
Proc. ICASSP1990, pp. 905-908, 1990.
[2] J. Choi and V. Boriakoff, “A new linear systolic array for FFT computation,”
IEEE Transaction on Circuits and Systems-II: Analog and Digital Signal Processing, Vol. 39, pp. 236-239, April 1992.
[3] J. You and S. S. Wong, “Serial-parallel FFT array processor,” IEEE Transaction on Signal Processing, Vol. 41, pp. 1472-1476, March 1993.
[4] V. Boriakoff, “FFT computation with systolic arrays, a new architecture,”
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Vol. 41, pp. 278–284, April 1994.
[5] H. E. Shousheng and M. Torkelson, “A new approach to pipeline FFT processor,” Proc. IPPS1996, pp. 766–770, 1996.
[6] H. T. Kung, “Why systolic architectures?” Computer Magazines, 15, pp.
37-45, Jan. 1982.
[7] L. W. Chan and M. Y. Chen, “A new systolic array for discrete Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, 36, pp. 1665-1666, Oct. 1988.
[8] J. A. Beraldin, T. Aboulnasr, and W. Steenaart, “Efficient one-dimensional systolic array realization of the discrete Fourier transform,” IEEE Transactions on Circuits and Systems, Vol. 36, No. 1, pp. 95-100, Jan. 1989.
[9] E. Chan and S. Panchanathan, “A VLSI architecture for DFT,” Proc. the 36th Midwest Symposium on Circuits and Systems, Vol. 1, pp. 292-295, 1993.
[10] N. R. Murthy and M. N. S. Swamy, “On the real-time computation of DFT and DCT through systolic architectures,” IEEE Transactions on Signal Processing, Vol. 42, No. 4, pp. 988-991, Apr. 1994.
[11] W. H. Fang and M. L. Wu, “An efficient unified systolic architecture for the computation of discrete trigonometric transforms,” Proc. ISCAS1997, Vol. 3, pp. 2092-2095, 1997.
[12] C. H. Paik and M. D. Fox,”Fast Hartley transform for image processing,”
IEEE Transactions on Med. Imaging, Vol. 7, No. 6, pp. 149-153, 1988.
[13] P. Duhamel and M. Vetterli, “Improved Fourier and Hartley transform algorithms: application to cyclic convolution of real data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASP-35, No. 6, pp.
818-824, 1987.
[14] R. N. Bracewell, “Discrete Hartley transform,” J. Opt. Soc. Amer., Vol.73, No.12, pp. 1832-1835, 1983.
[15] R. N. Bracewell, “The fast Hartley transform,” Proc. IEEE, Vol. 72, No. 8, pp.
1010-1018, 1984.
[16] J. A. C. Bingham, “Multicarrier modulation for data transmission: An idea whose time has come,” IEEE Communications Magazine, pp. 5-14, May 1990.
[17] J. S. Chow, J. C. Tu, and J. M. Cioffi, “A discrete multi-tone transceiver system for HDSL applications,” IEEE Journals on Selected Areas and Communications, Vol. 9, pp. 895-908, Aug. 1991.
[18] C. L. Wang and C. H. Chang, “A Novel DHT-based FFT/IFFT Processor for ADSL Transceivers,” Proc. IEEE International Symposium on Circuits and Systems, Vol. 1, pp. 51-54, 1999.
[19] C. L. Wang and C. H. Chang, “A DHT-based FFT/IFFT Processor for VDSL Transceivers,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 1213-1216, 2001.
[20] C. L. Wang, C. H. Chang, J. L. Fan, and J. M. Cioffi, ”Discrete Hartley transform based multicarrier modulation,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5, pp.
2513-2516, 2000.
[21] H. Bogucka, “Effective implementation of the OFDM/CDMA base station transmitter using joint FHT and IFFT,” Proc. IEEE Workshop on Signal Processing Advances in Wireless Communications, pp. 162-165, 1999.
[22] K. J. R. Liu and C. T. Chiu, ”Unified parallel lattice structures for time-recursive discrete cosine/sine/Hartley transforms,” IEEE Transactions on
Acoustics, Speech, and Signal Processing, Vol. 41, No. 3, pp. 1357-1377, March 1993.
[23] S. B. PAN and R. H. Park, ”Unified Systolic Arrays for computation of Discrete Hartley Transform,” IEEE Trans. on Circuits and Systems Video Technology, Vol. 7, No. 2, pp. 413-419, Apr. 1997.
[24] J. H. Hsiao, L. G. Chen, T. D. Chiueh, and C. T. Chen, “Novel systolic array design for the discrete Hartley transform with high throughput rate,” Proc.
IEEE International Conference on Circuits and Systems, Chicago, IL, U.S.A, pp. 1567-1570, 1993.
[25] J. I. Guo, C. M. Liu, and C. W. Jen, ”A novel CORDIC-based array architecture for the multi-dimensional discrete Hartley transform,” IEEE Transactions on Circuits and Systems, Vol. 42, No. 5, pp. 349-355, 1995.
[26] S. P. Kumar and K. M. M. Prabhu, “Novel CORDIC-based systolic arrays for the DFT and the DHT,” Proc. Asia High Performance Computing on the Information Superhighway, pp. 547-551, 1997.
[27] A. S. Dhar and S. Banerjee, “An array architecture for fast computation of discrete Hartley transform,” IEEE Transactions on Circuits and Systems, Vol.
38, No. 9, pp. 1095-1098, 1991.
[28] W. H. Fang and J. D. Lee, “Efficient CORDIC-based systolic architectures for the discrete Hartley transform,” IEE Proceedings, Computers and Digital Techniques, Vol. 142, No. 3, pp. 201-207, May 1995.
[29] L. W. Chang and S. W. Lee, “Systolic arrays for the discrete Hartley transform,” IEEE Transactions on Signal Processing, Vol. 39, No. 11, pp.
2411-2418, 1991.
[30] J. I. Guo, C. M. Liu, and C. W. Jen, “A novel VLSI array design for the discrete Hartley transform using cyclic convolution,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, SA, Australia, pp. II501-II504, 1994.
[31] J. I. Guo, “A New DA-Based Array for One Dimensional Discrete Hartley Transform,” Proc. 2001 IEEE International Symposium on Circuits and Systems, Sydney, Australia, pp .IV662-IV665, May 2001.
[32] J. I. Guo, “An Efficient Design for One Dimensional Discrete Hartley Transform Using Parallel Additions,” IEEE Transactions on Signal Processing, Vol. 48, No. 10, pp. 2806-2813, Oct. 2000.
[33] J. I. Guo, C. M. Liu, and C. W Jen, “The efficient memory-based VLSI array designs for DFT and DCT,” IEEE Trans. Circuits Syst. II, Vol. 39, pp.
723-733, Oct. 1992.
[34] S.A. WHITE, “Applications of distributed arithmetic to digital sequence processing: a tutorial review,” IEEE ASSP Magazines, Vol. 6, No. 3, pp. 5-19, 1989.
[35] J. P. Choi, S. C. Shin, and J.G. Chung, “Efficient ROM size reduction for distributed arithmetic,” Proc. ISCAS2000, pp. II61-II64, May 2000.
[36] K. Nourji and N. Demassieux, “Optimal VLSI Architecture for Distributed Arithmetic-based Algorithm,” ICASSP1994, Vol. 2, pp. 509-512, 1994.
[37] M. T. SUN, T. C. Chen, and A. M. Gotlieb, “VLSI implementation of a 16 x 16 discrete cosine transform,” IEEE Transactions on Circuits and Systems, CAS-36, pp. 610-617, Apr. 1989.
[38] T. S. Chang, J. I. Guo, and C. W. Jen, “Hardware Efficient DFT Designs with Cyclic Convolution and Subexpression Sharing,” IEEE Transactions on Circuits and Systems II, Vol. 47, No. 9, pp. 886-892, Sep. 2000.
[39] T. S. Chang, C. Chen, and C. W. Jen, “New distributed arithmetic algorithm and its application to IDCT,” IEE Proc. on Circuits, Devices, and Systems, Vol. 146, No. 4, pp. 159-163, 1999.
[40] J. I. Guo, “An Efficient Parallel Adder Based Design for One Dimensional Discrete Fourier Transform,” Proceedings of the National Science Council, ROC, Part A, Vol. 24, No. 3, pp. 195-204, May 2000.
[41] R. C. Agarwal and J. W. Cooley, “New Algorithms for Digital Convolution,”
IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.
ASSP-25, pp. 392-410, Oct. 1977.
[42] M. Teixeira and D Rodriguez, “A class of fast cyclic convolution algorithms based on block pseudocirculant,” IEEE Signal Processing Letters, Vol. 2, No.
5, pp. 92-94, May 1995.
[43] AVANT “0.35 micron 3.3-volt high performance standard cell library,” 1996.
[44] A. P. Chandrakasan and R. W. Brodersen, “Minimizing power consumption in digital CMOS circuit,” Proceeding of the IEEE, Vol. 83, No. 4, pp. 498-523, April, 1995.
[45] T. Xanthopoulos and A. P. Chandrakasan, “A low power DCT core using adaptive bandwidth and arithmetic activity exploiting signal correlations and quantization,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 5, pp.
740-750, 2000.
[46] H. K. Garg, “Digital signal processing algorithms - number theory, convolution, fast fourier transforms, and application,” CRC Press, 1998.
[47] J. E. Volder, “The CORDIC trigometric compution technique,” IRE Tran.
Electron. Comput., Vol. EC-8, pp. 330-334, Sep. 1959.
[48] J. S. Walther, “A unified algorithm for elementary functions,” AFIPS Spring Joint Comput. Conf., pp. 379-385, 1971.
[49] K. Hwang, “Computer Arithmetic principles, architecture, and design,” John Wiley & Sons, Inc., New York, 1979.
[50] A. V. Oppenheim and R. W. Schafer, “Discrete-time Signal Processing,”
Prentice-Hall, Englewood Cliffs, NJ, U.S.A, 1989.
[51] Y. H. Chan and W. C. Siu, “Generalized approach for the realization of discrete cosine transform using cyclic convolution,” Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, U.S.A, Vol. 3, pp. III277-III280, 1993.
[52] J. I. Guo, "Efficient parallel adder based design for one dimensional discrete cosine transform," IEE Proceedings Circuits, Devices, and Systems, Vol. 147, No. 5, pp. 276-282, Oct. 2000.
[53] J. H. Mcclellan, and C. M. Rader, “Number Theory in Digital Signal Processing,” Prentice-Hall, 1979.
[54] B. Arambepola, “Discrete Fourier transform processor based on the prime-factor algorithm,” IEE Proc., 130, Pt. G, No. 4, pp. 138-144, 1983.
[55] H. Lim, and E. E. Swartzlander, “Multidimensional systolic arrays for the implementation of discrete Fourier transforms,” IEEE Transactions on Signal Processing, Vol. 47, No. 5, pp. 1359-1370, May 1999.
[56] C. S. Burrus, “Index mappings for multidimensional formulation of the DFT and convolution,” IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-25, pp. 239-242, 1977.
[57] C. S. Burrus and T. W. Parks, “DFT/FFT and Convolution Algorithms,” John Wiley & Sons, 1985.
[58] H. V. Sorensen, D. L. Jones, C. S. Burrus, and M. T. Heideman, “On Computing the Discrete Hartley Transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, pp. 239-242, Oct. 1985.
[59] C. Chakrabarti and J. Ja’Ja’, “Systolic Architectures for the Computation of the Discrete Hartley and the Discrete Cosine Transforms Based on Prime Factor Decomposition,” IEEE Transactions on Computer, Vol.39, No.11, pp.
1359-1368, Nov. 1990.
[60] B. G. Lee, ”Input and output mappings for a prime-factor-decomposed computation of discrete cosine transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, No. 2, pp. 237-244, Feb. 1989
[61] J. McClellan and C. M. Rader, “There is something much faster than the fast Fourier transform,” Seminar Notes, Oct. 21, 1976.
[62] C. H. Chang, C. L. Wang, and Y. T. Chang, ”Efficient VLSI architectures for fast computation of the discrete Fourier transform and its inverse,” IEEE Transactions on Signal Processing, Vol. 48, No. 11, pp. 3206-3216, Nov.
2000.
[63] S. F. Hsiao and W. R. Shiue, ” Design of low-cost and high-throughput linear arrays for DFT computations: algorithms, architectures, and implementations,”
IEEE Transactions on Circuits and Systems—II: Analog and Digital Signal Processing, Vol. 47, No. 11, pp.1188-1203, Nov. 2000.
[64] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice-Hall, Inc. 1975.
[65] E. H. Wold and A. M. Despain, “Pipeline and Parallel pipeline FFT processors for VLSI implementation,” IEEE Transaction on Computers, Vol. C-33, No. 5, pp. 414-426, 1984.
[66] S. He and M. Torkelson, “Designing Pipeline FFT Processor for OFDM (de)Modulation,”1998 URSI International Symposium on Signals, Systems, and Electronics, pp. 257 -262, 1998.
[67] E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, “A fast single-chip implementation of 8192 complex point FFT,” IEEE Journal of Solid-State Circuits, Vol. 30, No. 3, pp. 300-305, Mar. 1995.
[68] L. Jia, “A new VLSI-oriented FFT algorithm and implementation,” IEEE ASIC Conference, pp. 337-341, 1998.
[69] J. C. Kuo, C. H. Wen, C. H. Lin, and A. Y. Wu, “VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-based Communication Systems,” in Special Issue on “Signal Processing for Broadband Access Systems: Techniques and Implementations,” EURASIP Journal on Applied Signal Processing, No. 13, pp. 1306-1316, Dec. 2003
[70] T. C. Pao, C. C. Chang, and C. K. Wang, “A variable-length DHT-based FFT/IFFT processor for VDSL/ADSL systems,” IEEE Asia-Pacific Conference on Circuits and Systems, pp. 381-384, 2004.
[71] Y. T. Lin, P. Y. Tsai, and T. D. Chiueh, “Low-power variable-length fast Fourier transform processor,” IEE Proc. Comput. Digit. Tech., Vol. 152, No. 4, pp. 499-506, 2005.
[72] B. M. Bass, “A low-power high performance, 1024-point FFT processor,”
IEEE Journal of Solid-State Circuit, Vol. 34, No. 3, pp. 380-387, Mar. 1999.
VITA
Hun-Chen Chen was born in Taiwan in 1961. He received the B.S. and M.S degrees, all in electronics engineering, from National Taiwan Technology University, and National Chiao-Tung University, Taiwan, in 1990 and 1998, respectively. He is currently pursuing the Ph.D. degree in low-cost bit-level DSP VLSI design and its applications to multimedia and communication systems. His research interests include VLSI digital signal processing and computer architecture.