Objectives - 高效能之管線式傅立葉轉換處理器之設計與實現

1 Introduction

1.2 Objectives

The objectives of this thesis are to propose the high effective pipeline processors for the DFT computations in different real-time applications. Four different applications have been taken into consideration, which are recursive based DFT computation in DTMF standard [12-15], multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless LAN (WLAN) [22, 23], long-length based FFT/IFFT computations in digital video broadcasting－handheld (DVB-T) standard [27, 28] and FFT/IFFT/2D-DCT computations in next generation mobile multimedia applications [5-7, 44]. The objective descriptions of these four designs are provided as below:

1. Recursive DFT/IDFT Design: The Goertzel algorithm has been widely applied to the dual tone multi-frequency (DTMF) standards [11]-[16] for voice over packet (VoP) network [17]-[19] to compute the interested spectra, the discrete multitone equalizer of multicarrier modulation system [20]-[31], and speed detection. Considering the state-of-the-art applications, the high channel-density dual-tone detector [17]-[19] is demanded. Some advanced DTMF detectors for the high density VoP network application have been realized by one embedded DSP processor [12]-[14], [17]-[19].

Although, the DSP processor based design could keep the maximum flexibility, it may

not meet the cost effective considerations. On the other hand, the DSP processor based design may lose the advantages of high-throughput, low power, and small area compared with the application-specific integrated circuits (ASIC) designs [45]. In [13], the DSP processor based DTMF detectors needs a large amount of memory to decode only 24 channels, which requires 800 words data memory and 1000 words program memory with 16-bit wordlength for each words. Also, it has to operate on the higher frequency of 24 MHz. For the purpose of optimizing the whole system performance and cost, much research [46]-[53] has concentrated on the dedicated core design. In [15]-[17], the recursive expressions for the DCT computation that are suitable for VLSI implementation are presented. It is worth noticing that the recursive algorithms are solely used to design recursive DCT architectures rather than the recursive DFT architectures in [46]-[48]. In the past two decades, several recursive DFT algorithms and architectures have been explored [49]-[53]. Compared with the conventional second-order recursive DFT/IDFT architecture, Van et al. [51] utilized resource-sharing and register-splitting schemes to reduce two multipliers and speedup the computation, respectively. Yang et al. [52] proposed two unified IIR filter structures to save the hardware cost for the DFT computation. Nevertheless, neither Van et al. [51] nor Yang et al. [52] improve the computation cycle. In [53], Fan et al. applied the previous proposed method to reduce the computation cycles but the performance is limited. On the other hand, Fan et al. only proposed the recursive DFT algorithm but the IDFT algorithm is not yet ready in [53]. In essence, a short description of the proposed algorithm has been presented in the associated conference [54, 55]. In this thesis, the detailed descriptions of a high-performance and power-efficient VLSI algorithm and architecture by the hybrid of input strength reduction scheme, Chebyshev polynomial, and register-splitting scheme for the DTMF application have been fully provided. The derived recursive algorithm and devised architecture [54, 55] possesses the following features: low-computation cycle (i.e., high throughput) and power efficiency at the expense of slightly increased area overhead compared with the existing recursive DFT/IDFT structures.

2. MIMO-OFDM FFT design: Future broadband wireless access systems including wireless LANs (WLAN) and fourth-generation (4G) mobile radio systems need much higher spectral efficiency and service quality than the current standards do [22, 23]. A multiple-input-multiple-output (MIMO) wireless system has been extensively studied recently due to the potential for raising system capacity [24-26]. The orthogonal

frequency division multiplexing (OFDM) modulation scheme not only decreases the receiver complexity, but also improves the performance on highly dispersive channels.

An especially promising candidate for the next-generation fixed and mobile wireless systems is the combination of MIMO technology with OFDM, called the MIMO-OFDM system. A MIIMO-OFDM system with k antennas in the transmitter and the receiver comprises k OFDM baseband processors working in parallel, and thus requires k FFT processors, one for each antenna [24-26]. Because of the high throughput requirements of the FFT computation in the MIMO-OFDM system, three 4×4 MIMO-FFT architectures, parallel multi-path architecture, serial multi-stream architecture and serial blockwise architecture, as depicted in Fig. 1(a)-(c), respectively, have been presented [25]. A parallel multi-path architecture includes k FFT blocks for k antennas, as depicted in Fig. 1(a). The figure indicates that the area cost of parallel multi-path based system rises linearly with the number of antennas (i.e. k times the FFT block area). Conversely, the serial multi-stream architecture and serial blockwise architecture only requires one FFT block to handle the concurrent computation of k antennas. However, the serial multi-stream architecture applies one lower throughput rate FFT processor embedded with the k times buffer size for intermediate computation, as depicted in Fig. 1(b). For k channel computation, the serial multi-stream architecture must operate at a higher clock frequency than sampling data frequency of Fs to satisfy the higher throughput requirements. Analytical results indicate that the operating frequency of serial multi-stream based system grows linearly with the number of antennae (i.e. k times the sampling frequency of Fs). Based on the serial blockwise FFT architecture, the input data of the FFT block can be provided in parallel with k embedding input buffer, as depicted in Fig. 1(c). Applying one higher throughput rate FFT processor, the serial blockwise FFT based processor can complete k channel FFT computations concurrently.

Among these three architectures, the serial blockwise architecture only requires one FFT block operating at the same clock frequency with the data sampling frequency of Fs. An analysis has depicted that a unique operating frequency, which is close or equivalent to the sampling frequency of F_s, is preferable to the FFT processor when the power consumption is confined by the application environment, such as mobile communications [26, 31, 38, 56, 57]. Considering the memory cost, the serial blockwise architecture should slightly increase the cost with one extra buffer of size N than other architectures. However, the memory cost problem for serial blockwise architecture becomes increasingly minor when the number of antennae in the MIMO-OFDM system

is larger. Consequently, the serial blockwise-based MIMO-FFT architecture applies single FFT block to achieve the appropriate throughput and minimizes power consumption for MIMO-OFDM WLAN applications.

FFT block # 1

(Operating Freqency: F_s) A/D

A/D

Buffer (Size: N)

FFT block # 2

(Operating Freqency: F_s) Buffer (Size: N) FFT block # 3

(Operating Freqency: F_s) Buffer (Size: N) FFT block # 4

(Operating Freqency: F_s) Buffer (Size: N) Channel 1

Channel 2 Channel 3 Channel 4

Z₁(k) Z₂(k) Z₃(k) Z₄(k) (A/D Sampling Frequency: Fs)

Parallel Multi-Path MIMO-FFT Processor

(a) Parallel multi-path MIMO-FFT architecture.

A/D A/D A/D A/D

Buffer (Size: 4N)

MUX FFT block # 1 DeMUX

(Operating Freqency: 4F_s) Channel 1

Channel 2 Channel 3 Channel 4

Z₁(k) Z₂(k) Z₃(k) Z₄(k) (A/D Sampling Frequency: Fs)

Serial Multi-Stream MIMO-FFT Processor

(b) Serial multi-stream MIMO-FFT architecture.

A/D A/D A/D A/D

MUX DeMUX

Buffer (Size: N)

Buffer (Size: N) Buffer (Size: N) Buffer (Size: N)

FFT block # 1 (Operating Freqency: F_s)

Buffer (Size: N) Channel 1

Channel 2 Channel 3 Channel 4

Z₁(k) Z₂(k) Z₃(k) Z₄(k) (A/D Sampling Frequency: Fs)

Serial Blockwise MIMO-FFT Processor

(c) Serial blockwise MIMO-FFT architecture.

Fig. 1: MIMO-FFT architectures.

3. Long-Length FFT Design: The FFT and IFFT are essential in the field of digital signal processing (DSP) and communication systems. In the realistic world, many applications require the FFT/IFFT implementations that can perform long-length computations while exhibiting low cost, low power consumption and high throughput. The long-length

based FFT/IFFT processor has been widely applied in many real time applications, such as: DVB-H(Digital Video Broadcasting－Handheld)[27, 28], VDSL(Very-high-speed Digital Subscriber Line) [29], and audio measurement [30]. DVB-H is a digital broadcast standard offering high data rate audio/video content delivery to handheld devices, which requires a 4096-point FFT computation (i.e. 4k mode) for the flexible networking design in single frequency networks (SFNs) [27, 28]. The VDSL transceiver and audio analyzer need to involve the complicated FFT computations, where the transform length is also 4096-point [29, 30]. Since such long-length FFT computations are rather time-consuming, the efficient FFT processors are necessary to meet the real time operations. Furthermore, the handheld devices include multimedia mobile phones with color displays as well as personal digital assistant (PDA) and pocket PC, which should consider some specific advantages — small, lightweight, portable, battery-powered devices.

4. Triple-mode reconfigurable FFT/IFFT/2-D DCT design: generation mobile multimedia applications, including mobile phones and personal digital assistant (PDAs), require much sufficiently high processing power for multimedia applications. Multimedia applications include video/audio codecs, speech recognition and echo cancellers. The speech recognition requires the speech extraction and autocorrelation coefficient computations [58] in the voice command application. The video codec is the most challenging element of a multimedia application, since it requires much processing power and bandwidth. Hence, a flexible and low cost pipeline processor with the superiority of high processing rate is required to realize necessary computation-intensive algorithms, such as 256-point FFT/IFFT and 8×8 2-D DCT [5]-[7]. Additionally, a major integration challenge is to design the digital baseband and accompanying control logic.

The WiMAX baseband is constructed around orthogonal frequency division multiplexing (OFDM) technology requiring high processing throughput. The fixed, IEEE 802.16e [44], version of WiMAX also needs a 256-point FFT computation. Many researchers have recently concentrated on designing an optimized reconfigurable DSP processor to achieve a high processing rate and low power consumption in next-generation mobile multimedia applications [5][6]. The software based architecture such as the co-processor and dual-MAC designs have been proposed by Chai et al. [5]

and Kolagotla et al. [6], respectively. However, they induce the large chip size because of the high flexibility. Vorbach et al. have also presented hardware-based concepts such as the processing element (PE) array [7], which achieves a high processing rate with

reasonable flexibility. However, the processing kernel has the flaw of a low utilization rate with a large array memory and muti-MACs, leading to poor cost efficiency. The specific ASIC based design on a fast computation algorithm provides high cost efficiency [8]-[10]. Tell et al. [8] presented the FFT/WALSH/1-D DCT processor for multiple radio standards of the upcoming 4^th generation wireless systems. Conversely, some designs [8]-[10] only support 1-D DCT computation, and have no 2-D DCT support. However, 2-D DCT is desirable for the video compression among wireless communication applications. This study not only presents a single reconfigurable architecture for the 256-point FFT/IFFT modes and the 8×8 2-D DCT mode, but also achieves high cost-efficiency in portable multimedia applications.

在文檔中高效能之管線式傅立葉轉換處理器之設計與實現 (頁 17-22)