The MIMO-FFT Architecture - Literature Review

2 Literature Review

2.4 The MIMO-FFT Architecture

The High Throughput Task Group, which established the IEEE 802.11n standard, is going to develop the next-generation wireless LAN (WLAN) based on the 802.11 a/g, which comprises the current OFDM-based WLAN standards [23]. According to the IEEE 802.11n standard [23], 128-point and 64-point FFT/IFFT processors are utilized to support four different throughput rates —R, 2R, 3R and 4R—within 3.6 or 4 µs. The transmitted signal bandwidths are 40 and 20 MHz for the 128-point and 64-point FFT/IFFT processors, respectively. In this study, we focus our 64-point FFT/IFFT design on 2×2 and 4×4 MIMO-OFDM WLAN systems, which require the high throughput rate of 2R and 4R.

Sansaloni et al. presented a detail comparison of several 64-points FFT/IFFT algorithms for the MIMO-OFDM WLAN system [26]. According to that comparison, the multi-path delay commutator (MDC) based design, which was built by the serial blockwise architecture, is the most cost-efficient architecture for the MIMO-OFDM system. For a 4×4 MIMO-OFDM system, the radix-4 multi-path delay commutator (R4MDC) architecture can achieve the lowest hardware requirement, where the operating frequency equals the sampling frequency, while the radix-2 multi-path delay commutator (R2MDC) architecture is the most cost-efficient architecture for the 2×2 MIMO-OFDM system. However, the R4MDC- and

R2MDC- based 64-point FFT/IFFT designs both have higher complex multiplicative complexities than the radix-8, radix-2/4/8 and radix-2/8 based designs as listed in Table 1.

The design with the highest complex multiplicative complexity has the highest power consumption [26, 31, 32, 36, 56, 57]. Maharatna et al. [41] recently presented a modified radix-8 multi-path delay commutator (R8MDC) based 64-point FFT/IFFT WLAN processor to reduce the hardware cost than the conventional R8MDC design with the appropriate throughput rate of 5.33R. Although, the modified R8MDC design achieves the low complex multiplicative complexity as radix-8 based algorithm, the large amount of memory and four constant multipliers still lead to a large chip cost.

Table 1: Number of complex multiplication needed for the computation of a 64 point FFT/IFFT processor.

Complex Multiplication Constant Multiplication

Radix-2 98 N/A

Radix-2² 76 N/A

Radix-4 76 N/A

Radix-2/4 72 N/A

Radix-2/4/8 48 32

Radix-8 48 32

Radix-2/8 48 32

Bouguezel et al. [59] reported the comprehensive analysis of the data transfer, address generation and twiddle factor evaluation or access to the lookup table. The comparison results of [59] reveal that the radix-2/8 algorithm has fewer arithmetic operations than other low-radix and mixed-radix algorithms. Additional, Yeh et al. [32] indicate that the radix-2/8 algorithm is computationally superior to all other algorithms, since it has most trivial multiplications (i.e., ±1and ±j). Therefore, the radix-2/8 based architecture is presented for the few constant multipliers, high utilization and low complex multiplicative complexity. Yeh et al. [32] apply the radix-2/8 algorithm to present the radix-2/8 single path delay feedback (R28SDF) -based 64-point FFT/IFFT processors. However, the single path delay feedback (SDF) based architecture [32] has the lowest throughput rate of R. This investigation adopts the novel radix-2/8 algorithm, which is different from the conventional radix-2/8 algorithm [32, 59, 60], to further reduce the constant multiplier requirement in the proposed retrenched 8-point FFT (R8-FFT) unit. Lin et al. briefly described the algorithm that is adopted in the SISO-OFDM application [57]. This work adopts this novel radix-2/8 algorithm and the

multiplier after write (MAW) scheme [57] to devise two architectures, radix-2/8 multiple-path delay feedback (R28MDF) and radix-2/8 multiple-path delay commutator (R28MDC), for the high throughput rate system of 2R and 4R, respectively.

Chapter 3 The Low-Computation Cycle and Power-Efficient Recursive DFT/IDFT Design

In this chapter, we focus on the design of low-computation cycle and power-efficient recursive DFT/IDFT design. The detailed descriptions of a high-performance VLSI algorithm and architecture by the hybrid of input strength reduction scheme, Chebyshev polynomial, and register-splitting scheme for the DTMF application have been fully provided. The derived algorithm and devised architecture [23] possesses the following features: low-computation cycle (i.e., high throughput) and power efficiency at the expense of slightly increased area overhead compared with the existing recursive DFT/IDFT structures. This chapter is organized as follows. A new recursive DFT/IDFT algorithm and architecture by the hybrid of input strength reduction, Chebyshev polynomial, and register-splitting schemes is revealed in Section 3.1. In Section 3.2, the DTMF application using this new architecture has been demonstrated. After the bit-level SNR simulation, the 212/106-point DFT/IDFT chip has been successfully implemented for the DTMF detector system. In Section 3.3, the comparison results are tabulated in terms of the amount of computation cycles for each output as well as N-point DFT/IDFT, the maximum number of the channel density, the clock period, and the number of real multipliers. At last, the concise statements conclude this chapter in Section 3.4.

3.1 New Recursive Algorithm and Architecture

The DFT of the N-point input x[n] is defined as

^/² ¹

(

^[ ^] ^[ ¹ ^]

)

^sin(² ⁾ (47) can be rewritten as

∑ ⋅ It is known that Chebyshev polynomials are well defined as

) Using the recursive identity stated in (51), equation (50) can be deduced as

( )

⁼ ^∑ ⁻

( (

⁺

) )

∑ − − ⋅ + ⋅

The z-transform of (53) can be denoted as

2 The z-transform of (56) can be denoted as

Equations (54) and (57) can be easily mapped into the recursive DFT structures as shown in Fig. 5(a) and (b), respectively. Compared with the conventional architectures [51, 52, 62], it is clear that by using the proposed DFT algorithm and architecture can reduce computations cycles by 50%. In other words, with respect to the algorithm derivation, the throughput rate can be easily doubled without increasing the operating frequency.

] [n r_k

−^Σ

Σ Σ

−

1 (−

] [k y_DCT

θk

cos 2

θk

cos

−1

(a)

θk

cos 2 ]

[n s_k

−^Σ

Σ ⁽⁻¹⁾^k ^yD ST^[k^]

θk

sin

−1

(b)

Fig. 5: Block diagram of low-computation cycle for (a) DCT part and (b) DST part of the DFT computation.

For the power-efficiency issue, we adopt the register-splitting scheme [51] (i.e., a type of retiming schemes) to reduce the critical path. There are two main advantages of using retiming scheme [65]: one is high speed and the other is low power. In this paper, we consider this technique for lowering the power consumption where the speed does not need to be increased. The resulting DCT part is depicted in the upper diagram of Fig. 6, where ¹^<=

denotes a hardwired shifter with one-bit left shift. Similarly, the DST part can be modified as the lower diagram of Fig. 6. In order to maintain the minimum clock period for the recursive DFT computation, the forward pipeline register, , is exploited for the final sum output.

Later combining these two new parts into one, a novel recursive DFT architecture that possesses lower computation cycle and more power-efficiency than the conventional DFT structures can be obtained.

]

Fig. 6: Block diagram of the proposed low-computation cycle and power-efficiency recursive DFT architecture.

The IDFT of the N-point input y[k] is defined as Nkn input strength reduction scheme can be modified as

( )

(60) can be rewritten as

∑ ⋅

where ₌ _∑⁻ ₋ ₋ _⋅

(

⁺

)

Using the recursive identity stated in (51), equation (63) can be deduced as

( )

⁼ ^∑ ⁻ ^⋅

( (

⁺

) )

The z-transform of (64) can be denoted as

The z-transform of (67) can be denoted as

architecture, it is obviously found that the 50% computation cycle reduction can be achieved by contrast with that of [50, 51, 62]. That means double the throughput rate can be achieved under the same operating frequency.

] [k r_n

−^Σ

θn

cos 2 ]

[k s_n

−^Σ

θn

cos

θn

sin

−1

] [n x

− Σ

N )n

(−1 j Σ

−

Fig. 7: Block diagram of the proposed low-computation cycle and power-efficient recursive IDFT architecture.

在文檔中高效能之管線式傅立葉轉換處理器之設計與實現 (頁 44-53)