Finite Word-Length Analysis - Long-Length based Effective Pipeline FFT/IFFT Processor

5 Long-Length based Effective Pipeline FFT/IFFT Processor

5.3 Finite Word-Length Analysis

Due to the requirements of handheld devices, several specific issues should be considered — small dimensions, light weight, and battery-power operation. The system performance should then satisfy the relative specifications. A higher system performance undoubtedly implies a larger chip cost and greater power consumption, owing to the wider internal word-length. Since the chip cost and system performance are known to be a trade-off, this study performed a finite word-length analysis to estimate the appropriate word-length for the R4²SDF and R4³SDF based 4096-point FFT/IFFT processors. In this work, the output signal to noise ratio (SNR) performance of 4096-point FFT/IFFT processor is estimated under 40dB additive white Gaussian noise (AWGN) channel. In our fixed-point simulation environment, the input data of the double floating-point precision were generated from the ideal IFFT (FFT) model by passing the 40 dB AWGN channel model in Matlab. The input data with noise are sent into the proposed R4²SDF and R4³SDF pipeline FFT/IFFT architectures, which are modeled at different fixed-point levels for each function unit. The output SNR is obtained by comparing the original input data with the fixed-point model output. The results after 100,000 iterations are averaged as depicted in Fig. 29, where the x-axis and y-axis represent the internal word-length and the whole system output SNR, respectively. These analytical results demonstrate that the output SNR saturated as the internal word-length increased. It is obviously that the proposed R4³SDF only requires 13-bit internal word-length for each function unit to produce satisfactory performance under 40dB noise environments, satisfying the DVB-H specification [27, 28]. Significantly, the proposed R4²SDF requires one more bit than R4³SDF, which is 14-bit internal word-length for each

function units. That means the R4²SDF design has the larger chip cost than R4³SDF design in the 4096-point FFT/IFFT computation.

Fig. 29: Finite word-length analysis of the proposed pipeline R4²SDF and R4³SDF-based

4096 points FFT/IFFT architecture.

5.4 The Comparison of Pipeline FFT/IFFT Architecture

This section presents the comprehensive comparison results of several famous pipeline FFT/IFFT architectures to demonstrate the high efficiency of the proposed R4²SDF and R4³SDF FFT/IFFT architectures. The architectures are compared in two indices, namely cost and utilization, to express the hardware efficiency of the proposed FFT/IFFT architecture, as listed in Tables 8 and 9. Table 8 lists the required hardware resources, where T denotes the number of complex adders required in the implementation of the constant multiplier.

Significantly, the area of the complex multiplier and memory are well known to be the dominant cost index in the pipeline FFT/IFFT design. The comparison results in Table 8 clearly demonstrate that the proposed R4³SDF based-FFT/IFFT architecture has the fewest complex multipliers requirement among other pipeline architectures. The R4³SDF based 4096-point FFT/IFFT architecture only needs one complex multiplier, which is 80% and 95%

below the requirement of the R2²SDF and R8MDC FFT/IFFT architectures, respectively.

Additionally, the proposed architectures maintain the minimum shift registers requirement among the tested pipeline architectures. Although the proposed R4²SDF and R4³SDF based architectures need slightly more complex adders than the R2²SDF based architecture, this small cost penalty is acceptable. To estimate the total chip cost in the 4096-point FFT/IFFT architectures, which includes the number of complex multipliers, complex adders and memory size, the conventional comparative methodology [26, 34] with the unit of equivalent adders was used to estimate the cost value between the different architectures. Based on the implementation results in our process, we convert the area of each complex multiplier and complex memory to the 50 and 1.3 complex adder, respectively, and the scheme with three real multiplications and five real additions, in the complex multiplier implementation. The rightmost column of Table 8 lists the area indexes of the equivalent adder of the 4096-point FFT/IFFT architecture. Clearly, the proposed R4³SDF-based 4096-point FFT/IFFT architecture has the lowest hardware requirements. Significantly, the cost advantages of our proposed architectures become more evident when the transform length is larger. That means the proposed architectures are very appropriate for the long-length FFT/IFFT computation.

Thus, the proposed R4³SDF architectures have lower hardware cost than R4²SDF and other famous pipeline FFT/IFFT architectures in terms of the number of ROMs, complex multipliers, complex adders, constant multipliers and shift registers.

Table 8: Hardware Cost Comparisons of the Pipelined FFT/IFFT Architecture.

Pipeline archi- tecture

Mult.

Comp-lexity

Complex Mult.

Complex adders (including constant mult.)

Complex Memory

Size

Equivalent area in 4096

points R2SDF [17] Radix-2 log2N-2 2log2N N-1 5847.5 R4SDF [18] Radix-4 log4N-1 8log4N N-1 5621.5 R8SDF [8] Radix-8 log8N-1 (24+2T)log8N N-1 5609.5 R2²SDF [6] Radix-2² log4N-1 4log4N N-1 5597.5 R2³SDF [5] Radix-2³ 2(log8N-1) 6log8N N-1 5647.5 R2MDC [13] Radix-2 log2N-2 2log2N 1.5N-2 8508.6 R2²MDC [9] Radix-2² log2N-2 2log2N 1.5N-2 8508.6 R4MDC [14] Radix-4 3log4N-3 4log2N 2.5N-4 14104.8 R8MDC [15] Radix-8 7log8N-7 (24+2T)log8N 4.5N-8 30664.4 Proposed R4²SDF Radix-4² log16N-1 (16+T)log16N N-1 5470.5 Proposed R4³SDF Radix-4³ log64N-1 (24+2T)log64N N-1 5429.5 Table 9: Hardware Utilization Rate Comparisons of the Pipelined FFT/IFFT Architecture.

Pipeline architecture Utilization rate of complex Mult.

Utilization rate of complex adders (including constant

mult.)

Utilization rate of complex memory

R2SDF [17] 50% 50% 100%

R4SDF [18] 75% 25% 100%

R8SDF [8] 87.5% 12.5% 100%

R2²SDF [6] 75% 50% 100%

R2³SDF [5] 87.5% 50% 100%

R2MDC [13] 50% 50% 50%

R2²MDC [9] 37.5% 50% 50%

R4MDC [14] 25% 25% 25%

R8MDC [15] 12.5% 12.5% 12.5%

Proposed R4²SDF 87.5% 56.25% 100%

Proposed R4³SDF 96.9% 60.42% 100%

Table 9 shows the comprehensive comparison of the hardware utilization rate in terms of the utilization rate of complex multipliers, complex adders and complex memory. Clearly, the proposed R4³SDF architecture achieves the highest complex multiplier utilization rate among the tested pipeline architectures (96.9%). Additionally, the proposed architecture maintains the maximum complex memory utilization rate of 100%. Furthermore, the proposed R4³SDF architecture, including the constant multipliers, has the highest complex adder utilization rate of 60.42%. Thus, the purposed R4³SDF architecture achieves a higher hardware utilization rate than R4²SDF and other well-known pipeline FFT/IFFT architectures in terms of the utilization rate of complex multipliers, complex adders, constant multipliers and complex memory.

在文檔中高效能之管線式傅立葉轉換處理器之設計與實現 (頁 94-98)