Comparison between R4 2 SDF and R2 2 SDF

5 Long-Length based Effective Pipeline FFT/IFFT Processor

6.4 Comparison and Chip Implementation

6.4.1 Comparison between R4 2 SDF and R2 2 SDF

He et al. presented the efficient pipeline FFT processor, several reliable architectures and the detailed comparison of their hardware costs [31]. A comparison of these architectures indicates that R2²SDF has the highest butterfly utilization of 50%, a the highest complex multiplier utilization of 75%, and the lowest hardware resource requirement [31][34]. Additionally, the SDF-based design has the structural merits of high regularity and modularity with simple wiring complexity, making it very appropriate for the VLSI implementation of the pipeline FFT processor design [31, 32, 34]. This section presents the comprehensive comparison results of several famous pipeline FFT/IFFT architectures to demonstrate the high cost-efficiency of the proposed R4²SDF FFT/IFFT architecture. The architectures were compared in two indices, namely cost and utilization, to express the cost efficient of the proposed FFT/IFFT architecture, as listed in Tables 13 and 14. Table 13 lists the required hardware resources, where T denotes the number of complex adders required in the implementation of the constant multiplier.

Significantly, the proposed constant multiplier is minimized using complex conjugate symmetry rule and subexpression elimination algorithm. The area of the complex multiplier is known to be one dominant cost index in the pipeline FFT/IFFT design. The comparison results in Table 14 clearly demonstrate that the proposed R4²SDF based-FFT/IFFT architecture has the fewest complex multipliers requirement among other pipeline architectures. The 256-point FFT/IFFT architecture only needs one complex multiplier, which is 67% and 95% below the requirement of the R2²SDF and R8MDC FFT/IFFT architectures, respectively. Additionally, the proposed architecture applies the feedback type memory structure to maintain the minimum shift registers requirement. Although the proposed R4²SDF based architecture needs slightly more complex adders than the R2²SDF based architecture, this small cost penalty is acceptable.

To estimate the total chip cost in the 256-point FFT/IFFT architectures, which includes the number of complex multipliers, complex adders and memory size, the conventional comparative methodology [26, 32] with the unit of equivalent adders was adopted to estimate the cost of each different architecture. Based on the implementation results in our process, we convert the area of each complex multiplier and complex memory to the 50 and 1.3 complex adder, respectively, when adopting 13-bit precision, and the scheme

with three real multiplications and five real additions, in the implementation. The rightmost column of Table 13 lists the area indexes of the equivalent adder of the 256-point FFT/IFFT architecture. Clearly, the proposed R4²SDF-based 256-point FFT/IFFT architecture has the lowest hardware requirements. The R4²SDF-based 256-point FFT/IFFT architecture has a 16% lower cost than the R2²SDF-based 256-point FFT/IFFT architecture. Significantly, the cost advantage of our proposed architecture becomes more evident when the transform length is larger. Thus, the proposed R4²SDF-based architecture has a lower hardware cost than R2²SDF and other famous pipeline FFT/IFFT architecture in terms of the number of ROMs, complex multipliers, complex adders, constant multipliers and shift registers.

Table 14 shows the comprehensive comparison of the hardware utilization rate in terms of the utilization rate of complex multipliers, complex adders and complex memory.

Clearly, the proposed architecture achieves the highest complex multiplier utilization rate among pipeline architecture (87.5%). Additionally, the proposed architecture maintains the maximum complex memory utilization rate of 100%. Furthermore, the proposed architecture, including the constant multipliers, has the highest complex adder utilization rate of 56.25%. Thus, the purposed architecture achieves a higher hardware utilization rate than R2²SDF and other well-known pipeline FFT/IFFT architecture in terms of the utilization rate of complex multipliers, complex adders, constant multipliers and complex memory. Although the R2MDC, R4MDC and R8MDC architectures have the higher throughput rate (output/cycle) of 2, 4 and 8 than SDF based architecture, these approaches require large hardware requirement, such as complex multipliers, adders and memory size, as shown in Table 13. Therefore, this investigation focuses on the

“hardware-oriented” architecture, in which the arithmetic operations can be tightly scheduled for efficient hardware utilization. This study demonstrates that the proposed R4²SDF based pipeline FFT/IFFT architecture has the lowest hardware cost and highest hardware utilization. Conversely, the proposed R4²SDF based pipeline FFT/IFFT architecture is the most cost-efficient.

6.4.2 8×8 2-D DCT Comparison

Many DCT implementations exist spanning a broad spectrum of architectures, focusing on different applications. Lee et al. [78] presented a highly parallel approach with high arithmetic cost and high power consumption for the high-performance application. The systolic implementation of Lee et al. [78] employs the row-column decomposition to derive the configurable 2D N×N DCT in three steps with each step implemented in systolic form. This work concentrates on high-speed FFT/IFFT/2D DCT architectures with a throughput rate of at least one output sample per cycle, targeted for applications in next-generation handheld devices needing a high data-processing rate.

Moreover, the proposed architecture has high cost efficiency and low cost in a portable consumer device. This subsection lists the hardware requirement comparison between six different implementations in terms of the number of real (complex) multipliers, real (complex) adders, twiddle factors realization, total transistor count, hardware complexity, throughput, internal wordlength, interconnect complexity and support for triple-mode, as shown in Table 15. Clearly, the proposed pipeline R4²SDF-based FFT/IFFT/2D-DCT processor has the fewest complex multipliers and lowest hardware complexity, an acceptable throughput rate and moderate interconnect complexity. Although the number of the complex adders in the proposed processor is greater than the designs in [79] and [80], the total area including complex multiplier is still lower than others. The total number of transistors indicates that the proposed design achieves the smallest chip cost among architectures supporting FFT/IFFT mode.

Table 13 Hardware Cost Comparisons of the Pipelined FFT/IFFT Architecture.

Table 14 Hardware Utilization Rate Comparisons of the Pipelined FFT/IFFT Architecture.

Pipeline architecture

Utilization rate of complex Mult.

Utilization rate of complex adders

(including constant mult.)

Utilization rate of complex memory

Table 15 Hardware Requirement Comparison of 8×8 2D DCT Architecture.

8×8 DCT Lee et al. [78]

(parallel)

Chang &

Wang [81]

(2D systolic)

Hsiao and Shiue [79]

(linear-array)

Ruetz et al. [80]

(linear-array)

Madisetti et al.

[82]

ROM based LUT ROM based LUT Hardwired Multiplier

Internal Wordlength 18 16 16 14 22 13 Interconnect

complexity

Complex Simple Moderate Moderate Simple Moderate

FFT/IFFT/2-D DCT triple modes

No No No No No Yes

1 A gate count was determined and the number of transistors was determined by assuming four transistors per gate.

2 An unknown gate count was indicated by “N/A”

在文檔中高效能之管線式傅立葉轉換處理器之設計與實現 (頁 122-126)