• 沒有找到結果。

5 Long-Length based Effective Pipeline FFT/IFFT Processor

5.5 Chip Implementation

Following the functional verification in the Matlab environment, the proposed R42SDF and R43SDF based 4096-point FFT/IFFT architectures in which the internal word-length of entire design are 14-bit and 13-bit, respectively, were synthesized by the Design Compiler with TSMC 0.13µm CMOS technology. Using the standard logic process rules, the single port SRAM applies the 6T bit cell. The floorplan and post-layout were performed by Astro. The post-simulation was issued by NC-Simulator to verify the functionality after back-annotation was performed from the Start-RC extractor. The static timing check can be signed-off by PrimeTime. Finally, the power analysis and DRC were conducted using Astro Rail and Dracula, respectively. The core area of the post layout for the R42SDF and R43SDF design are 1.01 and 0.89 mm2, which includes power rings and power straps as depicted in Fig. 30(a) and 30(b) , respectively. The gate count usage of each building block for R42SDF and R43SDF design are listed in Table 10. Comparing with the R42SDF architecture, the R43SDF architecture can replace one complex multiplier with one constant multiplier in the 4096-point FFT/IFFT computation as depicted in Fig. 22 and 23. Then, the R43SDF design reduces the multiplier cost of 3.9 % than the R42SDF design as listed in Table 10, which includes the complex and constant multipliers. It is obviously that 4095 words feedback memory

dominates the chip of 77.34 % and 80.72 % for the R42SDF and R43SDF design, respectively.

Both of these two chips could operate at 20 MHz, thus satisfying the high throughput requirement. Concerning the speed performance, because the pipelined multiplier operation is easy to design at a clock rate of 20 MHz or even higher, the proposed architectures can achieve a high clock rate by simple pipelining techniques for the involved arithmetic components. The average power dissipation of the R42SDF and R43SDF based 4096-point FFT/IFFT design are 6.3725 and 5.985 mW@20 MHz at 1.2V supply voltage. The layout view of R42SDF design as shown in Fig. 30(a) has 68 I/O pins, of which eight pins are power supply pins. Due to the few datawidth requirements, the layout view of R43SDF design as shown in Fig. 30(b) has only 64 I/O pins. The proposed R42SDF and R43SDF based 4096-point FFT/IFFT implementation satisfies the system performance of DVB-H standard.

Additionally, the proposed R43SDF based 4096-point FFT/IFFT implementation has a low power consumption (5.985 mW), and the lowest hardware requirement among the tested pipeline architectures. These findings indicate that the proposed design meets the requirements of high effective pipeline FFT/IFFT processor for SoC IP.

(a) The layout view of proposed R42SDF design.

(b) The layout view of proposed R43SDF design.

Fig. 30: The layout view of proposed 4096-point pipeline FFT/IFFT processor.

Table 10: The Gate Count Usage of Each Building Block in the Proposed Design.

Categories Control Butterfly Cores

Complex Multiplier

Constant Multipliers

Shift Registers

R42SDF 0.33 % 10.1 % 9.83 % 2.4 % 77.34 %

R43SDF 0.35 % 10.6 % 5.03 % 3.3 % 80.72 %

5.6 Summary

This work develops two high effective R42SDF and R43SDF pipeline VLSI architectures that support the long-length FFT/IFFT computations. The proposed R43SDF pipeline FFT/IFFT architecture has lower multiplicative complexity and higher hardware utilization rate with smaller cost than R42SDF and other pipeline architectures. Following with fixed-point analysis in 40dB AWGN environment, the proposed R42SDF and R43SDF based 4096-point FFT/IFFT designs are successfully implemented in 0.13 µm CMOS technology with an internal word-length of 14 and 13-bits, respectively. The proposed R42SDF and R43SDF based design have a low power consumption of 6.3725 and 5.985 mW @20 MHz at 1.2V supply voltage. Thus, these features ensure that the proposed R43SDF pipeline 4096-points FFT/IFFT processor design certainly meets the high effective VLSI architecture.

Chapter 6 Effeeeective Triple-Mode Reconfigurable Pipeline FFT/IFFT/2D-DCT Processor

Tell et al. [8] presented the FFT/WALSH/1-D DCT processor for multiple radio standards of the upcoming 4th generation wireless systems. Conversely, some designs [8-10]

only support 1-D DCT computation, and have no 2-D DCT support. However, 2-D DCT is desirable for the video compression among wireless communication applications. This study not only presents a single reconfigurable architecture for the 256-point FFT/IFFT modes and the 8×8 2-D DCT mode, but also achieves high cost-efficiency in portable multimedia applications. Results of comprehensive comparison further indicate that the proposed R42SDF-based pipeline processor achieves a higher utilization with a smaller hardware requirement than R22SDF-based pipeline processor [31] in the 256-point FFT/IFFT mode, and thus has higher cost efficiency. The proposed R42SDF-based design also achieves satisfactory performance for the DV encoding standard with the lowest cost in the 8×8 2D DCT mode. The organization of this chapter is structured as follows. A new R42SDF FFT/IFFT and 8×8 2D DCT algorithm is given in Section 6.1. Section 6.2 demonstrates the proposed FFT/IFFT/2-D DCT pipeline architecture using the R42SDF algorithm. The finite wordlength analysis is given in Section 6.3, and indicates that the proposed architecture achieves the required system performance in both 256-point FFT/IFFT and 8×8 2-D DCT modes with the lowest hardwire cost. Section 6.4 tabulates the comparison results in terms of hardware utilization and cost to demonstrate the high cost-efficiency of the proposed architecture, and also discusses the chip implementation. The section 6.5 draws conclusions.

6.1 8×8 2D FFT and 8×8 2D DCT Formula

Two concurrent 2D DCTs can be calculated by the single 2D shifted FFT (SFFT) algorithm [74] from the input reordering and post computation. This study presented a high-speed pipeline processor to support the triple-mode 256-point FFT/IFFT/8×8 2D DCT with the radix-42 algorithm. Two concurrent 2D DCTs results can be obtained by the

This study neglects the post-scaling factor of ( ) ( ) 4 could then be reordered as

) 1/4 samples. The detail description of the transfer function between 8×8 2D SFFT and 8×8 2D DCT could be found in Appendix A.

∑ ∑ ⋅ ⋅

where 0≤k1,k2,n1,n2 ≤7. Since the input data y(n1,n2) form a real-valued sequence, the second half output can be derived as

∑ ∑ ⋅ ⋅ respectively, can then be created as

{

Re[ ( , )] Re[ (8 ,8 )]

}

substituted as X[8k1+k2] and x(8n1+n2), respectively. Then, the specific two-dimensional (2D) linear index map is applied as follows:

12

minimized by following the specific mapping in (96). The 8×8 2-D FFT CFA form can then The butterfly structures for 8×8 2D DCT, corresponding to above equations (88)-(97), are summarized as follows:

Butterfly stage I:

The time-domain shift stage of 2-D DCT:

]

{

Im[ (8 )] Im[ (72 8 )]

}

4 ] 1 8

[ 1 2 1 2 1 2

2 k k Y k k Y k k

X + = s + − s − −

{

Re[ (64 8 ) Re[ (8 8 )]]

}

4 1

2 1 2

1 k Y k k

k

Ys − + + s + −

+ . (98)

Two 8×8 2D DCT computation results X1[8k1+k2] and X2[8k1+k2] are calculated concurrently in the post-computation of the butterfly stage IV. The 8×8 2-D IDCT computation can also be obtained following a similar decomposition procedure. Because of the cost-effective constraint in the physical design, this study only considers the triple-mode FFT/IFFT and 2-D DCT computations. The derivation results of the radix-42 based FFT/IFFT/2-D DCT algorithm indicate that all butterfly computation can be easily implemented with four four-input complex adders and some shuffle circuits. The radix-4 butterfly structure has no multipliers. Additionally, the regular structure can be easily derived in both the 8×8 2-D DCT and 256-point FFT/IFFT pipeline processor architecture.

6.2 Pipeline 256-Point FFT/IFFT/8×8 2D-DCT Processor