Implementation on DSP/FPGA

Chapter 5 MPEG-4 AAC Implementation and Optimization on DSP/FPGA

5.3 Implementation on DSP/FPGA

We have implemented and optimized MPEG-4 AAC on TI C64x DSP and Xilinx Virtex-II FPGA. The optimized result has been shown in Table 5.4. We use a 0.95 second test sequence to compare the performance of the DSP implementation and the DSP/FPGA implementation. The overall speed is 8.17 times faster than the original version, and the DSP/FPGA version can process 48-second audio data of in 1 second.

!!"

# $

! "

+ $ 4& & %% 4 4

( #! # 4& & %% 4 & & 4 %4&

( ' 7- 4 & 4 % & 4% % 4

Table 5.4 Comparison of DSP and DSP/FPGA implementation

Chapter 6 Conclusion and Future Work

We have implemented the MPEG-4 AAC decoder on DSP and FPGA together. In this project, we speed up the IMDCT implementation on DSP implementation, and the modified version is 503 times faster than the original version. And then we implement the Huffman decoding and IFFT on FPGA. The implementation and optimized results are faster than the DSP version as expected.

For the IMDCT calculation, we use radix-2³ FFT algorithm in DSP implementation. Then, we use fixed-point data type to present the input data. In addition, we rearrange the data calculation order in IFFT. Furthermore, we use intrinsic functions to speed up the IFFT. The test result is 503 times faster than the original version. The details of our design and results can be found in chapter 4.

We use FPGA to implement the fixed-output-rate Huffman decoder. Also, we modify this architecture to a more efficient variable-output-rate architecture. But the latter is in fact slower than the former due to the complexity of the control signals, which create slow paths on FPGA. The FPGA implementation is about 56 times faster than the DSP implementation.

We also use FPGA to implement IFFT. Similar to the DSP implementation, we use radix-2³ FFT algorithm for IFFT. The 512-point IFFT has a heavy computational load.

Therefore, we use three types of PE to perform these computations in order to reduce the chip area. The FPGA implementation of IFFT is about 4 times faster than the fastest DSP version.

The details of our design and results can be found in chapter 5.

Due to the board hardware defect and/or system software bug, we are unable to run and test our implementations on the DSP/FPGA baseboard yet. Thus, there are two important targets in the future. First, the DSP implementation should be executed on the DSP baseboard, and the streaming interface is needed to connect to the Host PC in real time execution. The Host PC reads in the source data from the file in the memory, and then it transfers the data to

DSP through the streaming interface. After DSP has processed data, it transfers data back to the Host PC. The second target is to integrate the FPGA implementation together with DSP to demonstrate the overall system. DSP does the pre-processing and then it transfers the data to FPGA through the streaming interface. After FPGA has processed the data, it transfers data back to DSP.

Bibliography

[1] ISO/IEC JTC/SC29/WG11 MPEG, International Standard ISO/IEC 13818-7 “Advanced Audio Coding”, 1997

[2] ISO/IEC JTC/SC29/WG11 MPEG, International Standard ISO/IEC 14496-3 “Advanced Audio Coding”, 1999

[3] M. Bosi and et al., “ISO/IEC MPEG-2 Advanced Audio Coding”, JAES, Vol.45, No.10 Oct. 1997

[4] M. Wolters and et al., “A closer look into MPEG-4 High Efficiency AAC”, AES 115th Convention Paper, 2003

[5] Innovative Integration, “Quixote User’s Manual”, Dec. 2003

[6] Texas Instruments, “TMS320C6000 Programmer’s Guide”, SPRU198F, Feb. 2001 [7] Texas Instruments, “TMS320C6000 CPU and Instruction Set Reference Guide”,

SPRU189F, Jan. 2000

[8] Texas Instruments, “TMS320C6000 Peripherals Reference Guide”, SPRU190D, Mar.

2001

[9] Texas Instruments, “TMS320C64x Technical Overview”, SPRU395B, Jan. 2001 [10] Xilinx, “Virtex-II Platform FPGA User Guide”, UG002(v1.7) Feb. 2004

[11] K. S. Lee and et al., “A VLSI implementation of MPEG-2 AAC decoder system,” ASICs, 1999 AP-ASIC '99. The First IEEE Asia Pacific Conf., pp. 139-142, 23-25 Aug. 1999 [12] M. K. Rudberg and L. Wanhammer, “New approaches to high speed Huffman decoding”,

IEEE Int. Symp., Vol. 2, pp. 149-152, 12-15 May 1996

[13] M. K. Rudberg and L. Wanhammar, “High speed pipelined parallel Huffman decoding,”

IEEE Proc. Int. Symp., Vol. 3, pp.2080-2083, 9-12 Jun. 1997

[14] P. Duhamel and et al., “A fast algorithm for the implementation of filter banks based on

‘time domain aliasing cancellation’”, IEEE Trans. Acous., Speech, Signal Processing, ICASSP, Vol. 3, pp. 2209-2212, Apr. 1991

[15] P. Duhamel and H. Hollmann, “Split-radix FFT algorithm for complex, real, and real symmetric data,” IEEE Trans. Acous., Speech, Signal Processing, ICASSP, Vol. 10, pp.

784-787, Apr. 1985

[16] S. He and M. Torkelson, “A new approach to pipeline FFT processor”, IEEE Proc. 10th Int. Parallel Processing Symp., IPPS, Apr. 1996

[17] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM (de)modulation”, IEEE Proc. URSI Int. Symp. Signals, Syst., Electron., pp. 257-262, Oct. 1998

[18] W. C. Yeh and C. W. Jen, “High speed and low power split-radix FFT,” IEEE Trans.

Signal Processing, Vol. 51, No. 3, Mar. 2003

Appendix A

N/4-point FFT Algorithm for MDCT

We will describe the N/4-point complex FFT in detail in this appendix. We will show the mathematical derivation to the algorithm. The details can be found in [14].

A.1 MDCT

The MDCT can be seen as a block of signals xm(n) project on a set of cosine functions as follow that this transform is not invertible, since

only N/2 output points are linearly independent.

However, if two adjacent block x_m(n) and x_m+1(n) overlap by N/2, the set of values x_m(n) can be removed from two successive output sets Y_m-1(n) and Y_m(n). Let

−

this reconstruction is perfect when the windows are symmetric and identical, thus g(n)=h(n).

A.2 N/4-Point FFT

The antisymmetry of the FFT output coefficients allows that we only compute half the input signals. In order to obtain a formula which is easy to handle, we have chosen to keep the even coefficients. The odd ones are reduced by Eq. (A.2). Hence Eq. (A.1) is equivalent to

which can be rewritten as

− permutation, which is typical in the DCT case

Here we will use two symbols:

)

Appendix B

Radix-2 ² and Radix-2 ³ FFT

We will describe the radix-2² and radix-2³ FFT in detail in this appendix. We will discuss the mathematical derivation to the algorithm. The details can be found in [16] and [17].

B.1 Radix-2 ² FFT

At first, we will see the analytical expression for the FFT is

−

and the analytical expression for the IFFT is

−

The derivation of the radix-2² FFT algorithm starts with a substitution with a 3-dimensional index map. The index n and k in Eq. B.1 can be expressed as

n N

When the above substitutions are applied to DFT definition, the definition can be rewritten as

which is a general radix-2 butterfly

Now, the two twiddle factor in Eq. B.6 can be rewritten as

Observe that the last twiddle factor in the above Eq. B.5 can be rewritten.

3 DFT definition with four times shorter.

−

The result is that the butterflies have the following structure. The PE2 butterfly takes the input from two PE1 butterflies.

4 )]

These calculations are for first radix-2² butterfly, or components the PE1 and PE2 butterflies. The PE1 is the one represented by the formulas in brackets in Eq. B.10 and PE2 is the outer computation in the same equation. The complete radix-2² algorithm is derived by applying this procedure recursively.

n N

When the above substitutions are applied to DFT definition, the definition can be rewritten as

is a general radix-2 butterfly

Now, the two twiddle factor in Eq-. B.13 can be rewritten as

)

Substitute Eq. B.14 into Eq. B.13, and expand the summation with regard to index n1, n2 and n₃. After simplification we have a set of 8 DFT of length N/8.

There a third butterfly structure has the expression of

)

As in the Radix-2² FFT algorithm, Eq. B.6 and Eq. B.10 represent the first two columns of butterflies with only trivial multiplications in the Radix-23 FFT algorithm. The third butterfly contains a special twiddle factor

)

作者簡歷

曾建統，民國六十七年出生於新竹市。民國九十一年六月畢業於國立交通大學電子工程學系，同年九月進入國立交通大學電子所就讀，從事多媒體訊號處理系統設計與實現之相關研究。民國九十三年六月取得碩士學位，碩士論文題目為

『MPEG-4 先進音訊編碼在 DSP/FPGA 平台上的實現與最佳化』。研究範圍與興趣

包括：多媒體訊號處理，軟硬體整合實現與最佳化。

在文檔中 MPEG-4先進音訊編碼在DSP/FPGA平台上的實現與最佳化 (頁 75-0)