Conclusions - 用於先進音訊編碼之高效率編碼策略

The main contributions of this dissertation are summarized as follows:

Cascaded Trellis-Based Rate-Distortion Control Algorithm (CTB)

The cascaded trellis-based (CTB) optimization scheme is a low complexity and high performance R-D control algorithm for the MPEG-4 AAC coder. It is basically a fast algorithm of the previous joint trellis-based (JTB) scheme. The optimization procedure for finding coding parameters, SF and HCB, in the CTB scheme is partitioned into two sequential steps with carefully inserted steps. It thus has the advantage of a much reduced computation. The proposed CTB scheme is approximately 71 to 142 times faster than the JTB scheme. Simulation results show that both the objective and subjective quality of the proposed CTB scheme is close to that of the JTB scheme.

In addition, we also propose a lossless fast search algorithm for the trellis-based optimization on HCB, which provides roughly a 4-times speed-up. Furthermore, two non-uniform search algorithms for trellis-based MNMR optimization on SF, so-called GMNU and LMNU, are proposed for reducing the candidates in trellis search.

Simulation results indicate that another factor of 25 speed-up can be achieved using GMNU with negligible audio quality loss. These two fast search algorithms can be applied to both the CTB scheme and the JTB scheme.

Enhanced BFOS Bit Allocation Algorithm for AAC (EBFOS)

EBFOS is an efficient bit allocation algorithm for MPEG-4 AAC. Instead of performing the heavy trellis search through entire frame, the bits are allocated to the most needed band step by step in the EBFOS scheme. It thus has the advantages of low complexity and higher flexibility. The performance of the EBFOS scheme is better than that of VM-TLS and the generalized BFOS algorithms. Moreover, the EBFOS scheme has a performance close to the trellis-search based algorithm (optimized for the average NMR,

JTB-ANMR). For reducing calculations, a fast algorithm is also introduced for the EBFOS scheme. The fast version can reduce the complexity to 1/10. Simulation result shows that there is almost no loss of performance (less than 0.06dB) in adopting the fast algorithm for the EBFOS scheme.

Perceptually Weighted Inter-Channel Prediction (PW-ICP)

PW-ICP is an efficient inter-channel redundancy removal algorithm. Different from the M/S stereo coding or the KLT-based approach, the PW-ICP scheme does not propagate the quantization noise from one channel to other channel. Therefore, no extra perceptual masking control is needed. Moreover, similar to the INT-DCT based approach, no audio quality degradation is induced by our method. In our PW-ICP algorithm, two types of predictors, TSP and SCP, are introduced. Also presented in this dissertation are simulations and detailed discussions on how to determine the parameters of the predictors. We find that the performance of our new index, PWPEmin, is better than that of traditional correlation method, Corrmax. For a chosen predictor order, the predictor with adaptive order can achieve the best performance for all kinds of audio signals. (Larger order predictors are often not preferred.) As for the predictor with fixed order, in general, the order around 1 is appropriate for TSP and the order less than or equal to 5 works best for SCP.

To evaluate the performance of our PW-ICP algorithms, the INT-DCT based approach is also implemented and compared. We have tested this scheme on a number of two-channel and five-channel audio sequences. Based on the simulation results, we find that the bit rate reduction performance of our new method on the average is about 10%

better than that of the well-known INT-DCT based approach for the audio sequences that show 5% or more bit rate reduction than the separate-channel coding.

Bibliography

[1] T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc. of IEEE, vol. 88, pp.

451- 515, Apr. 2000.

[2] ISO/IEC JTC1/SC29, “Information technology – Coding of audio-visual objects,”

ISO/IEC IS-14496 (Part 3, Audio), 1999.

[3] J. Herre, B. Grill, “Overview of MPEG-4 audio and its applications in mobile communications,” Proc. of WCCC-ICSP, vol. 1, pp. 11-20, Aug. 2000.

[4] H. Purnhagen, "An Overview of MPEG-4 Audio Version 2," AES 17th International Conference on High-Quality Audio Coding, Firenze, Sept. 1999.

[5] M. Bosi, et al., “ISO/IEC MPEG-2 advanced audio coding,” Journal of Audio Engineering Society, vol. 45, pp. 789-812, Oct. 1997.

[6] “The MPEG audio web page,” [Online]. Available:

http://www.tnt.uni-hannover.de/project/mpeg/audio/

[7] A. Aggarwal, et al., “Trellis-based optimization of MPEG-4 advanced audio coding,”

Proc. IEEE Workshop on Speech Coding, pp. 142-144, Sept. 2000.

[8] A. Aggarwal, et al., “Near-optimal selection of encoding parameters for audio coding,”

Proc. of ICASSP, vol. 5, pp. 3269-3272, May 2001.

[9] P. H. Westerink, et al., “An optimal bit allocation algorithm for sub-band coding,” Proc. of ICASSP, pp. 757-760, 1988.

[10] E. A. Riskin, “Optimal bit allocation via the generalized BFOS algorithm,” IEEE Trans. On Information Theory, vol. 37, No. 2, Mar. 1991.

[11] G. Diego and R. Sudhakar: “Optimal Bit Allocation for MPEG Audio Standard Using the Generalized BFOS Algorithm", 98^th AES Convention, Paris, Feb. 1995

[12] G.D. Forney. “The Viterbi Algorithm,” Proc. IEEE, vol.1 61, pp. 268-278, Mar. 1973.

[13] K. Sayood, Introduction to Data Compression, 2^nd ed., Morgan Kaufmann Publishers, San Francisco, 2000.

[14] D. Yang, et al., “An inter-channel redundancy removal approach for high-quality multichannel audio Compression”, AES 109^th Convention, Sept. 2000, Los Angeles, USA.

[15] Y. Wang, et al., “A multichannel audio coding algorithm for inter-channel redundancy removal”, AES 110^th Convention, May 2001, Amsterdam, Netherlands.

[16] ISO/IEC 11172-3 "Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s, Part 3: Audio," 1992.

[17] ISO/IEC 13818-3, "Information Technology – Generic Coding of Moving Pictures and Associated Audio, Part 3: Audio," 1994-1997.

[18] J. H. Rothweiler, "Polyphase Quadrature Filters - A new Subband Coding Technique,"

International Conference IEEE ASSP 1983, Boston, pp. 1280-1283.

[19] J. Princen, A. Johnson, and A. Bradley, "Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation," Proc. of the ICASSP 1987, pp.

2161-2164.

[20] ISO/IEC JTC1/SC29/WG11 MPEG, International Standard ISO/IEC 13818-7 "Generic Coding of Moving Pictures and Associated Audio: Advanced Audio Coding", 1997

[21] ISO/IEC JTC1/SC29/WG11 (MPEG), International Standard 14496-3 Amd 1: "Coding of Audio-Visual Objects: Audio", 2000

[22] S. Golomb, “Run-length encodings,” IEEE Transactions on Information Theory, vol. 12, pp. 399-401, July 1966.

[23] H. Najafzadeh and P. Kabal, “Perceptual bit allocation for low rate coding of narrowband audio,” Proc. of ICASSP, vol. 2, pp. 893-896, June 2000.

[24] European Broadcasting Union, Sound Quality Assessment Material: Recordings for Subjective Tests Brussels, Belgium, Apr. 1988.

[25] ITU-R BS.1116, “Method for subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems,” 1994.

[26] Draft ITU-T Recommendation BS.1387: “Method for objective measurements of perceived audio quality,” July 2001.

[27] A. Lerchs, “EAQUAL software”, Version 0.1.3 alpha, [Online]. Available:

http://mitiok.free.fr/c.htm

[28] K. Brandenburg and G. Stoll, “ISO-MPEG-1 audio: a generic standard for coding of high-quality digital audio,” J. Audio Eng. Soc., vol. 42, pp.780-792, Oct. 1994.

[29] L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. The Wadsworth Statistics/Probility Series. Belmont, California: Wadsworth, 1984.

[30] T. Liebchen, “Lossless audio coding using adaptive multichannel prediction”, AES 113^th Convention, Oct. 2002, Los Angeles, USA.

[31] J. D. Johnston and S. S. Kuo, “A study of why cross channel prediction is not applicable to perceptual audio coding”, IEEE Signal Processing Letters, Vol. 8, Issue. 9, pp. 245 - 247, Sept. 2001.

在文檔中用於先進音訊編碼之高效率編碼策略 (頁 89-93)