MPEG-1 Layer III 音訊編解碼演算法最佳化及DSP晶片實現

全文

(1)઼ ϲ Ϲ ఼ ̂ ጯ 電機與控制工程學系 Ⴧ!̀!ኢ!͛! MPEG-1 LAYER III 音訊編解碼演算法最佳化及 DSP 晶片實現 MPEG-1 LAYER III AUDIO CODEC OPTIMIZATION AND IMPLEMENTATION ON A DSP CHIP 研究生 : 林煜翔指導教授 : 吳炳飛教授. 中華民國九十三年七月.

(2) MPEG-1 LAYER III 音訊編解碼演算法最佳化及 DSP 晶片實現. 研究生 : 林煜翔. Student : Yu-Shiang Lin. 指導教授 : 吳炳飛教授. Advisor : Prof. Bing-Fei Wu. 國立交通大學電機與控制工程學系碩士論文. A Thesis Submitted to Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science National Chiao Tung University in Partial Fulfillment of the Requirements for the Degree of Master in Electrical and Control Engineering July 2004 Hsinchu, Taiwan, Republic of China. 中華民國九十三年七月.

(3) MPEG-1 LAYER III 音訊編解碼演算法最佳化及 DSP 晶片實現. 學生 : 林煜翔. 指導教授 : 吳炳飛教授. 國立交通大學電機與控制工程學系碩士班. 摘要這篇論文提出一套 MP3 編解碼的最佳化演算法及有效的 16 位元定點 DSP 實現。在 MP3 編碼最佳化中，我們基於移除計算量龐大的聲響心裡模型，提出一套新的速率控制迴圈演算法，並採用頻寬控制及動態位元分配等。在 MP3 解碼最佳化中，我們提出一套新的解量化方程式實現法，並可適用在定點處理器中；在實現 IMDCT 和子頻帶合成上，也採用了快速演算法。我們將 MP3 編解碼最佳化的演算法實現在一顆 16 位元定點 DSP，ADSP-2181 上，並採用動態定點格式降低定點運算時的失真。實現後的 MP3 編碼器僅需 21.05 MIPS 及 44 千位元組記憶體，而解碼器僅需 18.67 MIPS 及 44.3 千位元組記憶體，相較於其他商業化產品及學術研究，能提供最好的效能。最後，本篇論文還提出一個基於 32 位元 RISC 及 DSP 的雙核心嵌入式系統整合設計。. i.

(4) MPEG-1 LAYER III AUDIO CODEC OPTIMIZATION AND IMPLEMENTATION ON A DSP CHIP Student : Yu-Shiang Lin. Advisor : Prof. Bing-Fei Wu. Department of Electrical and Control Engineering National Chiao Tung University. ABTRACT This thesis presents the algorithm optimization and efficient 16-bit fixed point DSP implementation of MP3 encoding and decoding algorithms. In the MP3 encoding algorithm, we propose several approaches including the removal of psychoacoustic model, simplified iteration loop, fast rate control loop and applying of bandwidth control and dynamic bit allocation proportional to the energy of granules. In the MP3 decoding algorithm, we propose a fast dequantization method with high SNR in fixed point implementation and apply fast algorithms in IMDCT and subband synthesis. The algorithms are also completely realized on a 16-bit fixed point DSP, ADSP-2181, and the dynamic fixed point format is applied to improve audio quality. The MP3 encoder consumes 21.05 MIPS and 44k bytes memory, and the MP3 decoder consumes 18.67 MIPS and 44.3k bytes memory. Both have superior performance than other commercial products and paper works. Finally, this thesis also presents an integrated design of a dual core embedded system with a 32-bit RISC, Intel® StrongARM SA-1110, and ADSP-2181 DSP.. ii.

(5) ACKNOWLEDGEMENTS 首先要感謝我的指導教授吳炳飛教授四年來的指導，從大三的專題指導以來，吳教授給了我許多機會接觸各種研究領域及參加各種比賽，並提供豐沛的研究資源，讓我的研究得以順利進行。另外要特別感謝已畢業的錢昱瑋、許子偉及魏宏宇學長和張芷燕學姐在我剛進入實驗室時，給予熱心的指導，奠定我在音效壓縮理論與實作的基礎。還要感謝呂紹麒與鄭光輝學長帶領我認識嵌入式系統。顏志旭學長給予許多寶貴的意見，並指導我研究及分析的方法。還有一起做研究的黃榮煌同學及進行音質測試的 CSSP 實驗室伙伴們，感謝你們的全力協助，我才能完成這篇論文。還有一同參加比賽的學長姐、實驗室同學及政大伙伴們，大家在比賽過程中的全力參與，讓我們得到的獎項更有意義。另外要十分感謝我的家人，在升學的過程中提供我無憂無慮的環境，並且完全支持我，有你們的支持我才能順利地從研究所畢業。最後要感謝我的女朋友郭小姐，在這六年的求學生涯中，與我分享許多苦與樂，並容忍我長時間待在實驗室做研究。. 謹以本論文獻給最親愛的家人及所有支持關愛我的人. iii.

(6) AWARDS. 本研究在民國九十一年參加旺宏金矽獎第二屆半導體設計與應用大賽，並獲得應用組一獎，得獎作品為「MP3/CD-ROM Recorder System」，與賽成員尚包括許子偉、張芷燕及魏宏宇同學。. 本研究在民國九十二年參加旺宏金矽獎第三屆半導體設計與應用大賽，並獲得應用組二獎，得獎作品為「Multimedia Box」，與賽成員尚包括顏志旭、王坤卿、 iv.

(7) 魏宏宇及鄭光輝同學。. 本研究在民國九十二年參加由中華民國科管學會舉辦的第七屆學生創新獎競賽，並獲得第一名，得獎作品為「向下相容的 MP3 音樂安全機制」，與賽成員尚包括顏志旭、黃榮煌及林映伶同學。. v.

(8) CONTENTS ABTRACT (CHINESE).....................................................................................i ABTRACT (ENGLISH)................................................................................... ii ACKNOWLEDGEMENTS .............................................................................. iii AWARDS .......................................................................................................iv CONTENTS ...................................................................................................vi LIST OF FIGURES .........................................................................................ix LIST OF TABLES ...........................................................................................xi CHAPTER 1. INTRODUCTION .................................................................1 1.1. MPEG/Audio Compression ..................................................................1. 1.2. Motivations .............................................................................................2. 1.3. The Overview of The Proposed Method and Contributions..............3. 1.4. The Experimental Results and Potential Applications.......................3. 1.5. Content and Organization.....................................................................4. CHAPTER 2. ENCODER OPTIMIZATION ................................................6 2.1. Encoding Overview and Complexity Analysis ....................................6 2.1.1 2.1.2 2.1.3 2.1.4. 2.2. Psychoacoustic model II ..................................................................7 Time to frequency mapping transform ............................................9 Iteration loop..................................................................................12 Bitstream formatting ......................................................................13 Simplified PAM-II................................................................................14. 2.2.1 2.2.2 2.3. Distortion control loop analysis.....................................................15 Removal of window switching ......................................................18 Fast rate control loop...........................................................................21. vi.

(9) 2.3.1 2.3.2 2.3.3 2.3.4. Non-uniform quantization..............................................................22 Dynamic bit allocation proportional to the energy of granules .....26 Precise initialization of the quantization parameter.......................27 Fast search of the optimal quantizer parameter .............................31. CHAPTER 3. DECODER OPTIMIZATION ..............................................36 3.1. Decoding Overview and Complexity Analysis...................................36. 3.2. Dequantization .....................................................................................38. 3.3. IMDCT and Subband Synthesis .........................................................42. CHAPTER 4. DSP IMPLEMENTATION ..................................................43 4.1. Target DSP Architecture......................................................................43. 4.2. Data precision optimization in the proposed MP3 encoder .............45. 4.3. Data precision optimization in the proposed MP3 decoder .............50. CHAPTER 5. EXPERIMENTAL RESULTS ...............................................53 CHAPTER 6. DUAL CORE EMBEDDED SYSTEM...................................58 6.1. System Overview..................................................................................58. 6.2. Hardware Platform..............................................................................59 6.2.1 6.2.2. Host system – AdvanTech PCM-7130 SBC ..................................59 DSP system – ADI ADSP-2181 EZ-LAB......................................61. 6.2.3. Design of hardware adapter ...........................................................62. 6.3. Firmware Design..................................................................................66 6.3.1 6.3.2. Linux Character Device Driver......................................................66 DSP BIOS ......................................................................................69. CHAPTER 7. CONCLUSIONS AND FUTURE WORKS .............................71 7.1. Conclusions...........................................................................................71. 7.2. Future Works .......................................................................................72. vii.

(10) REFERENCE ................................................................................................74 APPENDIX ...................................................................................................78. viii.

(11) LIST OF FIGURES Figure 1.. MPEG/audio encoding process....................................................................7. Figure 2.. The absolute threshold of hearing................................................................8. Figure 3.. Frequency masking effect combined with ATH...........................................8. Figure 4.. Temporal masking effect..............................................................................9. Figure 5.. Hybrid transform for time to frequency mapping......................................10. Figure 6.. The 32-channel analysis polyphase filterbank...........................................10. Figure 7.. The coefficient of low-pass filter, h[n] ......................................................11. Figure 8.. The four types of MDCT window and the arrangement............................12. Figure 9.. Distortion control in the iteration loops.....................................................15. Figure 10. Noise analysis before distortion control ....................................................16 Figure 11. Coefficients of bandwidth controller (sampling rate is 44100 Hz)............17 Figure 12. Time domain waveforms from using window switching or not ................20 Figure 13. Rate control in iteration loops....................................................................21 Figure 14. The new rate control algorithm..................................................................22 Figure 15. The error of x f ,g (i ). 0.75. approximation .....................................................25. Figure 16. The histogram to difference between initial and final value of ∆ f , g ........30 Figure 17. The adaptive approach to iterative search optimum parameter .................31 Figure 18. Pseudo code of iteration loops (a) ISO method (b) Proposed method.......34 Figure 19. MPEG/Audio Layer III decoding block diagram ......................................36 Figure 20. Bitstream decoding ....................................................................................37 Figure 21. Frequency to time mapping .......................................................................37 Figure 22. The implementation of y 3f , g (i ) ...................................................................39 1. ix.

(12) Figure 23. The error to real output ratio of y 3f , g (i ) approximation.............................41 4. Figure 24. The error to real output ratio of y 3f , g (i ) fixed point approximation ..........42 4. Figure 25. The ADSP-2181 DSP core and peripheral integration ..............................43 Figure 26. Double precision multiplication, R(32-bit) = X(32-bit) x Y(16-bit). ........46 Figure 27. Data precision between each stage in proposed MP3 encoder ..................47 Figure 28. Data precision between each stage in proposed MP3 decoder ..................50 Figure 29. Different format between subbands and the modified IMDCT .................51 Figure 30. The dual core embedded system................................................................59 Figure 31. PCM-7130 SBC [15] .................................................................................61 Figure 32. ADI ADSP-2181 EZ-LAB evaluation board .............................................62 Figure 33. Functional diagram of hardware adapter ...................................................62 Figure 34. General IDMA transfer protocol [17] ........................................................64 Figure 35. Port access timing ......................................................................................65 Figure 36. The hierarchical view of software, firmware and hardware layer .............66 Figure 37. The firmware block diagram......................................................................69. x.

(13) LIST OF TABLES Table 1.. Predicted complexity to implement MPEG/audio encoder [4]..................14. Table 2.. The number of DSP instruction cycles in calculation of two regions........25. Table 3.. Symbols descriptions of Figure 17 ............................................................32. Table 4.. The average number of inner iteration.......................................................35. Table 5.. The implementation result and comparisons with commercial products ..54. Table 6.. The comparison of peak consumed MIPS in different MP3 encoder........55. Table 7.. The comparison of peak consumed MIPS in different MP3 decoder........55. Table 8.. Test audio samples .....................................................................................56. Table 9.. The subjective evaluation results (1) .........................................................56. Table 10.. The subjective evaluation results (2) .........................................................57. Table 11.. The subjective evaluation results (3) .........................................................57. Table 12.. Host port pins.............................................................................................63. Table 13.. ADSP-2181 IDMA port pins .....................................................................63. xi.

(14) CHAPTER 1. INTRODUCTION. CHAPTER 1.. 1.1. INTRODUCTION. MPEG/Audio Compression Today the digital audio compression has been applied in various current multimedia applications, like network multimedia streaming, online music store, DAB (Digital Audio Broadcasting), digital television and portable devices (pen drive, walkman, voice recorder, cellular phone and etc.). The MPEG/audio compression is the most popular international standard for digital compression of high-fidelity audio. The state-of-the-art algorithms for audio compression, such as MPEG and WMA, transform the audio signal for de-correlation and quantize the transformed coefficient according to the perceptual property determined by the psychoacoustic model (PAM) [1]. In this approach, the limitation of human hearing are exploited to remove the inaudible components of audio signals to achieve a high compression ratio. MPEG/audio offers a choice of three distinct compression layers [2]. This provides a wide range of the trade-off between the codec complexity and the compressed audio quality. Layer I forms the basic algorithms and is suitable for the. 1.

(15) CHAPTER 1. INTRODUCTION. bit rate above 128 Kbps per channel. Layer II targets the bit rates around 128 kbps per channel. Possible applications include the audio coding for DAB and the storage of synchronized video-and-audio sequences on CD-ROM. Layer III is the most complex but offers the best audio quality, particularly for the bit rate around 64 kbps per channel. This layer suits the audio transmission over ISDN and the multimedia application on portable devices. Which layer will be employed for an application is determined by the computational complexity and the performance requirement [3].. 1.2. Motivations MPEG/audio Layer III, also referred as MP3, is the most popular digital audio format on Internet now. And with the help of Internet, MP3 has also gained popularity as a portable solid-state audio format. Recently, various kinds of devices that support MP3 application have come out in the consumer market. However, most of all have “decoding-only” features. Few of them support MP3 encoding with high quality. This is solely because MP3 encoding algorithm often consumes too much computational resources to implement on the system powered by batteries. A high quality MPEG/audio Layer III encoding and decoding algorithms, which are optimized for 16-bit fixed point arithmetic, and a real-time implementation on a low-cost 16-bit fixed-point DSP are proposed in this thesis. ADI ADSP-2181 is chosen as the target DSP. The prototype design of a dual core embedded system is also presented in this thesis. The work is done by integrating the proposed MP3 codec implementation on ADSP-2181 DSP with a 32-bit RISC, Intel® StrongARM SA-1110 CPU, on an existing embedded system, AdvanTech PCM-7130 SBC.. 2.

(16) CHAPTER 1. INTRODUCTION. 1.3. The Overview of The Proposed Method and Contributions In this thesis, we propose several fast algorithms for MP3 encoding and decoding. In the MP3 encoding algorithm, the psychoacoustic model (PAM), the most computationally complex part of the entire MP3 encoding algorithm, is removed based on several experimental results, and the PAM-based distortion control loop is also simplified. Some techniques including bandwidth control and dynamic bit allocation proportional to the energy of granules are added to improve the audio quality. Furthermore, a fast rate control loop algorithm is proposed to reduce the complexity of non-uniform quantizer and the number of iterations. The complexity of non-uniform quantizer is reduced by moving the time-consuming operation outside the iteration and by applying piecewise linear approximation in the non-uniform quantization. Thus the quantizer is divided into two parts and consumes less than 10 and 4 DSP instructions outside and inside the iteration respectively. The number of iterations is reduced by the precise initialization and the fast iterative search of the non-uniform quantizer parameter. Thus the average number of iterations is only 1.8 while the original method takes more than 45 iterations in average. In the MP3 decoding algorithm, the dequantization operation is implemented by applying piecewise linear approximation and Newton’s method for root-finding to achieve higher SNR. And we adopt Lee’s fast DCT/IDCT algorithm to realize the IMDCT and the matrixing operation in the synthesis filterbank.. 1.4. The Experimental Results and Potential Applications The results of the proposed MP3 codec optimization and implementation are also analyzed. The MP3 encoder consumes 21.05 MIPS and 44k bytes memory and 3.

(17) CHAPTER 1. INTRODUCTION. the MP3 decoder consumes 18.67 MIPS and 44.3k bytes memory. Both have superior performance than other commercial products and paper implementations. The superior performance in MP3 codec implementation mainly brings two areas of potential application. . The low requirements of MIPS and memory are suitable for the system powered by battery, like pen drive, walkman, voice recorder, cellular phone and etc. Thus these devices can support both MP3 encoding and decoding.. . The system integration part gives the probability of taking the low cost ADSP-2181 as an audio coprocessor in a large system. By applying the firmware loading protocol proposed in this thesis, the system can support not only MP3 but also more audio application. The innovative feature is suitable for many products nowadays like PVR/ DVR, DVD player/ recorder, IP phone, digital broadcasting system and etc.. 1.5. Content and Organization This thesis contains seven chapters: . Chapter 1 introduces the digital audio compression algorithms and the motivation, overview and contribution of this thesis.. . Chapter 2 introduces the MPEG/audio Layer III encoding algorithm and brings the proposed optimization. The proposed methods are mainly focused on minimizing the complexity of the PAM-based bit allocation process and improving the coding efficiency. Based on a series of experiments and analysis, the PAM is simplified, and a new fast bit allocation algorithm is developed.. . Chapter 3 introduces the MPEG/audio Layer III decoding algorithm and brings the proposed optimization. The proposed methods are 4.

(18) CHAPTER 1. INTRODUCTION. focused on minimizing the complexity of the dequantization and the filterbank. . Chapter 4 introduces the 16-bit fixed-point DSP, ADSP-2181 and brings the MP3 codec DSP implementation of proposed methods.. . Chapter 5 presents the experimental results and comparisons with other methods.. . Chapter 6 introduces an application example, a dual core embedded system architecting by Intel® StrongARM MPU and ADI ADSP-2181 DSP, and brings the firmware design.. . Chapter 7 brings the conclusions and future works.. . Appendix contains the pictures of the whole system and sub-systems.. 5.

(19) CHAPTER 2. ENCODER OPTIMIZATION. CHAPTER 2.. 2.1. ENCODER OPTIMIZATION. Encoding Overview and Complexity Analysis Figure 1 shows the block diagram of the MPEG/audio Layer III encoding process. The 1152 consecutive PCM samples are grouping together and called one audio frame. The time to frequency mapping transforms the audio input into the spectral lines frame by frame. Then these spectral components are divided into several scalefactor bands according to the critical-band rate. The audio input simultaneously passes through the PAM-II, psychoacoustic model II, that determines the ratio of the signal energy to the masking threshold for each scalefactor band. To achieve the bit rate constraint, the rate controller varies the quantizer in an orderly way, quantizes the spectral values and counts the number of Huffman code bits required to code the quantized values. The quantizer in MP3 is non-uniform so that the quantization noise depends on the quantized value instead of the quantization parameters like the general uniform quantizers. Huffman coding is chosen as the lossless coding tool while the Huffman tables are pre-defined and have been statistically analyzed [5]. The distortion controller adapts the scalefactors to control the quality when the quantization noise exceeds the masking threshold. 6.

(20) CHAPTER 2. ENCODER OPTIMIZATION. Hybrid transform for time to frequency mapping. PCM audio input. Rate control for bit allocation. Masking Threshold. Distortion control for noise allocation. Psychoacoustic Model II. Iteration loop. FFT. Figure 1.. Bitstream formatting. Encoded bitstream. Ancillary data (optional). MPEG/audio encoding process. The functionality of each block will be described in the following subsections. 2.1.1 Psychoacoustic model II The psychoacoustic model, a model of the human auditory perception, supplies the non-uniform quantization block with the information on how to quantized and scaled based on their perceptual relevance. The relevance is denoted as the ability to mask other signals (maskee) for a signal (masker). Usually in MPEG/audio coding, the maskee indicates the noise from the non-uniform quantization of transformed coefficients. This masking is a perceptual property of the human auditory system that occurs when the presence of strong audio signal make a temporal or spectral neighborhood of weaker audio signal imperceptible. Three types of auditory masking effects are described below: . The absolute threshold of hearing, ATH: It is characterized by the minimum intensity of a pure tone that the ear can hear in a noiseless environment. This threshold is frequency dependent and typically shows a minimum (indicating the maximum sensitivity of ear) at frequencies between 1kHz to 5kHz. A typical curve of ATH is shown in Figure 2. 7.

(21) CHAPTER 2. ENCODER OPTIMIZATION. The absolute threshold of hearing. Sound Pressure Lavel, SPL(dB). 100. 80. 60. 40. 20. 0 10. Sound Pressure Level, SPL (dB). Figure 2.. 2. 3. 10 Frequency(Hz). 10. 4. The absolute threshold of hearing. Masker. 70 60. Masking Threshold 50 Threshold in Quiet 40 30. Maskee. 20 10 0 0.02. 0.1. 0.5. 1. 2. 5. 20. Frequency (kHz). Figure 3.. . Frequency masking effect combined with ATH. The frequency masking: it, also called simultaneous masking, is a frequency domain phenomenon where a weaker signal (maskee) can’t be perceptible by a simultaneously occurring stronger signal (masker) as 8.

(22) CHAPTER 2. ENCODER OPTIMIZATION. long as they are close enough to each other in frequency. The masking threshold is measured when any signal below is imperceptible and depends on the sound pressure level and the frequency of the masker. As shown in Figure 3, the complete masking threshold is combined with the masking threshold of the masker and the absolute threshold of hearing. . The temporal masking: It is a phenomenon that relatively loud sounds in an audio signal, such as a loud trumpet’s note, will tend to overpower. Sound Pressure Level, SPL (dB). other sounds that occur just before and just after it as shown in Figure 4.. Simultaneous Masking. 90 Pre-Masking. Post-Masking. 50. -40. 0. 20. 180. 0. 40. 80. 120. 160. Time (ms). Figure 4.. Temporal masking effect. 2.1.2 Time to frequency mapping transform MP3 algorithm uses a hybrid transform to perform time to frequency mapping. As shown in Figure 5, the hybrid transform includes a 32-channel analysis polyphase filterbank, also called subband analysis, and an MDCT filterbank. Before passing the frequency lines (transformed coefficients) into next stage of the encoding process, a reduction of alias is introduced here in order to reduce amount of information for transmission.. 9.

(23) MDCT Window. MDCT. window select－ normal, start, short, or stop. ...... ... ... Subband 31. ...... MDCT. Output transform coefficient. ...... MDCT Window. Alias Reduction (only for long blocks). Subband 1. ...... MDCT. ...... MDCT Window. ...... Subband 0. ... ... Input audio frame. Analysis Polyphase Filterbank. CHAPTER 2. ENCODER OPTIMIZATION. long or short block control. From psychoacoustic model II. Figure 5.. Hybrid transform for time to frequency mapping. Figure 6 is the function diagram of 32-channel analysis polyphase filterbank. It is composed of 32 band-pass filters. The band-pass filter, Hi(n) is generated by modulating the low-pass filter, h(n), to the ith subband as (1). The coefficient of h(n) is shown in Figure 7, ⎛ π ⋅ (2 ⋅ i + 1) ⋅ (n − 16 ) ⎞ H i (n ) = h(n ) ⋅ cos⎜ ⎟ , where n = 0 ~ 511 . 64 ⎝ ⎠ Input audio signals x(n). (1). Output subband signals H0(n). H1(n). P0(n) P1(n). S0(n)={..., P0(0), P0(32), P0(64), ...}. 32. S1(n)={..., P1(0), P1(32), P1(64), ...}. 32. S31(n)={..., P31(0), P31(32), P31(64), ...}. ……. 32. H31(n). Figure 6.. P31(n). The 32-channel analysis polyphase filterbank. The 32 consecutive audio signals are simultaneously passed into the 32 band-pass filters. The filtering (with 480 overlapped inputs) outputs are 10.

(24) CHAPTER 2. ENCODER OPTIMIZATION. down-sampled and the output subband signal are then produced.. 0.04 0.035 0.03. h(n). 0.025 0.02 0.015 0.01 0.005 0 -0.005. 0. Figure 7.. 50. 100. 150. 200. 250 n. 300. 350. 400. 450. 500. The coefficient of low-pass filter, h[n]. The MDCT (Modified Discrete Cosine Transform) performs finer resolution of the 32 subband outputs from the analysis polyphase filterbank as shown in Figure 5. First the subband output passes windowing operation. MDCT uses four types of window as shown in Figure 8 (a) to (d). MP3 specifies two different MDCT block lengths: long block of 18 samples and short block of 6 samples. The normal, start and stop windows are employed in the granule denoted as long block. And the short window is employed in the granule denoted as short block. As shown in Figure 8 (e), each window is 50% overlapped with neighborhood window. So the window size is 36 and 18 respectively. The start and stop windows are the so-called adaptive windows. The start window provides adaptation from normal window to short window and the stop window provides adaptation from short window to normal window. Which window is employed is determined by PAM-II. In general, the long. 11.

(25) CHAPTER 2. ENCODER OPTIMIZATION. block length provides better frequency resolution (less block effect) with stationary characteristic, and the short block length provides better time resolution with transient.. 1. 1. 1. 1. 0.8. 0.8. 0.8. 0.8. 0.6. 0.6. 0.6. 0.6. 0.4. 0.4. 0.4. 0.4. 0.2. 0.2. 0.2. 0.2. 0. 10. 20. 30. 0. (a) Normal. 10. 20. 0. 30. (b) Start. 10. 20. 0. 30. (c) Stop. 10. 20. 30. (d) Short. 1 0.8 0.6 0.4 0.2 0. 0. 20. 40. 60. 80. 100. 120. (e) The arrangement of overlapping MDCT windows. Figure 8.. The four types of MDCT window and the arrangement. The formula of MDCT is shown in (2), n −1 ⎛ π ⎛ ⎞ n⎞ n x(i ) = ∑ z (k ) ⋅ cos⎜⎜ ⋅ ⎜ 2 ⋅ k + 1 + ⎟ ⋅ (2 ⋅ i + 1)⎟⎟ , where i = 0 ~ − 1 . (2) 2⎠ 2 k =0 ⎝ 2⋅n ⎝ ⎠. The n is 36 for long block and 12 for short block. 2.1.3 Iteration loop The iteration loop plays a important role of performing “quantization” and “Huffman coding” to achieve a high compression ratio. This block outputs the coded data satisfying human auditory system and the correlative side 12.

(26) CHAPTER 2. ENCODER OPTIMIZATION. information. The iteration loop allocates the bits and the allowable noise to each scalefactor band from two main modules: outer and inner iteration loop. The outer iteration loop, also called distortion control loop, controls the quantization noise produced by the non-uniform quantization within the inner iteration loop. The scalefactor of the scalefactor band is adjusted to reduce the quantization noise if the quantization noise is found to exceed the masking threshold obtained from PAM-II. The outer loop is executed until the actual noise is below masking threshold in each scalefactor band. The inner iteration loop, also called rate control loop, does the actual quantization. The quantized coefficients are then Huffman coded, and the number of coded bits is counted. If Huffman coder demands bits more than the frame can supply, the quantizer parameter needs to be adjusted. The inner iteration loop is repeated with different quantizer parameters until the demanding bits of Huffman coder is small enough. The Huffman coding algorithm is based on 32 static Huffman tables, provides lossless compression and thereby reduces the amount data to be transmitted without loss of the quality. 2.1.4 Bitstream formatting This block produces the MPEG/audio Layer III compliant bitstream. The Huffman coded frequency lines, side information and frame header are assembled to form the bitstream. Ancillary data not necessarily related to the audio frame can be inserted into the coded bitstream. Table 1 summarizes the complexity of MP3 encoding algorithm in DSP MIPS. According to the analysis, PAM and iteration loops are two of the most time-critical processes. PAM-II normally requires transcendental computations 13.

(27) CHAPTER 2. ENCODER OPTIMIZATION. such as logarithm, exponential and power, which are often computationally demanding. Table 1.. Predicted complexity to implement MPEG/audio encoder [4]. MP3 Encoder. MIPS. Hybrid transform. 25. PAM-II. 90. Iteration loop. 70. Etc.. 5. Total. 190. Another computational demanding task is the iteration loop, also called bit or noise allocation process. The process finds the optimal quantization parameters and scalefactors to obtain the best audio quality in a limited bit resource. Because the quantizer is non-uniform, MP3 adapts an iterative approach to evaluate the parameters. Thus it is based on analysis-by-synthesis scheme. The experimental result shows that the number of iterations per audio granule reached up to 50. It should be also mentioned that the number of iterations depends on the characteristic of input signal and the execution cycles are also varied in each frame.. 2.2. Simplified PAM-II As shown in Table 1, the traditional MP3 encoding algorithm consumes too much MIPS and is hard to be implemented on power-limited devices. Since the complexity analysis shows that both of the most computationally demanding processes are related to ISO PAM-II, we first consider the possibility of encoding without ISO PAM-II [4].. 14.

(28) CHAPTER 2. ENCODER OPTIMIZATION. 2.2.1 Distortion control loop analysis Figure 9 shows the traditional iterative approach to implement distortion control. After the rate control loop quantizes the spectral lines, the distortion control loop first reconstructs the spectral by inverse quantization of the quantized value, and then we can evaluate the distortion of the quantization works. Then if the distortion exceeds the masking threshold in the scalefactor band, we can amplify the original signal, and then the masking threshold is also amplified. The pre-emphasis process turns the pre-emphasis flag on and amplifies the whole spectral by pre-defined factor if all of the upper four scalefactor bands have unmasked distortion.. Rate control loop Distortion calculation Preemphasis Amplify scale-factor band. Loop break condition. Figure 9.. Distortion control in the iteration loops. The motivation of PAM-II simplification is that PAM-II is ineffective over a certain threshold of bit rate [4]. Figure 10 shows the masking threshold and the quantization noise before applying distortion control in each scalefactor band. The signal was encoded at 128kbps and 256kbps stereo with 44.1kHz sampling rate.. 15.

(29) CHAPTER 2. ENCODER OPTIMIZATION. Distortion v.s. Masking Threshold -40 128kbps 256kbps Masking threshold. -50. Distortion (dB). -60. -70. -80. -90. -100. 2. 4. 6. 8. 10 12 14 Scalefactor band. 16. 18. 20. (a) Sample granule Average distortion -35 128kbps 128kbps with bandwidth control 256kbps Masking threshold. -40. Distortion (dB). -45 -50 -55 -60 -65 -70. 2. 4. 6. 8. 10 12 14 Scalefactor band. 16. 18. 20. (b) Average distortion Figure 10. Noise analysis before distortion control. As observed in Figure 10 (a), the distortion is much lower than masking threshold at 256kbps mode, and then no distortion control is needed. However at 128kbps mode, the distortion exceeds the threshold at the 19th and higher scalefactor bands, and then the distortion control is needed to shape the noise. 16.

(30) CHAPTER 2. ENCODER OPTIMIZATION. Figure 10 (b) shows the average. Similarly, the distortion only exceeds the threshold at higher scalefactor bands. Related research has been made to investigate the contribution of PAM-II to the distortion control [4]. By analyzing the number of distortion control iteration and the result of subjective quality preference tests, Oh et al. [4] showed that PAM-II is unnecessary when the bit rate is over 256kbps. To recover the audio quality at lower bit rate, they proposed a bandwidth control scheme. Subjective test revealed people prefer the sound with a limited bandwidth to the sound with full bandwidth but with unmasked distortion. In this thesis, we also employ bandwidth control of input signal. Figure 11 shows the bandwidth coefficient versus demand bit rate.. Cut-off frequency v.s Bit rate. Cut-off frequency (Hz). 22050 20787 19677 16805 15389 13705 11905 10298 8843 7886 6852 5895 5091 32. 40. 48. 56. 64. 80. 96. 112. 128. 160. 192. 224. 256. 320. Bit rate (kbps). Figure 11. Coefficients of bandwidth controller (sampling rate is 44100 Hz): the corresponding cut-off frequency of each bit rate is obtained from LAME [14]. A low pass filter is applied in the bandwidth control. The ith frequency line 17.

(31) CHAPTER 2. ENCODER OPTIMIZATION. x f , g (i ) in the gth granule of the f th frame is filtered by (3),. ⎧ x (i ) , if i ≤ nint ⎛⎜ Ωc × 576 ⎞⎟ f ⎪ f ,g ⎝ 2s ⎠, L(x f , g (i )) = ⎨ Ω , if i > nint ⎛⎜ fsc × 576 ⎞⎟ ⎪0 ⎠ ⎝ 2 ⎩. (3). where the cut-off frequency, Ω c is defined as the bandwidth coefficient. In other words, the bandwidth control can allocate more bits for low frequency band, and then the quality can be improved. Figure 10 (b) shows the decrease of average distortion when bandwidth control is employed. Experiments shows that the bandwidth control scheme is effective when the MP3 is encoded at lower bit rate. In this thesis, we propose removal of ISO PAM-II and related processes like distortion control and window switching and employ efficient allocation of limited bit resource to recover the audio quality. Later we will address the proposed method to allocate bit resource more efficiently. 2.2.2 Removal of window switching Modern audio compression algorithm often use dynamic window switching to avoid preechoes. Preecho happens when we encode audio signals that the amplitude raises violently in an instant as observed in Figure 12 (a). If the algorithm can’t individually encode the signals of different characteristic, the signals grouped by algorithm will be encoded by using the same quantization parameter, i.e. the quantization noise are spreading to the whole block, and it is hard to get better coding gain. In transform coding based algorithm, signal of the same time-slice are always grouped first and then encoded at a time so the preechoes are unavoidable. But the psychoacoustics reveals that the preecho less than 18ms can be masked by a loud voice behind it. 18.

(32) CHAPTER 2. ENCODER OPTIMIZATION. Thus the algorithm can group less amount of signals and encode them together even if the preecho is produced. That is why we need dynamic window switching in MP3 algorithm. In general PAM-II detects the appearance of preecho by calculating the perceptual entropy (PE), i.e. the predicted amount of bits needed to encode the granule. But PAM-II is not implemented in proposed design for power-limited device. It is also not easy to have another metric to detect the appearance of preecho. Related research shows that encoding without window switching didn’t cause significant negative effect to the audio quality [4]. Figure 12 (b) and (c) show the time domain waveforms encoded with and without window switching in 128Kbps. Preecho appears as a notable difference around the 7000th sample (the 6000th sample of source signal) whether the window switching is used or not.. 19.

(33) CHAPTER 2. ENCODER OPTIMIZATION. 1. x 10. 4. 0.5. 0. -0.5. -1. 0. 2000. 4000. 6000. 8000. 10000. 12000. 8000. 10000. 12000. 10000. 12000. (a) Source signal 1. x 10. 4. 0.5. 0. -0.5. -1. 0. 2000. 4000. 6000. (b) Encoded with window switching 1. x 10. 4. 0.5. 0. -0.5. -1. 0. 2000. 4000. 6000. 8000. (c) Encoded without window switching Figure 12. Time domain waveforms from using window switching or not 20.

(34) CHAPTER 2. ENCODER OPTIMIZATION. 2.3. Fast rate control loop As observed in Figure 13, the rate control loop also called inner iteration loops allocates the bit resources to each frequency line by quantization and Huffman coding. The difficulty is to find an optimal quantizer parameter also called global gain and select a suitable Huffman table. The ISO standard adopts a step-by-step approach to obtain the optimal parameter from an initial value determined by spectral flatness measure. Considering the input range of Huffman coding, more iteration taken in quantization process will be tested to guarantee the quantization output in the range. In this thesis, we propose a new rate control algorithm. Figure 14 illustrates the flowchart of the new algorithm. With the removal of PAM-II and related distortion control loop, the iteration loops is also simplified as Figure 14.. Initialize the quantizer parameter (step-size) Tune the step-size. Non-linear quantization Huffman coding N. # coded_bits < max_bits Y. Distortion control Loop break condition. Figure 13. Rate control in iteration loops. 21.

(35) CHAPTER 2. ENCODER OPTIMIZATION xˆ f , g (i ) = x f , g (i ). 0.75. Xˆ f , g (i ) = max{xˆ f , g (i )} i. Dynamic bit allocation Energy distribution between granules. Variable initialization Derive the initial value of quantizerStepSize. First trial Apply quantization and count the bits needed. Iterative search Apply quantization and count the bits needed. Finalizing process Update predict value. Figure 14. The new rate control algorithm. 2.3.1 Non-uniform quantization In ISO MP3 algorithm, the non-uniform quantizer was defined as (4),. ⎛ ⎛ x'' (i ) ⎞ 0.75 ⎞ ⎜ ⎜ f ,g ⎟ ⎟ y f ,g (i ) = nint⎜ − 0.0946 ⎟ , δ +q ⎜ ⎜⎝ 2 4 ⎟⎠ ⎟ ⎝ ⎠. (4). where nint is a rounding function, q is the lower bound of quantization parameter, i.e. the initial value, δ is the increasing variable, and x'f',g (i ) is the ith frequency line pre-emphasized (5) and amplified (6) in the distortion control loop.. x'f ,g (i ) = x f ,g (i ) ×. 2. z 2 × (1 + z1 ) × P (bi ). 22. , and. (5).

(36) CHAPTER 2. ENCODER OPTIMIZATION. x''f ,g (i ) = x'f ,g (i ) ×. 2. (1 + z1 )× C (bi ). ,. (6). where x f ,g (i ) represents the original frequency line, i is the index of spectral line, z 2 ∈ {0, 1} switches on or off preemphasis, z1 ∈ {0, 1} determines whether the scalefactors are logarithmically quantized with a step size of 2 or. 2 . bi is the scalefactor band index of the ith spectral line. P(⋅) is the preemphasis table defined in [5]. C (bi ) is the scalefactor of the scalefactor band, bi . Since the distortion control is not used in this implementation, (5) and (6) no longer exist. Then (4) can be simplified to (7),. ⎛ ⎛ x (i ) ⎞ 0.75 ⎞ ⎜ ⎜ f ,g ⎟ ⎟ − 0.0946 ⎟ , y f ,g (i ) = nint⎜ ∆ f ,g ⎜ ⎜⎝ 2 4 ⎟⎠ ⎟ ⎝ ⎠. (7). where the quantization parameter ∆ f ,g = δ + q .. (7) is executed iteratively in the finding of an optimal ∆ f ,g . The rounding function nint is unnecessary in fixed point implementation. We can further rearrange (7) to (8), y f ,g (i ) = x f ,g (i ). 0.75. × 2. −. 3 × ∆ f ,g 16. − 0.0946 .. (8). In the rate control iteration, ∆ f ,g is the only running variable. So we can take. x f ,g (i ). 0.75. out from the iteration. Therefore the quantizer can be. decomposed into two equation where one is calculated outside the iteration (9), ˆ x f ,g (i ) = x f ,g (i ). 0.75. ,. (9). 23.

(37) CHAPTER 2. ENCODER OPTIMIZATION. and another is calculated in the iteration (10),. y f ,g (i ) = ˆ x f ,g (i ) × 2. −. 3 × ∆ f ,g 16. − 0.0946 .. (10). The decomposition benefits the complexity reduction of the non-uniform quantizer. The most computationally demanding process, the. x f ,g (i ). 0.75. function, is only calculated once in each granule. And the iterative equation (10) in the fixed point implementation can be simplified to one multiplication, one shift operation and one subtraction. The implementation of (9) is optimized for the target DSP, ADSP-2181. The unsigned 16-bit fixed point inputs x f ,g (i ) ranged from 0 to 65535 are divided into two regions. The first region covering range from 0 to 31 is implemented using a 32-word lookup table. From the probability model of x f ,g (i ). 0.75. , the first region covered over 60 percentage of inputs. A small. lookup table is applied here to speedup the calculation. The second region from 32 to 65535 is implemented using piecewise linear interpolations. There are 11 sub-regions between 32 to 65535. The segmentation is also optimized for the target DSP. Since ADSP-2181 supports hardware detector of leading ones/zeros, we can derive biased log 2 ( x ) in one instruction cycle. Thus the boundaries of sub-regions are proposed to be set to power of 2, i.e. 32, 64, 128, …, 65536. The approximation error has been analyzed in (11),. ε (x f ,g (i )) =. x f ,g (i ). 0.75. − pow 075(x f ,g (i )) x f ,g (i ). 0.75. ,. (11). where pow075 is the implementation of proposed approximation method, also represented by Q1 (⋅) . Figure 15 shows the error to real output ratio, ε . The ratio is around ±1%. 24.

(38) CHAPTER 2. ENCODER OPTIMIZATION. Table 2 summaries the number of DSP instruction cycles in calculation of two regions.. 0.5%. 0%. ε ( x f,g( i)) -0.5%. -1% 0. 1. 2. 3. Figure 15. The error of x f ,g (i ). Table 2.. x f,g( i). 0.75. 4. 5. 6 x 10. 4. approximation. The number of DSP instruction cycles in calculation of two regions. Input range. DSP instruction cycles. Probability Table size. 0 ~ 31. 4. > 60%. 32 words. 32 ~ 65535. 9. < 40%. 22 words. We can rewrite the (10) as (12), ∆ y f ,g (i ) = ˆ x f ,g (i ) × 2 Q × 2 ∆ N − 0.0946 ,. (12). where ∆ N is the integer part, and ∆ Q is the fractional part of fixed point implementation, the multiplication of 2 ∆ N implemented by the hardware barrel shifter. And the 2. 25. ∆Q. −3 × ∆ f ,g 16. . In. can be easily is derived from a.

(39) CHAPTER 2. ENCODER OPTIMIZATION 0. 15. 1. 16-word lookup table that contains fixed point value 216 , 216 , …, 216 . The implementation is denoted as Q2 (⋅) . 2.3.2 Dynamic bit allocation proportional to the energy of granules In the MP3 bitstream, each frame has fixed amount of bit resources on the constant bit rate. With a help of bit reservoir control, we can save the unused bits in the reservoir if the distortion of quantization is imperceptible, and it will benefit the encoding of succeeding frames. But in the proposed algorithm without PAM-II and distortion control, the quality constraint is no longer exist. Then the rate control loop will exploit all the bit resource as possible as it can to encode one audio granule. Normally an audio frame contains two (mono) or four (stereo) granules. The traditional MP3 algorithm portioned out the total bits equally for granules in each frame. An asymmetric allocation of the bit resources which is proportional to the energy of granules is proposed to equalize the quality, i.e. allowed distortion, between granules in the same frame. It is general that the transformed coefficient with higher amplitude will be quantized to higher integer values. And from the property of the Huffman code words, the integer input with higher value is usually coded with more bits. We can also extend the idea to the group of coefficients, i.e. granule. If the granule has more energy or more number of coefficients with higher amplitude, it need more bits to maintain the same quality as others. Considering the non-uniform property of quantizer, the power function is taken into account. From experimental results, the frequency lines below 4,000Hz dominates the full bandwidth (22050Hz) energy. In the proposed approach, for sampling rate of 44.1kHz the score of the granule energy defined as (13) takes only. 4000 44100 2. × 576 ≈ 105 spectral lines of 26. x f ,g (i ). 0.75. for.

(40) CHAPTER 2. ENCODER OPTIMIZATION. calculation. E f ,g =. 105. ∑ ˆx (i ) , i =1. (13). f ,g. where ˆ x f ,g (i ) are determined from (9), and E f ,g is the energy score of the granule. The resource allocation is not exactly proportional to the granule score. The modification as shown in (14) takes the minimum resource into account.. B f ,g = b f ,g +. E f ,g. ∑E g. × Bp ,. (14). f ,g. where b f ,g is the minimum encoding bits given from (15), and B p is the number of bits used to distribute to each granule given from (16). B f is the total available number of bits in the frame.. b f ,g. ⎧ B6f ⎪ Bf ⎪ = ⎨ B12f ⎪9 ⎪ Bf ⎩ 18. Bp = B f −. , Mono , Left/Right Channel , Mid channel. (15). , Side channel. ∑b. (16). g f ,g. In this thesis, we also propose the fast search approach which has two following parts. One is the precise initialization of the quantization parameter, i.e. ∆ f ,g . Another is the fast search of the optimal quantizer parameter.. 2.3.3 Precise initialization of the quantization parameter In the ISO MP3 algorithm, the initial value of quantization parameter is. 27.

(41) CHAPTER 2. ENCODER OPTIMIZATION. derived as (17), q = 8.0 × ln (µ f , g ).. (17). The spectral flatness measure, µ f , g , is defined as (18). The derivation contains complex non-linear mathematic and is inefficient on fixed point implementation.. µ f ,g =. e. 1 576. ∑ ln (x f ,g (i )2 )⎟⎟. ⎛ 575. ⎞. ⎝ i =0. ⎠. 1 ⋅⎜ 576 ⎜. (18). 575 2 ⋅ ⎛⎜ ∑ x f , g (i ) ⎞⎟ ⎝ i =0 ⎠. We propose that the initialization of ∆ f ,g is predicted by the one of previous granule and a lower bound. To derive the lower bound we consider (10) again. From the property of Huffman table, the quantized value, y f , g (i ) , has a upper bound, 8207. So the lower bound of ∆ f ,g comes out from the direct derivation from (19), 8207 > xˆ f , g (i )× 2. −. 3×∆ f , g 16. − 0.0946. ⇒ 8207 > max{xˆ f , g (i )}× 2. −. 3×∆ f , g 16. i. − 0.0946. ⎛ 8207 + 0.0946 ⎞ . 16 ⎟ ⇒ ∆ f , g > − log 2 ⎜ ⎜ ⎟ 3 max{xˆ f , g (i )} ⎝ i ⎠ ⎤ ⎡16 ⇒ ∆ l = ⎢ log 2 max{xˆ f , g (i )} − 69.35⎥ i ⎥ ⎢3. (. (19). ). The lower bound, ∆ l , guarantees the quantized value in range of Huffman table. So the initialization with prediction is derived as (20), ∆ f , g (n ) = max{∆ l , ∆ f , g (n − 1) + σ },. 28. (20).

(42) CHAPTER 2. ENCODER OPTIMIZATION. where n is the iteration index of iterative search in Figure 14 and starts from zero. ∆ f , g (− 1) = ∆ f −1, g , ∆ 0, g (− 1) = −150 , σ is the addend of the step size and equal to zero during initialization. The great achievement of proposed method is proved by comparing the difference between initial value and final value. Figure 16 shows the difference histogram of ISO MP3 method and proposed method. From the statistic, over 60 percent of predicted initial values are very close to the final values, i.e. the difference ε ≤ 1 . The precise decision of initial value benefits to reduce the number of iteration of the following iterative search.. 29.

(43) CHAPTER 2. ENCODER OPTIMIZATION. Histogram: Difference between initial value and final value 1600 1400. ISO MP3 method Expected difference = 39.31. 1200. Counts. 1000 800 600 400 200 0 -80. -60. -40. -20. 0 Differnce. 20. 40. 60. 80. (a) ISO MP3 method in (17) and (18) Histogram: Difference between initial value and final value 3500 3000. Proposed method Expected difference = -0.037. Counts. 2500 2000 1500 1000 500 0 -30. -20. -10. 0 Differnce. 10. 20. 30. (b) Proposed method in (19) and (20) Figure 16. The histogram to difference between initial value and final value of. ∆ f , g . The accuracy is determined by the expected difference of initial value and final value. (a) ISO method initializes it by the measure of spectral flatness. (b) Proposed method initializes it by the one of previous granule and a lower bound. 30.

(44) CHAPTER 2. ENCODER OPTIMIZATION. 2.3.4 Fast search of the optimal quantizer parameter. Nonuniform quantization quantize_tj (xq, ix, quantizerStepSize); bits = countHuffBits (ix);. Update Addend of Step Size jump = updateJumpSize(bits-max_bits, step[ch]);. Jump != 0. First trial. Jump = 0. Update Step Size of Qauntizer quantizerStepSize = max(quantizerStepSize+jump, low_bound). Nonuniform quantization quantize_tj (xq, ix, quantizerStepSize); bits_modify = countHuffBits (ix);. Update Step Unit updateUnitStep(bits, bits_modify, jump, step+ch); bits = bits_modify;. Update Addend of Step Size jump = updateJumpSize(bits-max_bits, step[ch]);. No. Loop break condition. Iterative search. Yes. Bits≤max_bits. Yes. Exit. No. Nonuniform quantization. Final trial. quantizerStepSize += 1; quantize_tj (xq, ix, quantizerStepSize); bits = countHuffBits (ix);. Figure 17. The adaptive approach to iterative search optimum parameter. Figure 17 illustrates the our approach to iterative search. Table 3 describes the symbols used in Figure 17. 31.

(45) CHAPTER 2. ENCODER OPTIMIZATION. Table 3.. Symbols descriptions of Figure 17. Symbol name. Description. Abbreviation. xq. The frequency lines powered by 0.75 in (9). xˆ f , g. ix. The quantized integer value in (10). y f ,g. quantizerStepSize Quantizer parameter also called global gain The lower bound of ∆ f , g. low_bound. guaranteeing that the. quantized value can be coded within Huffman table. ∆ f ,g. ∆l. jump. Addend of ∆ f , g. σ. step. Predicted value of the difference number of bits used in Huffman coding when ∆ f , g is increased by one.. ρc. max_bits. The bits budget of this granule determined from (14). B f ,g. bits, bits_modify. Number of bits used in Huffman coding of the quantized values. b h , bˆ h. quantize_tj. Implementation of (10). Q2 (⋅). countHuffBits. Counting the number of bits used in Huffman coding of y f ,g. C h (⋅). updateUnitStep. Updating ρ c by results of the latest two iterations. U s (⋅). updateJumpSize. Updating σ by ρ c. U j (⋅). The proposed approach can be divided into three parts. The first part, the first trial, performs quantization with the initial value of ∆ f , g derived from (20). Then xˆ f , g are quantized to y f , g by Q2 (⋅) . The following C h (⋅) will choose appropriate Huffman tables for y f , g and count the number of coded bits, b h (0) . Based on ρ c and the difference of b h (0) and B f , g , a new σ is derived by U j (⋅) . The σ equal to zero implies that b h (0) is very close to B f , g then we omit the iterative search part and apply final trial directly.. The iterative search is applied when the σ is not equal to zero. n is 32.

(46) CHAPTER 2. ENCODER OPTIMIZATION. representing the iteration index. In the nth iteration, (20) is used to derive the ∆ f , g (n ) , and n starts from one where ∆ f ,g (0 ) is used in the first trial. After the. update of ∆ f , g (n ) , Q2 (⋅) and C h (⋅) is used to obtain y f , g and bˆ h . The difference number of bits with previous iteration, bˆ h − b h (n − 1) , σ and. ρ c (n − 1) are sent to U s (⋅) , and a new ρ c (n ) is updated. Similar with the first trial, U j (⋅) determines a new σ used in the (n+1)th iteration. The iterative search block is terminated while one of the following loop break conditions exists. . n is greater than 5,. . bˆ h − b h (n − 1) is less than 32,. . σ is zero. The final trial is applied to guarantee that b h ≤ B f , g . Different from the. iterative search, the fine tune of ∆ f , g is applied here to prevent the deadlock loop condition. For example, let. B f , g = 1000 ,. ∆ f , g (0 ) = −80 ,. ∆ l = −100 , and. ρ c (0) = 100 (obtained from previous granule), xˆ f , g are passed to Q2 (⋅) and C h (⋅) then we obtain b h (0) = 500 . U j (⋅) updates by ⎛ b h (0 ) − B f , g ⎞ ⎟ = −5 ⎟ ( ) ρ 0 c ⎝ ⎠. σ = nint ⎜⎜. after the first trial. Since σ is not equal to zero the iterative search is applied. (20) will update ∆ f ,g (1) as -85. Q2 (⋅) and C h (⋅) are then executed again to. 33.

(47) CHAPTER 2. ENCODER OPTIMIZATION. obtain bˆ h = 900 . U s (⋅) updates by ⎛3. 1 bˆ h − b h (0 ) ⎞⎟ ⎟ = 95 σ 4 ⎠. ρ c (1) = nint ⎜⎜ ⋅ ρ c (0 ) + ⋅ ⎝4. in 1st iteration. Then ∆ f ,g (2 ) is updated again to -86. And we will repeat the process iteratively until any one of loop break condition exists. In this thesis, we propose a new iteration loops algorithm. Figure 18 compares the pseudo code of ISO and the proposed method. With the removal of PAM-II, the distortion control is also removed, and the rate control is optimized for speedup. The solid lines in Figure 18 link the blocks with the same functionality but optimized in proposed method. The dotted lines link the blocks with different measurement in the proposed method. And the boldface represent the added blocks of the proposed method. (a) ISO iteration loop. (b) Proposed iteration loop pow075(xr, xq); bitAllocation(xq, frame_bits, max_bits); For // Granule loop low_bound = (int)(16*log2(xq_max)/3 - 69.35); quantizerStepSize = max(low_bound, predict_quantizerStepSize); quantize_tj(xq, ix, quantizerStepSize); bits = countHuffmanBits(ix); jump = updateJumpSize(bits-max_bits, step); if(jump != 0){ while(1){ quantizerStepSize = max(low_bound, quantizerStepSize+jump); quantize_tj(xq, ix, quantizerStepSize); bits_modify = countHuffmanBits(ix); updateUnitStep(bits, bits_modify, jump, step); bits = bits_modify; jump = updateJumpSize(bits-max_bits, step); if(iter > 3 || abs(bits-max_bits) < 32 || jump == 0) break; } } while(bits > max_bits){ quantizerStepSize += 1; quantize_tj(xq, ix, quantizerStepSize); bits = countHuffmanBits(ix); } predict_quantizerStepSize = quantizerStepSize;. For // Granule loop quantizerStepSize = nint(system_const * log(sfm(xr))); do{ // Outer loop quantizerStepSize -= 1; do{ // Inner loop do{ quantizerStepSize += 1; pow075(xr, xq); quantize_tj(xq, ix, quantizerStepSize); }while(testOverflow(ix)); bits = countHuffmanBits(ix); }while(bits > max_bits); loop_break = distortionControl(xr, ix, threshold, scf); }while(loop_break); end. // No distortion control end. Figure 18. Pseudo code of iteration loops (a) ISO method (b) Proposed method. 34.

(48) CHAPTER 2. ENCODER OPTIMIZATION. To evaluate the performance of optimized rate control process, the number of iterations has been analyzed by calculating the execution times of Q1 (⋅) ,. Q2 (⋅) , and C h (⋅) in each granule. The computational complexities are denoted as p, q, c individually. In encoding stereo MP3 with 128Kbps, the experiments show that the ISO method takes 45p+45q+47c in average while the proposed method takes 1p+2q+2c only. Table 4 lists the number of inner iteration in each method. The proposed method takes less iteration numbers than other methods. And due to decomposition of non-uniform quantizer, Q1 (⋅) and Q2 (⋅) , the computational complexity of inner iteration is also much less than other methods. Table 4.. Average Max. The average number of inner iteration. ISO. Oh et al. [4]. Proposed. 45. 2.1. 1.8. >100. 3. 8. 35.

(49) CHAPTER 3. DECODER OPTIMIZATION. CHAPTER 3.. 3.1. DECODER OPTIMIZATION. Decoding Overview and Complexity Analysis. Encoded bitstream. Bitstream Decoding. Dequantization. Frequency to Time Mapping. PCM audio output. Figure 19. MPEG/Audio Layer III decoding block diagram. The MPEG/Audio layer III decoding process has three main parts [5]: bitstream decoding, inverse quantization and frequency-to-time mapping as shown in Figure 19. The first part synchronizes the encoded bitstream input and extracts the quantized frequency coefficients and other information of each frame. Figure 20 illustrates the detail function blocks. The second part, inverse quantization also called dequantization, reconstructs a perceptually identical data of the frequency coefficients generated by the MDCT block during encoding. Based on the output of Huffman decoding and scalefactor information, the dequantization equation is represented in (21) [5].. 36.

(50) Huffman code bits. Huffman Decoding. Huffman Information. Huffman Info Decoding. Scalefactor Information. Scalefactor Decoding. Magnitude & Sign. Scalefactors. Encoded bitstream. Synchronization. CHAPTER 3. DECODER OPTIMIZATION. Ancillary Data. Figure 20. Bitstream decoding. x f , g (i ) = (− 1). s (i ). ⋅ y f , g (i ) ⋅ 4 3. (∆ f ,g −8⋅∆ s ( wi )). 1⋅. 24. (1+ z1 )⋅(C (bi )+ P (bi )). 1⋅. 24. ,. (21). where y f , g (i ) is the output of Huffman decoding, and ∆ f , g , z1 , ∆ s (wi ) (subblock gain only used in short block), and C (bi ) are part of side information.. Alias Reduction. Inverse MDCT. Frequency Inversion. Synthesis Subband Filter bank. PCM audio output. Figure 21. Frequency to time mapping. The last part, frequency to time mapping, produces the audio PCM output from the dequantized frequency lines. The part is a set of reversed operations of the MDCT and analysis polyphase filterbank in the encoder. The alias reduction block adds alias artifacts to dequantized outputs in order to obtain a correct reconstruction of subband signals. Then the inverse MDCT reconstructs time domain subband signals from frequency lines. The frequency inversion is then applied in order to compensate the decimation used in the analysis polyphase filterbank. After that, the synthesis polyphase filterbank, also called subband synthesis, is applied to the subband signals to yield the audio PCM output. 37.

(51) CHAPTER 3. DECODER OPTIMIZATION. Among them, dequantization, IMDCT, and synthesis polyphase filterbank in particular require a large number of arithmetic operations and produce quantization noise in fixed point implementation. In this thesis, we propose a fast realization of dequantization and adopt fast algorithms on IMDCT and synthesis polyphase filterbank.. 3.2. Dequantization The dequantization equation is represented in (21). The complexity is the calculation of y f3 ,g where y f , g (i ) is an integer ranging 0 to 8207. The direct 4. derivation using mathematic libraries is too time-consuming and not suitable for real-time implementation. 4. First the calculation of y 3f ,g is decomposed into (22) in order to minimize the quantization noise of fixed point implementation. Comparing the dynamic range of 4. y 3f ,g (0 to 165543.67) and. 1. y 3f ,g (0 to 20.171), it is obvious that the. 1. implementation of y 3f ,g produces lower quantization noise because of the smaller dynamic range. y f3 , g (i ) = y 3f , g (i ) ⋅ y f , g (i ) 4. 1. (22). Similarly with encoder case, the power function is implemented with hybrid scheme. First the input range is split into three section as shown in Figure 22. The first section, 0 ≤ y f , g (i ) < 32 , utilizes a small lookup table to obtain the noiseless value directly. Another two sections adopt the piecewise linear approximation method. The segmentation is also optimized for the target DSP. In order to minimize the approximation error, the segmentation of the second section has been. 38.

(52) CHAPTER 3. DECODER OPTIMIZATION. made according to the leading-zeros of y 3f , g (i ) .. 20. 256 ≤ y f , g (i ) ≤ 8207 Piecewise linear approximation with partitioning by leading-zeros of x.. 15. y 3f , g (i ). 6. 10. 5.5. 1. 5. 3. 32 ≤ y f , g (i ) < 256 Piecewise linear approximation with partitioning by leading-zeros of x3.. 4.5 5. 0 ≤ y f , g (i ) < 32 Lookup table. 2 1. 0. 10. 1000. 20 2000. 30 3000. 4 3.5. 50 4000. y f , g (i ). 100. 150. 5000. 6000. 200 7000. 250 8000. Figure 22. The implementation of y 3f , g (i ) 1. Ignoring the frequency index i for general, (23) represents the approximation 1. of u = y 3f ,g .. ( ). ⎧LUT y 3f , g , 0 ≤ y f , g < 32 ⎪ u = ⎨α 2 (S 2 ( y f , g ))⋅ y f , g + β 2 (S 2 ( y f , g )) , 32 ≤ y f , g < 256 , ⎪α (S ( y ))⋅ y + β (S ( y )) , 256 ≤ y ≤ 8207 f ,g 3 3 f ,g f ,g ⎩ 3 3 f ,g 1. (23). where LUT(⋅) represents the lookup table method applying in 1st section, α 2 and. β 2 are the linear approximation coefficients of the 2nd section, α 3 and β 3 are the linear approximation coefficients of the 3rd section, S 2 (⋅) is the segment index of the 2nd section derived from (24), and S3 (⋅) is the segment index of the 3rd section derived from (25).. 39.

(53) CHAPTER 3. DECODER OPTIMIZATION. (. ). 3 3 B2 ( j ) = nint ⎛⎜ (32) ⋅ 2 j ⎞⎟ , j = 0 ~ 9 ⎝ ⎠ ⇒ B2 ( j ) ∈ {32, 41, 51, 64, 81, 102, 128, 162, 204, 256} 1. (24). S 2 ( y f , g ) = {j B2 ( j ) ≤ y f , g < B2 ( j + 1)}. (. ). ⎧nint 256 ⋅ 2 j , j = 0 ~ 5 B3 ( j ) = ⎨ , j=6 ⎩8208 ⇒ B3 ∈ {256, 512, 1024, 2048, 4096, 8192, 8208}. (25). S3 ( y f , g (i )) = {j B3 ( j ) ≤ y f , g < B3 ( j + 1)}. The approximation error has been analyzed that the error to real output ratio is around ±1%, and the SNR is around 46dB. The error is still too large and will probably lead the following processes like IMDCT and subband synthesis to produce more error, especially in fixed point implementation. Nevertheless in the encoding case, the following process, Huffman coding, is noiseless. In order to obtain the further approximation, we propose to apply the Newton’s method in the section of 32 ≤ y f , g (i ) ≤ 8207 . Let u = y 3f , g (i ) ,where (26) 1. is another representation which is suitable for the Newton’s method of root-finding. The method will yield a value of u that approximates y 3f , g (i ) . 1. u 3 − y f , g (i ) = 0. (26). The function result is calculated through the repeated iterations that can successively reduce the residual error u 3 − y f , g (i ) . The iteration formula is shown in (27),. u~03 − y f , g (i ) 2u~03 + y f , g (i ) 1 ⎛ ~ y f , g (i ) ⎞ ~ ~ = = ⋅ ⎜⎜ 2 ⋅ u0 + ~ 2 ⎟⎟ , u1 = u0 − 3 ⎝ 3 ⋅ u~02 3 ⋅ u~02 u0 ⎠. (27). where the starting value u~0 is obtained from (23). The desired accuracy can be achieved in only one iteration. Figure 23 shows 40.

(54) CHAPTER 3. DECODER OPTIMIZATION. the error to real output ratio. The ratio is around ±0.01% and the SNR is increased to 86dB.. The error to real output ratio 0%. ε ( y f,g ( i )). -0.002%. -0.006%. -0.01%. -0.014%. 0. 1000. Figure 23. The. 2000. 3000. error. to. (. 4000 y f,g ( i ). real. output. 5000. 6000. ratio. 7000. y 3f , g (i ) 4. of. ) (y. ε ( y f , g (i )) = y f , g (i ) − pow 3( y f , g (i ))⋅ y f , g (i ) 4 3. 4 3. f ,g. (i )). 8000. approximation. where pow3. is the proposed implementation of y 3f , g (i ) . 1. The effect of fixed point implementation have been analyzed. Figure 24 shows the error to real output ratio. The ratio is around ±0.08%, and the SNR is around 82dB.. 41.

(55) CHAPTER 3. DECODER OPTIMIZATION. The error to real output ratio. ε ( y f,g ( i )). 0.04%. 0%. -0.04%. -0.08%. 0. 1000. 2000. 3000. 4000 y f,g ( i ). 5000. 6000. 7000. 8000. Figure 24. The error to real output ratio of y 3f , g (i ) fixed point approximation. 4. (. ) (y. ε ( y f , g (i )) = y f , g (i ) − pow 3 fx ( y f , g (i ))⋅ y f , g (i ) 4 3. 4 3. f ,g. (i )). where. pow3fx is the proposed fixed point implementation of y 3f , g (i ) . 1. 3.3. IMDCT and Subband Synthesis The frequency to time mapping tool is another computationally demanding process. Especially in IMDCT and subband synthesis blocks there are a lot of multiply-accumulation operations with cosine coefficients. It is necessary to perform optimization such as fast algorithm. But, in general, a fast algorithm brings more quantization errors due to fixed point operations. From the analysis result of Lee etc. [6], prevailing Lee’s Fast DCT algorithm [7] is adopted for the fast algorithms of IMDCT and subband synthesis block. For IMDCT block 9-point and 3-point Lee’s Fast IDCT is applied, and for matrixing routine in subband synthesis block 64-point Lee’s Fast DCT is used.. 42.

(56) CHAPTER 4. DSP IMPLEMENTATION. CHAPTER 4.. 4.1. DSP IMPLEMENTATION. Target DSP Architecture Using the proposed architecture, we implement the MP3 encoder and decoder by a 16-bit fixed point DSP, ADSP-2181. Figure 25 shows the block diagram of ADSP-2181 [17].. Figure 25. The ADSP-2181 DSP core and peripheral integration. The ADSP-2181 is a single-ship microcomputer optimized for digital signal processing (DSP) and other high speed numeric processing applications [17]. It 43.

(57) CHAPTER 4. DSP IMPLEMENTATION. combines the ADSP-2100 family base architecture (three computational units, data address generators and a program sequencer) with two serial ports, a 16-bit internal DMA port, a byte DMA port, a programmable timer, Flag I/O, extensive interrupt capabilities and on-chip program and data memory. The features of the ADSP-2100 family DSP core are as following [17]: . Computational units: There are three independent, full-functional computational units including an 16-bit arithmetic/ logic unit (ALU), a 40-bit multiplier/ accumulator unit (MAC) and a 32-bit barrel shifter. The ALU performs a standard set of arithmetic and logic operations; division primitives are also supported. The MAC performs single-cycle multiply, multiply/ add and multiply/ subtract operations with 40 bits for accumulation. The SHIFTER performs logic and arithmetic shifts, normalization, denormalization and derive exponent operations. The SHIFTER can be used to efficiently implement numeric format control including multiword and block floating point representations.. . Data address generators (DAGs): Dual DAGs allow the processor to generate simultaneous address for dual operand fetches and support circular,. post-modify. and. bit-reversed. addressing. modes.. In. sum-of-product calculation, DAGs allow the processor to fetch two operands and execute one ALU/ MAC/ SHIFTER instruction in single cycle. . Program sequencer: provides single-cycle conditional branching and executes program loop with zero loop overhead.. ADSP-2181 also integrates on-chip RAM and peripherals. The DSP core can access the on-chip peripherals by memory-mapped control register. The integration are as following: . 80K bytes on-chip RAM: They are configured as 16K words program 44.

(58) CHAPTER 4. DSP IMPLEMENTATION. memory RAM (24 bits per word) and 16K words data memory RAM (16 bits per word). ADSP-21xx uses a modified Harvard architecture in which data memory stores data, and program memory stores both program and data. This allows the processor core to fetch two operands (one from data memory and one from program memory) and an instruction (from program memory)) in a single instruction cycle. . Serial ports (SPORTs): There are two bi-directional, double-buffered serial ports for serial communication. Each SPORT can use an external serial clock or generates its own in a wide range of frequency down to 0 Hz. The SPORTs also support framing, hardware companding (A-law and µ-law), autobuffering, interrupt generation and multichannel capability (time-division multiplexed into 24 or 32 channels).. . Timer: The programmable interval timer provides periodic interrupt generation.. . DMA ports: There are two DMA ports, Internal DMA (IDMA) port and Byte DMA (BDMA) port. The IDMA port is a parallel I/O port that lets the. processor’s. internal. memory. (except. for. the. processor’s. memory-mapped control registers) be read or written by a host system. The read/ write access is completely asynchronous, and a host can access the DSP’s internal memory with an overhead of one DSP processor cycle per word while the DSP is operating at full speed. The BDMA port allows processor load program and data from/ to external byte memory with very low processor overhead and supports interrupt generation while the DMA transfer is completed.. 4.2. Data precision optimization in the proposed MP3 encoder Basically ADSP-2181 performs 16-bit arithmetic. However, the double precision, i.e. 32-bit, arithmetic provides more accuracy of processing data but also 45.

(59) CHAPTER 4. DSP IMPLEMENTATION. increases the computational complexity. As shown in Figure 26, five instructions are needed to perform the double precision multiplication, and the complexity is five times of the complexity of the single precision multiplication. 32 bits. Xh mx1. Xl mx0. X. Yl my0. {ADSP-21xx instructions}. Unsigned x Signed. X l x Yl. mr = mx0 * my0 (us); mr = mr (rnd); mr0 = mr1; mr1 = mr2; mr = mr + mx1 * my0 (ss). Signed x Signed. X h x Yl 32 bits. Rs mr2. Rh mr1. Rl mr0. Figure 26. Double precision multiplication, R(32-bit) = X(32-bit) x Y(16-bit).. To determine the data precision, we first divide the encoding processes into six stages as shown in Figure 27. The PCM samples are always 16-bit, and the format is denoted as (1.15)16, i.e. the format (α.β)γ means that a fixed point number of γ bits is represented by lying the binary point just after the αth most significant bit. It is obvious that α + β = γ. The subband analysis has divided into two stages, the windowing with partial calculation [5] and the matrixing [5]. The windowing with partial calculation performs 16-bit multiply-then-accumulate operations and produces 32-bit results vector Y [5]. The matrixing performs double precision multiply-then-accumulate like Figure 26 and produces subband signals S, only the 16-bit rounding result of Rh. According to the static analysis, the dynamic range of subband signals is − 2.0 < S < 2.0 , therefore the format is derived as (2.14)16. Then the subband signals are passed the MDCT and antialias stage. A faster. 46.

(60) CHAPTER 4. DSP IMPLEMENTATION. MDCT algorithm is applied here to decrease computational complexity but also maintain the quantization error due to fixed point arithmetic. After performing 16-bit multiply-then-accumulate operations on subband signals, the 32-bit transformed coefficients are produced and then pass antialias block. Again the double precision arithmetic as Figure 26 is performed, and a antialiased 32-bit transformed coefficients x f , g (i ) are produced from Rh and Rl.. 16 bits. 32 bits. PCM sample Subband analysis. Windowing. (1.15)16. Partial results Y [5] Matrixing. (2.30)32. Subband signal, Si(t) MDCT and antialias. (2.14)16. Transform coefficient, x f , g (i ) (2.30)32. Format converter Shifted transform coeff., ⎣x f , g (i )⎦b16 Iteration loops. Pow075 in equation (9). (M.N)16. Modified transform coeff., ⎣xˆ f , g (i )⎦b16 Quantizer in equation (10). (M-4.N+4)16. Quantized transform coeff., y f , g (i ) (16.0)16. Figure 27. Data precision between each stage in proposed MP3 encoder. (M.N)16, determined from the format converter, is the fixed point format of transformed coefficients.. A special format converter added after antialias block is used to convert 32-bit data with format (2.30)32 to 16-bit data while the fixed point format is determined 47.