AMR編碼及IEEE 802.16a標準之Reed-Solomon解碼器於數位訊號處理器之實現

全文

(1)國立交通大學電子工程學系電子研究所碩士班碩. 士. 論. 文. AMR 編碼及 IEEE 802.16a 標準之 Reed-Solomon 解碼器於數位訊號處理器之實現. DSP Implementation of AMR Speech Coding and the Reed-Solomon Decoder in IEEE 802.16a Standard. 研究生：陳志楹指導教授：杭學鳴博士. 中華民國九十四年六月.

(2) AMR 編碼及 IEEE 802.16a 標準之 Reed-Solomon 解碼器於數位訊號處理器之實現研究生: 陳志楹. 指導教授: 杭學鳴博士. 國立交通大學電子工程學系電子工程研究所摘要近年來，多媒體與無線通訊已成為市場上非常重要的發展趨勢，IEEE 802.16a 通訊標準主要在於實現無線網路上能夠傳輸高品質的多媒體的目標，在本篇論文中，我們將會實現語音與 Reed-Solomon 編碼機制於 TI DSP 平台上。本篇論文的重點之一，在於多媒體編碼的部分，我們將討論第三代無線通訊系統中所採用的語音標準「適應性多速率編碼(AMR)」，它提供了多樣的編碼模式來因應各種通道所產生的影響；另一個重點為 IEEE 802.16a 無線通訊標準中前向誤差改正編碼機制的部分，由於 Reed-Solomon 編碼高度的修正能力，因而被 IEEE 802.16a 採用於前向誤差改正編碼的程序之一。在論文中，首先我們將簡單描述 AMR 語音標準與 IEEE 802.16a FEC 部分的演算法與架構，並且針對數位訊號處理器(DSP)平台的特性，改善 AMR 語音編碼與 Reed-Solomon 解碼器的執行效率，進而實現於 DSP 平台上。我們的實現平台核心為德州儀器公司所發展的數位訊號處理器，程式經過改進後，AMR 語音編碼器在 DSP 平台上可以達到每秒 22.78K 位元的處理速率，解碼器則可達到每秒 31.84K 位元，而在 IEEE 802.16a 中 Reed-Solomon 解碼器的部分，在 DSP 平台上甚至可以達到每秒 176.4K 位元的處理速度，但這些測試數據都包括電腦與 DSP 之間資料傳輸所花費的時間，若扣除後將會更加快速。此外，我們也對原先的程式加以比. I.

(3) 較，在 AMR 編碼方面進步了 65.94%，在 Reed-Solomon 解碼器方面也比原先實現的版本進步了 96.44%。. II.

(4) DSP Implementation of AMR Speech Coding and the Reed-Solomon Decoder in IEEE 802.16a Standard Student: Chih-Ying Chen. Advisor: Dr. Hsueh-Ming Hang. Department of Electronics & Institute of Electronics National Chiao Tung University. Abstract Multimedia and wireless communication have been two very important trends in the recent years. Transmitting high quality multimedia data over wireless channel is the target of the IEEE 802.16a standard. In this thesis, we will implement a speech coding scheme and a Reed-Solomon coding scheme on TI DSP. One focus of this thesis is Adaptive Multi Rate (AMR), the speech coding standard of 3GPP. It provides various coding modes match the channel error rates. Another focus of this thesis is the Forward Error Correction (FEC) scheme of the IEEE 802.16a wireless communication standard. The Reed-Solomon coding is adopted by the IEEE 802.16a because of its high capability of correcting errors. We first describe the basic structure and algorithm of the AMR speech coding and the FEC in IEEE 802.16a. Then, we adopt and modify fast scheme to accelerate the programs of the AMR speech codec and Reed-Solomon decoder to match the architecture of the DSP baseboard. We further implement them on the DSP platform, which contains the Texas Instruments (TI) TMS320C6416 digital signal processor. III.

(5) (DSP). The processing rate of the AMR codec on the DSP platform reaches 22.78 Kbytes/sec for the encoder and 31.84 Kbytes/sec for the decoder. And the Reed-Solomon decoder reaches up to 176.4 Kbytes/sec. Moreover, those processing rates includes of the data transfer time between the host and the DSP board. It can be much faster if the data transfer time is excluded. In addition, the AMR speech codec after our improvement is 65.94% faster for the encoder and 61.31% faster for the decoder than the original one. The Reed-Solomon decoder is 96.44% faster than the original one.. IV.

(6) 誌謝首先要感謝我的指導教授杭學鳴博士這兩年來的悉心指導，使我能夠順利完成這篇論文。在研究的過程中，有停滯不前的時候也有迷惘的時候，老師總是以關心和體諒代替苛責，適時的給予指導，促使我能夠克服在研究中所遇到的瓶頸；而在有所突破時，也不忘給予勉勵。除了與研究相關的課題，老師也不斷地鼓勵我們涉獵其它相關領域，厚實未來作進一步研究的基礎。除此之外，老師也總是能夠關心並體諒我們在生活上的種種問題，使我能夠在研究與生活中取得良好的平衡。此外，還要感謝張錫嘉老師，在研究上給予許多的協助並引領我正確的研究方向，讓我受益良多，同時也使我的研究得以順利進行，在此特地感謝老師如此耐心的指教。在這裡也要感謝通訊電子與訊號處理實驗室，提供了充足的軟硬體資源，讓我在研究中不虞匱乏。也感謝實驗室全體成員，營造了一個充滿活力與和諧的環境氣氛，讓彼此能夠分享研究生活的點點滴滴、歡樂與苦澀。感謝楊政翰與陳繼大學長，在研究的過程不吝提供經驗與鼓勵，也感謝蔡家揚學長，適時提供技術上的支援，解決了許多我在研究上遇到的困難，也讓我學到解決各類問題的正確方式，另外還要感謝王盈閔、董景中、陳昱昇與洪朝雄等同學百忙之中提供研究工作與課業上的協助，使得論文能夠順利的進行。最後，要感謝的是我的家人，不論在生活上或求學上都給了我最大的鼓勵與支持，讓我能夠心無旁騖的從事研究工作，遇到挫折時更讓我能夠有勇氣去面對。沒有家人在背後的付出，也就沒有今天的我，在此，謹獻上最高的謝意與歉意。謝謝所有陪我走過這一段歲月的師長、同儕與家人，謝謝！. V.

(7) Content 1. Introduction. 1. 2 Adaptive Multi-Rate of Speech Coding 4 2.1 The Overview of AMR.......................................................................................4 2.2 Principles of the Encoder....................................................................................6 2.2.1 Pre-processing .....................................................................................7 2.2.2 Linear Prediction .................................................................................7 2.2.2.1 Windowing and auto-correlation ..........................................7 2.2.2.2 Levinson-Durbin algorithm ................................................10 2.2.2.3 LP to LSP Conversion ........................................................10 2.2.2.4 Monitoring resonance in the LPC spectrum .......................12 2.2.3 Open-loop pitch analysis ...................................................................13 2.2.4 Impulse response computation (all modes) .......................................13 2.2.5 Target signal computation (all modes) ..............................................14 2.2.6 Adaptive codebook ............................................................................14 2.2.6.1 Adaptive codebook search..................................................14 2.2.6.2 Adaptive codebook gain control (all modes)......................16 2.2.7 Algebraic codebook...........................................................................16 2.2.7.1 Algebraic codebook structure.............................................17 2.2.7.2 Algebraic codebook search.................................................17 2.2.8 Quantization of adaptive and fixed codebook gains..........................19 2.2.8.1 Adaptive codebook gain limitation.....................................19 2.2.8.2 Quantization of codebook gains .........................................19 2.2.9 Memory update (all modes)...............................................................21 2.3 Functional description of the decoder ..............................................................22 2.3.1 Decoding and speech synthesis .........................................................22 2.3.2 Post-processing..................................................................................25 2.3.2.1 Adaptive post-filtering (all modes).....................................25 2.3.2.2 High-pass filtering and up-scaling......................................26 2.4 Bit Allocation ...................................................................................................27 VI.

(8) 3. Overview of IEEE 802.16a FEC Scheme 29 3.1 Introduction to IEEE 802.16a Standard............................................................29 3.2 IEEE 802.16a FEC Specifications....................................................................30 3.2.1 Randomizer........................................................................................31 3.2.2 Forward Error Correction Coding .....................................................32 3.2.2.1 Reed-Solomon Code Specification.....................................34 3.2.2.2 Convolutional Code Specification......................................34 3.2.2.3 Interleaver...........................................................................36 3.3 Implementation Issues of the FEC Scheme......................................................37 3.3.1 Reed-Solomon Code..........................................................................37 3.3.1.1 Encoding of Shortened and Punctured Reed-Solomon Codes .................................................................................37 3.3.1.2 Decoding of Shortened and Punctured Reed-Solomon Codes .................................................................................40 3.3.2. 4. Convolutional Code...........................................................................43 3.3.2.1 Encoding of Punctured Convolutional Code ......................43 3.3.2.2 Viterbi Decoding of Punctured Convolutional Code..........44 3.3.2.3 Bit Interleaved Soft Decision Viterbi Decoding.................48 3.3.2.4 Viterbi Decoding of Tail-Biting Convolutional Code ........50 3.3.2.5 The Butterfly Structure in the Trellis Diagram...................50. DSP Implementation Environment 52 4.1 The DSP Chip...................................................................................................52 4.1.1 Central Processing Unit .....................................................................55 4.1.2 Memory .............................................................................................56 4.1.3 Peripherals .........................................................................................57 4.2 The DSP Baseboard..........................................................................................58 4.3 DSP Transmission Mechanism.........................................................................59 4.4 Features of TI TMSC6000 Family DSP for Optimization ...............................62 4.4.1 Code Development Flow ...................................................................62 4.4.2 Pipeline Structure of the TI TMSC6000 Family ...............................63 4.4.3 Software Pipelining ...........................................................................65 4.4.4 Program-Level Optimization.............................................................68. 5. Implementation and Acceleration of AMR Speech Coding on TI DSP Platform70 5.1 AMR Codec Acceleration ................................................................................71 5.1.1 AMR Code Profile.............................................................................71 5.1.2 Acceleration by Using the Intrinsics .................................................75 5.1.3 Compiler Level Improvement ...........................................................80 5.2 AMR Codec on C64x DSP Platform................................................................82 VII.

(9) 5.2.1 Structure of AMR Implementation....................................................82 5.2.2 Execution Flow of AMR Implementation .........................................83 5.2.3 Performance Analysis........................................................................88 5.2.3.1 AMR Encoder Performance Analysis.................................89 5.2.3.2 AMR Decoder Performance Analysis ................................91 6. Implementation and Acceleration of 802.16a Reed-Solomon Decoder on TI DSP Platform 94 6.1 Acceleration on Reed-Solomon Decoder .........................................................95 6.1.1 Profiling the Original RS Decoder ....................................................95 6.1.2 Modifications of RS Decoder ............................................................97 6.1.2.1 Syndrome Computation Improvement ...............................97 6.1.2.2 Chien Search Improvement ................................................99 6.1.3 Performance Analysis......................................................................101 6.2 Remainder Decoding Algorithm for RS Decoder ..........................................104. 6.3. 6.2.1 Remainder Decoding Algorithm .....................................................105 6.2.2 Program Flow and Performance Analysis .......................................107 DSP Implementation of Reed-Solomon Decoder and Viterbi Decoder .........112 6.3.1 Structure of RS Decoder and Viterbi Decoder Implementation......112 6.3.2 Execution Flow of RS Decoder and Viterbi Decoder .....................112 6.3.2.1 DSP Program Flow for RS Decoder................................. 112 6.3.2.2 DSP Program Flow for Viterbi Decoder........................... 115 6.3.3 Performance Analysis......................................................................115. 7 Conclusions and Future Work 117 7.1 Conclusion ......................................................................................................117 7.2 Future Work....................................................................................................118 Bibliography. 120. VIII.

(10)

(11) List of Figures Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4. Simplified block diagram of the CELP speech synthesis model ....................3 Simplified block diagram of the adaptive multi-rate encoder.........................6 LP analysis windows.......................................................................................9 Simplified block diagram of the adaptive multi-rate decoder.......................23. Figure 3.1 IEEE local and metropolitan area networks standards family ......................30 Figure 3.2 Channel coding structure in transmitter side (top) and receiver side (bottom) .....................................................................................................................31 Figure 3.3 PRBS for Data Randomization .....................................................................31 Figure 3.4 Creation of OFDMA randomizer initialization vector..................................32 Figure 3.5 Forward Error Correction structure in transmitter side (left) and receiver side (right) ...................................................................................................33 Figure 3.6 Convolutional Encoder of Rate 1/2...............................................................35 Figure 3.7 Block Diagram of the RS Encoder Program .................................................39 Figure 3.8 The Linear Feedback Shift Register Structure of RS Encoder .....................39 Figure 3.9 Block Diagram of a Conventional RS Encoder ............................................40 Figure 3.10 Block Diagram of the RS Decoder Program...............................................42 Figure 3.11 Syndrome Computation Circuit ..................................................................42 Figure 3.12 Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18 Figure 3.19 Figure 3.20. Block Diagram of the Convolutional Encoder Program .............................44 State Transition Diagram Example .............................................................45 Trellis Diagram Example for a Viterbi Decoder .........................................46 Survivor path of the Trellis Diagram ..........................................................47 Block Diagram of the Viterbi Decoder Program.........................................47 Structure of the Viterbi Algorithm ..............................................................47 Partition of the 16-QAM Constellation.......................................................49 Block Diagram of the Suboptimal Tail-Biting Viterbi Decoder..................50 Butterfly Structure Showing Branch Cost Symmetry.................................51. VIII.

(12) Figure 4.1 The Block Diagram of TMS320C6x DSP Chip............................................54 Figure 4.2 The TMS320C64x DSP Chip Architecture and Comparison with Ancient TMS320C62x/C67x Chip............................................................................54 Figure 4.3 Innovative Integration’s Quixote DSP Baseboard Card................................58 Figure 4.4 The Architecture of Quixote Baseboard........................................................59 Figure 4.5 Block Diagram of DSP Streaming Mode ......................................................61 Figure 4.6 Code Development Flow ..............................................................................63 Figure 4.7 (a) The Original Loop. (b) The Loop After Applying Software Pipelining ..65 Figure 4.8 (a) Execution Record of the Original Loop. (b) Execution Record of the Software Pipelined Loop .............................................................................66 Figure 5.1 Structure of AMR Speech Codes Implementation on the Host and DSP .....83 Figure 5.2 (a) Graphical Interface of the AMR Encoder Implementation. (b) A Snapshot of Running the Program...............................................................85 Figure 5.3. (a) Graphical Interface of the AMR Decoder Implementation. (b) A Snapshot of Running the Program...............................................................86 Figure 5.4 the Flowchart of the AMR Encoder Implementation....................................87 Figure 6.1 Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5 Figure 6.6 Figure 6.7. the C Code of the Syndrome Computation in the Lee Decoder....................98 the Plot of the Decoding Cycle versus SNR ...............................................103 the Plot of the Correct Decoding Ratio versus SNR ...................................104 Implementation of LFSR with the Intrinsics............................................... 110 the Interface of our RS Decoder implementation ....................................... 113 the Flowchart of our RS Decoder Implementation ..................................... 114 the Interface of the Viterbi Decoder Implementation.................................. 115. IX.

(13) List of Table Table 2.1 Bit allocation of the AMR coding algorithm for 20ms frame ........................28 Table 3.1 Mandatory Channel Coding per Modulation..................................................34 Table 3.2 The Inner Convolutional Code with Puncturing Configuration .....................35 Table 3.3 Bit Interleaved Block Sizes and Modulo ........................................................36 Table 4.1 Completing Phase of Different Type Instructions ..........................................64 Table 5.1 Profile of AMR Encoder Provided by 3GPP ..................................................73 Table 5.2 Profile of the Top Ten Encoder Functions Called Most (Except for the Functions Containing Value Assignment Only) ............................................74 Table 5.3 Profile of AMR Codec Arithmetic Functions (Not Counted are Value Assignments or Function Calling Only)........................................................76 Table 5.4 Profile of AMR Arithmetic Functions Listed in Table 5.3 after Acceleration 79 Table 5.5 Profile of Different Improved Versions of AMR Encoder..............................80 Table 5.6 Profile of Different Improved Versions of AMR Decoder..............................81 Table 5.7 Code Size of the AMR Encoder for Different Acceleration Level .................88 Table 5.8 Code Size of the AMR Decoder for Different Acceleration Level.................88 Table 5.9 Execution Time of the DSP Implementation under Different Source Rate for Each Test Sequence .......................................................................................89 Table 5.10 Execution Time of the DSP Implementation under Different Source Rate for Each Test Sequence (ms/frame: the Processing Time for one frame, %: Improvement Percentage)..............................................................................90 Table 5.11 Execution Time of the DSP Implementation under Different Source Rate for Each Test Sequence (the List Representation is the Same as Table 5.10)90 Table 5.12 Execution Time of the DSP Implementation under Different Source Rate for Each Test Sequence..................................................................................91 Table 5.13 Execution Time of the DSP Implementation under Different Source Rate X.

(14) for Each Test Sequence (ms/frame: the Processing Time for one frame, %: Improvement Percentage)..............................................................................92 Table 5.14 Execution Time of the DSP Implementation under Different Source Rate for Each Test Sequence (the List Representation is the Same as Table 5.13)92 Table 6.1 Table 6.2 Table 6.3 Table 6.4 Table 6.5 Table 6.6 Table 6.7. Profile of the Lee RS Decoder........................................................................96 Improvement of Syndrome Somputation........................................................99 Profile Chien Search without the Intrinsics and Compiler Optimization .....101 Profile Chien Search with _gmpy4 and file-Level Optimization .................101 Simulation Profile for RS Decoder...............................................................102 the Decoding Ratio and Cycle under the Channel with Different SNR .......103 Comparison of the Remainder Decoding Algorithm and the Lee Decoder (without the Intrinsics) ................................................................................107 Table 6.8 Profile of the Improved Remainder Decoding Algorithm ............................108 Table 6.9 Profile of our Implementation for RS Decoder and Viterbi Decoder ........... 115. XI.

(15) Chapter 1 Introduction Digital wireless transmission of multimedia contents is one of the important trends in the consumer electronics field in the present. Due to the demand for wireless communication of multimedia contents, the high compression ratio with high quality is an important issue for multimedia transmission. Multimedia service contains many different types of contents such as data, audio, video, image, and the traditional speech. These services would have poor quality if they are overly compressed with non-efficient source coding or cannot be recovered from the errors introduced by the noisy channel. According to channel condition, it is desirable to adjust the source and channel coding rate to provide a better overall performance. The international organization of 3GPP has adopted the concept above into its standard. For the traditional speech coding, it defines a set of technical specifications, which include the codecs of G.723.1 and AMR (Adaptive Multi Rate). Both G.723.1 and AMR are CELP based coders. However, AMR has a better speech quality than G.723.1 at about similar data rate. AMR also offers multiple modes for joint source/channel coding, providing flexibility for different QoS(Quality of Service). For the efficient channel coding, the OFDM modulation technique for wireless communication has been the main stream in the recent years. IEEE has completed several standards such as IEEE 802.11 series for LAN (Local Area Network) and IEEE 802.16 series for MAN (Metropolitan Area Network) based on OFDM technique. The. 1.

(16) advantage of digital wireless communication is based on a fact that it is convenient for consumers to receive or transmit digital contents without connecting to transmission lines. However, one major problem is that the transmission channel is not noisefree. The transmission signals are easily interfered and distorted by several different types of noise sources such as the crowd traffic, bad weather, the obstacle of buildings, etc. To improve the robustness of the wireless communication against the noisy channel condition,. the. FEC. (Forward–Error-Correcting. Coding). and. FED. (Forward–Error-Correcting Decoding) mechanism is necessary to reduce channel errors and is adopted by almost every commercial communication standards, including the IEEE 802.16a. Our study focuses on the Reed-Solomon coding included in the FEC/FED of the IEEE 802.16a standard, which specifies the air interface of fixed broadband wireless access systems for providing multiple accesses. The Reed-Solomon coding adds the resistance directly to the front end multimedia from the channel efforts. It has been wildly used and investigated because of its high capability of correcting both the random and burst errors and its efficient decoding algorithm of existence. In this thesis, we implement the AMR speech codec and the Reed-Solomon coding scheme of IEEE 802.16a standard on II Quixote DSP/FPGA board. We first review the algorithm of the AMR codec and the whole FEC/FED scheme of IEEE 802.16a in detail. Then, we simulate their procedure by the C codes to accelerate their execution efficiency. Finally, we implement the AMR codec and the Reed-Solomon coding algorithm on our DSP platform. The AMR encoder can reach a processing rate of 14.05 ms/frame, and the AMR decoder can reach a processing rate of 2.43 ms/frame. The Reed-Solomon decoder even achieves a processing rate of 176.4 Kbytes/s after our improvement and implementation. In Chapter 2, the concept and the major algorithm blocks of AMR are introduced. Due to the limited space, we only present the issues that are important for comprehending the structure of speech compression, such as ACELP model, LSP, and codebook formation.. 2.

(17) In Chapter 3, we briefly introduce the forward error correction scheme of the IEEE 802.16a standard. Furthermore, we also describe the algorithm to be implemented. In Chapter 4, we give a brief description of our implementation environment; it includes both the II’s Quixote DSP baseboard, its transmission mechanism between host PC and target DSP, and the techniques used to accelerate the programs. In Chapter 5, we profile and accelerate the AMR codec program before implementing on the TI C6x DSP. We first describe the technique used to accelerate our C code step by step. Then the structure and the execution flow of its DSP implementation shall be introduced in detail. In Chapter 6, we first discuss the original Reed-Solomon program required for speeding up. Secondly, the acceleration steps we have done on the Reed-Solomon decoder are discussed in detail. Finally, the DSP implementation of the improved program and the Viterbi decoder in IEEE 802.16a FED scheme is also described. Finally, we give some observations and conclusions. Possible subjects for future works are also included.. 3.

(18) Chapter 2 Adaptive Multi-Rate of Speech Coding 2.1 Overview of AMR AMR (Adaptive Multi-Rate) is a new concept for achieving a high speech quality while maintaining an efficient spectrum usage. A trade-off between speech quality and system capacity can be achieved for a variety of radio channel and operating conditions. It is a successful joint source/channel combined codec standard. The system allows channel mode (HR or FR) and codec mode (combination of speech and channel bit-rates) to vary in order to suit traffic and channel conditions. The channel mode consists of two different transmission bit rate: 22.8 kbit/s (Full rate) and 11.4 kbit/s (Half rate) and can be switched in order to increase channel capacity, replacing for example one full-rate channel with two half-rate channels, while maintaining a certain lower limit for the speech quality. These AMR handovers occur much less frequently than the codec mode changes, probably a few times per minutes [1]. For each channel mode (HR or FR), the codec mode, i.e. bit partitioning between speech and channel bit-rates, can be varied rapidly to track the channel error rate or the channel’s C/I. The changes must occur quite immediately (several times a second), with no perceptible speech degradation. The process is equivalent to Link Adaptation. Besides the basic source and channel codec for speech signal payload, the AMR system. 4.

(19) concept further includes channel state tracking and in-band transmission of adaptation data. The AMR coder consists of eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s. The codec is based on the code-excited linear predictive (CELP) coding model. In this model, the excitation signal at the input of the short-term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search procedure in which the error between the original and synthesized speech is minimized according to a perceptually weighted distortion measure. The structure of the CELP speech synthesis model is shown in figure 2.1 [2][3]. For details, more information can be obtained in [14][15][16].. adaptive codebook. gp v(n). fixed codebook gc. Figure 2.1:. u(n). +. 1 A(z). ^ s(n). post-filtering. LP synthesis. c(n). Simplified block diagram of the CELP speech synthesis model.. 5. ^ s'(n).

(20) 2.2 Principles of the Encoder The AMR coder operates on speech frames of 20ms corresponding to 160 samples at the sampling frequency of 8000 sample/s. At each 160 speech samples, the speech signal is analyzed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks’ indices and gains). A 10th order linear prediction (LP), or short-term, synthesis filter is used which is given by [3]: H ( z) =. 1 1 = ˆA( z ) 1 + ∑m aˆ − i i =1 i. (2.1). where aˆ i , i = 1,..., m, are the (quantified) linear prediction (LP) parameters, and m=10 is the predictor order. The long term, or pitch, synthesis filter is given by: 1 1 = , B ( z ) 1 − g p z −T. (2.2). where T is the pitch delay and g p is the pitch gain. The pitch synthesis filter is implemented using the so-called adaptive codebook approach. Then the following operations are repeated for each sub-frame: The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter W(z)H(z) with the initial states of the filters having been updated by filtering the error between LP residual and excitation. The impulse response, h(n) of the weighted synthesis filter is then computed. Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target x(n) and impulse response h(n), by searching around the open-loop pitch lag. Fractional pitch with 1/6th or 1/3rd of a sample resolution (depending on the mode) is used. The target signal x(n) is updated by removing the adaptive codebook contribution (filtered adaptive codevector), and this new target, x 2 (n) , is used in the fixed algebraic codebook search (to find the optimum innovation). The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively or vector quantified with 6-7 bits (with moving average (MA). 6.

(21) prediction applied to the fixed codebook gain). The different functions of the encoder is presented in figure 2.2.. 2.2.1 Pre-processing Two pre-processing functions are applied prior to the encoding process: high-pass filtering and signal down-scaling. Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixed-point implementation. The high-pass filter serves as a precaution against undesired low frequency components with a cut off frequency of 80Hz.. 2.2.2 Linear Prediction The LP analysis and quantization for the 12.2 kbit/s mode follows that of the GSM EFR coder, i.e. two LP filters are computed for each frame. These filters are jointly quantized with split matrix quantization (SMQ) of 1st order MA-prediction LSF residuals. For all the other modes, one LP filter is estimated per frame. Split VQ (SVQ) of 1st order MA-prediction LSF residuals are performed with 3 subvectors of dimension 3, 3, and 4.. 2.2.2.1 Windowing and auto-correlation For 12.2 kbit/s, LP analysis is performed twice per frame using two different 30ms asymmetric windows. Asymmetric windows have been proved to own better quality-delay performance then symmetric window [4]. The first window has its weight concentrated at the second subframe and it consists of two halves of Hamming windows with different size.. 7.

(22) frame. subframe. LPC analysis (twice per frame). Pre-processing. Open-loop pitch search (twice per frame). s(n). Pre-processing. A(z) windowing and autocorrelation R[ ]. interpolation for the 4 subframes A(z) LSP. ^ A(z) To. h(n). LevinsonDurbin R[ ] A(z). A(z). LSP indices. LSP. A(z). compute weighted speech (4 subframes). A(z). ^ A(z). interpolation for the 4 subframes ^ A(z) LSP. Figure 2.2:. compute target for adaptive codebook x(n). x(n). pitch index. find best delay and gain quantize LTP-gain. compute adaptive codebook contribution. find open-loop pitch. LSP quantization. Innovative codebook search. Adaptive codebook search. compute impulse response. compute target for innovation x2(n). update filter memories for next subframe. code index. find best innovation. LTP gain index. compute excitation. fixed codebook gain quantization. fixed codebook gain index h(n). Simplified block diagram of the adaptive multi-rate encoder. 8. Filter memory update.

(23) On the other hand, the second window has its weight concentrated at the fourth subframe and it consists of two parts: the first part is half a Hamming window and the second part is a quarter of a cosine function cycle [5]. No samples from future frames are used (no lookahead). A diagram of the two LP analysis windows is depicted in figure 2.3. The auto-correlations of the windowed speech s ' (n), n = 0,...,239, are computed by: 239. rac (k ) = ∑ s ' (n) s ' (n − k ), k = 0,...,10,. (2.3). n=k. w (n ) II. w (n ) I. t fr a m e n -1. fra m e n 5 ms. 20 m s fra m e (1 6 0 s a m p le s ). s u b fr a m e (4 0 s a m p le s ). Figure 2.3: LP analysis windows and a 60 Hz bandwidth expansion is used by lag windowing the auto-correlations using the window: ⎡ 1 ⎛ 2πf i ⎞ 2 ⎤ 0 ⎟⎟ ⎥, i = 1,...10, wlag (i ) = exp ⎢− ⎜⎜ ⎢⎣ 2 ⎝ f s ⎠ ⎥⎦. (2.4). where f 0 = 60 Hz is the bandwidth expansion. The expansion on the autocorrelation coefficients reduces the possibility of ill-condition in the Levinson algorithm (especially in fixed point). It also reduces the underestimation of the formant bandwidth, which could create undesirably sharp resonances. Further, rac (0) is multiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at –40 dB. The operation reduces the possibility of ill-condition due to bandpass filtering of the input [6].. 9.

(24) 2.2.2.2 Levinson-Durbin algorithm The modified auto-correlations are used to obtain the direct form LP filter coefficients a k , k = 1,...,10, by solving the set of equations. 10. ∑a r k =1. ' k ac. ( i − k ) = −rac' (i ), i = 1,...,10.. (2.5). The set of equations is solved using the Levinson-Durbin algorithm. E LD (0) = rac ' (0) for i = 1 to 10 do a0(i −1) = 1. [∑. ki = − ai(i ). i −1 (i −1) a rac ' (i j=0 j. ]. − j ) / E LD (i − 1). = ki for j = 1 to i − 1 do a (ji ) = a (ji −1) + ki ai(−i −j1) end E LD (i ) = (1 − ki2 ) E LD (i − 1) end. The final solution is given as a j = a (j10 ) , j = 1,...,10. The LP filter coefficients are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes.. 2.2.2.3 LP to LSP Conversion LP is not conducive to efficient quantization, because it has relatively high spectral sensitivity. On the other hand, LSP has intimate relationship with the formant frequencies. Also LSP’s can be quantized taking into account spectral features known to be important in perceiving speech signals. For the 10th order LP filter, the LSPs are defined as the roots of the sum and difference polynomials [3]:. ( ). F1′( z ) = A( z ) + z −11 A z −1. 10. (2.6).

(25) and. ( ). F2′ ( z ) = A( z ) − z −11 A z −1. (2.7). respectively. It can be proven that all roots of these polynomials are on the unit circle and they alternate each other. F1' ( z ) has a root z = −1 (ω = π ) and F2' ( z ) has a root. z = 1 (ω = 0) . To eliminate these two roots, we define the new polynomials:. (. ) ∏ (1 − 2q z. F1 ( z ) = F1′( z ) 1 + z −1 =. −1. i. + z −2 ). (2.8). i =1, 3,..., 9. and. (. ). F2 (z ) = F2′( z ) 1 − z −1 =. ∏ (1 − 2q z i. −1. + z −2 ). (2.9). i = 2 , 4 ,...,10. where qi = cos(ω i ) with ω i being the line spectral frequencies (LSP) and they satisfy the ordering property 0 < ω 1 < ω 2 < ... < ω10 < π .We refer to qi as the LSPs in the cosine domain. Since both polynomials F1 ( z) and F2 ( z ) are symmetrical, it means only the first 5 coefficients of each need to be computed. The coefficients of these polynomials are found by the recursive relation(for I=0 to 4): f1 (i + 1) = ai +1 + a m −i − f1 (i ) f 2 (i + 1) = ai +1 − a m −i + f 2 (i ). (2.10). where m=10 is the predictor order. The LSPs are found by evaluating the polynomials F1 ( z ) and F2 ( z ) at 60 points equally spaced between 0 andπ and checking for sign. changes. A sign change signifies the existence of a root and the sign change interval is then divided 4 times to better track the root. The Chebyshev polynomials are used to evaluate F1 ( z ) and F2 ( z ) [8]. In this method the roots are found directly in the cosine domain {qi } . The polynomials F1 ( z ) and F2 ( z ) evaluated at z = e jω can be written as:. F (ω ) = 2e − j 5ω C ( x) ,. 11.

(26) with C ( x) = T5 ( x) + f (1)T4 ( x) + f (2)T3 ( x) + f (3)T2 ( x) + f (4)T1 ( x) + f (5) / 2,. (2.11). where Tm ( x) = cos(mω ) is the mth order Chebyshev polynomials, and f (i ), i = 1,...5 are the coefficients of either F1 ( z ) or F2 ( z ) . The polynomial C (x) is evaluated at a certain value of x = cos(ω ) using the recurrence relation: Tk ( x) = 2 xTk −1 ( x) − Tk − 2 ( x). for k = 2,3,...,. (2.12). and trigonometric representation on [-1, 1] TN ( x) = cos( N arccos( x)). for. − 1 ≤ x ≤ 1.. (2.13). and then we obtain the following recursive relation: for k = 4 down to 1 λ k = 2 xλ k + 1 − λ k + 2 + f ( 5 − k ) end C( x ) = xλ 1 − λ 2 + f (5) / 2, with initial values λ 5 = 1 and λ 6 = 0.. 2.2.2.4 Monitoring resonance in LPC spectrum (all modes) Resonances in the LPC filter are monitored to detect possible problem areas where divergence between the adaptive codebook memories in the encoder and the decoder could cause unstable filters in areas with highly correlated continuous signals. Typically, this divergence is due to channel errors. The monitoring of resonance signals is performed using unquantized LSPs qi , i = 1,...,10. The LSPs are available after the LP to LSP conversion. The algorithm utilizes the fact that LSPs are closely located at a peak in the spectrum. First, two distances, dist1 and dist 2 , are calculated in two different. regions,. defined. as dist1 = min(qi − qi +1 ), i = 4,...8,. and. another. as. dist 2 = min(qi − qi +1 ), i = 2,3. Either of these two minimum distance conditions must be fulfilled to classify the frame as a resonance as a resonance frame and increase the. 12.

(27) resonance counter. 12 consecutive resonance frames are needed to indicate possible problem condition, otherwise the LSP_flag is cleared.. 2.2.3 Open-loop pitch analysis Open-loop pitch analysis is performed in order to simplify the pitch analysis and confine the closed-loop pitch search to a small number of lags around the open-loop estimated lags. Open-loop pitch estimation is based on the weighted speech signal s w (n) which is obtained by filtering the input speech signal through the weighting filter W ( z ) = A( z / γ 1 ) / A( z / γ 2 ). Open-loop pitch analysis is performed as follows. In the first step, 3 maxima of the correlation: 79. Ok = ∑ s w ( n ) s w ( n − k ). (2.14). n =0. are found in the three ranges:. i = 3:. 18,...,35,. i = 2: i = 1:. 36,...,71,. The. ∑. 72,...,143.. retained. nw. maxima. Oti , i = 1,...3,. are. normalized. by. dividing. by. s w2 (n − t i ),i = 1,...3, respectively. The normalized maxima and corresponding. delays are denoted by ( M i , t i ), i = 1,...,3. The winner, Top , among the three normalized correlations is selected by favouring the delays with the values in the lower range. This is performed by weighting the normalized correlations corresponding to the longer delays. This procedure of dividing the delay range into 3 clauses and favouring the lower clauses is used to avoid choosing pitch multiples.. 2.2.4 Impulse response computation (all modes). 13.

(28) The. impulse. response,. h(n). [. ]. H ( z )W ( z ) = A( z / γ 1 ) / Aˆ ( z ) A( z / γ 2 ). ,. of. the. weighted. synthesis. filter. is computed each subframe. This impulse. response is needed for the search of adaptive and fixed codebooks. The use of unquantized coefficients gives a weighting filter that matches better the original spectrum. The values of γ 1 and γ 2 modify the frequency response of the filter W (z ) , and thereby the amount of noise weighting. It also deemphasizes the error at the formant regions of speech spectrum.. 2.2.5 Target signal computation (all modes) The target signal for adaptive codebook search is usually computed by subtracting the zero input response of the weighted synthesis filter H ( z )W ( z ) from the weighted speech signal s w (n) . This is performed on a subframe basis. An equivalent procedure for computing the target signal is filtering of the LP residual signal res LP (n) through the combination of synthesis filter 1 / Aˆ ( z ) and the weighting filter A( z / γ 1 ) / A( z / γ 2 ) . After determining the excitation for the subframe, the initial states of these filters are updated by filtering the difference between the LP residual and excitation. The residual signal res LP (n) which is needed for finding the target vector is also used in the adaptive codebook search to extend the past excitation buffer. This simplifies the adaptive codebook search procedure for delays less than the subframe size of 40.. 2.2.6 Adaptive codebook 2.2.6.1 Adaptive codebook search Adaptive codebook search is performed on a subframe basis. It consists of performing closed-loop pitch search, and then computing the adaptive codevector by. 14.

(29) interpolating the past excitation at the selected fractional pitch lag. The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive codebook approach for implementing the pitch filter, the excitation is repeated for delays less then the subframe length. In the search stage, the excitation is extended by the LP residual to simplify the closed-loop search. Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In the first (and third) subframe the range Top ± 3 is searched. For the other subframes, closed-loop pitch analysis is performed around the integer pitch selected in the previous subframes. The closed-loop pitch search is performed by minimizing the mean-square weighted error between the original and synthesized speech. This is achieved by maximizing the term [9]:. ∑ n = 0 x ( n ) yk ( n ) , 39 ∑ n = 0 y k ( n ) yk ( n ) 39. R( k ) =. (2.15). where x(n) is the target signal and y k (n) is the past filtered excitation at delay k(past excitation with h(n) . Note that the search range is limited around the open-loop pitch. The convolution y k (n) is computed for the first delay t min in the searched range, and for the other delays in the search range k = t min + 1,..., t max , it is updated using the recursive relation: y k ( n) = y k −1 ( n − 1) + u( − k ) h( n) ,. (2.16). where u (n), n = −(143 + 11),...,39, is the excitation buffer. Note that in search stage, the samples u (n), n = 0,...,39 , are not known, and they are needed for pitch delays less then 40. To simplify the search, the LP residual is copied to u (n) in order to make the relation in equation (38) valid for all delays. Once the optimum integer pitch delay is determined, the fractions with a step of 1/6 (or 1/3) around that integer are tested [10]. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its. 15.

(30) maximum. The interpolation is performed using an FIR filter based on a Hamming windowed sin( x) / x function truncated and padded with zero. The filter has its cut-off frequencies (-3 dB) at 3600 Hz in the over-sampled domain. Once the fractional pitch lag is determined, the adaptive codebook vector v(n) is computed by interpolating the past excitation signal u (n) at the given integer delay k and phase (fraction) t. The interpolation filter is also based on a Hamming windowed sin( x) / x function truncated and padded with zero. The filter has a cut-off frequency (-3dB) at 3600 Hz in the over-sampled domain. The adaptive codebook gain is then found by:. gp. ∑ = ∑. 39. n=0 39. x ( n) y ( n). y ( n) y ( n) n=0. , bounded by 0 ≤ g p ≤ 1.2 (2.17). where y( n) = v( n)∗ h( n) is the filtered adaptive codebook vector (zero state response of H ( z)W ( z) to v( n) ). The computed adaptive codebook gain is quantified using non-uniform scalar quantization in the range [0.0, 1.2].. 2.2.6.2 Adaptive codebook gain control (all modes) The average adaptive codebook gain is calculated if the LSP_flag is set and the unquantized adaptive codebook gain exceeds the gain threshold GPth = 0.95 . The average gain is calculated from the present unquantized gain and the quantized gains of the seven previous subframes. That is, GPave = mean{g p (n), gˆ p (n −1),...,gˆ p (n − 7)}, where n is the current subframe. If the average adaptive codebook gain exceeds the GPth , the unquantized gain is limited to the threshold value and the GpC_flag is set to indicate the limitation.. 2.2.7 Algebraic codebook. 16.

(31) The algebraic codebook (innovation codebook) is for the secondary excitation computation. The vectors contained in the excitation forms a very important part in the CELP coding algorithm. They serve two main purposes: first, they provide the start-up information to the LTP memory, and this includes any sudden changes in the speech not adequately tracked by the LTP. Second, they supply the ‘filling in’ information that the LTP omitted. This is especially the case during unvoiced region. In the figure shows the general framework for innovation codebook driven by algebraic codes. Shaping function F can be fixed or changed dynamically as illustrated.. 2.2.7.1 Algebraic codebook structure The algebraic codebook structure is based on interleaved single-pulse permutation (ISPP) design. In this codebook, the innovation vector contains some non-zero pulses. All pulses can have the amplitudes +1 or –1. The 40 positions in a subframe are divided into a few tracks, where each track contains one or two pulses. Each pulse position in one track is encoded with some bits and the sign of the first pulse in the track is encoded with one bit. For two pulses located in the same track, only one sign bit is needed. This sign bit indicates the sign of the first pulse. The sign of the second pulse depends on its position relative to the first pulse. If the position of the second pulse is smaller, then it has opposite sign, otherwise it has the same sign then in the first pulse.. 2.2.7.2 Algebraic codebook search The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesized speech. The target signal used in the closed-loop pitch search is updated by subtracting the adaptive codebook contribution. That is, x 2 (n ) = x(n ) − gˆ p y (n ),n = 0,...,39. 17. (2.18).

(32) where y( n) = v( n)∗ h( n) is the filtered adaptive codebook vector and gˆ p is quantified adaptive codebook gain. If c k is the algebraic codevector at index k, then the algebraic codebook is searched by maximizing the term : 2 Ck ) (d c k ) ( = = t. Ak. 2. c tk Φ c k. ED k. ,. (2.19). where d = H t x 2 is the correlation between the target signal x 2 ( n) and the impulse response h( n) , H is a lower triangular Toepliz convolution matrix with diagonal h( 0) and lower diagonals h(1),..., h(39 ) , and Φ = H t H is the matrix of correlations of h( n) . The vector d (backward filtered target) and the matrix Φ are computed prior to the codebook search. To simplify the search procedure, the pulse amplitudes are preset by the mere quantization of an appropriate signal b( n) . This is simply done by setting the amplitude of a pulse at a certain position equal to the sign of b( n) at that position. b( n) is the correlated signal corresponding to the d ( n) . Having preset the pulse amplitudes, the optimal pulse positions are determined using an efficient non-exhaustive analysis-by-synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of position combination. During iterations, at least one pulse is located in a position corresponding to the global maximum and one pulse is located in a position corresponding to one of the 4 local maxima. A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre-filter FE (z ) which enhances special spectral components in order to improve the synthesized speech quality. Here the filter FE ( z ) = 1 /(1 − βz −T ) is used, where T is the nearest integer pitch lag to the closed-loop fractional pitch lag of the subframe, and β is a pitch gain. β is given by the quantified pitch gain bounded by [0.0, 1.0]. Note that prior to the codebook search, the impulse. response. h(n). must. include. the. pre-filter. FE (z ) .. That. h(n) = h(n) − βh(n − T ), n = T ,...,39. The fixed codebook gain is then found by:. 18. is,.

(33) xt z gc = 2t z z. (2.20). where x 2 is the target vector for fixed codebook search and z is the fixed codebook vector convolved with h( n) , n. z (n )=∑ c(i )h(n−i ), n=0,...,39.. (2.21). i =0. 2.2.8 Quantization of adaptive and fixed codebook gains 2.2.8.1 Adaptive codebook gain limitation If the GpC_flag is set, the limited adaptive codebook gain is used in the gain quantization. The quantization codebook search range is limited to only include adaptive codebook gain values less then GPth . This is performed in the quantization search for all modes.. 2.2.8.2 Quantization of codebook gains The fixed codebook gain quantization is performed using MA prediction with fixed coefficients. The 4th order MA prediction is performed on the innovation energy. Let E ( n) be the mean-removed innovation energy (in dB) at subframe n, and given by: N −1 ⎛1 ⎞ E (n ) = 10 log⎜ g c2 ∑ c 2 (i )⎟ − E , i =0 ⎝N ⎠. (2.22). where N = 40 is the subframe size, and c( i ) is the fixed codebook excitation. E (in dB) is the mean of the innovation energy and a pre-defined value. The predicted energy is given by: 4 ~ E (n ) = ∑ bi Rˆ (n − i ) i =1. 19. (2.23).

(34) where bi are the MA prediction coefficients, and R$ ( k ) is the quantified prediction error at subframe k. The predicted energy is used to compute a predicted fixed ~ codebook gain g c′ (by substituting E ( n) by E ( n) and g c by g c′ ). First, the mean innovation energy is found by: ⎛1 E I = 10 log ⎜ ⎜N ⎝. ⎞ 2 ⎟ c ( j ) ∑ ⎟ ⎠ j=0. N −1. (2.24). and then the predicted gain is found by: g c′ = 10. ~ 0.05( E ( n ) + E − E I ). .. (2.25). A correction factor between the gain g c and the estimated g c′ is given by:. γ gc = g c g c′ .. (2.26). Note that the prediction error is given by: ~ R(n) = E(n) − E (n) = 20 log (γ gc ).. (2.27). The correction factor γ gc is computed using a mean energy value E . The correction vector γ gc is quantified using an individual codebook or jointly vector quantized with adaptive codebook gain. If the correction factor γ gc is quantized individually, the quantization table search is performed by minimizing the error EQ = (g c − γˆ gc g c′ ) . 2. (2.28). Otherwise, The gain codebook search is performed by minimizing the square of the weighted error between original and reconstructed speech which is given by: 2. E = x − g py − gcz .. 20. (2.29).

(35) An adaptor based on the coding gain in the adaptive codebook decides if the coding gain is low. If this is the case, the correction factor codebook is searched once more minimizing a modified criterion in order to find a new quantized fixed codebook gain. The modified criterion is given by: 2 E mod = (1 − α ) ⋅ c ⋅ (g c − γˆ gc ⋅ g c′ ) + α ⋅ 2. (. E res − Eexc. ). 2. (2.30). where E res and E exc are the energy (the squared norm) of the LP residual and the total excitation, respectively. The criterion is searched with the already quantized adaptive codebook gain and the correction factor γˆ gc that minimizes (60) is selected. The balance α decides the amount of energy matching in the modified criterion. This factor is adaptively decided based on the coding gain in the adaptive codebook as computed by: ag = 10 ⋅ log10. res LP. 2. res LP − v. 2. (2.31). if the coding gain ag is less then 1 dB, the modified criterion is employed, except when an onset is detected. An onset is said to be detected if the fixed codebook gain in the current subframe is more then twice the value of the fixed codebook gain in the previous subframe. A hangover of 8 subframes is used in the onset detection so that the modified criterion is not used for the next subframes either if an onset is detected. The balance factor α is computed from the median filtered adaptive coding gain. The current and the ag-values for the previous 4 subframes are median filtered to get ag m . The α -factor is computed by: ⎧ 0 ag m > 2 ⎪ α = ⎨0.5 ⋅ (1 − 0.5 ⋅ ag m ) 0 < ag m < 2 ⎪ 0.5 ag m < 0 ⎩. 2.2.9 Memory update (all modes). 21. (2.32).

(36) An update of the states of the synthesis and weighting filters is needed in order to compute the target signal in the next subframe. After the two gains are quantified, the excitation signal, u( n) , in the present subframe is found by: u (n ) = gˆ p v(n ) + gˆ c c(n ),n = 0,..,39 .. (2.33). The states of the filters can be updated by filtering the signal res LP (n) − u(n) (difference between residual and excitation) through the filters 1 A$ ( z ). and. A( z γ 1 ) A( z γ 2 ) for the 40-sample subframe and saving the states of the filters). A simpler approach which requires only one filtering is as follows. The output of the filter 1 A$ ( z ) due to the input res LP (n) − u(n) is equivalent to e( n) = s( n) − s$( n) . So the states of the synthesis filter are given by e(n ),n = 30,...,39 . Updating the states of the filter A( z γ 1 ) A( z γ 2 ) can be done by filtering the error signal e( n) through this filter to find the perceptually weighted error ew (n ) = x(n ) − gˆ p y (n ) − gˆ c z (n ) . Since the signals x( n) , y( n) , and z( n) are available, the states of the weighting filter are updated by computing ew ( n) for n = 30,...,39 .. 2.3 Functional description of the decoder The function of the decoder consists of decoding the transmitted parameters (LP parameters, adaptive codebook vector, adaptive codebook gain, fixed codebook vector, fixed codebook gain) and performing synthesis to obtain the reconstructed speech. The reconstructed speech is then post-filtered and upscaled. The signal flow at the decoder is shown in figure 2.5.. 2.3.1 Decoding and speech synthesis. 22.

(37) The received indices of LSP quantization are used to reconstruct the quantified LSP vectors. The interpolation is performed to obtain 4 interpolated LSP vectors (corresponding to 4 subframes). For each subframe, the interpolated LSP vector is converted to LP filter coefficient domain a k , which is used for synthesizing the reconstructed speech in the subframe. frame. subframe. post-processing. LSP indices decode LSP. interpolation of LSP for the 4 subframes. LSP. pitch index. decode adaptive codebook. gains indices. construct. decode. excitation. gains code index. ^ synthesis s(n) filter. post filter. ^s'(n). decode innovative codebook. ^ A(z). Figure 2.4:. Simplified block diagram of the adaptive multi-rate decoder. The following steps are repeated for each subframe: 1.. Decoding of the adaptive codebook vector: The received pitch index (adaptive. codebook index) is used to find the integer and fractional parts of the pitch lag. The adaptive codebook vector v(n) is found by interpolating the past excitation u (n) (at the pitch delay) using the FIR filter. 2.. Decoding of the innovative codebook vector: The received algebraic codebook. index is used to extract the position and amplitudes (signs) of the excitation pulses and to find the algebraic codebook codevector c(n) . If the integer part of the pitch lag, T, is less than the subframe size 40, the pitch sharpening procedure is applied which translates into modifying c(n) by c(n) = c(n) + βc(n − T ) , where β is the decoded pitch gain, gˆ p , bounded by [0.0,1.0] or [0.0,0.8], depending on mode.. 23.

(38) 3.. Decoding of the adaptive and fixed codebook gains: In case of scalar quantization. of the gains the received indices are used to readily find the quantified adaptive codebook gain, gˆ p , and the quantified fixed codebook gain correction factor, γˆ gc , from the corresponding quantization tables. In case of vector quantization of the gains, the received index gives both the quantified adaptive gains, gˆ p , and the quantified fixed codebook gain correction factor, γˆ gc . 4.. Smoothing of the fixed codebook gain: An adaptive smoothing of the fixed. codebook gain is performed to avoid unnatural fluctuations in the energy contour. The smoothing is based on a measure of the stationary of the short-term spectrum in the q domain. 5.. Anti-sparseness. processing:. An. adaptive. anti-sparseness. post-processing. procedure is applied to the fixed codebook vector c(n) in order to reduce perceptual artifacts arising from the sparseness of the algebraic fixed codebook vectors with only a few non-zero samples per subframe. The anti-sparseness processing consists of circular convolution of the fixed codebook vector with an impulse response. The selection of the impulse response is performed adaptively from the adaptive and fixed codebook gains [3]. 6.. Computing the reconstructed speech: Before the speech synthesis, a. post-processing of excitation elements is performed. This means that the total excitation is modified by emphasizing the contribution of the adaptive codebook vector. Adaptive gain control (AGC) is used to compensate for the gain difference between the non-emphasized excitation u (n) and emphasized excitation uˆ (n) . 7.. Additional instability protection: An additional instability protection is. implemented in the speech decoder which is monitoring overflows in the synthesis filter. If an overflow has occurred in the synthesis part, the whole adaptive codebook memory, v(n), n = −(143 + 11),...,39 is scaled down by a factor of 4, and the synthesis filtering is repeated using this down-scaled memory.. 24.

(39) 2.3.2 Post-processing 2.3.2.1 Adaptive post-filtering (all modes) As the encoding rate goes down, the SNR drops and the noise floor of this white coding noise is elevated to such an extent that it is very difficult to keep it below the threshold of audibility. In speech perception, the formants of speech are perceptually much more important then spectral valley regions. A good strategy is to sacrifice valley regions and preserve the formants. An important feature of the frequency response of the adaptive post-filter is that the spectral envelope peaks corresponding to the formants have roughly the same height. This feature ensures that the relative intensity of the formants will remain roughly unchanged after post-filtering [12]. The adaptive post-filter is the cascade of two filters: a formant post-filter, and a tilt compensation filter. The post-filter is updated every subframe of 5ms. The formant post-filter is given by: H f ( z) =. Aˆ ( z / γ n ) Aˆ ( z / γ ). (2.34). d. where Aˆ ( z ) is the received quantified (and interpolated) LP inverse filter (LP analysis is not performed at the decoder), and the factors γ n and γ d control the amount of the formant post-filtering. To further reduce the low-pass effect, we added a first-order filter with a transfer function H t (z ) to compensate for the tilt in the formant post-filter H f (z ) and is given by: H t ( z ) = 1 − µ z −1. 25. (2.35).

(40) where µ = γ t k1' is a tilt factor, with k1' being the first reflection coefficient calculated on the truncated ( Lh = 22 ) impulse response, h f (n) , of the filter Aˆ ( z / γ n ) / Aˆ ( z / γ d ) . k1' is given by: k1' =. Lh −i −1 rh (1) ; rh (i ) = ∑ h f ( j )h f ( j + i ) rh (0) j =0. (2.36). Adaptive gain control (AGC) is used to compensate for the gain difference between the synthesized speech signal sˆ(n) and the post-filtered signal sˆ f (n) . The gain scaling factor γ sc for the present subframe is computed by: 39. γ sc =. ∑ sˆ. 2. ∑ sˆ. 2 f. n =0 39. n =0. ( n) (2.37) ( n). The gain-scaled post-filtered signal sˆ ' (n) is given by: sˆ ' (n) = β sc (n) sˆ f (n). (2.38). where β sc (n) is updated in sample-by-sample basis and given by:. β sc (n) = αβ sc (n − 1) + (1 − α )γ sc. (2.39). where α is AGC factor.. 2.3.2.2 High-pass filtering and up-scaling The high-pass filter serves as a precaution against undesired low frequency components. A filter cut-off frequency of 60 Hz is used. Up-scaling consists of multiplying the post-filtered speech by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal.. 26.

(41) 2.4 Bit Allocation The bit allocation of the AMR codec modes is shown in Table 2.1. In each 20ms speech frame, 95, 103, 118, 134, 148, 159, 204 or 244 bits are produced, corresponding to a bit-rate of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2 or 12.2 kbit/s. Note that the most significant bits (MSB) are always sent first [3].. 27.

(42) Mode. 12.2 kbit/s (GSM EFR). 10.2 kbit/s. 7.95 kbit/s. 7.40 kbit/s (TDMA EFR). 6.70 kbit/s (PDC EFR). 5.90 kbit/s. 5.15 kbit/s. 4.75 kbit/s. Parameter. 1st 2nd 3rd 4th subframe subframe subframe subframe. total per frame. 2 LSP sets Pitch delay Pitch gain Algebraic code Codebook gain Total. 9 4 35 5. 6 4 35 5. 9 4 35 5. 6 4 35 5. 38 30 16 140 20 244. LSP set Pitch delay Algebraic code Gains Total. 8 31 7. 5 31 7. 8 31 7. 5 31 7. 26 26 124 28 204. LSP sets Pitch delay Pitch gain Algebraic code Codebook gain Total. 8 4 17 5. 6 4 17 5. 8 4 17 5. 6 4 17 5. 27 28 16 68 20 159. LSP set Pitch delay Algebraic code Gains Total. 8 17 7. 5 17 7. 8 17 7. 5 17 7. 26 26 68 28 148. LSP set Pitch delay Algebraic code Gains Total. 8 14 7. 4 14 7. 8 14 7. 4 14 7. 26 24 56 28 134. LSP set Pitch delay Algebraic code Gains Total. 8 11 6. 4 11 6. 8 11 6. 4 11 6. 26 24 44 24 118. LSP set Pitch delay Algebraic code Gains Total. 8 9 6. 4 9 6. 4 9 6. 4 9 6. 23 20 36 24 103. 8 9. 4 9. 4 9. 4 9. LSP set Pitch delay Algebraic code Gains Total. 8. 8. Table 2.1: Bit allocation of the AMR coding algorithm for 20ms frame. 28. 23 20 36 16 95.

(43) Chapter 3 Overview of IEEE 802.16a FEC Scheme 3.1 Introduction to IEEE 802.16a Standard The IEEE 802.16a standard amends IEEE standard 802.16 by enhancing the medium access control layer and providing additional physical layer specifications in support of broadband wireless access at frequencies from 2 to 11GHz. The resulting standard specifies the air interface of fixed (stationary) broadband wireless access systems providing multiple services. The medium access control layer is capable of supporting multiple physical layer specifications optimized for the frequency bands of application. The standard includes a set of particular physical layer specifications applicable to systems operating between 2 and 66 GHz. It supports point-to-multipoint and optional mesh topologies [14]. This standard is a part of a family of standards for local and metropolitan area networks. The relationship between the standard and other members of the family is shown in Fig. 3.1 (The numbers in the figure refer to IEEE standard designations). The family of standards deals with the Physical and the Data Link Layers as defined by the international Organization for Standardization (ISO) Open Systems Interconnection Basic Reference Model. The access standards define several types of medium access technologies and the associated physical media, each appropriate for particular applications or system objectives. Other types are under investigation [14].. 29.

(44) This thesis focuses on the Reed-Solomon decoder acceleration and the DSP implementation issues of the Reed-Solomon and Viterbi decoder in the IEEE 802.16a Forward Error Correction (FEC) Decoding scheme. Therefore, we will concentrate on introducing the FEC specifications defined in IEEE 802.16a physical layer part in the next section. In the last part of this chapter, we will show the block diagrams of the program conventionally implemented and also described briefly some modification and our contribution to improve the implementation structure by reducing the computational complexity. The detail of our improvement will be described in the latter chapter.. Figure 3.1: IEEE local and metropolitan area networks standards family.. 3.2 IEEE 802.16a FEC Specifications The overall physical layer structure of the channel coding scheme is shown in Fig. 3.2, where the Reed-Solomon Code and the Convolutional Code are major parts of the FEC scheme, and the randomizer and the interleaver are additional modules for further improving the error performance of the FEC scheme. The detailed specifications of each part are introduced in the following subsections, excluding the modulator, which is not implemented in our research subproject.. 30.

(45) Randomizer. Reed-Solomon Encoder. Convolutional Encoder. Interleaver. Modulator. De-randomizer. Reed-Solomon Decoder. Convolutional Decoder. De-interleaver. De-modulator. Figure 3.2: Channel coding structure at the transmitter side (top) and the receiver side (bottom).. 3.2.1 Randomizer Data randomization is performed on data transmitted on the downlink (DL) and uplink (UL). The randomization is performed on each allocation (DL or UL), which means that for each allocation of a data block (subchannels on the frequency domain and OFDM symbols on the time domain) the randomizer shall be used independently. If the amount of data to transmit does not match exactly the amount of data allocated, symbol “0xFF” (“1” only) should be padded to the transmission block until the allocated data are filled.. Figure 3.3: PRBS for Data Randomization.. The randomizer is a Pseudo Random Binary Sequence (PRBS) generator depicted in Fig. 3.3. As shown in the figure, source bit randomization is performed by the. 31.

(46) modulo-2 adder and the Linear-Feedback Shift Register (LFSR) with characteristic polynomial 1+X14+X15. Each data byte to be transmitted shall enter sequentially (msb first) into the randomizer to make the “0” and “1” bits well-distributed in the output data streams and hence improve the coding performance. The randomizer sequence is applied only to information bits. Preambles are not randomized. The shift-register of the randomizer shall be initialized for every 1250 bytes passed through (if the allocation is larger then 1250 bytes). In the downlink, the randomizer shall be re-initialized at the start of each frame with the sequence (msb) 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 (lsb). In the uplink, the randomizer is initialized with the vector created as shown in Fig. 3.4.. Figure 3.4: Creation of OFDMA Randomizer Initialization Vector.. 3.2.2 Forward Error Correction Coding Forward error correction is used to decrease bit error rate (BER) on noisy communication channels. This is achieved by a method known as channel coding, which adds redundant information to the transmitted data. With forward error correction, transmission errors are corrected at the decoder, without requesting a retransmission. Convolutional encoding and block coding are two major forms of channel coding. In our IEEE 802.16a OFDMA project, both convolutional code and block code (Reed-Solomon Code) are employed.. 32.

(47) The Forward Error Correction scheme used in the IEEE 802.16a standard, as shown in Fig. 3.5, consisting of the concatenation of a Reed-Solomon outer code and a rate-compatible convolutional inner code, is supported on both UL and DL. The input data streams are first divided into RS (Reed-Solomon) blocks of which the size is determined by parameter k defined in RS code specification, then encoded by a RS encoder, and each RS coded block is then encoded by a convolutional encoder. Convolutional code is one kind of sequential codes, but RS code is a block code. Overall it makes the whole concatenated code a block-based coding scheme.. Concatenated Decoder. Concatenated Encoder. Reed-Solomon Encoder. Reed-Solomon Decoder. Convolutional Encoder. Convolutional Decoder. Figure 3.5: Forward Error Correction structure in transmitter side (left) and receiver side (right).. In order to make the system more flexible and adaptable to the channel condition, there are six coding-modulation schemes provided in the standard, as shown in Table 3.1(notice that 64QAM is an optional mode). The different coding rates are made by shortening and puncturing the original RS code and with puncturing of the original convolutional code. The shortened- and- punctured mechanisms in RS code can provide different block size and hence different error-correction capability through the same RS Codec (Coder / Decoder). Similarly, the convolutional code can provide variable code rates through the same codec by applying the puncturing rule. Thus it can suit the variable block size of the shortened-and-punctured RS code to achieve a desired overall coding rate.. 33.

(48) Modulation. Uncoded Block Overall Coding Coded Block Size (bytes) Rate Size (bytes). RS Code. CC Code Rate. QPSK. 18. 1/2. 36. (24,18,3). 2/3. QPSK. 26. ~3/4. 36. (30,26,2). 5/6. 16-QAM. 36. 1/2. 72. (48,36,6). 2/3. 16-QAM. 54. 3/4. 72. (60,54,3). 5/6. 64-QAM. 72. 2/3. 108. (81,72,4). 3/4. 64-QAM. 82. ~3/4. 108. (90,82,4). 5/6. Table 3.1: Mandatory Channel Coding per Modulation.. 3.2.2.1 Reed-Solomon Code Specification The Reed-Solomon encoding is derived from a systematic RS (N=255, K=239, T=8) code using GF(28),where N is the number of overall bytes after encoding, K is the number of data bytes before encoding, and T is the number of data bytes which can be corrected from errors. The galois field used in this code is generated by the field generator polynomial: p(x) = x8 + x4 + x3 + x2 + 1, and the codeword is generated by the code generator polynomial: g(x) = (x +λ0)(x +λ1)(x +λ2)…(x +λ2T-1). This code is shortened and punctured to enable variable block sizes and variable error-correction capability. When a block is shortened to K’ data bytes, the first 239 – K’ bytes of the encoder block are filled with “0”s. When a codeword is punctured to permit T’ bytes to be corrected, only the first 2T’ of the total 16 codeword bytes are employed.. 3.2.2.2 Convolutional Code Specification After the RS encoding process, each RS block is then encoded by the binary convolutional encoder, which has native rate 1/2, a constraint length K=7, and uses the following generator polynomials to derive its two code bit outputs:. 34.

(49) G1 = 171OCT. FOR X. G2 = 133OCT. FOR Y. The generator is depicted in Fig. 3.6.. Figure 3.6: Convolutional Encoder of Rate 1/2.. Puncturing patterns and serialization order which is used to generate variable code rates are defined in Table 3.2. In the table, a “1” denotes a transmitted bit and a “0” denotes a removed bit, whereas X and Y correspond to Fig. 3.6.. Code Rates Rate. 2/3. 3/4. 5/6. dfree. 6. 5. 4. X. 10. 101. 10101. Y. 11. 110. 11010. XY. X1Y1Y2. X1Y1Y2X3. X1Y1Y2X3Y4X5. Table 3.2: The Inner Convolutional Code with Puncturing Configuration.. Furthermore, a tail-biting mechanism is adopted in our convolutional code, by initializing the encoder’s memory with the last data bits of the RS block being encoded.. 35.

(50) 3.2.3 Interleaver All encoded data bits are interleaved by a block interleaver with a block size corresponding to the number of coded bits per the specified allocation, Ncbps (see Table 3.3) to protect the convolutional code from severe impact of burst errors and therefore increase the coding performance. The interleaver is defined by a two step permutation. The first permutation ensures that adjacent coded bits are mapped onto nonadjacent carriers. The second permutation ensures that adjacent coded bits are mapped alternately onto less or more significant bits of the constellation, thus avoiding long runs of lowly reliable bits.. Modulation. Coded Bits per Bit Interleaved Block (Ncbps). Modulo Used (d). QPSK. 288. 16. 16-QAM. 576. 18. 64-QAM. 864. 16. Table 3.3: Bit Interleaved Block Sizes and Modulo.. Now let Ncpc be the number of coded bits per carrier, i.e. 2, 4 or 6 for QPSK, 16QAM or 64QAM, respectively. Let s = Ncpc/2. Let k be the index of the coded bit before the first permutation at transmission, m be the index after the first and before the second permutation and j be the index after the second permutation, just prior to modulation mapping, and d be the modulo used for the permutation.. The first permutation is defined by the rule: m = (Ncbps/d) * kmod(d) + floor(k/d),. k = 0, 1, …, Ncbps – 1. The second permutation is defined by the rule: J = s * floor(m/s) + (m + Ncbps – floor(d*m Ncbps))mod(s),. 36. m = 0, 1, …, Ncbps -1.