應用於無線視訊傳輸之可變長度解碼器

全文

(1)國立交通大學電子工程學系電子研究所碩士班. 碩士論文. 應用於無線視訊傳輸之可變長度解碼器. Soft Variable Length Decoding for the Wireless Video Transmission. 研究生:劉子明指導教授:李鎮宜博士. 中華民國九十三年六月.

(2) 應用於無線視訊傳輸之可變長度解碼器. 學生: 劉子明. 指導教授: 李鎮宜博士. 國立交通大學電子工程學系電子研究所碩士班. 摘要可變長度編碼在近來的視訊及影像編碼當中一直被廣泛的應用。然而，傳統的解碼方式可能會因為傳輸的錯誤而產生非同步的解碼，甚至產生錯誤傳遞的情況。而為了改善其容錯的能力，越來越多的研究者投入心力在聯合訊源與通道編碼的設計領域上。一種新型的可變長度解碼器已經慢慢的浮現出來，他能夠在頻寬有限和廣播的系統當中抵抗傳輸的錯誤。而這樣的解碼方式通常需要保留許多的狀態，尤其是當編碼表格很大的時候。因此，這種新型的解碼方式在實作的時候，會產生很高的複雜度和需要很大的記憶體容量。為了減少表格的大小和記憶體讀取的次數，我們提出了一種低複雜度和低記憶體使用量的方式。甚者，我們更提出一種 ”Symbol-alias” 的量測方法來提高對於解碼效能的猜測。利用我們所提出的 Black-Box 模型，我們可以在效能以及複雜度上取得最佳的平衡點。最後，利用我們所提出的效能模型，一個高效能、低複雜度的可變長度解碼器已被我們實現。在可允許的效能損失之下，它不只減低了記憶體的使用量，更減少了表格的大小。而整個系統的模擬是在 MPEG-4/UDPLite/UEP/AWGN 的平台上所實現。無論是跟傳統的解碼或者是擁有錯誤更正能力的 RVLC 解碼的比較上，我們平均可以提昇畫面的品質 0.4~2.9dB ，並且提供更好的主觀品質。. i.

(3) Soft Variable Length Decoding for the Wireless Video Transmission. Student: Tsu-Ming Liu. Advisor: Dr. Chen-Yi Lee. Department of Electronics Engineering& Institute of Electronics National Chiao Tung University. Abstract Variable Length Codes (VLCs) are extensively used in recent video and image coding standard. However, traditional table look-up hard decoding may lose synchronization and induce error propagation over a noisy channel. To improve the error resilience of VLC, more and more researchers pay lots of attention about the joint source and channel design. The soft VLC decoding method has emerged to resist the channel disturbances on the environment of band-limited and broadcasting system. Such design generally needs to maintain many states when the table size grows. Hence, soft VLC decoders have problems of high complexity and high memory access. To reduce the table size and the number of memory access, we propose a soft VLC decoder with low memory access and low complexity approach. Further, a novel measurement of “symbol-alias” is presented to provide more accurate performance estimation. With the proposed Black-Box model, we can achieve the optimal trade-off between performance and complexity. Finally, a memory-efficient and low-complexity soft VLC decoder using performance modeling is proposed. It exploits not only modified sorting scheme to reduce the memory access, but also table redundancy to reduce the table size at the cost of minor performance loss. The system evaluation is achieved in the model of MPEG-4/UDP-Lite/UEP/AWGN. We averagely improve the PSNR by 0.4~2.9dB (i.e. 40~80% improvement) and offer better subjective quality compared with the traditional VLC decoding and standard-support RVLC decoding. ii.

(4) Acknowledgements I would like to express my deepest gratitude to my advisor Dr. Chen-Yi Lee for his sophomore enthusiastic guidance and encouragement throughout the research, and give him and his family my best wish faithfully.. Especially, I much appreciate my senior Mr. Wen-Hsiao Peng and junior Mr. Sheng-Zen Wang for their fruitful discussion and comments during my research. Also, I would like to thanks my senior Dr. Bai-Jue Shieh and my SI2 group mate Mr. Cheng-Hung Liu for their great help in the period of my research. In addition, I want to thank all members of the SI2 group of NCTU for plenty of worthwhile assistance in my graduated lives.. Finally, I give the greatest respect and love to my family and my girl-friend, Yu-Fen Chuang. I much admire her thoughtfulness, and I want to express my highest appreciation and dedicate the thesis to her for assisting me to achieve the most important stage in my life. I never let her down and hope her and my family happy now and forever. iii.

(5) Contents Chapter 1.. Introduction …………….………………………………. 1. 1.1. Motivation ….………………………………………….…………………. 1. 1.2. Joint Source and Channel Design …………………………..……………. 2. 1.3. Contributions of this Thesis ..……………………………….……………. 9. 1.4. Thesis Organization ……………………………………….……………. 11. Chapter 2.. Soft Decoding of Variable Length Codes ..…………….12. 2.1. Background…….………………………………………………………… 12. 2.2. Soft Input Soft Output Algorithm….…….……………………………… 13. 2.2.1. Algorithm Translation …….………………………………………… 13. 2.2.2. Algorithm Modification …….………………………….…………… 15. Chapter 3. 3.1. Memory Efficient Design Approach …….….…………19. Adaptive Selection Algorithm …….…………………………………….. 19. 3.1.1. Modified Sorting Scheme…….………………………….………….. 19. 3.1.2. Performance Comparison …….………………………….…………. 21. 3.2. Complexity Analysis…….…………………………………….………… 22. 3.3. Summary …….………………………………………………..………… 24. Chapter 4. 4.1. Symbol Merging Algorithm …….……………………………….……… 25. 4.1.1 4.2. Low Complexity Design Approach…………………… 25 Metric Formulation of “Balance Degree” …….…………….……… 27. Table Merging Algorithm …….………………………………….……… 28. 4.2.1. Code-Word Merging …..….………………………………………… 28. 4.2.2. Prefix Merging ……………………………………………………… 29. 4.2.3. Merged Table ………..……………………………………….……… 29. 4.3. Performance Evaluation ………………………………………………… 29. 4.4. Summary …………………………………………………...…………… 31 iv.

(6) Performance Modeling ……………………..…………. 33. Chapter 5.. Black-Box Model …….….…………………………...….……………… 33. 5.1. 5.1.1. Algorithm-Sensitive Parameters …….……………………………… 34. 5.1.2. Application-Sensitive Parameters …….…………….………………. 36. 5.1.3. Table-Sensitive Parameters …….…………………………………… 36. 5.1.3.1. Intra Alias …….………………………………………………… 36. 5.1.3.2. Inter Alias …….………………………………………………… 37. 5.3. Performance Estimation …….………………..………….……………… 38. 5.4. Summary …….…………………………………………..……………… 41. Chapter 6.. Performance Evaluation on MPEG-4 ……..……….... 42. Environment setup…….…………………………………..………..…… 42. 6.1 6.1.1. Source Model …….………………………………….……………… 43. 6.1.2. Channel Model …….………………………………….…………..… 45. 6.2. Performance Evaluation on MPEG-4/UDP-Lite/UEP/AWGN …….…… 49. Chapter 7.. Conclusions and Future Work ……………………….. 54. Bibliography …………………………….……………………………. 56 Appendix A. Symbol-Merging Algorithm …..……………………. 60. Appendix B. Table-Merging Algorithm …….……………………. 78. About the Author …………………………………………………….. 84. v.

(7) List of Figures. Figure 1.1. The on-going tree of error handling …..…….…………………………. 3. Figure 1.2. The categories of implementation and representation in JSC design .… 6. Figure 1.3. Symbol-constrained directed graph representation for VLC decoding… 7. Figure 1.4. Bit-constrained directed graph representation for VLC decoding ……. 8. Figure 2.1. High-level description of the decoding procedure with algorithm translation ..…………………………………………………………… 14. Figure 2.2. The algorithm trsnslation between symbol-constrained directed graph and the SISO algorithm ………………………………………………. 15. Figure 2.3. The original (a) and real case (b) of VLC table ……………………… 16. Figure 2.4. The algorithm modification due to the constraint change …................. 17. Figure 2.5. High-level description of the decoding procedure with algorithm modification ………………………………………………………..… 18. Figure 3.1. The graph representation in approximated decoding ………………… 20. Figure 3.2. The comparison between the AMAP-2 (a) and the proposed Adaptive AMAP-2 (b) …………………………………………………….…… 21. Figure 3.3. The comparison of performance (a) and memory access (b) vs. SNR 22. Figure 3.4. Complexity analysis in terms of each symbol state numbers ………… 24. Figure 3.5. The comparison with complexity issue in terms of state numbers (a) and total state numbers (b) using the VLC table of Figure 2.3 .……… 24. Figure 4.1. The tree-structured VLC (a) and scalable algorithm with hard and soft decoding (b)…………………………………………………….… 26. Figure 4.2. A simple VLC table with merge-0(a), merge-1(b) and merge-2(c)…… 27. Figure 4.3. The evaluation of execution time (a) and performance (b) with different symbol-merging table in Figure 4.2………………………… 30. Figure 4.4. The formulation of “Improved Ratio”………………………………… 32. Figure 5.1. The B-B model (a) and the evaluation of source table (b)………….… 34. Figure 5.2. The relationship between performance and each parameter ……….… 34 vi.

(8) Figure 5.3. The complexity (a) and performance (b) in different ‘z’ ..…………… 35. Figure 5.4. The performance with convergence and saturation point in AMAP-2 (a) and A-AMAP-2 (b)…………………………….………………..… 35. Figure 5.5. The optimization of performance in different ‘N*’…………………… 36. Figure 5.6. The symbol-alias of VLC table (a)(b) ..…………………………….… 37. Figure 5.7. The performance evaluation of ‘intra alias’ and ‘inter alias’ (b) in different ‘T’ ………………………..………………………….…… 38. Figure 5.8. VLC tables (a) and measurements (b) for the same source ..……….… 39. Figure 5.9. Coding performance with different VLC coding table .…………….… 39. Figure 5.10 The simulated parameters (a) and PSNR comparison (b) within 50 frames…………………………………………………………….…… 40 Figure 5.11 The comparison on the 1st frames of video sequence ………………… 41 Figure 6.1. The proposed overall simulation environment of soft VLC decoder.… 43. Figure 6.2. The data partition mode in the MPEG-2 (a) and MPEG-4 (b) .…….… 44. Figure 6.3. The high-level description of ESCAPE code handler on MPEG-2 and MPEG-4 .……………………………………………………………… 45. Figure 6.4. The content (a) and ratio (b) of one video packet in MPEG0-4 ……… 46. Figure 6.5. The soft input of VLC decoder …………………………………….… 47. Figure 6.6. The performance improvement (a)(b)with different quantization level 48. Figure 6.7. Overlooking bit errors in application layer…………………………… 48. Figure 6.8. Average PSNR of Y-component for proposed soft and TLU VLD…… 49. Figure 6.9. PSNR vs. AWGN channel performance.……………………………… 51. Figure 6.10 PSNR vs. Burst error performance …………………………………… 51 Figure 6.11 The comparison between the proposed soft VLD and the RVLD .…… 52 Figure 6.12 The comparison between the proposed soft VLC decoder and the standardized VLC decoder for table-merging algorithm .…………..… 53. vii.

(9) List of Tables. Table 4.1. The reduction of table size by symbol-merging scheme……………… 28. Table 4.2. The comparison with existing design ………………………………… 32. Table 6.1. The trade-off between error correction and channel bandwidth …....... 52. Table 6.2. The PSNR improvement within different video characteristics ……… 53. viii.

(10) Chapter 1 Introduction 1.1. Motivation. Variable Length Codes (VLCs), also called Huffman codes [1] are common used to approach the entropy rate of a given data source. They are extensively used in recent image and video coding standards including JPEG, MPEG-1/2/4 and the newly design of H.264 [2]. However, most of the VLC designs are highly sensitive to error disturbances. Table look-up decoding method may render extremely vulnerability and lose synchronization over a noisy channel. Although many conventional methods like automatic repeat request (ARQ) and forward error correction (FEC) reduce the effect of channel errors, these solutions have been found to be expensive in band-limited communications of delay sensitive video signals [3]. Particularly, ARQ-based designs are inadequate for the broadcast transmission due to the necessary of backward channel. Besides, they may induce significant delay that would potentially result in network congestion; While FEC designs may be bandwidth-inefficient when the channel conditions are fairly mild, and fine-tune to a particular error-rate when the channel condition differs. Therefore, it is strongly interest to look for an alternative design to reduce the error sensitivity of variable length encoded video source. In recent years, more and more researchers pay lots of attention about the source and channel design jointly. To improve the error resilience of VLC, joint source and channel (JSC) design has emerged to resist the channel disturbances on the environment of 1.

(11) band-limited system and broadcasting transmission. Several JSC designers concentrated on variable length encoded data since most of the video application exploited VLC-based compression method. However, the main problems of JSC design are the complicated computation and the greatly memory utilization in the decoding process of the sequence estimation. The reduced complexity or sub-optimal JSC designs [24][25] [26] are proposed to diminish the decoding complexity in the VLC-based source transmission. However, these designs are still inadequate for the large source table and the separate source tables. In this thesis, we focus on the implementation of JSC design. Low complexity and memory efficient design approach have been proposed to resolve the error propagation and outperformed the traditional designs on VLC decoding.. 1.2. Joint Source and Channel Design. In the past, the designs of source and channel coder have been performed separately. This often makes excellent senses and could be proved by the separation theorem of Shannon [4]. However, Shannon’s theorem effectively assumes that source coder removes all data redundancy, and the channel coder inserts additional redundancy to protect the source data due to the impairment of physical channel. This separation does not make as much practical senses. It has been shown that the separation theorem does not hold for all channel conditions [5]. When it does hold, it needs to exploit an optimal source and channel coder pair that may not be suitable for the practical system. To improve the error robustness on VLCs, all the solutions can be classified into three types (cf. Figure 1.1). They are error resilient, error concealment and error recovery respectively. Error resilience methods are performed in the encoder side, and the respective decoding procedures are defined by the video standard. To make the compressed video data more robust to channel errors, the MPEG-4 standard incorporated several error resilient tools, including data partition (DP), header extension code (HEC) and re-synchronization marker (RM) [6]. On the other hand, decoder provides the error concealment and recovery to improve the video quality. Particularly, the error concealment methods are proposed to conceal the errors, but seem to have its limitation [7]. They often assumed that video errors have been correctly located; otherwise error concealment cannot be properly applied. 2.

(12) Error recovery can be partitioned into three levels that are source level, channel level and joint source-channel level. In the consideration of source-level error recovery, reversible variable length codes (RVLCs) [8] are realized in the MPEG-4 and the newly design of H.264. Many source-level error recovery methods are suggested including RVLC, error resilient entropy coding (EREC) [9] and self-synchronization VLC (SSVLC) [10]. These methods use the syntax and codeword structure to reconstruct the source data and do not consider any channel behavior. The improvement of source-level error recovery is still insufficient. On the contrary, the improvement of channel-level error recovery is significant like the well-known scheme of Viterbi decoder or turbo decoder. However, the usage of channel-level error recovery is very expensive for the band-limited system. The trade-off between source and channel level is proposed that can be termed as JSC design on the soft VLC decoder. The idea of JSC design has been gaining increasing attention in recent years. This is because that the significant growth of multimedia wireless communication on the channels of noisy and band-limited. Besides, the channel conditions about broadcasting on DVB system [11] faced the channel behaviors without backward notification.. Error!! Encoder Error Resilient. Decoder. Error Concealment Source Level. Error Recovery. Joint Source and Channel Level Proposed Soft VLC Decoder. RVLC, SSVLC. Channel Level Viterbi, Turbo Code. Figure 1.1：The on-going tree of error handling. Based on the different derivation or formulation of intermediate metric in JSC design, it can be classified into three categories in [16] (e.g. [18] [21] [26]). We just omit the 3.

(13) complicated derivation of algorithmic metric. Instead, behaviors of these three categories are discussed here and compared with each other. Performance and complexity are the crucial cues for our final decision of implementation method.. Maximal Likelihood / Soft-Input Soft-Output Decoding Method One category of coder is Maximum Likelihood (ML) decoding method. The ML decoder is investigated in the joint area of source and channel design. Viterbi decoder using ML decoding algorithm is famous for many decades, and be considered as the decoding process of fixed length codes. Most applications exploit variable length codes to compress the source data, but lead to loss of error resilience. A modified version of the Viterbi algorithm [17] may now be used to perform maximum likelihood decoding of VLCs and improve the error robustness [18]. The main problem in applying the Viterbi algorithm directly is the fact that the state transition will result in a variable number of bits. Therefore, it is necessary to keep track of the position of each transition and lead to a great number of states to be survived. In [19], the authors introduced the Soft-Input Soft-Output (SISO) approach to improve the coding performance when the source data has been corrupted by additive white Gaussian noise (AWGN). The SISO VLC decoder involves no modification to the encoder side. It simply receives input as a packet of known length containing corrupted VLC data, and produces or estimates the codeword sequence that is most likely to the input of the VLC encoder. It behaves as a ML decoding process for VLCs, uses the Hamming distance of hard input and cumulative square errors of soft input as the derivation of intermediate metric. In addition, SISO decoding algorithm is similar with soft output Viterbi algorithm (SOVA) [20] that provides the soft output information as a confidential level or reliability in the back-end decoding process.. Maximum A Posteriori Decoding Method Maximum A Posteriori sequence estimation, termed MAP decoding for VLCs is investigated. In the last paragraph, we classify the derivation of metric as ML decoding. 4.

(14) Otherwise, we classify the newly derivation of metric as a MAP decoding. The Viterbi algorithm was re-derived with a priori or a posteriori information for MAP decoding [21]. More detailed formulation about intermediate metric is published in several literatures. It can be noted that the main difference between ML and MAP decoding algorithm is the intermediate metric derivation. In practice, the MAP decoding method outperforms the ML decoding method in terms of decoding performance, but offered a complicated computation of metric for more accurate sequence estimation. Many researchers focus on the complexity reduction in algorithmic level [23]-[25]. However, it is still insufficient for the consideration on the long input-sequence and large symbol-table. In the point of comparison between ML/SISO and MAP decoding process, we can see that SISO decoding with ML algorithm approximates closely to MAP decoding algorithm and provides the reliability output and less complexity [16] [22].. Sequential Decoding Method Sequential decoding predates the Viterbi decoding. It is discovered by Wozencraft in 1960. The decoding process traverses a tree to find out the possible paths that could be taken depend on the input data. The transition paths are followed or eliminated through the likelihood comparison, threshold or other criteria. Though average decoding complexity is reasonable, there is a great possibility of repeated computation and a wide variation on complexity that depending on error occurrences. For the practical communication system, the complexity is a big problem to fit any channel behavior. Besides, the performance of sequential decoding strongly relies on the instantaneous error events. To improve the coding complexity, fast sequential decoding algorithm using a stack is proposed [26], but the improvements still have its limit compared with ML or MAP decoding method. Considering the large number of codeword in MPEG-4, the coding complexity of JSC design will become a critical bottleneck. The performance and complexity of sequential decoding will depend on the channel condition, and unsuitable for the practical VLSI implementation. Further, MAP decoding algorithm provides more capability of error correction slightly than ML/SISO [22], but high complexity is its penalty about the 5.

(15) computation of intermediate metric. Consequently, we use ML/SISO decoding algorithm as our implementation of VLC decoder.. Joint Source and Channel Design. Implementation Method. Representation Method. ML/SISO Decoding. MAP Decoding. Symbol-Constrained Bit-Constrained Directed Graph Directed Graph. Sequential Decoding. Tree-Stack Structure. Figure 1.2：The categories of implementation and representation in JSC design. For implementation and representation method in JSC design, Figure 1.2 shows the relation between each other. Implementation method has been briefly discussed above, and representation method is composed of tree or trellis structure. Trellis representation can be used as a representation of fix length path label such as Viterbi decoder. The Viterbi decoder kept only one of the paths entering a state as the survivor path and the others are pruned. However, in the case of VLCs, different paths entering a state have consumed a different number of bits from the received sequence and can be extended differently. Therefore, the case of VLCs cannot use a traditional trellis representation anymore and needs more complicated graph representations to be solved. The first works in this area of graph decoding have been proposed by Demir & Sayood [13] and Park & Miller [23]. These new graph representations have been proposed and summarized in [28]. They are symbol-constrained and bit-constrained directed graph respectively. In this thesis, we focus on soft VLC decoding by performing ML/SISO algorithm on the symbol-based VLC trellis decoding [13]-[15].. Symbol-Constrained Directed Graph The representation of Symbol-Constrained Directed Graph that we call it as SCDG here is introduced in [13] [28]. The SCDG representation retains many survivors when there are paths with different number of symbols coming at the considered state for a 6.

(16) given bit position. Example of SCDG decoding representation is described in Figure 1.3 for the VLCs of dimension T = 3 and codeword sets {0, 10, 11}. There are three-axis that should be notified in Figure 1.3, they are symbol-step i, codeword-step j and bit-step k. Each symbol step represents the number of decoded symbols. In Figure 1.3, the total decoded number N is equal to 3. This information can be retrieved through the syntax or the coding behavior of the JSC decoding process. In addition, each codeword-step stands for the different code-symbol in the pre-defined VLC table. Meanwhile, each codeword-step contains the different bit-step depending on the symbol-step i. Each square is the bit-state, and the decoded bit-number is resided in the center. Each dotted square or rectangle keeps the same codeword j for a given symbol-step i. We can see that each transition path from one square to the other square exploits the transition probability. The pruning operation will be performed when there are two arrows pointing to the same bit-state. To obtain the final solution, decoder will stop constructing this graph in symbol-step 3 due to the known information N. Further, we choose the three bit-states with dotted circles as our candidates because they have the known constraint (i.e. 3-symbol, 6-bit). After the comparison of intermediate metric, we can choose the smallest one as our decision state, and trace-back to decode the left symbols. More detailed decoding process will be recalled in chapter 2. Decoded symbolnumber. N = 3-symbol, L = 4-bit. 1 {0}. 1. 2. 3. ………. i. symbol-step. ………. .. .. B .. ... ………. .. .. B .. ... ………. .. .. B .. ... 3. 2. 4. 3. 5. k bit-step {10}. {11}. 2. 2. 4. 3. Decoded bit-number. 5. 4. 6 4. 3. 5. 4. symbol-state. 6. S. bit-state. Trellis symbol step. j codeword-step Figure 1.3：Symbol-constrained directed graph representation for VLC decoding. 7.

(17) Bit-Constrained Directed Graph In addition to the SCDG representation, Bit-Constrained Directed Graph that we call it as BCDG here is introduced in [23] [28]. The BCDG representation retains many survivors when there are paths with different number of bits coming at the considered state for a given symbol position. Example of BCDG decoding representation is described in Figure 1.4 for the VLCs of dimension T = 3 and codeword sets {0, 10, 11}. As the discussion of SCDG, the decoding process of BCDG is similar to the SCDG except that the roles of bit and symbol are exchanged. Similarly, we can perform JSC decoding process of VLCs with BCDG representation. However, the transition path in BCDG is more complicated than SCDG. For the consideration of coding complexity, we need two-dimensional pointers to address where the arrows point to. This complexity becomes more prominent on the implementation of large VLC tables, such as the AC-coefficient table with 103 symbols in MPEG-4 [30]. Therefore, we choose the representation of SCDG as our implementation in this thesis. Although the SCDG representation may lose a little performance when the sub-optimal solution is imposed, it is of great worth when dealing with the large VLC tables. Decoded bitnumber. N = 3-symbol, L = 4-bit. 0 {0}. 1. 0. 1. 2. 3. 2. 4. 3. 4. 2. 3. ………. k bit-step. i {10}. {11}. Trellis bit step. 1. 2. 1. 2. 3. ………. .. .. S .. ... ………. .. .. S .. ... ………. .. .. S .. ... symbol-step. Decoded symbol-number. 2. 3. symbol-state. 2 bit-state. j codeword-step Figure 1.4：Bit-constrained directed graph representation for VLC decoding. 8. B.

(18) These representation methods perform well for both hard and soft input, but show its error correction capability for soft input in this thesis. In the ML/SISO decoding algorithm, the improvement can be achieved when compared with classical table look-up decoding method is significant, but the complexity is prohibitive. In this thesis, we will focus on the implementation of practical application, such as MPEG-4 and H.264. We use Soft-Input Soft-Output (SISO) decoding algorithm as our basis of metric derivation. Compared to the Maximum A Posteriori (MAP) decoding algorithm and sequential decoding algorithm, SISO algorithm performs the optimal trade-off between performance and complexity. Further, it utilized a simpler metric (i.e. absolute difference) to improve the error resilience on the decoding process of VLCs. From the graph representation point of view, we choose the SCDG as our graph representation of SISO decoding algorithm. Finally, we outline our contribution of this thesis in the next section which including the algorithm simplification and complexity reduction. Further, a memory efficient and performance modeling is proposed to achieve the low memory utilization and optimal performance.. 1.3. Contribution of this Thesis. From the previous statements, the JSC design algorithm chosen is the SCDG-based ML/SISO VLC decoding method. This new decoding technique for variable length codes considered here provides channel protection without the necessary of extra bandwidth. The proposed VLC decoder can be considered as an add-on module on the primitive structure. Therefore, it is compliant to the present video decoder. To improve the error resilience, the soft VLC decoders with joint source and channel design have been proposed [23]-[25]. Such algorithms generally need to maintain many states when the table size grows. Hence, soft VLC decoders have problems of high complexity. Reduced complexity algorithms with sub-optimal solution have been made [24]. However, the improvement in [24] is not significant with larger VLC table. In this thesis, we propose a scalable soft VLC decoder (Scalable Soft VLD) to reduce the complexity. Firstly, our approach includes algorithm translation and table size reduction. To simplify the algorithm, we translate the metric derivation in Soft-Input Soft-Output algorithm [19] into the symbol-constrained directed graph (SCDG) for the soft VLC 9.

(19) decoding. Through the help of graph representation, we develop a modified sorting scheme that can achieve the same decoding performance with fewer states. Further, it can obtain the less number of memory accesses for the low-power demand. To reduce the table size, we proposed a symbol-merging algorithm. We merge two symbols with the same prefix into one symbol. By the symbol-merging algorithm, we can greatly reduce the table size as well as complexity at the cost of minor performance loss. However, to deal with the different tables (intra and non-intra table) with different types of frame in MPEG standard, we propose a table merging method to integrate the different tables into one table. The proposed soft VLC decoder can employ this single merged-table and deal with the requirement of different VLC coding tables (i.e. intra or non-intra table) instead of duplicated configuration for the different VLC table. In summary, compared with [29][31], the proposed symbol-merging and table-merging algorithms achieve high capability of integration and flexibility. In [16], the authors used the minimal Hamming distance (dH) to quantify the relation between table and performance. But, it is still inaccurate when the different tables reach the same dH. We propose a novel measurement to improve the accuracy of performance estimation. Further, we reduce the penalty of over-design and observe the tendency of performance through the proposed Black-Box model. Thus, the proposed model reaches the optimal trade-off between performance and complexity. The proposed scalable soft VLC decoder using performance modeling is verified with not only a simple table but also a practical MPEG-4 table. From the analysis of simple VLC source data, our algorithm can averagely save 15% of memory access in comparison with the state-of-the-art algorithms. Further, we can obtain the optimal parameters for a given table and decoding algorithm through the Black-Box model. Finally, our scheme shows more than 1dB PSNR improvement as compared with the straightforward table look-up decoding in AWGN or bursty channel. In addition, the proposed scheme is also compared with different coding configuration such as the SSVLC [10] and RVLC [30]. Compared with the standard-support RVLC decoding method, our algorithm achieved more than 0.5dB improvement at the environment of SNR=10dB. Further, the VLC coding is more efficient than RVLC in terms of coding efficiency. There is not any side information to 10.

(20) be transmitted and the proposed decoder is bandwidth efficient.. 1.4. Thesis Organization. The rest of this thesis is organized as follows. Chapter 2 briefly introduces the SISO algorithm [19] and presents our proposed adaptive AMAP-2 for reducing the number of memory access in chapter 3. Chapter 4 shows our symbol-merging and table-merging method for complexity reduction. Chapter 5 describes the proposed Black-Box model for the optimal trade-off between performance and complexity. Chapter 6 presents the complexity and performance evaluation on MPEG-4. Finally, chapter 7 summarizes our work and discusses some topics for future research.. 11.

(21) Chapter 2 Soft Decoding of Variable Length Code. 2.1. Background. In the most image/video compression, VLCs decoding is considered as table look-up method and performed bit by bit. The input of entropy decoder assumed to be a sequence of “hard” bits that no soft information is available. However, soft information can be associated with each information bit in a noisy environment. It can be realized either on the channel observations in the case of un-coded transmission, or through soft-output channel decoders (e.g. SOVA or turbo coder) when channel coding is employed. Based on the soft input of VLC decoder, many publications [24][31] proved that the performance improvement is noticeable than the traditional VLC decoders. Compared with the FEC and ARQ method, soft VLC decoder is bandwidth-efficient and channel-robust in the noisy environment. We choose the SISO/ML algorithm as the core algorithm of soft VLC decoder because of the implementation cost and real-time consideration. To apply the SISO/ML algorithm into the practical system (e.g. MPEG-4, H.264), there are some modifications required. We address the translation between the conventional SISO algorithm and the modified SISO on the following. Further, we modify the traditional source VLC table by introducing some symbol-information. After that, we can facilitate the system integration on the soft VLC decoder.. 12.

(22) 2.2. Soft-Input Soft-Output Algorithm. SISO decoding technique [19] is considered as an exhaustive decoding procedure to resist the error disturbance in the noisy channel. It estimates and searches on the tree-like path in the existence of additive white Gaussian noise (AWGN). The input sequence is a packet-based transmission through packetization. We don’t exploit the soft output for the iterative decoding because of the consideration of the real-time video transmission. It uses L bits and equivalently N symbols to represent the priori information in one packet. Specifically, the SISO algorithm chooses the estimated sequence X as the one that maximizes the joint probability for the observed sequence Y. The estimated sequence that maximizes the joint probability Pr(X,Y) is indicated as X* = {x*(1), x*(2),…x*(N)}. The optimal codewords can be developed as Equation 2.1, where the probability P* is the sequence of codewords which maximize Pr(X,Y). More detailed derivation and description have been shown in [19]. Based on the similar estimation, we perform the algorithm translation to simplify the SISO algorithm when the table size or decoded symbol grows.. {. {. } }. x * (N ) = arg max P * ( N − 1, L − l i ) ⋅Pr y L −li +1 , y L −li + 2 , …, y L x(N ) = i ⋅ pi , i ,i∈[1, k ]. X (1, L ) = arg max {Pr{y1 , y 2 , …, y L x(1) = i}⋅ pi }. (2.1). i ,i∈[1, k ]. 2.2.1. Algorithm Translation. To help the understanding of our simplified algorithm, we utilize a symbol-constrained directed graph representation [13][24] for the symbol-based VLC trellis decoding [14][15]. Figure 2.1 depicts the high-level description of the decoding procedure. The overall algorithm translation can be partitioned into two main parts. The one is the state-trellis construction. Because the SISO algorithm is an exhaustive search, it will result in the exponential growth of complexity with the increase of sequence length or table size. This state-trellis construction require the adder, shifter and multiplexer to perform the similar function of ACS unit in the Viterbi decoder. In addition, the other one is the trace-back decoding procedure. Firstly, it searches the best candidates 13.

(23) conforming to the matching criterion. This criterion is feed-forward from the packet header and provides the priori or soft information to the back-end VLC decoding procedure. SoftVLD_Procedure ( ) { // Step1 : Initialization. for(j=0;j<LUT_size;j++) { for(L=0;L<VLC_CL;L++) // Step 1.1 : assign the intermediate metric of each state in the first symbol step. } // Step2 : Generating state trellis. for(i=0;i<N;i++) { while( search the minimal metric from the previous state) { Step2.1 : [Add] – add the previous metric to form the present state metric. Step2.2 : [Compare & Select] – compare with the other state metric to select the minimal one as the final candidate in present state. } } // Step3 : Trace back to decode symbols. while( search the final states(i.e. i==N-1)) { if(state pointer==input size) // Step3.1 : label the start point in the trace-back process. } for(i==N-1;i>=0;i--) { // Step3.2 : Look-up the previous states of present state. // Step3.3 : decode each codeword and look-up the symbol-information. } }. Figure 2.1：High-level description of the decoding procedure with algorithm translation. For the illustration of our algorithm translation, we use a simple example to address this translation. Firstly, assume we have a simple VLC table with only 3 symbols {0,10,11} and a packet that includes 3 bits (and equivalently 2 symbols) with content as ‘0 10’. After BPSK modulation, the modulated sequence is {-1,+1,-1}. When the packet is transmitted over the AWGN channel, the received packet may become {-0.8, -0.05, -0.2} (i.e. error occurred in the second bit). Figure 2.2 depicts the graph representation for this example. The intermediate metric D*(i,j) denotes the cumulative square error of i-th symbol and j-th bit in each symbol-state. S(m,n) is the symbol state decoded with m-symbol and have the index of 14.

(24) n among the identical value of m. The number inside each square is just the same as the ‘j’ of D*(i,j). The operation of ‘minimum’ is exercised in the states S(2,1), which is entered by more than 2 arrows for the same states. Furthermore, the minimal metric after the comparison is survived and the others are pruned. There is no need to calculate the state metric D* of S(2,3) and S(2,5), and return the null value (i.e. φ) because the decoded bit pointer exceeds the priori bit information (i.e. 4>3 bits). Therefore, we can decide the shaded squares as the final candidates. The S(2,2) is the minimum among them, survives and traces back to the S(1,0) to decode the bitstream as {0,10} for the correct decoding.. Priori-info.=3-bit, 2-symbol D*(2,2)=D*(1,1)+(-0.05-(1))^2=0.9425. D*(1,1)=(-0.8-(-1))^2=0.04. {0}. 2 3. 1 S(1,0). D*(2,3)=D*(1,1)+(-0.05-1)^2+(-0.2(-1))^2=1.7825. 3 4. 2 S(1,1). S(2,2) S(2,3) D*(2,4)=. φ. D*(2,3)=D*(1,1)+(-0.051)^2+(-0.2-1)^2=2.5825. D*(1,2) = (-0.8-1)^2 + (0.05-1)^2 = 4.3425. {11}. S(2,1) D*(2,3)=min{4.1425+(-0.2-(-1))^2, 4.3425+(-0.2-(-1))^2} = 4.7825. D*(1,2) = (-0.8-(-1))^2 + (0.05-(-1))^2 = 4.1425. {10}. S(2,0). 3 4. 2 S(1,2). S(2,4) S(2,5) D*(2,4)=. 1st symbol. φ. 2nd symbol. Figure 2.2：The algorithm translation between symbol-constrained directed graph and the SISO algorithm.. 2.2.2. Algorithm Modification. Source Table Modification To apply our algorithm to the MPEG-4 standard, we introduce the ‘sign’ and ‘LAST’ 15.

(25) field from the original Huffman table. The extra fields of ‘sign’ and ‘LAST’ are essential for the decoding procedure of SISO in MPEG-4. In Figure 2.3(a), we modify the simple VLC table as Figure 2.3(b). In our proposed approach, we exploit the number of ‘LAST’ in one packet to represent the modified priori information. The number of ‘LAST’ in one packet is defined by MPEG-4 standard and extracted from the packet header. To deal with the “s” parameter appended in each symbol, we use a simple hard decoding with table-look-up method. The induced ‘sign’ field in Figure 2.3(b) represents the number of “s” in each symbol. The ‘sign’ field is 1 when the “s” of each symbol is appended by 1-bit. More discussion about the ‘sign’ field is provided in the scalable soft VLC decoder of chapter 4.. Code length. Code word. Code length. Code word. sign. LAST. 1 3 4 4. 1 011 0100 0101. 2 4 5 5. 1s 011s 0100s 0101s. 1 1 1 1. 1 0 0 0. (a) Simple VLC table. (b) Simplified MPEG VLC table. Figure 2.3：The original (a) and real case (b) of VLC table.. Priori-Info. Modification After the modification of VLC coding tables, it is crucial to modify the priori information since the original information cannot be extracted within the coding procedure for the practical application such as MPEG-4 or H.264. However, it is feasible to obtain the information on the number of blocks contained in the texture partition when decoding headers and motion partition. This information can easily be exploited by counting the number of occurrences of LAST field being equal to 1. Thus, the knowledge of the number of blocks can be considered as an “a priori-information” that can be used as the number of symbols to select a likely path. Further, it’s available to the user without requiring any side information to be transmitted. This modification induces a little performance loss due to the additional candidates to 16.

(26) be selected in the Step3.1 of Figure 2.1. The difference with this modification can be described in Figure 2.4. The traditional soft VLC decoder [13][19] used the constraint of known symbol numbers as the algorithmic priori-information. However, this information should be transformed into the numbers of specified symbols. We can use the “EOB” symbol of MPEG-2 and “LAST” symbol of MPEG-4 (i.e. specified symbol) as the algorithmic constraint within the trace-back procedure. But, this modification induces the extra candidates from the start point to the end point with LAST number constraint in Figure 2.4. To achieve the standard-compliant and bandwidth-efficient design, this modification is essential and the induced performance loss is inevitable. without side info: LAST number. another side info: symbol constraint. Constrained Range. Constrained Range. C. VLC Table. llis n T r e u c t io s tr on. Trace-back Procedure. C o T re l n s lis tru c ti o. C. VS.. VLC Table. llis n T r e u c tio s tr on. Trace-back Procedure. C o Tre l n s lis tru cti o. n. ..... n. Start Point. End Point. Figure 2.4：The algorithm modification due to the constraint change.. In summary, based on the above algorithm modification, we show the modified high-level description in Figure 2.5. The modifications are labeled with shaded region. Firstly, we have to introduce the other term of “SIGN” to perform the metric calculation in Step1.2 and Step 2.1.1. This term is calculated by absolute difference and decoded with hard decoding scheme. Secondly, the constrained range (see Figure 2.4) has been extended and re-calculated in Step 3.1. Therefore, we can easily apply this SISO/ML soft VLC decoding algorithm into the practical VLC coding table such as the AC TCOEF tables in MPEG-2 or MPEG-4. More simulation and discussion will be 17.

(27) addressed on the following chapters.. SoftVLD_Procedure ( ) { // Step1 : Initialization. for(j=0;j<LUT_size;j++) { for(L=0;L<VLC_CL;L++) // Step 1.1 : assign the intermediate metric of each state in the first symbol step. // Step 1.2 : adding the extra sign bit into the formulation of metric. } // Step2 : Generating state trellis. for(i=0;i<N;i++) { while( search the minimal metric from the previous state) { // Step2.1 : [Add] – add the previous metric to form the present state metric. // Step 2.1.1 : adding the extra sign bit into the formulation of metric. // Step2.2 : [Compare & Select] – compare with the other state metric to select the minimal one as the final candidate in present state. } } // Step3 : Trace back to decode symbols. while( search the final states ( i.e. LAST start point <= I <= LAST end point ) ) { if(state pointer==input size) // Step3.1 : label the start point in the trace-back process. } for(i==N-1;i>=0;i--) { // Step3.2 : Look-up the previous states of present state. // Step3.3 : decode each codeword and look-up the symbol-information. } }. Figure 2.5：High-level description of the decoding procedure with algorithm modification.. 18.

(28) Chapter 3 Memory Efficient Design Approach. 3.1. Algorithm with Adaptive Selection. The SISO algorithm requires many states since practical MPEG-4 tables have many entries. It becomes inadequate for the VLSI implementation when the number of survival states grows. To reduce the number of states as well as memory access, we propose an adaptive AMAP-2 (A-AMAP-2) to reduce the memory accesses.. 3.1.1. Modified Sorting Scheme. In [24], the author introduced the approximated decoding method 2 (AMAP-2) to improve the coding performance with low complexity. However, their approach is not robust to the variation of channel condition. They induced more states to retrieve the metric in the error-occurred region and increased the penalty to error-free region. They tried to find the fixed ‘M’ state in the sense of smaller state metric D* and sorted among them in each symbol step. To against the variation of channel condition, we propose to adaptively select the states and reduce the number of survival states. Our adaptive scheme is more robust to the channel observance and provides the variable states in each symbol step to select the best states. To address our improvement and differences as compared with the AMAP-2 [24], we use the simple VLC table in Figure 2.3(b) as an example. The corresponding graph representation is developed in Figure 3.1 (a). To clearly show the metric variation in each state, we just omit the arrows and the indication of ‘LAST’. In Figure 3.1 (b), we show the sorting algorithm via the number of states in AMAP-2. By pruning the 19.

(29) square of the same bit-position in Figure 3.1(b), we obtain Figure 3.1(c) that can be used in comparison with our proposed A-AMAP-2.. 10.72. {1s} 2. 3.022. 3.858 {011s} 4. {0100s} 5. 2.009. 4 6 7. 5.458 9.12 2.453. 6 8 9. 9.387 8.96 8.871. 7 9 10. 7.253 6.204 9.671. 6 8 14.56 3.538 9 14.738 10 11 5.404 12 φ. 8 10.56 10 11.538 11 5.671 12 φ 13 φ 14 φ 9 11 12 13 14 15. 7.804 14.204. φ φ φ φ. 7.004. 9 10.204 11 5.92 7 3.876 12 φ {0101s} 5 9 5.404 13 φ 10 11.004 14 φ 1st 2nd 15 φ symbol symbol 3rd symbol. # of states. 11.004 10 9.671 10. 6 6 8 9 7 6.204 9 5.92 7 5.458 4 5.404 9 2.453 7 2nd symbol. 9.387 9.12 8.96 8.871 7.253. 3.876 3.858 3.022 2.009. 1st symbol. 5 4 2 5. 14.738 10 14.56 8 14.204 11 11.538 10. 6 8 10.204 11 7.804 9 7.004 9 5.671 11 5.404 11 3.538 9 3rd symbol 10.72 10.56. (b). symbol step. # of states 10 6 8 3.858 4 4 3.022 2 9 2.009 5 7 2nd 1st symbol symbol 9.671. 9.12 8.96 5.458 5.404 2.453. (a). 11.538 10. 6 8 11 9 3rd symbol. 10.72 10.56 5.404 3.538. symbol step. (c). Figure 3.1：The graph representation in approximated decoding 2 [24]. From Figure 3.2, we can see that the main difference of AMAP-2 and A-AMAP-2 is the sorting scheme in the Y-axis. Figure 3.2(a) shows that AMAP-2 requires at least 3 (i.e. MAMAP-2) states for correct decoding given the specified threshold. The correct states are labeled with the shaded region. In Figure 3.2(b), by employing the D* in the sorting algorithm instead of the number of states, the state metric range above the minimal metric for the correct decoding is 4 (i.e. MA-AMAP-2=6-2). As a result, we can find that there are 9-state and 8-state survived in AMAP-2 and A-AMAP-2 respectively for the correct decoding. Such improvement on the state number reduction increases when the errors occur infrequently. More simulation results are provided in Figure 3.3. This novel scheme adaptively selects the number of survived states in each symbol step, 20.

(30) and that’s why we call it as the Adaptive AMAP-2 (i.e. A-AMAP-2).. D*. threshold # of states 10 9.12 6 8.96 8 5.458 4 5.404 9 2.453 7 2nd symbol. 9.671. M AMAP-2=3 4 2 5 1st symbol. 3.858 3.022 2.009. 11.538 10. 6 8 5.404 11 3.538 9 3rd symbol. 10.72 10.56. 12 11 10 9 8 7 6 5 4 3 2. symbol step. (a) AMAP-2. 10. 5.404. 2 4 5 1st symbol. 6 10 8. 8 6. 9 4. 11. 3.858. 7 2nd symbol. M A-AMAP-2 = 6-2 = 4 5.404. 9 3rd symbol. symbol step. (b) A-AMAP-2. Figure 3.2：The comparison between the AMAP-2 (a) and the proposed Adaptive AMAP-2 (b). In Figure 3.2(b), the propose A-AMAP-2 survives more states when the errors are occurred frequently, such as 1st symbol step. Furthermore, fewer states are survived in the less error region, such as 3rd symbol step. It is more robust to the channel observance and provides the variable states to be survived in each symbol step.. 3.1.2. Performance Comparison. The proposed A-AMAP-2 adaptively selects the number of states in each symbol step and reduces the number of memory access. The variable best state selection is presented. Many states are survived when the error occurred, and fewer states are survived in the error-free region. In Figure 3.3, we assume that one survived state will cost one access of memory element. In addition, we choose a specified threshold for AMAP-2 and A-AMAP-2 individually. This specified threshold could be optimized and decided by the proposed B-B model in chapter 5. After that, we choose an optimal threshold as the simulated parameter, and compare the performance of our proposed A-AMAP-2 and the AMAP-2 versus channel condition. Obviously, given the same performance, our algorithm occupies less memory space and memory accesses in high SNR. Averagely, our algorithm saves 15% of memory access as compared with AMAP-2 [24].. 21.

(31) N=50, M(convergence point) 10. 200. Symbol-Error-Rate. 180. # of memory access. 0. (b). AMAP-2 A-AMAP-2. 160 140 120. 10. 100 80 60. Hard VLD AMAP-2 A-AMAP-2. -1. 40 20 0. 1. 2. 3. 4. SNR. 5. 6. 7. 10. 8. -2. 1. 2. 3. 4. SNR. 5. 6. 7. 8. (b). (a). Figure 3.3：The comparison of performance (a) and memory access (b) vs. SNR.. 3.2. Complexity Analysis. With the soft decoding of variable length code, comparators, multiplexers and storage elements are essential for the VLSI implementation. However, each storage elements (i.e. each state) also require corresponding modules inclusive of adder, multiplexer and shifter. Therefore, reducing number of storage element reduces not only the implementation cost but also the memory access times for the power-saving demand. To formulate the complexity issue on the soft VLC decoder, we introduce some parameters to analyze the overall complexity in terms of the numbers of states. Since the number of states will grow with the sequence length and the number of code-length-type in the VLC coding tables. We introduce the total numbers of codeword T = {CW0, CW1,…, CWT } and the number of code-length-type S={ CL0, CL1,…, CLS } in the pre-defined VLC coding table. Moreover, the symbol number N is the received symbol constraint. If this constraint N cannot be noted before the coding procedure, we can use the decoded number of specified symbols (i.e. # of LASTs) instead for the practical application. The “optimal” soft VLC decoding, which means no any states are reduced has the best performance at the price the high complexity and high memory access. The number of states in optimal soft VLC decoder is depicted in Equation 3.1. Optimal : States _ per _ Stage ( N ) = (CL S − CL0 ) × ( N − 1) × T , N > 1. 22. (3.1).

(32) AMAP − 1 : States _ per _ Stage ( N ) = (CL S − CL 0 ) × ( N − 1), N > 1 AMAP − 2 : States _ per _ Stage ( N ) = bmax × N. (3.2) (3.3). Due to the implementation of large VLC table, we will pay more attention on the complexity formulation of sub-optimal solution in soft VLC decoder. In [24], authors presented AMAP-1 to reduce the state numbers. The performance of AMAP-1 is almost the same with the optimal soft decoding method, since the pruning algorithm won’t affect the optimal sequence selection in the trace-back decoding procedure. However, from the Equation 3.2, the number of states in AMAP-1 is still too large to implement in large VLC table. Another sub-optimal solution in [24] is AMAP-2 that keeps the bmax best states at each trellis symbol step, and the formulation is described in Equation 3.3. The above equations are assumed that the code length of input source table are continuous and then approximated by the proposed formation in Equation (3.1) ~ (3.3). We show the example in Figure 3.4 to address the complexity of optimal decoding algorithm. After the analysis of state numbers in each symbol step, we address the total numbers of states required in the soft decoding procedure. In Figure 3.5, the state numbers of optimal decoding and AMAP-1 decoding method are dramatically increased with the received symbol number (or received sequence length) N. However, the sequence length decided by the system-level controller or the packet size for the realistic application. From the algorithmic point of view, parameters T and S affect the increased degree of algorithmic complexity. These parameters are decided by the pre-defined VLC coding table. Therefore, reducing the number of entries in coding table or the number of tables can greatly reduce the numbers of states as well as the overall complexity. AMAP-2 is a sub-optimal and low-complexity solution for the realization of soft VLC decoder. But the reduced-complexity is still not enough when the table size or sequence length grows. We will focus on the reduction of table size in the next chapter.. 23.

(33) T = {1s, 011s, 0100s, 0101s}. T = {0s, 10s, 11s}. Codwword. CodeLength. 1s. 2. 011s. 4. 0100s. 5. Symbol Step N. 1. 2. 3. 4. 0101s. 5. # of state in each symbol step. 4. 12. 24. 36. S = {2, 4, 5}. S = {2, 3}. Codwword. CodeLength. 0s. 2. 5. 10s. 3. Symbol Step N. 1. 2. 3. 4. 5. 48. 11s. 3. # of state in each symbol step. 3. 6. 9. 12. 15. Optimal => (5-2)x(N-1)x4, N>1. (a) Example 1. Optimal => (3-2)x(N-1)x3. (b) Example 2. Figure 3.4：Complexity analysis in terms of each symbol state numbers. # of States. Total # of States. 600. 15000. 500. Optim al AMAP-1 AMAP-2. (a). 400. (b). 10000. 300. 200. 5000. 100. 0. 0. 10. 20. 30. 40. 0. 50. 0. 20. 40. 60. Each Symbol Step. Each Symbol Step. Figure 3.5：The comparison with complexity issue in terms of state numbers (a) and total state numbers (b) using the VLC table of Figure 2.3.. 3.3. Summary. In this chapter, the memory-efficient algorithm and complexity analysis of adaptive soft VLC decoder has been presented. Based on the modified sorting scheme, the proposed Adaptive AMAP-2 becomes more channel-robust than traditional AMAP-2. Our proposed algorithm averagely saves 15% of memory access at the condition of identical coding performance. Further, we introduce some parameters to analyze the overall complexity in terms of state numbers. The advanced analysis and formulation of performance are described in chapter 5. 24.

(34) Chapter 4 Low-Complexity Design Approach. 4.1. Symbol-Merging Algorithm. The main problem of soft VLC decoding is the many states and the complicated metric computation when the sequence length or table size grows. To apply the SISO algorithm to the MPEG-4 system, it is essential to reduce the table size. Thus, we propose a scalable scheme with symbol merging algorithm. We utilize the redundancy exhibiting in different symbols to perform the merging algorithm. We consider a simple VLC table as a tree-structure in Figure 4.1(a). The proposed symbol-merging scheme searches the symbols with identical prefix and merges them into single merged-symbol. In Figure 4.1(b), the original SISO decoding algorithm is a special case that is when z is equal to 0 (i.e. Base T0). In other words, there is no hard decoding performed except ‘sign’ bit. Such case achieves the highest performance with the penalty of the largest complexity. However, the code-length of prefix symbol with soft decoding will decrease when the index ‘z’ increases. Meanwhile, the number of bits with hard decoding will increase. As a result, it can be considered as a hybrid scheme that combines the hard decoding and the soft decoding.. 25.

(35) root 0. 1 1. 0. Symbol. 1. 1. Merge-2. Symbol. prefix 011 1 s. prefix ' 1. 0. s. 0111. 1. prefix 011 0 s. s'. Soft decoding. prefix ' 0. 1. 0. 1. Merge -1. (a). s'. Soft decoding Hard decoding. 0. s. 0110. Hard decoding. Base T0 Scalable T1 Scalable T2 . . . Scalable Tz. (b). Figure 4.1：The tree-structured VLC (a) and scalable scheme with hard and soft decoding (b). The symbol-merging scheme can be operated only on a certain specified condition. Two codeword symbols can be merged only on the same symbol information including the identical “LAST” and “SIGN” field. In addition, these two codeword symbols have to own the equivalent prefix code and only different on the one-bit suffix code. The detailed high-level description has been shown in Appendix A. The AC TCOEF tables in MPEG-2 and MPEG-4 have been reduced to a reasonable size after the symbol-merging scheme. In addition, the merging conditions are also related to the symbol-information of “SIGN” and “LAST”, that’s why there are different merging result on MPEG-4 intra and non-intra table with all the same codeword (see Table 4.1). We use a simple example to illustrate the proposed scheme in Figure 4.2 where ‘Ti’ represents the number of symbols after the operation of Merge-i. As shown, after the operation of ‘Merge-1’, the table size is decreased by 2. Further, with the ‘Merge-2’ operation, the total number of symbols becomes 3. The introduced ‘sign’ field represents the number of “s” appended in the corresponding symbol. The ‘sign’ field will increase when both of symbols with the identical “SIGN” and “LAST” have been merged into one.. 26.

(36) Code Length. Code Length. T0 = 6. Code Word. Sign. LA ST. 2. 10 S. 1. 1. 3. 111 S. 1. 0. 4. 0100 S. 1. 0. 4. 0101 S. 1. 0. 4. 0110 S. 1. 0. 4. 0111 S. 1. 0. (a) (a). T1 = 4. Code Word. Sign. LA ST. 2. 10 S. 1. 1. 3. 111 S. 1. 0. 3. 010 S’. 2. 0. 3. 011 S’. 2. 0. Sign. LA ST. (b) Merge-1 (b) Code Length. T2 = 3. Code Word. 2. 10 S. 1. 1. 3. 111 S. 1. 0. 3. 01 S”. 3. 0. (c) Merge-2 (c). Figure 4.2：A VLC table with Merge-0 (a), Merge-1(b) and Merge-2(c) operation.. 4.1.1. Metric Formulation of “Balance Degree”. It can be noted that the more the merged-symbol have been developed, the great the merging-efficiency can be achieved. Therefore, to quantify the number of symbols after the symbol-merging scheme, we propose the metric of ‘Balance Degree’ (B.D.) in Equation 4.1. The metric of B.D. is between 0 and 1. In Equation 4.1, the denominator represents the maximal value as well as a special table with complete tree-structure. It leads to “z×0.5” after the z times of summation where the ratio of Ti+1 over Ti is fixed at 0.5. Therefore, the branch degree of Figure 4.2 is 58% in the condition of “z=2”. To prove that B.D. is a meaningful number to our merging scheme, we measure the B.D. using the AC TCOEF tables in MPEG-2 and MPEG-4. As shown in Table 4.1, we find that the higher of the B.D., the more reduction of the table size. The B.D. value of non-intra table is lager than that of intra one. It can be explained by the fact that there is more redundancy exploited in terms of symbol-structure. That is to say the non-intra table is more efficient than intra table after performing the symbol-merging scheme.. ⎛ Ti +1 ⎞ ⎟⎟ ⎜⎜1 − ∑ T i =0 ⎝ i ⎠ z −1. B.D.( z ) =. real reduction complete reduction. =. z −1. ∑ (1 − 0.5) i =0. 27. z −1. =. ⎛. ∑ ⎜⎜1 − i =0. Ti +1 ⎞ ⎟ Ti ⎟⎠. ⎝ z × 0 .5. (4.1).

(37) Table 4.1：The reduction of table size by symbol-merging algorithm.. Standard. MPEG-2. MPEG-4. INTRA TB-15. NON-INTRA TB-14. INTRA TB-15. NON-INTRA TB-14. T0. 113. 114. 103. 103. Scalable T1. 65. 60. 61. 56. Scalable T2. 45. 34. 48. 38. 73.2%. 90.7%. 62%. 77.8%. Table. B.D.(2). 4.2. Table-Merging Algorithm. It is essential for switching tables on the decoding process of soft VLC decoder, since there are intra and non-intra AC coefficient in the AC partition of whole bit-stream. Further, table-merging method is demanded on the fast switching capability of VLC decoder, such as the context-adaptive VLC in H.264. Consequently, to share the same soft VLC decoder on the different VLC table, we propose a novel soft VLC decoder with table merging algorithm to reduce the implementation cost and memory accesses. We propose codeword merging and prefix merging method to realize the Table -Merging scheme. These merging methods are a lossless merging and harmless to the performance of soft VLC decoder; while the symbol merging algorithm in section 4.1 is a lossy merging scheme, since the performance of decoder will degrade with the number of merging (see Figure 4.3). We show the more detailed high-level description in Appendix B, and elaborate the merging algorithm in the following literature.. 4.2.1. Code-Word Merging. Although most VLC coding tables are generated based on the Huffman procedure, one codeword still has high probability to exist in many coding tables. If this case is occurred, it is unnecessary to duplicate the codeword information in memories for every 28.

(38) table that uses this codeword. A codeword merging is applied to set this codeword as a merged codeword and reuse the codeword information when the coding tables are required. Therefore, the information redundancy among coding tables is exploited. The stored data are reduced from many identical codewords to one merged codeword.. 4.2.2. Prefix Merging. According to the Huffman property, one codeword cannot be the prefix of another codeword in a table but this rule does not hold among different tables. Frequently, a short codeword in one table will be the prefix of a long codeword in other tables. When these codewords are found, a prefix merging is performed by storing the long codeword as a merged VLC codeword and the lengths of the VLC codewords in each table. As a result, the information redundancy among tables is further exploited.. 4.2.3. Merged Table. A table merging process is accomplished by applying both codeword-merging and prefix-merging to the codewords of all AC TCOEF tables. The required table information, which is to recover VLC coding tables from merged table, has to be a superset of the stored data of two merging methods since it is hard to distinguish which method is used to generate a merged codeword. Hence, every VLC code-length of all tables has to be stored individually and will not be reused even though codeword merging is performed. To select the merged codewords of VLC table quickly, additional information, a valid-bit, is utilized to indicate whether a merged codeword belongs to the table. Thus, the table information of a coding table is the valid-bit and VLC code-length of every merged codeword (see Appendix B). The overall memory requirement is reduced because merged codewords are stored once and reused by all AC TCOEF tables.. 4.3. Performance Evaluation. We propose the symbol-merging method to reduce the complexity at the expense of 29.

(39) little performance loss. There are tradeoffs between the complexity reduction and performance loss. In Figure 4.3, the complexity in terms of execution time reduces greatly at the cost of little performance degradation. Figure 4.3(b) describes that the performance loss will dominate the overall system performance (i.e. symbol error rate) when ‘i’ is larger than 2 (i.e. Merge-i > Merge-2).. Scalable Soft VLD Scalable Soft VLD w ith Merge-1 Scalable Soft VLD w ith Merge-2. Execution Time (sec). 3. 10. Symbol-Error-Rate. 2.5. 10. 2. 1.5. 10. 1. 0. -1. -2. Hard VLD Scalable Soft VLD Scalable Soft VLD w ith Merge-1 Scalable Soft VLD w ith Merge-2. 0.5. 0. 1. 2. 3. 4. 5. 6. SNR. 7. 8. 9. 10. 10. (a). -3. 1. 2. 3. 4. 5. SNR. 6. 7. 8. 9. 10. (b). Figure 4.3：The evaluation of execution time (a) and performance (b) with different symbol-merging table in Figure 4.2. To improve the flexibility of soft VLC decoder with the different AC TCOEF coding tables (i.e. intra and non-intra), we perform Table-Merging scheme to reduce the implementation cost and computational complexity. In the table configuration of soft VLC decoder (see Table 4.2), [31] uses two soft VLC decoders with MAP decoding operating on intra and non-intra blocks respectively. It’s not intuitive for the hardware implementation and system integration. It may require additional information to partition the intra and non-intra blocks into different channels. The integration overhead and implementation cost made it unreliable for the cost-effect design approach. In [29], the authors implement a soft VLC decoder with sequential algorithm. It used single-like soft VLC decoder to reach the different VLC table requirement. However, the entries of AC TCOEF tables are extensive and induce unexpected memory access and computational complexity. To resolve the problems of complexity, we propose a novel merging scheme to reduce the table size and merge the different tables into one table. 30.

(40) Based on our proposed soft VLC decoder, a comparison with existing designs is given in Table 4.2. We implement the soft VLC decoder with SISO/ML algorithm. However, due to the different anchor configurations and source characteristics among them, we additionally list “Improved Ratio” (Equation 4.2) to declare the performance relation of upper bound (i.e. no error), soft VLC decoder, and anchors. More discussion about “improved ratio” can be addressed in Figure 4.4. In general, it can be noted that soft VLD has an improvements of x dB as compared with the anchor. However, the value of x is an absolute-local metric since this metric may vary with different source (e.g. bit rate) and channel (e.g. channel condition) environment. To achieve a fair comparison, we propose a measurement of “improved ratio” to equalize among them. We consider the performance not only the lower bound (i.e. anchor) but also the upper bound (i.e. no error) to obtain the ratio among them. Based on the induced “improvement ratio”, Table 4.2 depicts about 80% capability of error recovery in our proposed design can be achieved. Finally, we propose a low complexity soft VLC decoder to realize the large VLC table in the MPEG standard at the expense of minor performance loss. Improved Ratio =. 4.4. Perf ( Soft VLC Decoder) − Perf ( Anchor ) Perf ( No Error ) − Perf ( Anchor ). (4.2). Summary. In this chapter, the algorithm and system implementation of scalable soft VLC decoder with a novel symbol-merging and multi-table-merging approach have been presented. Based on the symbol-merging algorithm, we can greatly reduce the table size with the price of minor performance loss. Further, to improve the table configuration on the decoding process of switching table, we present a table-merging scheme to improve the efficiency of soft VLC decoder when operating on the multiple tables. For the practical applications, an efficient and low-complexity soft VLC decoder is fulfilled on the joint source and channel design.. 31.

(41) PSNR. No Error Upper Bound Soft VLD x dB. y dB. Anchor. SNR. Improved Ratio = x/y. Figure 4.4：The formulation of “Improved Ratio” Table 4.2：The comparison with existing design.. Soft VLC Decoder. Proposed. [29]. Implementation Method MPEG-4+SISO/ML. [31]. MPEG-4+Sequential. MPEG-4+ MAP. Reduced-Single. Single. Separated. RM 1. Enable. Enable. Enable. DP 2. Enable. N/A. N/A. EC 3. Disable. Enable. Disable. Source Characteristics. Foreman, QCIF, 64kbps, I-P-P, 300bits/packet. Foreman, CIF, 800kbps, I-P-P, 4000bits/packet. Foreman, 0.164bits/pel,QCIF, I-P-P. Testing Environment. AWGN+BPSK. AWGN+BPSK. AMC 4. Improvement. 1.2dB. 8dB. 6dB. Improved Ratio. 79.28%. 80%. 52.72%. Table Configuration Anchor. 1. Resynchronization Markers.. 2. 4. Additive-Markov-Channel model for slow fading wireless channel.. Data Partition.. 32. 3. Error Concealment.