應用於數位電視雙模背景適應性可變長度之編解碼

(1)

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

應用於數位電視雙模背景適應性可變長度之編解碼

Context Adaptive Variable Length Coding of Dual

Standards for Digital TV

研究生：楊俊彥

指導教授：李鎮宜博士

(2)

應用於數位電視雙模背景適應性可變長度之編解碼

研究生：楊俊彥指導教授：李鎮宜博士國立交通大學電子工程學系電子研究所

摘要

在此論文中，首先我們研究了背景適應性可變長度編解碼之特性根據霍夫曼編解碼。因此，根據霍夫曼編解碼的特性，我們提出了一個利用前綴零之個數表格分割法以及藉由算術方式之表格實現，應用於兩種標準的可變長度解碼器，此兩種標準分別為 MPEG-2 以及 H.264/AVC。此一被提出之設計減少了功率消耗以及硬體花費。再者，考量系統觀點及需求的生產率，我們使用了改良式的以群組為基礎之可變長度編解碼器演算法來實現所提出的可變長度編解碼器系統。除了以群組為基礎的演算法外，我們提出了層次有效率之編解碼、前置零有效率之編解碼、符號構成法以及省略前置零等使用於我們所提出的可變長度編解碼設計之方法。藉由平行化的輸入位元流，我們所提出的可變長度編解碼系統能夠執行即時的編碼及解碼依據所發表的方法。因此，此一被提出的設計能夠滿足訂於 H.264/AVC 主要輪廓中之生產率的需求。

(3)

Context Adaptive Variable Length Coding of Dual

Standards for Digital TV Applications.

Student : Jiun-Yan Yang Advisor : Dr. Chen-Yi Lee

College of Electric & Computer Engineering National Chiao Tung University

Abstract

In this dissertation, first we will research the features of CAVLC which is based on Huffman coding. Therefore, based on the features of Huffman coding, we present a VLC decoder for dual standards, MPEG-2 and H.264/AVC, with the PZTP and tables realization with arithmetic method. The proposed design reduces the power consumption and the hardware cost.

Again, from the system view and the requirement of the throughput, we use the improved group-based VLC codec algorithm to realize the proposed VLC codec system. In addition to group-based algorithm, we present other approaches which are level efficient coding, run_before efficient coding, symbols construction and run_before zero-skipping used in the proposed VLC codec design. With the parallel input bitstream, the proposed VLC codec system can execute the real time encoding and decoding based on the proposed approaches. Therefore, the presented design can satisfy the requirement of the throughput specified in H.264/AVC main profile.

(4)

Acknowledgments

This dissertation could not have been written without Prof. Chen-Yi Lee who not only served as my supervisor but also encouraged and challenged me throughout my academic program. He and the group leader, Mr. Tsu-Ming Liu patiently guided me through the dissertation process, never accepting less than my best efforts. Besides, Yi-Hong Huang and Kang-Cheng Hou gave me precious advices through this work, and I thank them all.

(5)

Chapter 1. Introduction ...

...

... 1111

...

1.1. Overview of H.264/AVC System ...1

1.2. CAVLC Algorithm ...4

1.2.1. Huffman Coding ...4

1.2.2. Context Adaptive Variable Length Coding...6

1.3. Designs of CAVLC Encoders and Decoders...8

1.3.1. Designs of CAVLC Encoders ...8

1.3.2. Designs of CAVLC Decoders ...9

1.4. Motivation...10

1.5. Organization of This Thesis ...12

Chapter 2. A Low Power VLC decoder design...

...

... 13

...

13

2.1. Overview of CAVLC Encoder and Decoder...13

2.1.1. Encoding Process Flow...13

2.1.2. Decoding Process Flow ...15

2.2. Overview of the Proposed Architecture ...16

2.3. Table Partition ...18

2.4. Table Realization with Arithmetic Method ...21

(6)

2.4.2. Level Decoding...22

2.5. Summary ...25

Chapter 3. A VLC Codec System for dual standards...

...

... 29

29

3.1. The Architecture of the Proposed VLC Codec System...30

3.2. Conventional Group-based VLC Codec System...35

3.2.1. Definition of Codeword Groups ...35

3.2.2. Intra-Group Decoding Procedure...36

3.2.3. Group-searching Scheme...38

3.3. The Proposed Group-Based VLC Encoding and Decoding...43

3.3.1. The Definition of Decoding Codeword Groups...43

3.3.2. The Definition of the Encoding Symbol Groups ...46

3.3.3. Intra-Group Decoding Procedure...48

3.3.4. Intra-Group Encoding Procedure...53

3.3.5. Decoding Group-Searching Scheme and overall group-based decoding processes...55

3.3.6. Encoding Group-Searching Scheme and overall group-based decoding processes...60

3.3.7. Group-Based VLC Coded System Architecture ...64

3.4. Summary ...67

(7)

System

...

... 69

69

4.1. Efficient Coding ...69

4.1.1. Level Efficient Coding...70

4.1.2. Run_before Efficient Coding...74

4.2. Zero skipping and proposed symbols constructor...76

4.3. Summary ...79

Chapter 5. Implementation Results and Conclusion ...

...

... 81

81

5.1. Implementation Results...81

5.2. Conclusion...85

5.3. Future Work...86

(8)

List of Figures

Figure 1-1 : The block diagram of H.264/AVC encoder...1

Figure 1-2 : The block diagram of H.264/AVC decoder...2

Figure 1-3 : An example of VLC code construction...5

Figure 1-4 : An example of equivalent Huffman trees for Figure 1-3 ...6

Figure 1-5 : An example of CAVLC code construction...7

Figure 2-1 : The encoding process flow of CAVLC ...13

Figure 2-2 : An example of CAVLC coefficients scanning ...14

Figure 2-3 : How to calculate the value of nC ...14

Figure 2-4 : The decoding process of CAVLC ...15

Figure 2-5 : Overview of the proposed low power architecture ...17

Figure 2-6 : An example of proposed table partition ...18

Figure 2-7 : The PZTP VLC decoder architecture of coeff_token ...19

Figure 2-8 : An example of NUM_FLC ...21

Figure 2-9 : The architecture of proposed NUM_FLC ...22

Figure 2-10 : Algorithm of level decoding ...23

Figure 2-11 : The proposed architecture of level decoding ...23

Figure 2-12 : The throughput of foreman.yuv with the proposed VLD ...25

Figure 2-13 : The throughput of mobile.yuv with the proposed VLD...25

Figure 3-1 : The architecture of our proposed system ...29

Figure 3-2 : The usage of memory of CABAD in our proposed H.264/AVC decoder 30 Figure 3-3 : Block diagram of the proposed VLC codec design ...31

Figure 3-4 : Example of VLC table and codeword groups ...36

Figure 3-5 : Example of intra-group symbol memory mapping and group information ...37

(9)

Figure 3-6 : Process of decoding a symbol ...40

Figure 3-7 : Process of encoding a symbol address ...40

Figure 3-8 : Block diagram of conventional group-based VLC decoder architecture.41 Figure 3-9 : An example of intra-group memory map and group information...49

Figure 3-10 : An example of the special case of suffix length...49

Figure 3-11 : An example of the complete group information ...50

Figure 3-12 : An example of intra-group codeword memory map and group information ...54

Figure 3-13 : CAVLC decoding processes and corresponding examples...57

Figure 3-14 : The memory usage for conventional symbol memory...57

Figure 3-15 : MPEG-2 decoding processes and corresponding examples ...59

Figure 3-16 : CAVLC encoding processes and corresponding examples...62

Figure 3-17 : MPEG-2 encoding processes and corresponding examples ...63

Figure 3-18 : Block diagram of the proposed VLC codec system for MPEG applications...64

Figure 3-19 : Architecture of codeword group address generator ...65

Figure 3-20 : Architecture of symbol address generator...66

Figure 3-21 : Architecture of symbol group address generator ...66

Figure 3-22 : Architecture of codeword address generator...67

Figure 3-23 : Formats of all kinds of memories ...67

Figure 3-24 : The memory usage of each memory ...68

Figure 4-1 : Algorithm of level encoding and decoding ...70

Figure 4-2 : Calculations of level encoding and decoding ...71

Figure 4-3 : Level decoding procedures and the corresponding examples...72

Figure 4-4 : Level encoding procedures and the corresponding examples...72

(10)

Figure 4-6 : The numerical calculations of run_before encoding and decoding ...75

Figure 4-7 : Architecture of run_before codec system...75

Figure 4-8 : An example of decoding procedures of CAVLC ...76

Figure 4-9 : The proposed symbols construction for example in Figure 4-8...77

Figure 4-10 : An example of the proposed symbols construction with zero-skipping 78 Figure 4-11 : The run_before table mapping to zero run_before...78

(11)

List of Tables

Table 1-1 : The basic profiles of H.264/AVC standard...3

Table 1-2 : Maximum throughput requirement of H.264/AVC main profile...9

Table 2-1 : The result of encoding the example in Figure 2-2 ...15

Table 2-2 : An example of CAVLC decoding from the result of Table 2-1 ...16

Table 2-3 : Hardware cost evaluation of proposed low power design ...26

Table 2-4 : The post layout power consumption under 0.18um CMOS Tech...27

Table 3-1 : Example of inter-group symbol memory mapping...38

Table 3-2 : Group information for Table 3-1 ...38

Table 3-3 : An example of CAVLC code and codeword grouping ...43

Table 3-4 : An example of Huffman code and codeword grouping in MPEG-2 table B15 ...45

Table 3-5 : An example of MPEG-2 encoding symbol groups ...46

Table 3-6 : The symbol groups for MPEG-2 VLC tables ...47

Table 3-7 : The symbol groups of CAVLC coeff_token ...48

Table 3-8 : The codeword groups of MPEG-2...51

Table 3-9 : The codeword groups of CAVLC coeff_token ...53

Table 3-10 : CAVLC PZGS table and intra-/inter-group symbol memory map ...56

Table 3-11 : CAVLC group information of the coding table shown in Table 3-10...56

Table 3-12 : An example of MPEG-2 symbol group information ...58

Table 3-13 : MPEG-2 PZGS table and intra-/inter-group symbol memory map...59

Table 3-14 : An example of MPEG-2 group information of the coding table ...59

Table 3-15 : MPEG-2 symbol groups and group information ...60

Table 3-16 : CAVLC coeff_token symbol groups and group information...61

(12)

Table 4-2 : Throughput improvement of each proposed method, foreman QP = 10 ...79 Table 4-3 : Throughput improvement of each proposed method, mobile QP = 28 ...79

(13)

Chapter 1.

Introduction

1.1.

1.1. Overview of H.264/AVC System

Figure 1-1 : The block diagram of H.264/AVC encoder

Figure 1-1 shows the block diagram of H.264/AVC encoder. When one frame is inputted, first the encoder will do prediction and choose intra or inter prediction according to the input frame type. After the prediction, the original input will subtract the predicted result to get residual data and the residual data will experience discrete-time cosine transform (DCT) and quantization to compress the data transmitted. Finally, entropy encoder will encode the DCT coefficients to bitstream and send the bitstream. Another step to produce F’n is to make the reference for motion estimation (ME), because in the H.264/AVC decoder this step is to generate the encoded frame. If we want to get the same result in the decoder, we have to use

(14)

the same reference both in the encoder and decoder. Therefore, we use F’n-1 as the reference for ME not Fn-1.

Figure 1-2 : The block diagram of H.264/AVC decoder

Figure 1-2 shows the block diagram of H.264/AVC decoder. As we can see, the architecture of H.264/AVC decoder is much simpler than encoder, because H.264/AVC encoder also has to do decoding process. In H.264/AVC decoder, the input bitstream first is decoded by entropy decoder and the outputs of the entropy decoder is DCT coefficients. Through de-quantization and inverse DCT (IDCT), we can fetch the residual data and finally we add the residual data and the result of MC or intra prediction to get one frame.

Table 1-1 shows the profiles of H.264/AVC standard. These three profiles are basic profiles of H.264/AVC standard. Applications of H.264/AVC cover digital storage media, television broadcasting, and real-time communications. For example, baseline profile targets applications of low bit rates such as multimedia communication and applies portable multimedia players because of its low computation complexity; main profile meets the demand of HDTV due to backup of interlaced content; extended profile contains error resilient tools for the IPTV or multimedia on demand (MOD). However, in those profiles small size of blocks and fixed quantization matrix can’t totally hold the image information in high frequency, so H.264/AVC adds Fidelity Range Extensions which contains high profile, high 10 profile, high 4:2:2 profile, and high 4:4:4 profile based on main profile for high

(15)

definition multimedia applications.

Profiles Coding

Tools Baseline Main Extended

I slice ○ ○ ○ P slice ○ ○ ○ CAVLC ○ ○ ○ Slice Group and Adaptive Slice Ordering ○ ○ Redundant Slice ○ ○ Weighted Prediction ○ ○ Interlace ○ CABAC ○ SI and SP slice ○ Data Partition ○ B slice ○ ○

Table 1-1 : The basic profiles of H.264/AVC standard

From Table 1-1, we can see there are two coding approaches for entropy coding, one is context adaptive variable length coding and the other is context adaptive binary arithmetical coding. Although CABAC has better compression rate than CAVLC, CABAC has extremely more complex structure witch limits the throughput of CABAC than CAVLC. Besides, CAVLC is suitable for all profiles in H.264/AVC system and it has more flexibility for different applications. Therefore, we will further discuss CAVLC in the following sections.

(16)

1.2.

1.2. CAVLC Algorithm

1.2.1.

1.2.1. Huffman Coding

Huffman coding uses a specific method for choosing the representations for each symbol, resulting in a prefix-free code (that is, no bit string of any symbol is a prefix of the bit string of any other symbol) that expresses the most common characters in the shortest way possible. It has been proven that Huffman coding is the most effective compression method of this type; i.e. no other mapping of source symbols to strings of bits will produce a smaller output when the actual symbol frequencies agree with those used to create the code. However, for a set of symbols whose cardinality is a power of two and a uniform probability distribution, Huffman coding is equivalent to simple binary block encoding. A Huffman code can be built in the following manner:

Rank all symbols in order of probability of occurrence.

Successively combine the two symbols of the lowest probability to form a new composite symbol; eventually we will build a binary tree where each node is the probability of all nodes beneath it.

Trace a path to each leaf, noticing the direction at each node and define the code for each tracing direction. For example, a ’0’ represents following the left child and a ‘1’ represents following the right child.

An example of building a Huffman tree using binary code is shown in Figure 1-3. We can see that there are 5 symbols, namely SA, SB, SC, SD, and SE. Occurring probability for each symbol is 0.5, 0.25, 0.125, 0.0625, and 0.0625. From the probability of the source symbols, the two smallest probabilities are grouped together and their sum is the substituted probability representing for the original smallest two.

(17)

If the branch traces up, it is given the binary code 0. Otherwise, it is given the binary code 1. According to the label (0 or 1) of each branch, we can obtain the variable length codeword of every symbol.

Symbol Probability Codeword SA SB SC SD SE 0.5 0.25 0.125 0.0625 0.0625 1 01 001 0001 0000 SD,SE SC,SD SE SB.SC SD,SE All Symbols SE 0.0625 SD 0.0625 SC 0.125 SB 0.25 SA 0.5 Up-tracing is defined as 0. Down-tracing is defined as 1. 1 0.5 0.25 0.125 Average bits = 0.5x1 + 0.25x2 + 0.125x3 + 0.0625x4 + 0.0625x4 = 1.875

Figure 1-3 : An example of VLC code construction

For a given frequency distribution, there are many possible Huffman codes, but the total compressed length will be the same. We can see Figure 1-4 for this situation. The example in Figure 1-3 can also be represented by several alternative binary trees. It is possible to define a ‘canonical’ Huffman tree, and that is, pick one of these alternative trees. Such a canonical tree can then be represented very compactly, by transmitting only the bit length of each code. This technique is used in most archives such as PKZIP, LHA, ZOO, ARJ, etc. Huffman coding is optimal when the probability of each input symbol is a power of two. Prefix-free codes tend to have slight inefficiency on small alphabets, where probabilities often fall between powers of two. Expanding the alphabet size by coalescing multiple symbols into “words” before Huffman coding can help a bit.

(18)

1 0 0 1 1 0 1 1 0 0 0 1 0 1 1 0

Figure 1-4 : An example of equivalent Huffman trees for Figure 1-3

Encoding symbols into bitstream is very simple. We just concatenate the codewords associated with the symbols. For example, if we want to encode SA.SB.SE.SD using the lookup table in Figure 1-3, we just pick the codewords of SA, SB, SD, and SE, which are 1, 01, 0001, and 0000; then concatenate them into 10100000001. If we want to decode 10100000001 back to symbols, we just gave to traverse the binary tree in Figure 1-3 bit by bit through branches to leaf nodes. If a node is encountered, then use the rest of bitstream to traverse from the root of the tree. Keep traversing until there’s no bit left in the bitstream. Traversing the tree, we can decode 10100000001 to be SA, SB, SE, and SD.

1.2.2.

1.2.2. Context Adaptive Variable Length Coding

Huffman coding is generally used in various multimedia standards such as MPEG series and JPEG series. CAVLC also adopts Huffman coding as a coding approach but it adds one skill on Huffman coding base. This skill is called “context adaptive” which can bring higher compression ratio than traditional VLC. In above section, the way to calculate the occurring probability of all symbols is under all cases. However, some symbols usually appear under some conditions and seldom appear under other conditions. Therefore, we will build different Huffman codes of one symbol by the

(19)

occurring probabilities under different conditions. A CAVLC can be built in the following steps:

Separate different conditions and get the occurring probabilities of all symbols under all conditions.

Rank all symbols in order of probability of occurrence in each condition.

Successively combine the two symbols of the lowest probability to form a new composite symbol; eventually we will build a binary tree where each node is the probability of all nodes beneath it.

Trace a path to each leaf, noticing the direction at each node and define the code for each tracing direction. For example, a ’0’ represents following the left child and a ‘1’ represents following the right child.

In addition to the first step, the other steps are the same as Huffman code. The purpose of CAVLC is to divide the occurring probability of one symbol in different condition and we can get better compression ratio than traditional VLC. It is sure that more particular description of probability can bring higher code efficiency.

condition 1 condition 2 1 SB,SC All Symbols 0 0 1 SB 0.4 SA 0.5 SC 0.1 0.5 1

symbol probability codeword

SA SB SC SD SE 0.5 0.4 0.1 0 0 1 01 00 N.A N.A All Symbols 1 SA 0.5 1 0 SB,SC SD,SE 0 SB 0.1 1 SB,SD 0.5 0.225 0 SC 0.15 1 SD,SE 0.275 ₁ SD 0.125 0 SE 0.125

symbol probability codeword

SA SB SC SD SE 0.5 0.1 0.15 0.125 0.125 1 000 010 001 011 Average bits = 0.5 x (1x0.5 + 2x0.4 + 2x0.1) + 0.5 x (1x0.5 + 3x0.1 + 3x0.15 + 3x0.125 + 3x0.125) = 1.75

(20)

Figure 1-5 shows an example of CAVLC code construction. The total occurring probabilities of all symbols are the same as the example of Figure 1-3, so the occurring probability of condition 1 and condition 2 is both 50%. Under these two conditions, all symbols have two occurring probabilities, so we will get two code tables to map each symbol. As we mentioned, the only distinction between CAVLC and traditional VLC is the step to divide the conditions, and in each condition there is still Huffman code process. Generally, the way to compare the performance of different coding approaches is to compare the average number of bits. The example of Figure 1-3 gets that the average number of bits is 1.875 and here the average number of bits is 1.75. Although CAVLC has more complex code construction and more VLC tables than traditional VLC, we will achieve the significant improvement of compression rate.

1.3.

1.3. Designs of CAVLC Encoders and Decoders

1.3.1.

1.3.1. Designs of CAVLC Encoders

CAVLC is a lossless coding so the design of CAVLC encoder can’t change the quality of one frame. Therefore, the target of CAVLC encoder design focuses on the performance such as throughput and hardware cost. Table 1-2 shows the maximum throughput requirement of H.264/AVC main profile. Level means the layer of each profile and the range of level in H.264/AVC main profile is 4 to 5.1. Level 4 is the basic demand of main profile and this level can support HD1080i when the frame rate is 30 frames per second. From Figure 1-1 we can observe that the encoding speed of entropy encoder affects the throughput of the entire system greatly. For this reason, the present papers about CAVLC encoder solve the problem of throughput.

(21)

Level 4 4.1 4.2/Lo 4.2/Hi 5 5.1

MB/sec 245760 245760 491520 522240 589824 983040

Table 1-2 : Maximum throughput requirement of H.264/AVC main profile

The major two parts of CAVLC encoder are coefficients scanning and symbols encoding. The direct approach to design a CAVLC encoder is to input a set of coefficients and to do the encoding steps serially. Repeating the mentioned steps can easily get the wanted results. However, the maximum number of input coefficients is 16 and encoding symbols has five steps and needs one cycle at least. If we do it serially, the throughput of this simple CAVLC encoder should not meet the requirement.

One way to solve this problem is to deal with scanning coefficients and encoding symbols parallel [1], because there is no dependency between encoding symbols of one block and scanning coefficients of the following block. Therefore, we can execute these two steps parallel and we can improve the encoding throughput.

Another way is to reduce the cycles of encoding symbols because each step of encoding symbols often has multiple cycles. If we send multiple inputs to one step and this step encodes these inputs in one cycle [2]. This method gets better performance than the above one.

1.3.2.

1.3.2. Designs of CAVLC Decoders

The discussion about CAVLC decoder is more than encoder because CAVLC decoder has to handle all the bitstream transmitted from H.264/AVC encoder. Great data variation must result much power consumption so power saving of CAVLC decoder is an important issue. Another major issue is the throughput of CAVLC decoder and Table 1-2 shows the throughput requirement of H.264/AVC main profile. Because the input bitstream of CAVLC decoder has dependency on the decoded

(22)

information, we need some efforts to accelerate the decoding speed of CAVLC decoder.

The major part of CAVLC decoder is also VLC tables and most papers realize those tables by finite state machine (FSM). Build the FSM according to the codeword in the VLC tables and looking up these tables will get the symbols decoded [3]. But directly using the codeword of VLC tables to build the FSM is not efficient in hardware cost and throughput. Furthermore, we have to improve the size of FSM. Separate the VLC tables according the length of the codeword and look up the dividing tables serially and we can build the FSM with the same entries as the VLC tables [4]. This approach achieves lower hardware cost and improves the throughput to support level 4.1. However, if we use some skills such as zero-skipping and multi-symbol, we can get better performance about the throughput [5]. Above papers do not discuss the problem of power consumption. If we make good table partition to control the table switch, we can save the power consumption significantly. In fact many papers proposed many approaches to realize VLC like RAM-based methods [6], [7], but present papers about CAVLC decoder only use ROM-based methods. In fact, we can try more approaches to design CAVLC encoder and decoder.

1.4.

1.4. Motivation

Recently, human life has been changed greatly by various multimedia applications such as cellular phones, digital cameras, DVD and digital television. But some new technologies like high-definition television (HDTV), blue-ray (BD), and high-definition DVD (HD DVD) appear and will be popular in the future. Therefore, a novel video compression standard, H.264/AVC, can be invoked for these uses because of its high compression rate.

(23)

design a decoder for mobile devices the most important thing is power reduction. The advent of H.264/AVC provides high compression ratio, but there is no backward compatibility to the prevalent MPEG-x and H.264x video coding standards. MPEG-2 and H.264/AVC processors have been reported at ISSCC. However, these solutions used separate modules and only processed a single type of video content in each module. To support different system requirements such as DVB-H or HD-DVD, a scalable pipeline is exploited to efficiently integrate both MPEG-2 and H.264/AVC in a single chip. Besides, we think we can do different table partition from that mentioned above and add other approaches to get more power reduction. Therefore, we propose a VLD with new table partition and realize some tables with arithmetic method.

Furthermore, when our entire system [8], [9] want to provide higher throughput for some applications such as HD 1080, we suffer the entropy decoder can’t meet the throughput requirements of H.264/AVC main profile. We have to generate a VLD which can support MPEG-2 and H.264/AVC with enough throughput, and if this VLD can be integrated with context adaptive binary arithmetic decoder (CABAD), that is all we need. We find that CABAD has to use much SRAM for context model and this is a direction to integrate these two entropy decoders. These three decoders, CAVLD, MPEG-2 VLD, and CABAD, in our system should not work at the same time, so we have to make the SRAM with programmability. This approach has been proposed [6], [7], but the approach to divide the groups is not efficient about memory usage and group mapping. From Figure 1-1 and Figure 1-2, we can see that the H.264/AVC encoder also has most parts of H.264/AVC decoder. If we add entropy decoder to the decoding part of the encoder, that is complete H.264/AVC decoder. Therefore, we propose a new group-based VLC codec system adding efficient-coding and zero-skipping to improve the throughput and memory usage.

(24)

1.5.

1.5. Organization of This Thesis

In this thesis, we propose a new low power, table partition VLD for dual standards, a new group-based, high throughput VLC codec system with full programmability for dual standards, and a new soft VLD to handle the error resilient problem. The organization of this thesis is as follows. The overview of CAVLC and the new low power, table partition VLD for dual standards is presented in Chapter 2. The algorithm and architectures of the proposed group-based, high throughput VLC codec system with full programmability for dual standards are described in Chapter 3. The proposed error resilient CAVLD is introduced in Chapter 4. Finally, conclusions and future works are made in Chapter 5.

(25)

Chapter 2.

A Low Power VLC

decoder design

2.1.

2.1. Overview of CAVLC Encoder and Decoder

2.1.1.

2.1.1. Encoding Process Flow

Figure 2-1 : The encoding process flow of CAVLC

Figure 2-8 shows the encoding process flow and the detailed steps are as follows.

When receiving a 2x2 or 4x4 block, the procedure of scanning coefficients will record the symbols to be encoded. There are six symbols which are TotalCoeff, TrailingOnes, trailing_ones_sign_flag, level, total_zeros, and run_before. TotalCoeff is the total number of non-zero coefficients; TrailingOnes is the number of trailing +/- 1 and its value should be smaller than four, level is the value of non-zero coefficient; total_zeros is the number of all zeros before the last non-zero coefficient in zigzag-scan order; run_before is the number of zeros before last one non-zero coefficient in zigzag-scan order. Figure 2-2 shows the results derived in coefficients-scanning procedure.

(26)

Figure 2-2 : An example of CAVLC coefficients scanning

Encode TotalCoeff and TrailingOnes (coeff_token). There are 5 choice of look-up table to use for encoding coeff_token. The choice of table depends on a variable named nC and Figure 2-3 shows how to calculate the value of nC.

Figure 2-3 : How to calculate the value of nC

Encode the sign of each trailing one in reverse order.

Encode level in reverse order and there are 7 VLC tables to choose from, Level_VLC0 to Level_VLC6.

Encode total_zeros.

Encode run_before.

Table 2-1 lists the result of encoding the example in Figure 2-2 and the transmitted bitstream for this block is 000010001110010111101101.

(27)

Element Value Code coeff_token TotalCoeff = 5, TrailingOnes = 3 0000100

T1 sign (4) + 0 T1 sign (3) - 1 T1 sign (2) - 1 Level (1) +1 1 Level (0) +3 0010 total_zeros 3 111

run_before (4) zerosLeft = 3; run_before = 1 10 run_before (3) zerosLeft = 2; run_before = 0 1 run_before (2) zerosLeft = 2; run_before = 0 1 run_before (1) zerosLeft = 2; run_before = 1 01

run_before (0) zerosLeft = 1; run_before = 1 No code required; last coefficient

Table 2-1 : The result of encoding the example in Figure 2-2

2.1.2.

2.1.2. Decoding Process Flow

Figure 2-4 : The decoding process of CAVLC

Figure 2-4 shows the decoding process flow of CAVLC and we can see that the decoding procedures are similar to the encoding steps. The only difference is decoding process does not do coefficients scanning and the other steps do decoding bitstream instead of encoding symbols. Table 2-2 shows an example of CAVLC decoding and the final output array is 0, 3, 0, 1, -1, -1, 0, 1.

(28)

Code Element Value Output array

0000100 coeff_token TotalCoeff = 5, TrailingOnes = 3 Empty

0 T1 sign + 1 1 T1 sign - -1,1 1 T1 sign - -1,-1,1 1 level +1 1,-1,-1,1 0010 level +3 3,1,-1,-1,1 111 total_zeros 3 3,1,-1,-1,1 10 run_before 1 3,1,-1,-1,0,1 1 run_before 0 3,1,-1,-1,0,1 1 run_before 0 3,1,-1,-1,0,1 01 run_before 1 3,0,1,-1,-1,0,1

Table 2-2 : An example of CAVLC decoding from the result of Table 2-1

2.2.

2.2. Overview of the Proposed Architecture

Figure 2-5 shows the functional diagram of the proposed architecture of the CAVLC decoder. As introduced in section 2.1.2, there are five major parts to decode the symbols. In order to support MPEG-2 VLC decoding, we construct the MPEG-2 VLC tables in coeff_token part, because the two decoding procedures have similar decoding manner. This part will be described in later section. The prefix-zero buffer and the bitstream buffer are used for the table partition and table realization with arithmetic method. The coeffNum is to calculate the right position in the coefficient buffer of the present level in level buffer. For power reduction issue, all function units are controlled by enable signals, because they must not work at the same time. There is also a hold signal for prefix-zero buffer to avoid counting the zeros not belong to

(29)

prefix zeros. If there is no enable signal or hold signal to control the function unit, it should result the power dissipation.

bitstream buffer [11:0] coeff_token & MPEG-2 VLC tabls trailing_one s_sign_flag level total_zeros run_before prefix-zero buffer

+

b it’ b it controller is _ c a v lc M P E G -2 is _ b 1 4 is _ b 1 5 m a x N u m C o e ff h o ld _ p re fix 0

level buffers run buffers

coefficients buffers

+

coeffNum 1 Enable coefficients

(30)

2.3.

2.3. Table Partition

In VLSI design, the efficient method to reduce dynamic power consumption is to decrease the data switching. However, most designs of the CAVLC decoder use FSM to look up the VLC tables. As long as the input bitstream to access the look-up table changes frequently, that must cause much power dissipation. Besides, the alteration in large look-up table must dissipate more power than the same one in small look-up table. Therefore, good table partition will reduce the size of look-up table and the data switching to decrease power consumption.

Figure 2-6 : An example of proposed table partition

Figure 2-6 shows an example of the proposed table partition. Although the original codeword table has only 10 entries, the longest length of the codeword is 6, so we have to build a look-up table with 32 entries for this codeword table by FSM method. That is, the longest length of the codeword dominates the entries of the codeword table not the real entries. However, if we adopt the proposed table partition method to build the look-up table, the entries of the first time to access the table are 4, and other entries are equal to the relative suffix entries. Because this approach divide the tables according to the prefix zeros, we call it prefix-zero table partition (PZTP).

(31)

When we access the look-up table with PZTP every cycle, the searching entries are much smaller than the original entries. If the longest length of the codeword is larger, the difference between the searching entries with PZTP and the original entries is greater.

The way to build the look-up table is as follows:

According to the leading zeros we call prefix, build the first layer of look-up table like prefix item in Figure 2-6.

Build the second layer of look-up table by suffix which is the codeword except the leading zeros and the first 1.

The steps to look up the VLC tables are as follows:

We count the leading zeros until the first 1 appears, and choose the relative suffix table by prefix.

We look up the suffix table by the input bitstream, and find symbols needed.

T o ta lC o e ff D e co d e r & T a b le B 1 4 o r B 1 5 S e le ct o r is _ ca vl c M P E G -2 P re fix ze ro s d e co d e r

(32)

Figure 2-7 shows the PZTP VLC decoder (VLD) architecture of coeff_token. There are five tables of CAVLD, NUM_VLC0, NUM_VLC1, NUM_VLC2, NUM_VLC3, and NUM_FLC and the other two tables, Table B14 and Table B15, belong to MPEG-2 VLD. The implementation of NUM_FLC will be introduced in the next section. First, if both the two enable signals, is_cavlc and MPEG-2, are not active, the entire PZTP VLD will be shut down to avoid the dynamic power dissipation due to the data switching. If either of them is active, the controller (TotalCoeff Decoder & Table B14 or B15) will open only one of those tables for power issue. Of course, the two signals should not be active at the same time.

Assume that we are executing H.264/AVC decoding. Even if the present decoding procedure is coeff_token, the enable signal, is_cavlc, will not be active at the beginning. To avoid unnecessary power consumption, we set the enable signal to be active, only when we receive the first one of the codeword or the boundary of prefix. Therefore, when receiving prefix, only accumulator consumes power. When executing MPEG-2 VLD, we do the same thing.

From Figure 2-5, we put the value of prefix in prefix-zero buffer. When we begin receiving suffix of codeword and looking up the suffix table, the value of prefix is fixed. Therefore, we can consider the output of prefix zeros decoder in Figure 2-7 as an enable signal of the relative suffix table in the process of looking up the suffix table. At this time, the searching entries of the entire codeword table are equal to the entries of the suffix table. The most entries of coeff_token are 8 and those of MPEG-2 VLD are 16.

PZTP takes advantage of the feature of Huffman coding to decrease the data switching when accessing the look-up table, and the hardware cost of the VLC tables. Besides, another advantage is easy to implement, so total_zeros and run_before also adopt this method to implement in the proposed CAVLD.

(33)

2.4.

2.4. Table Realization with Arithmetic Method

2.4.1.

2.4.1. NUM_FLC of coeff_token

The length of all the codeword in this look-up table is 6, and the total entries of this table are 62. If we build the table by FSM method, this idea seems good. However, if we analyze the relationship between the codeword and the symbols, we will find some arithmetic rules.

Figure 2-8 : An example of NUM_FLC

Figure 2-8 shows an example of NUM_FLC. The left table is the original table of NUM_FLC and we can derive the right table after we separate the codeword. We can find the following arithmetic relationship except the first row, and this formula exists in NUM_FLC distinctly. Although the first row of NUM_FLC doesn’t fit this rule, only prefix of the codeword map to the symbols is 4.

5:2 1 1:0 TotalCoeff codeword TrailingOnes codeword         = + =

Figure 2-9 shows the proposed architecture of NUM_FLC. Due to the power consideration, we only access this part when we receive the sixth bit of the codeword. Based on this method, we can easily change the look-up table into and reduce much hardware cost and power consumption.

(34)

MU

X

Figure 2-9 : The architecture of proposed NUM_FLC

2.4.2.

2.4.2. Level Decoding

Basically, level coding is constructed by seven VLC tables which are VLC0 to VLC6. However, if we implement the level decoder with VLC tables, it costs much hardware and power. The reason is the longest length of codeword is 28, prefix is 16 and suffix is 12. Even if we use PZTP to construction the VLC tables of level decoder, they are still huge VLC tables. For the low power demand, we have to use another method to realize the level decoder, and here we implement it by arithmetic approach which algorithm is specified in [10].

Figure 2-10 shows the algorithm of level decoding. In fact, suffixLength is to decide the VLC tables to choose from. According to this algorithm, if we pipeline the level decoding and suffixLength well, we can use the minimum number of function units to decode level. However, we can get good performance about the power and hardware cost.

(35)

level_prefix

levelCode = (level_prefix << suffixLength) if (suffixLength > 0 || level_prefix >= 14) {

level_suffix

levelCode += level_suffix }

if (level_prefix == 15 && suffixLength == 0) levelCode += 15

if (first_level && TrailingOnes < 3) levelCode += 2 if (levelCode % 2 == 0)

level = (levelCode + 2) >> 1 else

level = (-levelCode - 1) >> 1 level decoding

if (TotalCoeff > 10 && TrailingOnes < 3) suffixLength = 1 else suffixLength = 0 Decoding level if (suffixLength == 0) suffixLength = 1;

if (|level| > (3 << (suffixLength - 1)) && suffixLength < 6) suffixLength++

suffixLength

level_prefix = leading 0s

level_suffix = bitstream [levelSuffixSize-1 : 0]

if (level_prefix == 15) levelSuffixSize = 12

else if (level_prefix == 14 && suffixLength == 0) levelSuffixSize = 4

else

levelSuffixSize = suffixLength

Figure 2-10 : Algorithm of level decoding

M U X M U X M_U X

(36)

Figure 2-11 shows the proposed architecture of level decoding. There are two major parts, the left part is to calculate the suffixLength and the right part is to decode the codeword of level. The gray rectangles represent the registers. The size of level_prefix buffer is 10 bits, bitstream buffer uses 12 bits which is shared by all modules, and suffixLength needs 3 bits to save the value. The level_prefix is the number of leading zeros derived by the leading zeros counter shown in Figure 2-5 which is shared by four decoding modules, coeff_token, level, total_zeros, and run_before. The barrel shifter to rearrange the level_prefix works, only when we receive the first one of the codeword of level. Besides, it also handles the special case when level_prefix is 15 and suffixLength is 0. That helps us not to add additional 15 to levelCode, so it shortens the critical path of level decoding and reduces the hardware cost. The whole architecture of level decoding is also controlled by an enable signal which turns off level decoding when we execute another procedure. That inverter is to do the step, (-levelCode - 1), and according to 2’s complement -levelCode is equal to (~ levelCode + 1), so the formula, -levelCode – 1, is equal to (~ levelCode + 1 - 1), that is ~levelCode.

The part to calculate suffixLength is also needed even if we implement level decoding with look-up table. As we mentioned above, the method of table searching depends on suffixLength to choose the correct VLC table, so this part is not omissible in any approach of level decoding. Therefore, our contribution is to simplify the VLC tables with arithmetic method, and the effect is pretty good.

(37)

2.5.

2.5. Summary

Figure 2-12 : The throughput of foreman.yuv with the proposed VLD

(38)

Figure 2-12 and Figure 2-13 show the throughput of two pictures with the proposed VLD. The simulation environment is JM 9.2 which C code of H.264/AVC system. We set nine different values of QP to get the simulation results. In the two figures, the blue line is the throughput requirement of baseline@3.1 specified in H.264/AVC standard when the clock frequency is 100MHz and the black one is for baseline@3.2. In Figure 2-12, the throughput of foreman meets the requirement of baseline@3.2 when QP is 20 and that of I-frame in the same picture also meets that standard when QP is 28. In Figure 2-13, the throughput of mobile meets the demand when QP is 28. Therefore, the proposed design can support H.264/AVC baseline.

[3] [4] Proposed Design

Tech. 0.25 um 0.18 um 0.18 um

Gate-count 6100 4720 CAVLC : 3267

MPEG2 : 945

Target Spec. Baseline Profile Main Profile @4.1

Main Profile @4.2 &

MPEG-2

Buffer N.A. 696 bits RAM 3471 gate-count

Clock Constraint 125 MHz 125 MHz 125 MHz

Table 2-3 : Hardware cost evaluation of proposed low power design

Table 2-3 shows the comparison of the hardware cost. Although we show the throughput of two pictures in Figure 2-12 and Figure 2-13 when the clock frequency is 100MHz, the maximum speed of the proposed design is 180MHz under a 0.18um CMOS technology. The performance is fast enough for meeting the real-time processing requirement of CAVLC decoding on main profile @4.2. Compared to the design proposed by [4], The CAVLC part of the proposed design reduce 30% hardware cost, and the total design still has less hardware cost. The proposed design doesn’t use RAM as storage due to the power saving.

(39)

Spec. MPEG-2 I-frame H.264 I-frame H.264 P-frame

power (mW) 1.719 1.302 1.376

Table 2-4 : The post layout power consumption under 0.18um CMOS Tech.

Table 2-4 shows the post layout power consumption under 0.18um CMOS technology. The proposed design can provide extremely low power, and it is used in our dual-standard system [8], [9].

(40)

(41)

Chapter 3.

A VLC Codec System

for dual standards

< F u n ct io n >

Figure 3-1 : The architecture of our proposed system

Figure 3-1 shows the architecture of our proposed system for H.264/AVC main profile. The entropy decoder contains CABAD, UVLD, and CAVLD. UVLD and CAVLD are the same choice for entropy decoder, and UVLD is used to decode the syntax parser, and CAVLD is for residual data. Therefore, the output of UVLD is to control the decoding mode of H.264/AVC decoder, and the results of CAVLD are the DCT coefficients of residual data. After IDCT, the data will be added with the predicted data to complete a unit block.

In Figure 3-1, CABAD has to use slice memory to store the context model and row-storage. Figure 3-2 shows the usage of memory of CABAD in our proposed H.264/AVC decoder system. The context model of CABAD uses 349.1 bytes memory of the slice memory.

(42)

Figure 3-2 : The usage of memory of CABAD in our proposed H.264/AVC decoder

The context model of CABAD uses much memory, so that is an idea to integrate CABAD and CAVLD. The used memory can provide a space to store the VLC tables of CAVLD, and our proposed H.264/AVC decoder receive parallel input of bitstream, so we have to try another approach to implement CAVLD. Besides, as mentioned in my motivation, if we add the CAVLC encoder into the entropy decoder, that can be integrated with H.264/AVC encoder to a H.264/AVC codec system. Therefore, we try to find a method to implement a VLC codec system based on memory. and finally we proposed a new group-based VLC codec system reference to [6] and [7].

3.1.

3.1. The Architecture of the Proposed VLC

Codec System

Here, we will describe the architecture of the proposed VLC codec system. We will focus on the design of CAVLC encoder/decoder, and not to express the MPEG-2 VLC codec in detail. That is because the major difference of the proposed MPEG-2

(43)

VLC codec is the group-based algorithm and hardware implementation, and other parts basically are similar to the conventional VLC codec design. Therefore, about the MPEG-2 VLC codec system, we only discuss the proposed group-based alteration, and we will pay attention to the CALVC encoder/decoder.

codeword boundary detector coefficients scanner

Group-based VLC codec

system

Trailing_ones_sign _flag

Level codec with efficient coding Total_zeros codec with efficient coding Run_before codec with efficient coding

symbols constructor bitstream concatenater

Leading 0s suffix enable

suffixLength TC & T1s

Total_zeros

Run_before

T1s sign flags levels

Run_skip bitstream 8maxNum 4 nC 4 TC & T1s T1s sign flags levels Total_zeros Run_before coefficients 16 is_decoding is_encoding FIFO_full MPEG-2 levels and runs MPEG-2 sign

Figure 3-3 : Block diagram of the proposed VLC codec design

The block diagram of the proposed VLC codec design is shown in Figure 3-3. To fit specification of our proposed H.264/AVC decoder system, the input bitstream is parallel input and its length is 8 bits. The decoder is controlled by the enable signal, is_decoding, so is the encoder. The maxNum is to decide the block type which is being decoded or encoded, and nC is introduced in 2.1 to choose the correct VLC table for coeff_token. The serial input data, coefficients, is the DCT coefficient for the encoder in reverse order. The codeword boundary detector has a FIFO to store the input bitstream, and the output signal, FIFO_full, represents whether the bitstream FIFO is full or not. The symbols constructor will send out the results of DCT coefficients arranged and the bitstream concatenater handles the link of the encoded

(44)

codeword. The illumination of the components is as follows.

The major functions of the codeword boundary detector are counting the leading ones and zeros, and fetching the demanded suffix for the each decoding function unit by the recorded bitstream boundary. Besides, it is also a controller to decide the activity of each decoding component, and it has to calculate the number of skipped run_before and then send the information to symbols constructor. For MPEG-2 VLC, it has to detect the special case such as escape mode and end of block.

After coefficients scanner receive the serial input data, DCT coefficients, it calculates and sends the necessary data for each encoding component. When doing MPEG-2 VLC encoding, it only counts the levels and runs. After sending the MPEG-2 level and run, it can receive the following coefficients. The more information is needed for CAVLC encoding, and this unit should calculate TotalCoeff, TrailingOnes, T1s flags, levels, and run_befores. Different from MPEG-2 process, coefficients scanner has to receive all coefficients of one block, and then it can begin requesting the coefficients of the next encoding block.

Group-based VLC codec system uses the proposed group-based VLC codec algorithm to implement MPEG-2 and CAVLC coeff_token encoder/decoder. Besides, it contains the NUM_FLC of CALVC coeff_token and MPEG-2 escape case. The detailed design contribution will be described in the following section.

Trailing_ones_sign_flag encodes and decodes the signs of all trailing ones.

Level codec with efficient coding handles the information about levels. The detail of efficient coding will be expressed in the next chapter.

Total_zeros codec with efficient coding deals with the coding process of total_zeros.

Run_before codec with efficient coding encodes and decodes the run_befores to get the wanted results.

(45)

The symbols constructor is used for decoding process. It arranges the decoded levels by the decoded runs. In CAVLC decoding process, it works at the same time when decoding run_before to increase the decoding throughput.

The bitstream concatenater collects the encoded bit streams and links them. The first step it receives the codeword value and length to assemble the bitstream belonging to each encoding process. Then, it concatenates the separate bit streams to transmitted bitstream.

The decoding procedure of CAVLC decoder has to decode the bitstream step by step, because the bit streams have data dependency. If we don’t get some decoded information, we can’t do the next step. Therefore, the important thing to increase the decoding throughput is to reduce the decoding cycles for each component. The CAVLC decoding steps are as follows:

Counting the leading zeros until detecting the first one of the input bitstream, and then sends the leading zeros and suffix to group-based VLC codec system. If nC is the value of NUM_FLC, we only send suffix.

Decoding the coeff_token according to group-based VLC algorithm. The component outputs the suffix length to calculate the used bitstream boundary.

After decoding the coeff_token, we will get TrailingOnes that can help us decide suffix length transmitted to Trailing_ones_sign_flag. When decoding Trailing_ones_sign_flag, we also count the leading zeros belong to level decoding process.

At the same time to decode levels, we count the leading zeros of level decoding or total_zeros. When the number of decoded level is equal to TotalCoeff, we have to quit decoding level.

When decoding total_zeros, we count the leading zeros used for some run_before symbol and the leading ones for zero skipping.

(46)

When decoding run_before, we still count the leading zeros used for the next run_before symbol and the leading ones for zero skipping. Then, according the previous decoded run_before, we can begin arranging the DCT coefficients into the correct position in the decoded block. When the zerosLeft is equal to 0 or the last run_before is decoded, the run_before process has to end.

The encoding process of CAVLC encoder doesn’t have so many steps, although we can design the encoding process like the way of decoding procedure. However, we consider the throughput of the CAVLC encoder is worse, if we execute the encoding process with the serial steps. We observe that there is no data dependency between the encoded symbols for different encoding component, so we can do the encoding steps parallel. For example, even if coeff_token step doesn’t finish, we can still execute level encoding step, because the data for level encoding step doesn’t depend on the results of coeff_token encoding step. Therefore, when executing encoding process, all components of our proposed design will work together. The design idea is to increase the encoding throughput, because the throughput of the proposed CAVLC encoder design depends on the most cycles of encoding step instead of the sum of cycles cost by all encoding components.

In order to support the proposed encoder design, how to design a bitstream concatenater is important. The bitstream concatenater has to link the encoded codewords as fast as possible. We don’t hope we save the cycles of encoding process, but we take more efforts to concatenate the encoded codewords. Therefore, this design will be described in Chapter 4, and here we first introduce the proposed VLC group-based codec system.

(47)

3.2.

3.2. Conventional Group-based VLC Codec

System

This work is previously developed and verified by Bai-Jue Hsieh in [6], [7]. The intention of this section is to quickly give us a sense of what a conventional group-based VLC Codec system is and how it works.

3.2.1.

3.2.1. Definition of Codeword Groups

An example of Huffman code and codeword grouping is illustrated in Figure 3-4. Based on this result, the conventional codeword group is a set of codewords whose source symbols are combined to perform the Huffman procedure and receive the same codeword length. According to this definition, the codeword groups have the following properties:

In a group, the codeword can be treated as a binary number which is codeword length-bit long, called VLC_codenum, since the codeword length is the same.

The codeword that has the smallest VLC_codenum in a group is denoted VLC_mincode.

A VLC_codeoffset is the offset value between the VLC_mincode and the VLC_codenum.

(48)

Figure 3-4 : Example of VLC table and codeword groups

In Figure 3-4, the symbols C4, C5, and C6 belong to the codeword group G3. In this group, the codewords have the same codeword length, 4-bit, and the prefix 112. The word length of the suffixes is 2-bit. Therefore, the 4-bit VLC_codenums are13, 14, and 15; the VLC_mincode is 4’b1101; and the 2-bit VLC_codeoffsets are 0, 1, and 2. Source symbols that are not combined will belong to different groups, such as C7, C8, and C9 in G0, and C4, C5, and C6 in G3, although codeword lengths are identical. Moreover, there is only one symbol in group G1 since C1 is the only VLC having length of 2 bits.

3.2.2.

3.2.2. Intra-Group Decoding Procedure

Besides grouping codewords, mapping symbols onto memories and extraction codeword group information are necessary for VLC decoding. The memory address of a symbol in a group is calculated by the VLC_codeoffset of the symbol and the base address of the VLC_mincode in that group; i.e. the symbol address is the sum of the VLC_codeoffset and the base address of the group. After applying this arithmetic relationship, decoded symbol address can be found by numerical calculation rather than by pattern matching. Thus, the group information to be stored is composed of

(49)

codeword length, VLC_mincode, and base address. Based on the group information in Figure 3-5, intra-group decoding/encoding procedure is performed as follows.

Assuming we are decoding codeword 100112.

VLC_codeoffset = VLC_codenum(100112) – VLC_mincode(100002) =

000112 =3;

symbol_address = VLC_codeoffset(3) + base_address(50) = 53;

the decoded symbol C4 is retrieved from memory address 53; Assuming the encoded symbol address is 103.

VLC_codeoffset = symbol_address (103) – base_addresss (100) = 3;

VLC_codenum = VLC_codeoffset (3) + VLC_mincode (32) = 35;

The encoded 8-bit codeword is 001000112 = 35.

symbol prefix suffix VLC _codenum VLC _codeoffset Symbol address C7 C1 C2 C3 C4 C5 C6 10 10 000 0 1 2 3 4 5 6 16 50 53 54 56 51 52 55 10 10 10 10 10 001 010 011 100 101 110 17 18 19 20 21 22

Group Information : codeword length = 5 VLC_mincode = 100002

base address = 50

(50)

3.2.3.

3.2.3. Group-searching Scheme

An economical group-searching scheme with high operation rate and low complexity determines the performance of a group-based VLC decoder because the decoding procedure is performed after the group information is obtained. We use inter-group symbol memory mapping and Pseudo-Constant-Length-Code (PCLC) in order to achieve such a group-searching scheme. If all codeword lengths are the same , the numerical properties of codewords in a group can be applied to the whole coding table. We apply a procedure, namely PCLC procedure, to equalize codeword lengths by adding redundant binary digits, 00…0, behind VLC codewords. Therefore, PCLC codewords, which have the same length as the longest VLC codeword, can be treated as binary numbers, PCLC_codenums.

group symbol PCLC _codeword PCLC _codenum symbol address PCLC _codeoffset is PCLC _mincode G0 S00 00100100 36 0 0 o G0 S01 00100101 37 1 1 G0 S02 00100110 38 2 2 G0 S03 00100111 39 3 3 G1 S10 00110000 48 4 0 o G2 S20 01100000 5 0 o G2 S21 01110000 6 1 G1 S11 01111100 56 7 3 .. .. …….. …. .. …. .. .. .. …….. …. .. …. ..

Table 3-1 : Example of inter-group symbol memory mapping

Group Valid codelength PCLC_mincode base address

G0 1 8 00100100 0

G1 1 6 00110000 4

G2 1 4 01100000 5

(51)

It is easily to distinguish PCLC codewords and PCLC_codenums from each other because the VLC code is a prefix code. As a result, a PCLC table is established with PCLC_codenums placed in ascending order, i.e. codenum0 < codenum1 < … < codenumn. This results in ascending PCLC_mincodes as well, i.e. mincode0 < mincode1 < … < mincoden. Based on the PCLC table, the base addresses have to be assigned in PCLC_mincode order, i.e. base_addr0 < base_addr1 < … < base_addrn, for inter-group symbol memory mapping. An example of the PCLC table and its intra/inter-group symbol memory mapping is shown in Table 3-1, and the group information of this PCLC table is given in Table 3-2, where the valid bit indicates whether the group information is used. We can see in Table 3-1 that G2 is inserted in the middle of G1. This placement is specialized for decoding to save memory space of symbol memory.

According to PCLC tables and symbol memory maps, the conventional decoding group searching scheme is realized by applying numerical properties to bitstream and symbol addresses. Similar to PCLC codewords, a decoded bitstream that has the same length as the PCLC codewords is treated as a binary number, bitstream_num. Because the bitstream is a sequence of concatenated codewords, such as codewordi – codewordj – etc, a relation between the bitstream and the PCLC table can be expressed by PCLC_codenumi ≦ bitstream_num < numerical comparisons. The decoded codeword belongs to group Gx when the hit condition, PCLC_mincodex ≦ bitstream_num < PCLC_mincodex+1, is encountered. Let’s see the process of decoding one symbol from bitstream “001111010110…”

(52)

Figure 3-6 : Process of decoding a symbol

Figure 3-7 : Process of encoding a symbol address

According to the relation between PCLC tables and the symbol address, the conventional encoding group searching scheme is realized by applying numerical properties to codewords and symbol addresses. Based on the encoded symbol, the relative symbol address can be fetched. A relation between the symbol address and the PCLC table can be expressed by base_addri ≦ symbol address < numerical comparisons. The decoded codeword belongs to group Gy when the hit condition, base_addry ≦ bitstream_num < base_addry+1, is encountered. Let’s see the process

(53)

of encoding one symbol from the symbol address “19 (5’b10011)”

Figure 3-8 : Block diagram of conventional group-based VLC decoder architecture

The conventional VLC codec system is designed for MPEG applications with coding tables up to 256-entry 12-bit symbols and 16-bit codewords. This system performs concurrent encoding and decoding procedures by accessing the same group information and achieves table programmability by loading data into on-chip memories. To complete the VLC codec processes of MPEG videos, this design includes the operations of sign bits and escaped run-levels (escRL) following VLC codewords. By the efficient symbol conversion, the memory requirement is reduced to (25x8 + 28x8 + 28x12 + 32x29) bits for a CBS-LUT, a symbol address memory, a symbol memory, and 32-entry group-information. Block diagram of the conventional VLC codec system is shown in Figure 3-8. It mainly consists of the following components.

The group-based VLC encoder/decoder is composed of group detectors and combinational logic circuits to realize the VLC codec processes.

(54)

results, the Dec_bitstream selector transmits codewords bitstream to the VLC decoder. Besides, this selector detects sign bits and escRLs when VLC codewords are decoded.

The Enc_bitstream concatenater adds sign bits or escRL’s behind VLC codewords and concatenates encoded results into a single bitstream. Then, every 32 bits of the encoded bitstream in the concatenater is shifted into the Output FIFO.

The special code detector recognizes special codes, such as escape and EOB, by checking decoded symbol addresses instead of decoded symbols. Without waiting for symbol fetching, this detector can determine the length of the additional bits following a VLC codeword. Hence, the next codeword bitstream can be found by the Dec_bitstream selector immediately and the decoding throughput can be increased.

The Enc_en and Dec_en Ctrls determine the operations of the VLC_encoder and decoder according to the condition of input data and FIFOs.

Both symbol address and symbol memories are the on-chip memory modules for storing symbol information.

The symbol converter performs symbol conversion and detects escaped RLP’s and EOB symbols. On the other hand, the symbol recoverer finds correct runs and signed levels based on decoded results.

(55)

3.3.

3.3. The Proposed Group-Based VLC Encoding

and Decoding

3.3.1.

3.3.1. The Definition of Decoding Codeword Groups

group symbol prefix suffix suffix_num suffix

_offset attribute

G2 S3 001 N.A. N.A. 0 suffix_min

S4 0001 00 0 0 suffix_min S5 0001 01 1 1 G3 S6 0001 1 2 2 S7 00001 00 0 0 suffix_min S8 00001 01 1 1 G4 S9 00001 1 2 2 S10 000001 00 0 0 suffix_min S11 000001 01 1 1 S12 000001 10 2 2 G5 S13 000001 11 3 3

Table 3-3 : An example of CAVLC code and codeword grouping

An example of CAVLC code and codeword grouping is illustrated in Table 3-3. CAVLC code is also constructed based on Huffman code, and as we introduce it in Chapter 1, Huffman code is a prefix code, that is any codeword is not the prefix code of other codewords. For example, the symbol, S3, listed in Table 3-3 is relative to the codeword, 001, and in the entire VLC codeword table there is no codeword which starts as 001 except the codeword of S3. Based on the result, the proposed codeword group is a set of codewords whose source symbols are combined to receive the same number of leading zeros. Besides, the number of the group is equal to the relative the number of leading zeros. For example, when the number of leading zeros is 5, the relative group number is also 5. This is very useful to simplify the process of group searching. According to this definition, the codeword groups have the following

應用於數位電視雙模背景適應性可變長度之編解碼

國 立 交 通 大 學

電子工程學系 電子研究所碩士班

碩 士 論 文

應用於數位電視雙模背景適應性可變長度之編解碼

Context Adaptive Variable Length Coding of Dual

Standards for Digital TV

研究生：楊俊彥

指導教授：李鎮宜 博士

應用於數位電視雙模背景適應性可變長度之編解碼

應用於數位電視雙模背景適應性可變長度之編解碼

應用於數位電視雙模背景適應性可變長度之編解碼

應用於數位電視雙模背景適應性可變長度之編解碼

摘要

摘要

摘要

摘要

Context Adaptive Variable Length Coding of Dual

Standards for Digital TV Applications.

Acknowledgments

Contents

Contents

Contents

Contents

Chapter 1.

Introduction ...

...

...

...

...

... 1111

...

Chapter 2.

A Low Power VLC decoder design...

...

... 13

...

13

13

13

Chapter 3.

A VLC Codec System for dual standards...

...

...

... 29

29

29

29

System

...

...

...

...

...

...

...

...

...

... 69

69

69

69

Chapter 5.

Implementation Results and Conclusion ...

...

...

... 81

81

81

81

List of Figures

List of Tables

Chapter 1.

Introduction

1.1.

1.1.

1.1.

1.1.

Overview of H.264/AVC System

1.2.

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

指導教授：李鎮宜博士