• 沒有找到結果。

Chapter 1 Introduction

1.6 Organization of the Thesis

The chapter 2 will discuss the previous works about VLC decoder of different implementation and target applications first. Then the previous works of error robust on decoder side only or JSCD will be mentioned. Chapter 3 will show the CAVLC operation and the design of memory-based VLC decoder supporting multiple standards. In the memory-based VLC decoder, multi-table merging algorithm is used and the allocation of memory is considered. Next, we will show the hardware architecture and implementation result in chapter 4. Chapter 5 proposes an algorithm used to find the block boundaries in frames stops error propagation under the condition that channel information is known. Also, the simulation result will be in the chapter 6. Chapter 7 will show the conclusion about the whole design of multi-mode and memory-based VLC decoder with error robustness.

Chapter 2

Previous Work

2.1 CAVLC Decoding Process

There are five syntax elements in CAVLC:Coefficient_Token (Coeff_Token), TrailingOnes_Sign (T1s_Sign), Level_Prefix, Level_Suffix, Total_Zeros and Run_Before. They are decoded in order defined by the following rules and the block data composed of these syntax elements is shown in Fig. 4

1. The first decoded syntax element is Coeff_Token, which includes to symbol:

Total_Coeff and TrailingOnes. Total_Coeff represents number of non-zero coefficients in this block and TrailingOnes represents number of coefficient with magnitude one and it is 3 at most. The sub-tables are select by nC parameter from system. nC is positive for luma and -1 for chroma.

2. TrailingOnes_Sign is decoded by getting TrailingOnes bits from bitstream.

3. Level_Prefix is decoded by leading one detector and is equal to number of zeros before the leading one.

4. Then, a parameter called SuffixLength is initially set to 0 or 1 if Total_Coeff is greater than 10 and TrailingOnes is less than 3. LevelSuffixSize is set to SuffixLength with two except case: 1. Level_Prefix is equal to 14 and SuffixLength is equal to 0. 2. Level_Prefix is equal to 15. LevelSuffixSize is set to 4 in case1 and 12 in case2. Next, Level_Suffix is decoded by getting LevelSuffixSize bits from bitstream and is set as 0 if LevelSuffixSize is 0.

5. Select Total_Zeros sub-tables according to Total_Coeff. If Total_Zeros is 0, the decoding process is finished.

6. The Zeros_Left is set as Total_Zeros. Run_Before is subtracted from Zeros_Left and the result is assigned to Zeros_Left until Zeros_Left is 0.

Fig. 4 Sequential syntax elements decoding in CAVLC

2.2VLC Decoder

There are several ways to implement VLC decoder such as memory-based technique, hardwired implementation. [1] proposed group-based algorithm to classify VLC codewords into different groups such that memory just stored group information.

In [1], the symbol addresses are calculated by input bit-stream and group information.

Last, the symbol memory stored all symbols are accessed to output decoded symbols.

The codec can support a coding table with 256-entry 12-bit symbols and 16-bit codewords. Furthermore, [2] proposed the multi-table-merging algorithm to reduce memory space and codec can support JPEG, MPEG-2 and MPEG-4 coding tables. [3]

used cache and table partitioning on the group-based VLC decoder to achieve power reduction for MPEG-2. The decoding method in [4] decodes some short codewords by arithmetic operation and the others are mapped into memory to reduce memory access. But [4] was just for Coeff_Token tables and its sequential searching in the memory would lead to low throughput. The scheme proposed by [5] and [6] is that decode short codewords with arithmetic operation while other codewords are decoded by conventional decoding to saving memory access.

Fig. 5 Examples of short codewords, they can be decoded by arithmetic decoding from the equation in [6].

For the hardware implementation proposed by [7], it was ROM-based and used HLLT (hierarchical logic for look-up table, Fig. 6) to improve speed and PCCF (partial combinational component freezing) to reduce power consumption.

Fig. 6 The implementation of HLLT partitions the original big LUT into many small LUTs in [7].

Design of VLC decoder in [8] was for MPEG-1/2/4 decoding and LUTs are implemented by hardwire. The codewords are separated into groups in several look-up tables and one address generator is used to calculate symbol address.

In [9], the multi-symbol for level decoding in CAVLC is proposed to reduce operation frequency while maintain enough throughput for real-time requirement. [10]

proposed a modified SuffixLength detector to reduce critical path in level decoding .

2.3 Error Robustness on Wireless Video Transmission

Until now, there has been much research on improvement of error robustness to reduce the effect of error propagation in video decoder, compensation of erroneous data and correction error. They can be mainly separated into two sections: source decoder side only and joint source-channel design.

The error robustness mechanism at the source side only includes error detection, error concealment and error resynchronization. Error detection is to find the location of error data or bits in the blocks. The simplest error detection is syntax-based error

detection, that is, use some rules that violated regular decoding process. For example, a codeword is not found or the value of a variable overflows. [11] made some rules of syntax-based error detection and analyzed the performance of detection. However, the detection has delay between the correctly detected block and exactly erroneous block as shown in Fig. 7

Fig. 7 Organization of macroblock level concealment delay and detection delay in [11]

[12] detected error blocks basically by computing the boundary difference and used threshold as the determination rule. In Fig. 8, L means “Left” and can be replaced by T(TOP), R(Right) or B (Bottom) and K2 is the number of available

(a)

(b)

Fig. 8 (a) Pixels for average inter-sample difference across boundary (AIDB) calculation and equation in [12] with N=16. (b) Final equation of the AIDB

Fig. 9 Pixels of Average difference across frames (ADF) calculation and N = 16

After error detection, error concealment can be activated to compensate the corrupted blocks. Error resynchronization can be achieved by inserting markers in the bitstream to know the boundary of the next decoding unit. In H.264/AVC, one frame can separated into slices and the decoding of one slice would not reference data in other slices. Therefore, if one slice data is corrupted, the error can be restricted in that slice thus the resynchronization is achieved. The other method is inserting refreshment frames, slices or macroblocks so that temporal error propagation can be stopped. [13], [14] and [15] are joint-source channel design for MPEG-4 video format.

[13] and [14] simulated the performance of using Maximum A Posteriori (MAP) decoder under additive Markov channels (AMC) and the simulation environment are shown in Fig. 10. [15] combined the source state space with the channel state space to one finite state machine (Fig. 11) and the corresponding trellis decoding can be defined. [16] was a JCSD for H.264 motion vectors data to improve video quality.

(a)

(b)

Fig. 10 Experimental Set-up for evaluating the performance of the MAP decoder in (a) [13] , (b) [14]

Fig. 11 Combing source and channel state space. (a) source state space. (b) channel state space. (c) integrated state space.

Chapter 3

Algorithm of Memory-based VLC Decoding

3.1 Conventional Group-based VLC Algorithm and Decoding

Flow

This section was previously developed and verified by Bai-Jue Hsieh in [2]. The intention of this section is to quickly talk about the concept of conventional group-based VLC decoder system and how it works.

3.1.1 Definition of Codeword Groups

For a coding table, we separated codewords into groups. Codewords in a group has the following properties:

1. In a group, the codeword can be treated as a binary number which is codeword length-bit long, called VLC_codenum, since the codeword length is the same.

2. The codeword that has the smallest VLC_codenum in a group is denoted VLC_mincode.

3. A VLC_codeoffset is the offset value between the VLC_mincode and the VLC_codenum.

For the example shown in Fig. 12, the VLC table has 8 codewords and the codewords with the same length and prefix are classified as the same group. The codewords in G0 have 4 bits with 2-bit prefix and 2-bit suffix. Therefore, the VLC_codenum are the 0,1,2,3 thus the VLC_codeoffset of Sym5, Sym6, Sym7 are 1,

Prefix Suffix VLC_code

Fig. 12 Grouping of codewords in the table

3.1.2 Intra-Group Decoding Procedure

In the same group, the codewords have arithmetic relationship from the VLC_codenum, VLC_codeoffset and VLC_mincode. Thus, only the VLC_mincode information of every group is stored in memory and we can find the information about other codewords by means of computation of the offset. In other words, if the symbols of the same group are allocated in the continuous location in the symbol memory and the decoded symbol address can be known by adding offset amount to a base address.

Fig. 13 shows the information within one group. For example, if the 0000011 is received, the offset equals to 3 and thus the symbols address is 3 + 60 = 63 to that S3 is decoded.

Symbol Prefix Suffix VLC_codenum VLC_codeoffset Address

S1 000 0 0 60

Fig. 13 One group with address assignment

3.1.3 Group Searching Scheme

To search the group that the correct symbol locates in, Pseudo Constant Length Codeword (PCLC) is used. In the table, all codewords are extended to the length of the longest codeword by appending 0’s behind the codewords. All PCLCs have the same length and can be view as binary numbers, PCLC_codenum. All PCLCs are organized in ascending order so that PCLC_codenum0 < PCLC_codenum1 <

PCLC_codenum2…PCLC_codenumn and thus PCLC_mincode0 < PCLC_mincode1

< PCLC_mincode2….PCLC_mincoden. Next, the base addresses are assigned to PCLC_mincode and base_addr0 < base_addr1 < base_addr2…..base_addrn. The example of the intra-/inter- group symbol memory mapping is shown in Fig. 14 and the group information of the tables is shown in Fig. 15, where the valid bit means whether the table contains this group or not.

Fig. 15 Group information of the table in Fig. 14

Like the PCLC_codenum, a segment of bitstream with the same length of PCLC can be treated as a binary number, bitstream_num. The group searching scheme can be achieved by computed the (bitstream_num – PCLC_mincodei). The hit condition of the decoded symbol located the group Gn is PCLC_mincoden < bitstream_num <

PCLC_mincoden+1.

The overall decoding process of the group-based algorithm is as follows: Assume the bitstream input is 001111100110……

1. Do group searching

ÎPCLC_mincode1(8’b00110000)<bitstream_num < PCLC_mincode2(8’b01000000) ÎThe matching group: G0

2. Send group information

Î code length = 6-bit, PCLC_mincode = 8’b00110000, base_addr(5-bit) = 5’b00100.

3. Find the valid VLC_codeoffset, which is the code length most significant bits of the result of subtracting the PCLC_mincode from the bitstream_num

ÎBitstream_num(8’b00111110) – PCLC_mincode(8’b00110000) = 8’b00001110.

ÎThe valid VLC_codeoffset = 6’b000011= 3.

4. Extract the VLC_codeoffset operand, which has the same word length as the symbol address

ÎVLC_codeoffset = 5’b00011 = 3.

5. Calculate the decoded symbol address

Îsymbol_addr = base_addr(5’b00100) + VLC_codeoffset(5’b00011) = 5’b00111= 7.

6. Fetch the decoded symbol Î sym_memory[7] = S11.

3.2 Conventional Multi-Table Merging Algorithm and Decoding

Flow

This section was previously developed and verified by Bai-Jue Hsieh in [2]. The intention of this section is to quickly talk about the concept of conventional multi-table merged VLC decoder system and how it works

3.2.1 Collection of Group Information of All Coding Tables

According to group-based decoding algorithm, group information of all tables can be known and the PCLCs of groups of a table can be viewed as a codeword in that table. Therefore, all PCLCs are collected in the ascending order as Fig. 16 shows. In this figure, all group information items are ordered according to the magnitude of PCLC_mincode and there are 13 items.

Fig. 16 Part of group information of several tables.

the groups with the same PCLC into the same group and only one PCLC is stored.

This reduces storage space. We can see that there are 13 items and after codeword merging, the number of items reduces to 4.

Fig. 17 One portion of the groups after codeword merging process.

3.2.3 Prefix Merging

The prefix merging check any two neighbor groups after codeword merging.

When the longest VLC_mincode in a group is the prefix the PCLC_mincode in the adjacent codeword group, they can be merged together to one group. In the case of Fig. 17, there is no prefix merging can be operated.

3.2.4 Set Table Information

After merging process, merged groups and PCLC_mincodes are MTM groups and MTM_PCLC_mincodes, respectively. The table information of a coding table includes the valid-bit and the length of codewords. Because the shortest length of codeword is 1 bit and the length is from 1-bit to 16-bit, we just store (length-1),i.e. 0 ~ 15 in the memory to save memory space. After this shifting operation, the smallest (length-1) in all the groups is defined as MTM_CL-1 and stored in the group

information memory. Therefore, the difference between the larger (length-1) and (MTM_CL-1) which is defined as CL_diff is stored in the table information memory.

The memory space is further saved because the data redundancy among the lengths in a MTM group is exploited. The table information and group information are shown in the Fig. 18.

Fig. 18 Table information and group information

3.2.5 Base Address Merging

Although base addresses can be stored for different tables under the given group, the required memory space is large when the number of tables becomes large. [2]

proposed a method that classify base address in to categories according to the numbers of table entries. For example, the table1 has 28 entries and table2 has 136 entries, the base addresses of them are classified into two categories: base_addr1 is 5-bit and base_addr2 is 8-bit. With the base address adjustment, different tables with the same category can use common set of base addresses.

group information recovery is shown in Fig. 19. In the first step, (length-1) of VLC_mincode is computed by adding MTM_CL-1 and CL_diff. Second, the most length bits of the MTM_PCLC are assigned to PCLC_mincode while the remained bits are 0s. Finally, the base address is accessed according to base address selection.

Fig. 19 Example of group information recovery

Finally, Table 4and Table 5 shows the number of groups of every table and MTM groups in CAVLC and MPEG-2, respectively. The number of items is reduced greatly in both standards.

Table 3 The number of groups of tables in CAVLC and the number of MTM groups

# of Group # of symbols # of group after MTM

Coeff_Token(0<=nC<2) 17 62

Coeff_Token(2<=nC<4) 16 62

Coeff_Token(4<=nC<8) 11 62

Coeff_Token(8<=nC) 7 62

Coeff_Token(nC= -1 ) 8 14

Total_Zeros(TC =1) 9 16

Total_Zeros(TC =2) 8 15

Total_Zeros(TC =3) 8 14

Total_Zeros(TC =4) 7 13

Total_Zeros(TC =5) 7 12

Total_Zeros(TC =6) 7 11

Total_Zeros(TC =7) 8 10

Total_Zeros(TC =8) 7 9

Total_Zeros(TC =9) 7 8

Total_Zeros(TC =10) 6 7

23

Total_Zeros(TC =11) 5 6

Total_Zeros(TC =12) 5 5

Total_Zeros(TC =13) 4 4

Total_Zeros(TC =14) 3 3

Total_Zeros(TC =15) 2 2

Total_Zeros_ch (TC =1) 4 4

Total_Zeros_ch (TC =2) 3 3

Total_Zeros_ch (TC =3) 2 2

Run_Before(ZL = 1) 2 2

Run_Before(ZL = 2) 3 3

Run_Before(ZL = 3) 3 4

Run_Before(ZL = 4) 4 5

Run_Before(ZL = 5) 4 6

Run_Before(ZL = 6) 5 7

Run_Before(ZL > 1) 11 15

Table 4 The number of groups of tables in MPEG2 and the number of MTM groups

*: There are 9 locations are unused because the VLC_codnum in one groups are not continuously increment.

# of Group # of symbols # of group after MTM TB14 13 111

TB15 19 111* 21

3.3 Modified MTM algorithm for Improvement of Memory

Efficiency

Based on the basic concept of MTM algorithm, we applied the algorithm for all coding tables in CAVLC to achieve programmability. The tables include Coeff_Token

adjustment will shift the base address to the maximum value of the group within the same category hence increase the unused locations in symbol memory. Take Coeff_Token (0 <= nC < 2, 2 <= nC < 4, 4 <= nC < 8, 8 <= nC) tables as a example, these four tables are 62-entry and they should belong to 6-bit address category. Fig. 20 shows that most base addresses are adjusted to meet the requirement and we can see that the total shift amount is 83+59+53+52=247. The base address implies the symbol address, as a result, there are 247 entries in symbol memory are unused after the adjustment procedure. The symbol length of Coeff_Token is 7-bit thus there 247 * 7

=1729 bits are unused.

====== Group0: MTM_CL-1=0 1000_0000_0000_0000 ======

VLC3.G4 2 +56 +9

====== Group15: MTM_CL-1=12 0000_0000_0011_0000 ======

VLC1.G12 9 +5 +7 +4

====== Group16: MTM_CL-1=10 0000_0000_0010_0000 ======

VLC0.G12 21

VLC1.G13 5 +5 +7 +4

VLC0.G14 5 VLC1.G15 0 +5

====== Group19: MTM_CL-1=15 0000_0000_0000_0100 ======

VLC0.G15 1

====== Group20: MTM_CL-1=14 0000_0000_0000_0010 ======

VLC0.G16 0

====== Group21: MTM_CL-1=0 0000_0000_0000_0000 ======

VLC3.G6 0

Fig. 20 Base address adjustment procedure. The first number after VLCi. Gn is base address and the +K is the adjusted amount of base address.

From this reason, base address adjustment is not considered as the method of reducing memory space. Nonetheless, direct record of the base address as MTM group information also makes many unused locations in the group information memory. Because there are many small tables in CAVLC and they have short codewords and a few of entries. Usually, the small tables use just one or two groups and other groups are invalid for them. In other words, many entries of group information are unused if the table is small as shown in Fig. 21. In Fig. 21, “TB”

represents large tables which have long codewords and many entries while “tb”

represents small tables which almost have short codewords and a few entries. The blank blocks is the filed that is unused by the table, i.e., validbit = 0 for that group of a table. For example, TBN does not contain GP0 and valid bit for GP0 is 0 while valid bit for other groups are 1. Also, many fields in the Fig. 21 for tb0~tbm are unused.

Fig. 21 Unused locations for the information memory.

Due to the above phenomenon, the modified organization of group information and table information is proposed. First, we use the longest codeword of a table as the determination of whether the table is large or small. We use 4 as the threshold.

Therefore, there are 14 small tables and 16 large tables among CAVLC. Next, the practical storage unit is separated in two parts: one for small tables and one for large tables. By doing this partitioning, the unused locations can be reduced. Originally, the size of the memory space is (n + m) × k and the total size is reduced to n × k + m × ks, where ks is number of groups used by small tables, k is number of groups used by large tables, m is number of small tables, and n is number of large tables. In addition, the PCLC of small tables is shorter than the longest PCLC among all tables. Therefore, size of group information can also be reduced. Table 5 shows the comparison of size for conventional MTM without base address adjustment and the modified memory allocation.

Table 5 Space of the conventional MTM without base address adjustment and the modified memory allocation

3.4 Symbol Memory Allocation

Generally, the symbols are store in the symbol memory and symbol lengths of different tables are different. If only one symbol memory is used, the word length must be the length of the longest symbols among all tables. This allocation leads to some wasted space for shorter symbols. In [2], there are several symbol memories with different kinds of word length in order to save the space. The symbol length is

That is, the symbols in Coeff_Token and Total_Zeros/Run_Before can be concatenated into 11-bit words and stored in the 256 × 11 symbol memory. The overhead includes a mask and the multiplexer to choose the format of the symbol according to the standards and decoding tables as shown in Fig. 22. Besides, the start positions of tables are stored in a small register files. As CAVLC is used, we can select the most significant 7 bits of the memory output for Coeff_Token symbols or the least significant 4 bits for Total_Zeros/Run_Before symbols; the whole word is assigned to

That is, the symbols in Coeff_Token and Total_Zeros/Run_Before can be concatenated into 11-bit words and stored in the 256 × 11 symbol memory. The overhead includes a mask and the multiplexer to choose the format of the symbol according to the standards and decoding tables as shown in Fig. 22. Besides, the start positions of tables are stored in a small register files. As CAVLC is used, we can select the most significant 7 bits of the memory output for Coeff_Token symbols or the least significant 4 bits for Total_Zeros/Run_Before symbols; the whole word is assigned to

相關文件