Terminal decoding flow

Chapter 3 Binary Arithmetic Decoder Engine

3.1 Overview for CABAD

3.2.3 Terminal decoding flow

The final AD is the terminal decoding flow which is applied by the syntax elements of mb_type and end_of_slice_flag. Because the terminal decoding flow is used to judge if the current slice is complete, it works one or two times per macroblock. Thus, the terminal decoding flow is seldom used in CABAD system. The pie chart of Figure 16 shows the percentage of the AD usage. The terminal decoding flow occupies approximately 0%, so we don’t care about its efficiency very much.

According to the flow chart of Figure 11, the terminal decoding flow just needs one comparator which compares with codlOffset and codlRange and the renormalization part. Hence, it shares the comparator and the renormalization part with the normal decoding flow, given as Figure 18.

Based on the algorithm of the terminal decoding flow, codOffset and codlRange don’t need to be renormalized when sub-dividing to LPS. Thus, the control signal of

“skip renormalization” in Figure 18 has discussed. “skip renormalization” has two cases to be activated. One is the bypass decoding flow to be used. The other is the LPS condition when the terminal decoding flow is applied. “skip renormalization”

controls one 9-bit 2-to-1 multiplexers for codlRange and one 10-bit 2-to-1 multiplexers for codlOffset to skip the renormalization part. The simple logic function is shown in Figure 24.

Figure 24 Simple logic function for “skip renormalization” control signal

3.3 2 ^nd Decoding Flow -

Architecture of the Binarization Engine

The binarization engine is the second level decoding flow of the CABAD architecture. We also treat it as the top module in our proposed architecture. It schedules the timing related to the context model of reading-to and writing-back and selecting the arithmetic decoding flows. It also controls the syntax element buffer (SEB) to record the required coefficients to the row-storage SRAM (RS SRAM). In addition to the aforementioned techniques, the main work of the binarization engine is to read the bin string and find the suitable syntax element values by means of the unary, the truncate unary, the fixed length, the Exp-Golomb, and the special defined codes.

bin string match the defined code

State 0: waiting for the request of syntax parser

State 1: check the AD mode

If( normal decoding): get the context model

If( bypass decoding or terminal decoding) skip getting the context model

State 2: bin string decode:

match definition: binIdx = 0

No match definition: binIdx=binIdx+1, request AD to produce next bin

State 3: generate SE value

Figure 25 finite state machine of the binarization engine

Figure 25 shows the finite state machine (FSM) of the binarization. The first state (state 0) is the stand-by state. The binarization waits for the request of the syntax parser until activating the CABAD system, and jumps to “state 1”. “state 1” checks

the type of AD. If it is the normal decoding, the binarization engine reads the neighbor information from the RS SRAM, and generates the context model index and reads the context model form the context SRAM. Then, FSM jumps to “state 2”. “State 2” is a binary tree where we have defined in Section 2.3. Based on the bin index (binIdx), the bin string is compared with the binary tree. If bin string can’t find the mapped binary, the binarization engine increases binIdx and requests AD producing the next bin value to map again until the mapped binary and the suitable value of syntax element in state 3. If it finds the mapped binary value, the value of binIdx is initialized as “0” and waits for the request of the next syntax element.

0 1 5 6

2 4 7 12

3 8 11 13

9 10 14 15

Figure 26. Zig-Zag scan

In the residual decoding, the binarization engine decodes the residual data to the residual coefficients. There are five types to present the residual data in Table 4. The coefficients are ordered by the zig-zag scan, shown as Figure 26. The five types of residual coefficients are shown as follows.

The first type is the luminance blocks for DC. The 16x16 MB is composed of sixteen sub-macroblocks of 4x4 pixels for luminance blocks. Thus, there are sixteen DC values in each MB, which is shown in Figure 27.

Figure 27 Hadamard transform for Luma DC

The second type is the luminance blocks for AC. The 16x16 MB is composed of sixteen sub-macroblocks of 4x4 pixels for luminance blocks. Each 4x4 sub-macroblock is composed of one DC value and fifteen AC value. Thus, the binarization engine has to compute fifteen coefficients of the AC block for sixteen times, which is shown in Figure 28.

0 4 5 1 3 6 11 2 7 10 12 8 9 13 14

Figure 28 Hadamard transform for Luma AC

The third type is the luminance blocks without separating the DC and AC blocks.

Thus, the binarization engine computes the coefficients sub-macroblock by sub-macroblock, which is shown in Figure 29

. Figure 29 Hadamard transform for Luma

The fourth type is the chrominance blocks for DC. The 8x8 MB is composed of four sub-macroblocks of 4x4 pixels for chrominance blocks. Thus, there are four DC values in each MB, which is shown in Figure 30.

Figure 30 Hadamard transform for Chroma DC

The fifth type is the chrominance blocks for AC. The 8x8 MB is composed of four sub-macroblocks of 4x4 pixels for chrominance blocks. Each 4x4 sub-macroblock is composed of one DC value and fifteen AC value. Thus, the binarization engine has to compute fifteen coefficients of the AC block for four times, which is shown in Figure 31.

0 4 5 1 3 6 11 2 7 10 12 8 9 13 14

Figure 31 Hadamard transform for Chroma AC

Significant_ coeff

Figure 32. Zero-skip architecture

According to our statistics, the decoding residual data take more than 80%

execution time of CABAD, so we focus on reducing the utility rate of it. Hence, we propose the zero-skip method to promote the throughput.

Before decoding the coefficients, it is necessary to decode significant_coeff_flag

at first. The content of significant_coeff_flag buffer is the zigzag scanned order of the sub-macroblock. Figure 32 shows the zero-skip architecture for decoding the residual data. In significant_coeff_flag buffer, it means the non-zero coefficient when the flag is equal to ‘1’ and the zero coefficients when the flag is equal to ‘0’. Thus, we extract all the indices of non-zero significant_coeff_flag to be scheduled. The scheduled indices are taken into the residual decoding operation to produce the corresponding coefficient values. It sends the completed value to the suitable location by means of the 1-to-16 de-multiplexer. It can be found that the indices of zero significant_coeff_flag don’t execute for any cycle when decoding the current residual coefficient. Hence it leads to save some cycles for the residual decoding.

3.4 Summary

In this chapter, we have discussed the architecture of the H.264/AVC system for CABAD. It is the entry of the H.264/AVC decoder. Then, we proposed the 2-level hierarchical decoding flow of the CABAD system which includes the arithmetic decoder of the first level decoding and the binarization engine of the second level decoding. Hence, the hardware architecture of CABAD is given at the same time. We summarize that we have proposed three main methods to enhance throughput, including the pipelining architecture for the normal decoding flow of AD, the multi-symbol organization for the bypass decoding mode of AD, and zero skip consideration for the residual decoding of the binarization engine.

在文檔中應用於數位電視之H.264/AVC背景適應性二元算術解碼器 (頁 61-68)