• 沒有找到結果。

Chapter 1 Introduction

1.1 Motivation

H.264/AVC is the new video compression standard of the ITU-T Video Coding Experts Group and ISO/IEC Moving Picture Experts Group (MPEG). It promises to outperform the earlier MPEG-4 and H.263 standard, employing many better innovative technologies such as multiple reference frame, variable block size motion estimation, in-loop de-blocking filter and context-based adaptive binary arithmetic decoding (CABAD). H.264/AVC system can save the bit-rate up to 50% compared to the previous video standard such as H.263 and MPEG-4 under the same quality.

Because of its high quality and compression gain technology, the more livelihood application products such as digital camera, video telephony and portable DVD player adopt H.264/AVC as its video standard as well.

H.264/AVC contains two entropy decoders which are context-based adaptive variable length decoding (CAVLD) and context-based adaptive binary arithmetic decoding (CABAD). The simpler entropy coding method is CAVLD for simple profile. It can save 10% for the execution time under increasing the 7% bit-rate compared to CABAD. Because of the bit-rate saving, CABAD is the advanced choice for the massive capacity demand of the newest video application.

From the profiling of the H.264/AVC reference software (JM9.2), the run time of CABAD entropy decoding mode increases about 10% than CAVLD under the main profile QCIF video stream at 30 fps. So the acceleration of the CABAD architecture is necessary in H.264/AVC main profile. We propose a novel architecture for the high speed RAM-based CABAD. It takes the advantage of the low-rate and speed

promotion when using this entropy decoder.

The bottleneck of our CABAD design is the throughput for every decoding mode.

The arithmetic decoder pipelining is the first task for CABAD architecture. Therefore, the RAM-based context model scheduling for fetching and write-back at the same time becomes important issue in order to apply the pipeline architecture in CABAD.

The pipeline problems will be overcame in our proposed implementation.

1.2 Organization of this thesis

This thesis is organized as follows. In Chapter 2, we present the algorithm of CABAD. It contains the arithmetic decoder for the first level decoding and the binarization engine for the second level decoding. In addition, we also present the criterion of the memory system in our design. Chapter 3 shows the proposed architecture of our CABAD design. An in-depth discussion of the proposed architecture of the arithmetic decoder and the binarization engine will be given. In Chapter 4, the introduction of the memory system realization will be presented in detail. At the final, the verification method and simulation result will be shown in Chapter 5. We make a brief conclusion and future work in the last chapter.

Chapter 2

Algorithm of CABAD for H.264/AVC

In this chapter, we show the algorithm of CABAD. The CABAD is composed of the arithmetic decoding process, the binarization process and the context model. The arithmetic decoding process reads the bit-streams which are compressed by the H.264 encoder, and computes the bin to offer the binarization process for decoding the suitable syntax elements. CABAD needs fewer bit-streams to decode all the decoded syntax elements (SE) depending on the context model records the historical probability.

This chapter is organized as follows. In Section 2.1, we present an overview of the CABAD decoding flow, and show the two level decoding processes. In Section 2.2, the more detail of the binary arithmetic coding algorithm will be shown. We will make a briefly introduction related to the binary arithmetic encoding process, and explain our topic, the binary arithmetic decoding process, in detail. In Section 2.3, we introduce all kinds of the binarization process such as the unary, the truncated unary, the fixed- length, Exp-Golomb and the defined code organization. In Section 2.4, we present the context model related with the different SEs. In final section, we show how to get the neighbor SE to index the suitable context model allocation.

2.1 Overview of CABAD flow

Figure 1. Block diagram of H.264/AVC for baseline profile

Figure 2. Block diagram of H.264/AVC for main profile

The entropy decoder is the first step of the H.264/AVC system which contains two entropy decoders which contain the variable length decoding (VLD) and the context-based adaptive binary arithmetic decoding (CABAD).

Figure 1 shows the block diagram of H.264/AVC for baseline profile. The baseline profile adopts VLD to decode the MB information and the pixels coefficients which contains the universal variable decoder (UVLD) and the context-based adaptive variable length decoder (CAVLD). UVLD is one of VLD in baseline profile. It decodes not only the MB information such as the mb_type, coded_block_pattern,

intra_prediction_mode, and so on, but also the MB coefficient such as mvd.

Because the residual data decoding occupies over 50% of the entire execution time, the residual coefficients are computed by the CAVLD architecture of the more efficiency.

Figure 2 shows the block diagram of H.264/AVC for main profile. The main profile has an advance choice except VLD. CABAD can be used in place of UVLD and CAVLD. Thus, H.264 system just needs CABAD to decode all MB information and pixel data if entropy decoding flag is assigned to CABAD.

NAL Layer SyntaxNAL

Element NAL unit SyntaxNAL NAL unit

Element SyntaxNAL NAL unit

Element

Figure 3 Bit-stream structure of H.264/AVC

In our system architecture, the block of syntax parser in Figure 2 employs in decoding the bit-stream on NAL layer, picture layer, and slice layer, given as Figure 3.

Syntax parser is also the top module to control all sub-system such as CABAD, VLD, intra-prediction, inter-prediction, IDCT, and so on. Hence, CABAD is the passive unit and is requested by the syntax parser and decodes the bit-stream of the macroblock layer in Figure 3. The bit-stream is also fetched through the syntax parser gets from bit-stream SRAM.

In this section, we introduce each building block of CABAD and the execution flow of the CABAD system.

Before introducing the decoder, we have to explain how the decoded bit-stream

is encoded. Figure 4 shows the block diagram of the CABAC encoder. We first see the left side of this figure. All SEs of the H.264/AVC will be transferred into the binary code “bin” when entering the CABAC encoding process. Besides the SE of fixed-length coding type, all SEs have to be coded by the binarization process which will be defined in Section 2.3. The transferred bin string encodes to the bit-stream by the binary arithmetic coder currently. The binary arithmetic coder has three different types such as normal, bypass, and terminal encoding processes. The terminal encoding process is seldom applied in CABAC system, which is only executed one time per macroblock (MB) encoding flow when the current MB is complete. So we ignore its influence. bin value for context model update

bit stream

Figure 4. CABAC encoder block diagram [3]

The normal and bypass encoding process are two main binary arithmetic coders.

If it performs the bypass encoding process, there is no need to refer to the context model because the probability of bit-stream value is fair between logical “1” and “0”.

If it applies the normal encoding process, it has to refer the associated context model depending on the SE type and the bin index. In the H.264/AVC decoder, the decoding sequence of CABAD is contrary to CABAC encoder. Figure 5 shows the CABAD block diagram. At first, the binary arithmetic decoder reads the bit-stream and transfers to bin string. The binarization process reads the bin string and decodes to the SE by five kinds of decoding flows which definition will be shown in Section 2.3.

The execution sequences between CABAC and CABAD are reversible. But the

context modeler is still determined by binarization and SE.

Figure 5. CABAD block diagram

2.2 Algorithm of arithmetic code

In this section, we introduce the basic arithmetic encoding algorithm to understand the organization of the arithmetic code. Then we will describe the basic arithmetic decoding algorithm, and show the advanced algorithm for H.264/AVC. It makes more efficient with the integer operation by means of multiplication-free and table-based architecture.

2.2.1 Basic binary arithmetic encoding algorithm

This section introduces the basic arithmetic encoding algorithm to understand the binary arithmetic coding algorithm and know how to decode the bit-stream which is generated by encoder. According to the probability, the binary arithmetic encode defines two sub-intervals in the current range. The two sub-intervals are named as MPS (Most Probable Symbol) and LPS (Least Probable Symbol). Figure 6(a) shows the definition of the sub-intervals. The lower part is MPS and the upper one is LPS.

The range value of MPS is defined as rMPS and the one of LPS is defined as rLPS.

The ranges of the current MPS and LPS are defined in Eq. 1. In this equation, ρMPS and ρ are the probability of MPS and LPS. The summation of LPS and MPS is

equal to one because the probability of the current interval is one.

Figure 6 (a) Definition of MPS and LPS and

(b) Sub-divided interval of MPS and (c) Sub-divided interval of LPS

(1 )

Depending on the bin decision, it identifies as either MPS or LPS. If bin is equal to “1”, the next interval belongs to MPS. Figure 6(b) shows the MPS sub-interval condition and the lower part of the current interval is the next one. The range of the next interval is re-defined as rMPS and ρMPSis increased. On the contrary, the next current interval belongs to LPS when bin is equal to “0”. Figure 6(c) shows the LPS sub-interval condition and the upper part of the current interval is the next one. The range of the next interval is re-defined as rLPS and ρMPSis decreased.

We arrange the algorithm in Eq. 2 and Eq. 3 as follow.

Most probable symbol (MPS) condition:

The MPS probability of the next interval:ρMPS_NEXTMPSInc The range of the next interval: rangeNEXT =rMPS

The value of the next interval: codlOffsetNEXT =rMPS×ρMPS_NEXT

ρInc: The increment of ρMPS. (Eq. 2) Least probable symbol (LPS) condition:

The MPS probability of the next interval:ρMPS_NEXTMPS −ρDec The range of the next interval: rangeNEXT =rLPS

The value of the next interval: codlOffsetNEXT =codlOffset+rLPS×ρMPS_NEXT

ρDec: The decrement of ρMPS. (Eq. 3)

codlOffset is allocated at the intersection between the current MPS and LPS range. Depending on codlOffset, the arithmetic encoder produces the bit-stream in order to achieve the compression effect.

2.2.2 Binary arithmetic decoding algorithm for H.264/AVC

In this section, we introduce the basic algorithm in Section 2.2.2.1 at first.

According to the H.264/AVC standard [1], we provide the advanced algorithm in Section 2.2.2.2, which executes the binary arithmetic decoder efficiently by means of the table-based probability and range computation.

2.2.2.1 Basic binary arithmetic decoding algorithm

In the binary arithmetic decoder, it decompresses the bit-stream to the bin value which offers the binarization to restore the syntax elements. The decoding process is similar to the encoding one. Both of them are executed by means of the recursive interval subdivision. But they still have some different coding flow, which is described as follow.

It is needed to define the initial range and the MPS probability ρMPS when starting the binary arithmetic decode. The value of codlOffset is composed of the bit-stream and compared with rMPS. The MPS and LPS conditions are unlike the definitions of the encoder. Figure 7 illustrates the subdivision of the MPS and LPS condition. If codlOffset is less than rMPS, the condition belongs to MPS. The range of the next interval is equal to rMPS. The probability of MPS (ρ ) is increased and

the bin value outputs “1”. The next value of codlOffset remains the current one.

Figure 7(a) illustrates the MPS condition. If codlOffset is great than or equal to rMPS, the next interval turns into LPS. The range of the next interval is defined as rLPS. The probability of MPS (ρMPS) is decreased and the bin value outputs “0”. The meaning of the next value of codlOffset is to subtract the rMPS from the current codlOffset.

Figure 7(b) illustrates the MPS condition.

Figure 7 (a) Result of MPS subdivision and

(b) Result of LPS subdivision We arrange the algorithm in Eq. 4 and Eq. 5 as follow.

Most probable symbol (MPS) condition: If ( codlOffset < rMPS ) The bin Value = “1”

The value of the next codlOffset: codlOffsetNEXT =codlOffset The MPS probability of the next interval:ρMPS_NEXTMPSInc The range of the next interval: rangeNEXT =rMPS

ρInc: The increment of ρMPS. (Eq. 4)

Least probable symbol (LPS) condition: If( codlOffset >= rMPS ) The value of the next codlOffset: codlOffsetNEXT =codlOffsetrMPS The MPS probability of the next interval:ρMPS_NEXTMPS −ρDec The range of the next interval: rangeNEXT =rLPS

ρDec: The decrement of ρMPS. (Eq. 5)

2.2.2.2 Advanced binary arithmetic decoding algorithm for H.264/AVC

In Section 2.2.2.1, we introduce the basic algorithm of the binary arithmetic decoder. Although it can achieve the high compression gain, the algorithm works under the floating-point operation. The hardware complexity becomes the problem when we implement the binary arithmetic decoder. In Eq. 1, it has to compute the values of rMPS and rLPS with two multipliers and processes the next value of codlOffset, range, and the probability by means of the floating adders and comparators. It consumes the lots hardware cost because the multipliers and floating operations make the complex circuit. According to H.264/AVC standard [1], we adopt the low complexity algorithm to implement the CABAD circuit.

In order to improve the coding efficiency, there are three kinds of the binary arithmetic decoders in H.264/AVC system such as the normal, bypass, and termination decoding flow. We will show whole algorithms as follows.

The first algorithm is the normal decoding process which is shown in Figure 8.

There are two main factors to dominate the hardware efficiency. One is the multiplier of range×ρLPS defined as rLPS and the other is the probability calculation defined asρLPS. In Eq. 1, it applies one multiplier to find the range of LPS (rLPS). According to the H.264/AVC standard, the table-based method is used in place of the multiplication operation. In the normal decoding flowchart, codlRangeLPS looks up the table, rangeTabLPS, depending on two indexes such as pStateIdx and qCodlRangeIdx. pStateIdx is defined as the probability of MPS (ρMPS) which gets from the context model. qCodlRangeIdx is the quantized value of the current range (codlRange) which is separated to four parts in this table. The second factor of the improved method is to estimate the value of ρMPS. In Section 2.2.2.1, we know that the value of ρMPS is increased when MPS condition happened and is decreased when LPS condition happened. But we can’t find the rule how much the value has to

be increased or decreased. The flowchart of Figure 8 also shows the table-based method to process the probability estimation. It divides into two sub-intervals such as MPS and LPS conditions. Depending on the sub-interval, it computes the next probability by the transIdxLPS table when the interval division is LPS and by the transIdxMPS table when the interval is MPS. These two probability tables are approximated by sixty-four quantized values indexed by the probability of the current interval.

codIOffset >= codIRange

binVal = !valMPS codIOffset = codIOffset - codIRange

codIRange = codIRangeLPS

qCodIRangeIdx = (codIRange>>6) & 3 codIRangeLPS = rangeTabLPS[pStateIdx][qCodIRangeIdx]

codIRange = codIRange - codIRangeLPS

pStateIdx == 0?

valMPS = 1 - valMPS

pStateIdx = transIdxLPS[pStateIdx]

Yes

No

Figure 8 the flowchart of the normal decoding flow [1]

In the basic binary arithmetic decoder, the interval subdivision is operated under the floating-point operation. In practical implementation, this method causes the complexity of the circuit to be increased. The advanced algorithm adopts the integer operation for H.264/AVC. The value of the next range becomes smaller than the current interval. So we use the renormalization method to keep the scales of codlRange and codlOffset. Figure 9 shows the flowchart of renormalization. The MSB of codlRange always keeps “1” in order to realize the integer operation. If the

MSB of codlRagne is equal to “0”, the value of codlRagne has to be shifted left until the current bit is equal to “1”. Depending on the shifted number of codlRagne, codlOffset fill the bit-stream in the LSB.

Figure 9 Flowchart of renormalization [1]

The second algorithm is the bypass decoding process which is applied by the specified syntax elements such as abs_mvd, significant_coeff_flag, last_significant_coeff_flag, and coeff_abs_level_minus1. The probabilities of MPS and LPS are fair, that is, both probabilities are 0.5. It is unnecessary to refer to the context model during decoding. Figure 10 shows the flowchart of the bypass decoding flow. Compared with Figure 8, the bypass decoding process doesn’t estimate the probability of the next interval. So we can’t see the probability computation in the bypass decoding. The computed codlRange doesn’t change which means that it has no renormalization in the bypass decoding. It just uses one subtraction to implement this decoding process. This algorithm is very simple, so we will use the architecture to speed up the CABAD system.

Figure 10 Flowchart of the bypass decoding flow [1]

codIOffset >= codIRange

binVal = 1 binVal = 0

Yes No

Done DecodeTerminate

RenormD codIRange = codIRange-2

Figure 11 Flowchart of the terminal decoding flow [1]

The third algorithm is the termination decoding process. Figure 11 show the flowchart of the terminal decoding flow. The terminal decoding process is very simple as well, but it has the more decoding procedure compared to the bypass decoding process. It doesn’t need the context model to refer to the probability. The value of the next codlRange is always to subtract two from the current codlRange depending on whether the subdivision condition belongs to MPS or not. The final values of codlRange and codlOffset are required to renormalize through the RenormD in this figure when it branches to the situation that codlOffset is smaller than codlRange (MPS condition). The architecture of this flowchart is composed of one constant subtraction, one comparator, and one renormalization. The termination decoding process is used to trace if the current slice is ended. It occurs one time per macroblock process which is seldom used during all decoding processes.

The efficiency, therefore, affects slightly in CABAD system. So we will focus on the first two algorithms in this section.

2.3 Binarization decoding flow

In Section 2.3, we focus on the decoding process of the binarization. It reads the bin string to look up the suitable syntax elements. For H.264/AVC, CABAD adopts five kinds of the binarization methods to decode all syntax elements. This section is organized as follows. In Section 2.3.1, the decoding flow of the unary code is shown at the first. The unary code is the basic coding method. Section 2.3.2 shows the truncated unary code which is the advanced unary coding method. It is applied in

In Section 2.3, we focus on the decoding process of the binarization. It reads the bin string to look up the suitable syntax elements. For H.264/AVC, CABAD adopts five kinds of the binarization methods to decode all syntax elements. This section is organized as follows. In Section 2.3.1, the decoding flow of the unary code is shown at the first. The unary code is the basic coding method. Section 2.3.2 shows the truncated unary code which is the advanced unary coding method. It is applied in

相關文件