H.264/AVC S TANDARD O VERVIEW - 應用於數位電視之視訊雙標準解碼器設計與實現

CHAPTER 1 INTRODUCTION

1.3 H.264/AVC S TANDARD O VERVIEW

Slice N

Macroblock N

……

Quantization ……

value

Address Modes Block Block …… Block

Macroblock Layer Address Modes Motion

Vectors Block …… Block

Fig.1.3 Hierarchical bit-stream structure of MPEG-2 video

1.3 H.264/AVC Standard Overview

H.264/AVC is a standard only for videos. Its extreme low data rate is achieved by several complex techniques and algorithms such as up to 1/4 resolution for luma and 1/8 for chroma on motion vector, several block size from 4x4 to 16x16, several modes in inter/intra prediction, CAVLC, or CABAC in context-adaptive entropy coding.

1.3.1 Profiles and Levels

H.264/AVC contains 3 profiles, which are baseline, main, and extended profiles. A new profile named “high profile (Fidelity Range Extensions (FRExt))” will be included as well and is currently standardized. As Fig. 1.4 shows, I-slice, P-slice and CAVLC are the basic parts of the H.264/AVC system. CABAC and interlace is supported in main profile, and some extra slice like SP and SI slices, and data partitioning is supported in extended profile.

I slices P slices CAVLC

Slice Group and ASO Redundant

slices B slices Weighted

Prediction Interlace

CABAC SP and SI

slices Data partitioning

Baseline Extended

Main

Fig.1.4 H.264 baseline, main, and extended profile

Much more than MPEG-2 levels can be found in the H.264 standard. From level 1 to level 5.1, max frame size ranging from 99 to 36,864 macroblocks, max video bit rate ranging from 64k to 240,000k bits/s, and motion vector ranging from +/-64 to +/-512 samples.

1.3.2 Encoder/Decoder Block Diagram

The encoding process for H.264/AVC video is more complex than the encoding process of the MPEG-2 video. Fig. 1.5 shows the simple block diagram of the H.264/AVC encoder. Same as MPEG-2 encoder, an embedded decoder exists inside the encoder that calculates the result of the motion compensation and intra prediction at the decoder side.

With this embedded decoder, the encoder can foresee the decoded result and precisely calculate the residual pixel values without mismatch to the decoder. Besides inter prediction (motion compensation), intra prediction is also an important parts that tries to reduce the spatial redundancy to increase coding efficiency. Several intra prediction modes can be used for the intra predictor, and the prediction mode is decided by a mode decision block at the proceedings of the intra predictor. Not only intra prediction, the choices of the motion compensator are a lot as well. Various block sizes, multiple reference frames, short/long term prediction, and the motion vectors are all decided by motion estimation block. With these 2 strong prediction paths, the residual pixels values calculating from subtracting the input video with the prediction pixel values is closer to zero. After DCT transformation, quantization process, the entropy decoder at last reduces the coding redundancy effectively and then outputs the coded pictures.

Input video

Motion Estimation

Motion Compensation

Intra Mode Decision

Intra Prediction

- ^DCT ^Quanti_zation ^Reorder ^Entropy_encoder ^NAL

+ ^IDCT ^Inverse_Quant.

Loop Filter Reference

Frame

+ -Inter prediction Intra prediction

Embedded Decoder

Fig.1.5 A simple block diagram of H.264/AVC video encoder

Compared with the encoder, the decoder is simpler because it lacks the decision parts like motion estimator and the intra mode decision parts. Fig. 1.6 shows a simple block diagram of the H.264/AVC video decoder. After entropy decoding the input bit-stream, the inverse quantization process and IDCT transformation transferred the bit-stream data into residual pixel values. By adding the predicted pixel values from intra predictor or motion compensator, an in-loop filter smoothed the blocking effects and then to both the output buffer and frame buffer for future reference. The details of the decoding process will be described in section 2.2.

Input bit-stream

Entropy

decoder Reorder Inverse

Quant. IDCT +

Motion Compensation

Intra Prediction Inter prediction

Intra prediction

Loop Filter

Output Video

Frame Buffer

Fig.1.6 A simple block diagram of H.264/AVC video decoder

1.3.3 Bit-stream structure

Same as MPEG-2 bit-stream structure, the H.264 bit-stream is structured hierarchically, from block-level to video sequence level. Different from MPEG-2 which is the 8x8-block based system, the smallest block size in H.264/AVC system is the group of 4x4 pixels.

Reference to the annex B in the H.264 standard [7], as Fig. 1.7 shows, data are all packed into NAL units. An NAL syntax element is attached in the front of each NAL unit. Each NAL unit contains an NAL unit header, which indicates the NAL unit type of the following data in this NAL unit, and the type of the RBSP (Raw Byte Sequence Payload) it contains.

There’re several types of RBSP. For example, the SPS (sequence parameter set), PPS

(picture parameter set), and Slice layer RBSP. Slice layer RBSP includes slice header, slice data, and sometimes slice ID or redundant picture count of the partitioned slice layer. Slice data is composed of macroblocks, each consists of prediction modes (in intra macroblock) or sub-macroblock type, motion vectors (in inter macroblock) and the 4x4 block based residual data, which contributes the size of the H.264 bit-stream the most.

‧‧‧‧‧‧

NAL unit ^{NAL unit} NAL unit ‧‧‧‧

SPS-RBSP _RBSP^PPS- Slice Layer-RBSP

Slice

Header Slice Data

Macro-block Residual

Data NAL

Syntax Element

NAL Syntax Element

NAL Unit Header

Macro-block

Macro-block Residual

Data

Sub-Macroblock Predition

Macro-block Predition

Fig. 1.7 Hierarchical structure of H.264 video bit-stream

在文檔中應用於數位電視之視訊雙標準解碼器設計與實現 (頁 25-29)