CHAPTER 1 INTRODUCTION
1.3 H.264/AVC S TANDARD O VERVIEW
Slice N
Macroblock N
……
Quantization ……
value
Address Modes Block Block …… Block
Macroblock Layer Address Modes Motion
Vectors Block …… Block
Fig.1.3 Hierarchical bit-stream structure of MPEG-2 video
1.3 H.264/AVC Standard Overview
H.264/AVC is a standard only for videos. Its extreme low data rate is achieved by several complex techniques and algorithms such as up to 1/4 resolution for luma and 1/8 for chroma on motion vector, several block size from 4x4 to 16x16, several modes in inter/intra prediction, CAVLC, or CABAC in context-adaptive entropy coding.
1.3.1 Profiles and Levels
H.264/AVC contains 3 profiles, which are baseline, main, and extended profiles. A new profile named “high profile (Fidelity Range Extensions (FRExt))” will be included as well and is currently standardized. As Fig. 1.4 shows, I-slice, P-slice and CAVLC are the basic parts of the H.264/AVC system. CABAC and interlace is supported in main profile, and some extra slice like SP and SI slices, and data partitioning is supported in extended profile.
I slices P slices CAVLC
Slice Group and ASO Redundant
slices B slices Weighted
Prediction Interlace
CABAC SP and SI
slices Data partitioning
Baseline Extended
Main
Fig.1.4 H.264 baseline, main, and extended profile
Much more than MPEG-2 levels can be found in the H.264 standard. From level 1 to level 5.1, max frame size ranging from 99 to 36,864 macroblocks, max video bit rate ranging from 64k to 240,000k bits/s, and motion vector ranging from +/-64 to +/-512 samples.
1.3.2 Encoder/Decoder Block Diagram
The encoding process for H.264/AVC video is more complex than the encoding process of the MPEG-2 video. Fig. 1.5 shows the simple block diagram of the H.264/AVC encoder. Same as MPEG-2 encoder, an embedded decoder exists inside the encoder that calculates the result of the motion compensation and intra prediction at the decoder side.
With this embedded decoder, the encoder can foresee the decoded result and precisely calculate the residual pixel values without mismatch to the decoder. Besides inter prediction (motion compensation), intra prediction is also an important parts that tries to reduce the spatial redundancy to increase coding efficiency. Several intra prediction modes can be used for the intra predictor, and the prediction mode is decided by a mode decision block at the proceedings of the intra predictor. Not only intra prediction, the choices of the motion compensator are a lot as well. Various block sizes, multiple reference frames, short/long term prediction, and the motion vectors are all decided by motion estimation block. With these 2 strong prediction paths, the residual pixels values calculating from subtracting the input video with the prediction pixel values is closer to zero. After DCT transformation, quantization process, the entropy decoder at last reduces the coding redundancy effectively and then outputs the coded pictures.
Input video
Motion Estimation
Motion Compensation
Intra Mode Decision
Intra Prediction
- DCT Quantization Reorder Entropy encoder NAL
+ IDCT Inverse Quant.
Loop Filter Reference
Frame
+ -Inter prediction Intra prediction
Embedded Decoder
Fig.1.5 A simple block diagram of H.264/AVC video encoder
Compared with the encoder, the decoder is simpler because it lacks the decision parts like motion estimator and the intra mode decision parts. Fig. 1.6 shows a simple block diagram of the H.264/AVC video decoder. After entropy decoding the input bit-stream, the inverse quantization process and IDCT transformation transferred the bit-stream data into residual pixel values. By adding the predicted pixel values from intra predictor or motion compensator, an in-loop filter smoothed the blocking effects and then to both the output buffer and frame buffer for future reference. The details of the decoding process will be described in section 2.2.
Input bit-stream
Entropy
decoder Reorder Inverse
Quant. IDCT +
Motion Compensation
Intra Prediction Inter prediction
Intra prediction
Loop Filter
Output Video
Frame Buffer
Fig.1.6 A simple block diagram of H.264/AVC video decoder
1.3.3 Bit-stream structure
Same as MPEG-2 bit-stream structure, the H.264 bit-stream is structured hierarchically, from block-level to video sequence level. Different from MPEG-2 which is the 8x8-block based system, the smallest block size in H.264/AVC system is the group of 4x4 pixels.
Reference to the annex B in the H.264 standard [7], as Fig. 1.7 shows, data are all packed into NAL units. An NAL syntax element is attached in the front of each NAL unit. Each NAL unit contains an NAL unit header, which indicates the NAL unit type of the following data in this NAL unit, and the type of the RBSP (Raw Byte Sequence Payload) it contains.
There’re several types of RBSP. For example, the SPS (sequence parameter set), PPS
(picture parameter set), and Slice layer RBSP. Slice layer RBSP includes slice header, slice data, and sometimes slice ID or redundant picture count of the partitioned slice layer. Slice data is composed of macroblocks, each consists of prediction modes (in intra macroblock) or sub-macroblock type, motion vectors (in inter macroblock) and the 4x4 block based residual data, which contributes the size of the H.264 bit-stream the most.
‧‧‧‧‧‧
NAL unit NAL unit NAL unit ‧‧‧‧
SPS-RBSP RBSPPPS- Slice Layer-RBSP
Slice
Header Slice Data
Macro-block Residual
Data NAL
Syntax Element
NAL Syntax Element
NAL Syntax Element
NAL Syntax Element
NAL Unit Header
NAL Unit Header
NAL Unit Header
Macro-block
Macro-block
Macro-block Residual
Data
Sub-Macroblock Predition
Macro-block Predition
Fig. 1.7 Hierarchical structure of H.264 video bit-stream