Video Coding Layer (VCL) - Introduction to H.264/AVC

Chapter 2 Background

2.2 Introduction to H.264/AVC

2.2.2 Video Coding Layer (VCL)

Figure 2-4 shows a typical structure of an H.264/AVC encoder. Intra prediction or inter motion compensation is the first step, followed by the DCT transform of the residual data, quantization of the transformed data, and entropy coding of the quantized data. Then, the coded data are sent to the NAL unit to be packed into packages for transmission. At the decoder site, reversed operations are applied for data reconstruction. In the following sections, we will describe some details of the H.264/AVC VCL layer.

Figure 2-4 Basic coding structure for H.264/AVC for a macroblock [3]

a. Pictures, Frames, and Fields

A video sequence consists of several pictures. One Picture could be either a frame or a field. In H.264/AVC, the coding of a macroblock could be either frame coding or field coding.

b. YCbCr Color Space and 4:2:0 Sampling

The color space of H.264/AVC is YCbCr. A typical sampling pattern of YCbCr is 4:2:0.

However, in the earliest profile (High Profile), which is also called Fidelity-Range Extension (FRExt), 4:4:4 sampling is supported.

c. Division of a Picture into Macroblocks

The luma macroblock size is 16×16 and the chorma macroblock size is 8×8. For the FRExt Profile, the macroblock size is decided based on the color sampling size.

d. Slices and Slice Groups

Slices are a sequence of macroblocks that are processed in the order of raster scan, if not using FMO (Flexible Macroblock Ordering). Two examples are illustrated in Figure 2-5. If using FMO, pictures are partitioned into slices. Each slice is a set of macroblocks defined by a macroblock-to-slice-group map. Two examples of FMO are illustrated in Figure 2-6.

Figure 2-5 Possible subdivisions of a picture into slices [3]

Figure 2-6 Possible subdivisions of a picture into slices with FMO.

No matter whether using FMO or not, slices are coded by one of the following ways.

z I slice: All macroblocks of a slice are coded by intra prediction.

z P slice: Slices are partitioned into several macroblocks. Some are coded based on motion-compensated prediction, while the others are coded based on intra prediction.

z B slice: similar to P slice, but some macroblocks may use bi-directional motion-compensation for prediction.

In addition to these three kinds of slices, there are two special types of slices: SI slice and SP slice. They are used in sequence switching. An illustration is shown in Figure 2-7.

Figure 2-7 Switching streams using I-slice and SP-slices [4]

e. Intra-Frame Prediction

In Intra Prediction, there are two modes that are different in block size. One is the 4×4 prediction mode, and the other is the 16×16 prediction mode. Usually, we use the 4×4 mode in complex regions while use the 16×16 mode in smooth regions. The 4×4 prediction mode is further divided into nine different sub-modes to handle different edge directions. Some examples are shown in Figure 2-8. On the other hand, the 16×16 mode has four different sub-modes.

Figure 2-8 Five of nine Intra 4×4 prediction modes [3]

Figure 2-9 Four Intra 16×16 prediction modes [5]

f. Inter-Frame Prediction

In inter Prediction, H.264/AVC provides eight macroblock modes. They are illustrated in Figure 2-10. First, in the selection of Macroblock Types, we have four choices: 16×16, 16×8, 8×16 and 8×8. If we have chosen the 8×8 mode, each 8×8 Type has four extra modes to select:

8×8, 8×4, 4×8 and 4×4. Hence, according to the complexity of image contents, we may have different choices of modes. For example, we may have chosen the 16×16 mode for regions with a global motion, but use the 8×8 mode for regions that contain individual moving objects.

Figure 2-10 Decomposition of macroblock for motion compensation [3]

In H.264/AVC, the accuracy of motion compensation is in units of one quarter of the distance between luma samples. The prediction values at half-sample positions are obtained by applying a one-dimensional 6-tap FIR filter horizontally and vertically. Prediction values at quarter-sample positions are generated by averaging samples at integer- and half-sample positions. This is illustrated in Figure 2-11.

Figure 2-11 Filtering for fractional-sample accurate motion compensation [3]

g. Transform and Quantization

H.264/AVC uses 4×4 integral DCT, whose transform matrix is

⎥⎥

This is an exact-match inverse transform. This transformation doesn’t cause any miss match in the inverse transform.

Besides, after the DCT transform, the DC values of 16 luma blocks will be packed, as

shown in Figure 2-12, and then transformed by the 4×4 Hadamard transform. Similarly, the DC values of 4 chroma blocks will be packed and then transformed by the 2×2 Hadamard transform.

Figure 2-12 Package of the DCT DC values [5]

There are two major reasons for the use of the 4×4 DCT Transform. First, H.264/AVC has a great improvement in prediction coding. Hence, using a smaller size DCT transform may still obtain reasonable performance. Second, the computational complexity becomes lighter when a small-size DCT is used.

The QP values of H.264/AVC range from 0 to 51. An increase of 6 in the QP value will double the quantization step size.

h. Entropy Coding

There are two modes of entropy coding in H.264/AVC: Context-Adaptive Variable Length Coding (CAVLC) and Context-Adaptive Binary Arithmetic Coding (CABAC). CAVLC is a coding technique that is more efficient than VLC, but with higher complexity. On the other hand, CABAC uses Arithmetic Coding and is also more complex than VLC.

i. In-Loop Deblocking Filter

H.264/AVC is the first standard that uses in-loop deblocking filter. This means decoders would have a deblocking filter too. The use of the deblocking filter is to reduce blocking effect in the reference image. The use of the deblocking filter can improve the performance of

motion compensation and thus increase the coding efficiency.

在文檔中 H.264/AVC之碼率控制技術研究 (頁 16-22)