OVERVIEW OF H.264/AVC STANDARD - 適用於高畫質視訊之移動估測設計

2.1. Overview

Image and video compression has been a very active field of research and development for over twenty years. Many different systems and algorithms for compression and decompression have been proposed and developed. In order to achieve inter‐working, industrial competition and possibility of popularity, it is necessary to define standard methods for decoding to allow products from different manufacturers to communicate to each other effectively. Therefore, the standardization process has contributed to the prevalence of broadcast television and home entertainment nowadays. Recently, the ISO (International Standard Organization) MPEG4 standard is enabling a new generation of internet‐based video applications while the ITU‐T (Telecommunication Standardization Sector) H.263 standard for video compression is now widely used in videoconference systems.

MPEG4 and H.263 are standards that are based on video compression technology start with about 1995. The two groups responsible for these standards: the one is Motion Picture Experts Group (MPEG) and the other is Video Coding Experts Group (VCEG), both of them are in the final stages of developing a new standard that promises to significantly outperform MPEG4 and H.263. It provides better compression of video images by properly adopting a variety of tools to supporting high‐quality and low bit rate streaming video. In the VCEG side, after finishing the original H.263 standard 1995, the VCEG started work on two further development areas: a short‐term effort to add extra features to H.263 and a long‐term effort to develop a new standard for low bit rate visual communications. The long‐term effort led to the draft “H.26L” standard, offering significantly better video compression efficiency than previous ITU‐T standards.

In 2001, the MPEG recognized the potential benefits of H.26L; therefore the Joint Video

Team (JVT) was formed, including experts from MPEG and VCEG. JVT’s main task is to develop the draft H.26L model into a full international standard. In fact, the outcome will be two identical standards: ISO MPEG4 Part 10 of MPEG4 and ITU‐T H.264. The official title of the new standard is Advanced Video Coding (AVC); however, it is widely known by its old working title, H.26L and by its ITU document number, H.264 [1].

H.264 consists of numerous of tools. Compared to the prior video coding standards, many important and new techniques are employed and bring significant improvement on coding performance. Some details of these techniques can be found in [2]. Here, we would like to give a brief introduction of the basic concepts of these tools, which have existed for some time but nicely tuned and well integrated together to form a good compression scheme in H.264.

2.2. Coding Structure

In common with earlier standards, the H.264 standard does not explicitly define a CODEC (encoder / decoder pair). Instead, the standard defines the syntax of an encoded video bit stream together with the method of decoding. Actually, a compliant encoder and decoder are likely to include the functional elements shown in Figure 2 and Figure 3;

besides, the functions shown in these figures are likely to be necessary for compliance.

In these figure, we can find that the decoder system is a part of the encoder, whereas there are a certain range for considerable variation in the structure.

In general, most of the video coding systems are based on the motion estimation and motion compensation mechanism along with some other tools to reduce the neighboring frame redundancy. The basic functional elements (prediction, transform, quantization, entropy encoding) are little different from previous standards (MPEG1, MPEG2, MPEG4, H.261, H.263, etc.).

Figure 2 the basic structure of encoder

Reference Frame

Re-contstructed

Frame

Motion Compensat

ion

Intra prediction

Inverse Discrete Cosine Transform

Inverse Quantizatio

n Reorder Entropy

Coding

2.3. Intra Prediction

If a block or macroblock is encoded in intra mode, a prediction block is formed based on previously encoded and reconstructed blocks. This prediction block is subtracted from the current block prior to encoding. In H.264 [1], for the luminance (luma) block, it may be formed for each 4x4 subblock or for a 16x16 macroblock. There are a total of 9 optional prediction modes for each 4x4 luma block and 4 optional modes for a 16x16 luma block and one mode that is always applied to each 4x4 chrominance (chroma) block.

2.4. Inter Prediction

Inter prediction creates a prediction model from one or more previously encoded video frames. The model is formed by shifting samples in the reference frame(s) (motion compensated prediction). The AVC CODEC uses block‐based motion compensation, the same principle adopted by every major coding standard since H.261. Important differences from earlier standards include that the H.264 supports for a variety of range of block sizes (down to 4x4) and fine sub‐pixel motion vectors (1/4 pixel in the luma component).

2.5. In‐loop Filter

In H.264, a filter is applied to every decoded macroblock in order to reduce blocking distortion caused by block‐based transformation. In the encoder, the deblocking filter is applied after the inverse transform and before reconstructing and storing the macroblock for future predictions. In the decoder, it is applied before reconstructing and displaying the macroblock. The filter has two benefits: in the first place, block edges are smoothed, improving the appearance of decoded images, especially at higher compression ratios. In the second place, the filtered macroblock is used for motion‐compensated prediction of further frames in the encoder, resulting in a smaller residual after prediction.

2.6. Context‐based Adaptive Binary Arithmetic Coding (CABAC)

An arithmetic coding system is used to encode and decode H.264 syntax elements. The arithmetic coding scheme selected for H.264, Context‐based adaptive binary arithmetic coding (CABAC) achieves good compression performance through for two reasons: first, it selects probability models for each syntax element according to the element’s context.

Second, it adapts probability estimates based on local statistics by using arithmetic coding.

在文檔中適用於高畫質視訊之移動估測設計 (頁 27-31)