Video Encoder Architecture - 視訊品質控制之軟硬體協同設計與純粹硬體方法

1. Introduction

1.2. Video Encoder Architecture

Source Picture

ME/MC DCT

Reference Frame

IDCT Inverse

Quantize

ME: Motion Estimation MC: Motion Compensation QP: Quantization Parameter 1. Calculate

Frame level Target bits

2. Apply R-D Model

Get QP

3. Adapt R-D Model

VLC to Bitstrea

Fig 1 is the flow diagram for video encoder to generate the output bitstream.

The blocks with solid lines are the essential processing elements and blocks with dashed lines are rate control points.

1.2.1. Introduction to Video Coding modules

In this section, we will give a brief introduction to the essential processing modules of a video encoder. The input video data to an encoder is usually in YC_BC_R format. In this representation, the luminance information (Y-channel, a.k.a. luma) is separated from the chrominance information (C_B- and C_R- channels, a.k.a. chroma).

1.2.1.1. Motion Estimation/Motion Compensation

Each video frame is encoded using either inter-coding or intra-coding modes.

Inter-coding modes (P-frame or B-frame) employs motion prediction to reduce the temporally correlated data while intra-coding mode (I-frame) operates directly on the pixel data of the current frame. For the first source frame, the only choice is intra-coding because there are no previous frames to support inter-coding. For subsequent frames, all three coding types (I, P, B frame) can be used. These coding modes also exist at the macroblock-level. However, some restrictions apply. For I-frames, only I-MBs are allowed. For P-frames, both I- and P-MBs are permissible.

Finally, for B-frames, all coding modes are possible. The key to the performance of an video encoder is the mode decision algorithm (which can be part of the RC module) and the motion estimation algorithm.

1.2.1.2. DCT

DCT is used to transform pixel data (I-MBs) or error residual data (P- and B-MBs) into frequency domain. 8x8 block size is used for the transform. That is, each MB is split into six 8x8 blocksfor the transform.

1.2.1.3. Quantization

The quantization process reduces the entropy of the source data. This is the key technique to lossy coding. A properly-designed non-uniform quantizer can produce much better results than a uniform quantizer. However, for real-time video applications, uniform quantizer is often used. The RC module determines the quantization step size (QP) as a tradeoff between quality and distortion.

1.2.1.4. IDCT and Inverse Quantization

In order to do predictive coding for the next frame, reconstructed frame should be stored in the reference frame buffer. Hence, there is a video decoder (minus the entropy decoder) embedded inside the encoder. Inverse quantization and IDCT are used to reconstruct spatial data from DCT coefficients which are different from the original coefficients due to quantization effect.

1.2.2. Rate Control Illustration

In Fig 1, the blocks which are in dashed lines are an example of rate control points in a video encoder. As mentioned in previous section, there are two types of rate control, namely, frame-level RC and macroblock-level RC.

Fig 2. Rate Control Flow

Fig 2 shows the flow of rate control. The top path of Fig 2 is for frame-level control, so the rate-distortion curve is modeled using entire frame; while the bottom path is macroblock-level control, which uses MBs as modeling units and adapts quantization parameters for each MBs. The main tasks for a typical rate control algorithm are described in the following sections.

1.2.2.1. Calculate Frame Level Target Bits

The essential goal of rate control is to control the output bitrate, so the first task of RC is to compute the bit budget for current frame. Bit budget (a.k.a.

bit-allocation) is obtained according to the status of buffer and possible pre-analysis for current frame content. In general, pre-analysis is employed to exploit the frame complexity and some critical information using the technique like image processing, probability modeling, and etc., and this may be one essential factor to improve the performance of rate control.

It is important for RC algorithm to estimate the complexity of a frame/macroblock before the encoding loop starts so that the RC algorithm can controlled the quantization stepsize (QP) to meet the bit budget constraint. If QP is

sum-of-absolute-difference (SAD) of motion-compensated frame and current frame, consequently, SAD is often used to represent the degree of complexity.

Good bit allocation algorithm could improve the overall visual quality, even if the same rate-distortion model is used. Therefore, this is a key differentiator among different video encoder implementations.

1.2.2.2. Apply R-D model and Get QP

Given target number of bits (i.e. data rate), QP can be calculated from the rate-distortion model at macroblock-level or frame-level. Many algorithms approximate R-D curves with polynomial functions, logarithmic functions, or in a transformed domain (e.g. Rho-domain). Seleting a QP that fulfills the bit-budget constraint while maintaining consistent visual quality across macroblocks and video frames is one of the biggest challenges.

1.2.2.3. Update R-D model

Because R-D model is content-dependent, a RC algorithm must adapt model parameters progressively during the encoding of a video sequence. After QP is determined and used to encode current frame, the actual bits used to code current frame could be obtained, and this information could help correct the R-D model parameters for next encoding iteration (for next frame or next macroblock).

When scene change happens in a video sequence, the model parameters may need to be reset. A common practice is to encode the first frame at the scene change position as an I-frame and restart the RC modeling process from that point.

在文檔中視訊品質控制之軟硬體協同設計與純粹硬體方法 (頁 13-16)