Introduction - 基於人類視覺系統特性之視訊編碼

Video compression is an important topic in multimedia applications nowadays.

Since the data volume of digitalized video is large for most storage and transmission systems, compression is essential for video applications. For the past decade, the technology of video compression has created many new multimedia applications, such as DVD movies, digital television, and video conferencing. Popular video coding standards adopt a lossy approach where some data from the video source are discarded during compression due to limited bandwidth or storage constraints. The process that determines which part of the source data should be discarded is called rate control.

A rate control algorithm allocates target bits for each coding unit and adjusts coding parameters to achieve the target bitrate. Rate control schemes can be categorized into two groups, which are variable-bit-rate (VBR) control and constant-bit-rate (CBR) control. The VBR rate control scheme attempts to maintain constant video quality by allocating different amount of bits to different segment of video data based on their entropy. As a result, the compressed bitstream data rate varies across time. On the other hand, the CBR rate control tries to minimize the variations of compressed bitstream data rate in order to fulfill the bandwidth constraints from video delivery channel, or the playback device capability constraints (i.e. the processing speed in bits-per-second). A side effect of this rate control approach is that the resulting video quality may vary across time. In practice, the rate control algorithm selects a proper quantization parameter in order to produce a bitstream that fulfills the application constraints.

In existing encoding models, most of them analyze video data complexity for bit allocation and rate control. The complexity (or loosely speaking, entropy) is computed either by the mean absolute difference (MAD) measure of the residual data for inter-predicted frame or by the standard deviation for intra-predicted frames.

Although MAD can reasonably represent the coding complexity of a region of video data, it does not sufficiently capture the perceptual importance of the data. Since in most video compression applications, human eyes are the final judgment of quality, perceptual models of human vision systems must be considered for better bit allocation.

Although human perceptual models have been successfully used for audio coding [28], it has not been popular for either still image coding or video coding. You probably do not need a complicated model for still image coding since the data amount is small. However, for motion picture coding, perceptual models are not used mainly because the behavior of human vision systems is very difficult to put into equations. Some coding schemes are proposed for content-based video compression since conceptually, human high-level visions work on objects instead of pixels [29][31] . A general method alone this line of thinking is to decompose a video sequence into foreground and background representations, and reduce the coding bitrate of the background data. However, we do not agree with this approach since the regions which attract human attention are most likely related personal experiences and are different from person to person. Therefore, our proposed algorithm focuses on human early vision processes [7] and tries to formulate a distortion measure based on low-level vision behavior because of its generality.

In this thesis, a rate control algorithm based on human visual system properties is proposed. The proposed rate control model of visual complexity is composed of

visual texture complexity and temporal complexity. A modified contrast sensitivity function is proposed to estimate the visual texture complexity. The visual texture complexity map represents visual distortion sensitivity of each macroblock and is incorporated into the proposed rate control model.

One critical issue in developing a perceptual model-based rate control algorithm is about how to judge the quality of a coded bitstream. The most common objective quality measurement for lossy compressed video is the peak signal-to-noise ratio (PSNR) measure. Nevertheless, researches show that the value of PSNR does not completely agree with the perceptual quality evaluated by human eyes [30]. That is the reason why International Standardization Organizations (ISO) Motion Picture Expert Group (MPEG) only uses subjective viewing tests for the evaluation of the performance of proposals for various new technology call-for-proposals (CfPs) and for final verification tests of a new prospective standard. Unfortunately, subjective viewing tests are difficult to conduct and are subject to bias if the tests are no done properly. In this thesis, we investigate an objective distortion measurement called structural similarity index (SSIM) which approximates the perceived image distortion.

From our experiments, SSIM is clearly more consistent with perceptual quality than PSNR is. The proposed visual-based rate control algorithm is evaluated using the SSIM measure.

The organization of the thesis is as follows. Chapter 2 introduces some previous work of rate control schemes, including non-visual model-based and visual model based ones. Some theories and models of human visual systems are also presented in this chapter. Chapter 3 introduces mathematical for the theoretical foundations of the proposed solutions. The detail of the proposed algorithm is derived and presented in Chapter 4. The experimental results are shown in Chapter 5. Finally, some discussions

and conclusions are given in Chapter 6.

在文檔中基於人類視覺系統特性之視訊編碼 (頁 10-14)