基於空間可變之動作補償預測的動向不確定性之分析

(1)

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

基於空間可變之動作補償預測的動向不確

定性之分析

Motion Uncertainty Analysis for

Spatially-Variant Temporal Prediction

研究生：詹家欣

指導教授：彭文孝助理教授

(2)

基於空間可變之動作補償預測的動向不確定性之分析

Motion Uncertainty Analysis for

Spatially-Variant Temporal Prediction

研究生：詹家欣 Student：Chia-Hsin Chan

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

國立交通大學

多媒體工程研究所

碩士論文

A Thesis

Submitted to Institute of MultimediaEngineering

College of Computer Science National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science

September 2009

Hsinchu, Taiwan, Republic of China

(3)

基於空間可變之動作補償預測的動向不確定性之分析

研究生：詹家欣指導教授：彭文孝

國立交通大學多媒體工程研究所碩士班

摘

要

這份論文中提出了如何利用空間可變之動作補償濾波器來增加動作補償估測之效率。一些近年來的研究指出原本為了解決鋸齒(aliasing)現象而設計的小數點像素精度之插補濾波器，也能在某種程度上解決動向不確定性問題。實驗結果甚至指出這些濾波器某些情況在超高清晰度(UHD)的影片的編碼效率比低解析度的來得好。基於一個為了減輕動向不確定性之編碼工具，當前的作法卻有兩個主要的問題：(1)濾波器的設計仍然是為了移除鋸齒現象，(2)對於擁有不同程度動向不確定性之像素所使用之濾波器都是故定的。在這份論文中我們對於動向不確定性進行了一些實驗來提供細部的分析，同時經由實驗結果，一些重要的發現被提出，並且我們利用這些發現來提出空間可變之動作補償濾波器之設計原則。

(4)

Motion Uncertainty Analysis for Spatially-Variant

Temporal Prediction

Student : Chia-Hsin Chan Advisor : Wen-Hsiao Peng

Institute of Multimedia Engineering

National Chiao Tung University

ABSTRACT

This thesis addresses the problem of improving temporal prediction efficiency through a spatially-variant motion-compensated filtering. A few recent studies pointed out that the sub-pel interpolation filters, originally designed to address aliasing effects, can overcome the problem of motion uncertainty to some extent. Experimental results even indicated that in terms of coding efficiency, they can improve more in UHD videos than in low-resolution ones in some cases. As a coding tool for alleviating motion uncertainty, current approach however poses two major problems: (1) the filter design is still aimed at removing aliasing effects, and (2) pixels having motion uncertainty of different degrees are forced to use the same filter. In this thesis, we carried out a few experiments to provide a detailed analysis on motion uncertainty. Also, from the experimental results, some important observations were made to form guidelines for designing a spatially-variant temporal prediction filter.

(5)

誌

謝

回顧兩年的研究所生涯，首先，我要感謝我的指導教授—彭文孝博士，給予我於學問研究上的指導。彭老師實事求是的精神，與深入剖析問題的態度，其追根究柢與契而不捨的指導方式，已經成為我在學習與研究路上的典範與楷模。其次，我要感謝我的學弟王澤瑋，不辭辛勞的與我討論，給予許多珍貴的意見，並且能適時從旁給予建議修正我已偏差的研究方向，使我在這兩年的碩士生涯，不再舉步維艱。謹此對兩人致上由衷的謝意。有榮幸進入多媒體架構與處理實驗室，可以在這個優良的環境下不斷學習，又有熱心與親切的實驗室成員們的切磋與討論，是我在學士後時代最充實的時光。感謝我的學長姐們—陳渏紋博士、李志鴻博士、林鴻志博士、黃雪婷、林岳進、與陳敏正，引領我進入研究生的階段；感謝我的好同學們林哲永、陳俊吉、與陳建穎，不論是課業上或研究上，他們總是可以一針見血地提出問題的核心要點，給予最直接協助；感謝我的學弟楊復堯、吳思賢、王澤瑋、蔡閏旭、與吳崇豪，在最後這一年內，給予許多無私的協助。最後，我要感謝我的家人—賴明珠女士、詹前亮先生的栽培，在爭取碩士學位的路上，給予百分之百的支持，讓我免去許多後顧之憂與煩擾。感謝我的弟弟詹家銘，給予我滿滿手足的關懷。感謝我的老師、家人、與朋友們，是你們的支持，使我有信心取得這個學位，謝謝你們。

(6)

Motion Uncertainty Analysis for Spatially-Variant

Temporal Prediction

Advisors: Prof. Wen-Hsiao Peng

Student: Chia-Hsin Chan

Institute of Multimedia Engineering

National Chiao-Tung Univeristy

1001 Ta-Hsueh Rd., 30010 HsinChu, Taiwan

September 2009

(7)

List of Figures

2.1 Average residual signal energy calculated from Football sequence using

block size of 16x16. . . 4

2.2 The filter design flow suggested by T.Wedi et al. . . 5

2.3 Average residual signal energy calculated from Football sequence using

block size of 16x16. . . 6

2.4 One example of OBMC. The shaded block is the current MB. Pixel p’s

distance to block MV (v1, v2, v3, v4) is (r1, r2, r3, r4). . . 6

2.5 MB partition in 32x32 MBs. . . 7

3.1 The 4 distinct pixel position goups to be analyzed. These groups are marked as "corner," "near-corner," "near-center" and "center" related

to their distance to block center. . . 9

3.2 Example of how true motion indicated by TMV search and OBMC

dis-tributes for different sequences. . . 12

3.3 True motion distributions for different pixel positions indicated by TMV search and OBMC. The axis marked "Range" is the distance started from block MV. The axis marked "cdf" is the cumulative distribution

function for the probability of true motion occurance. . . 13

(11)

-v-LIST OF FIGURES

3.4 True motion distributions for different MB sizes indicated by TMV search and OBMC. The axis marked "Range" is the distance started from block MV. The axis marked "cdf" is the cumulative distribution

function for the probability of true motion occurance. . . 14

3.5 Weiner filter with: (a) 9 taps, (b) 25 taps and (c) 49 taps. The shaded blocks indicate filter supports and the block marked "X" indicates the pixel pointed by block MV. . . 15

3.6 SSE for sequences at MB size 16x16. . . 16

4.1 Proposed 14 Weiner filter supports. . . 18

4.2 SSE Comparison for MB size 16x16. . . 20

4.3 SSE Comparison for MB size 32x32. . . 21

(12)

Abstract

This thesis addresses the problem of improving temporal prediction efficiency through a spatially-variant motion-compensated filtering. A few recent studies pointed out that the sub-pel interpolation filters, originally designed to address aliasing effects, can overcome the problem of motion uncertainty to some extent. Experimental results even indicated that in terms of coding efficiency, they can improve more in UHD videos than in low-resolution ones in some cases. As a coding tool for alleviating motion uncertainty, current approach however poses two major problems: (1) the filter design is still aimed at removing aliasing effects, and (2) pixels having motion uncertainty of different degrees are forced to use the same filter. In this thesis, we carried out a few experiments to provide a detailed analysis on motion uncertainty. Also, from the experimental results, some important observations were made to form guidelines for designing a spatially-variant temporal prediction filter.

(13)

CHAPTER 1 Research Overview

1.1 Introduction

Technology evolution in both capture and display devices will soon make possible the creation and presentation of Ultra High Definition (UHD) videos. The video bit-rate is expected to go up faster than the transmission bandwidth. Although H.264/AVC [2][9] has been a successful video coding standard, it was reported to have poor efficiency in coding UHD videos. Part of the reason is the lack of coding tools for dealing with motion uncertainty. It is thus necessary to design a video codec that is specifically optimized for UHD applications.

A few recent studies pointed out that the sub-pel interpolation filters, mainly de-signed to address aliasing effects, can overcome the problem of motion uncertainty to some extent. Experimental results even indicated that in terms of coding efficiency, they can improve more in UHD videos than in low-resolution ones in some cases. The result may appear to contradict our intuitive notion since the aliasing effects are sup-posedly to be less severe when a higher sampling rate is in use. But, when viewed from the perspective of reducing motion uncertainty, it can indeed justify the observation.

(14)

Sec 1.2. Problem Statement

1.2 Problem Statement

As a coding tool for alleviating motion uncertainty, current approach however poses two major problems:

1. The filter design is still aimed at removing aliasing effects.

2. Pixels having motion uncertainty of different degrees are forced to use the same filter.

The former can be solved by using the least-squares method to update the filter on a frame-by-frame basis, whereas the latter requires to adapt the filter in a spatially-variant manner. This thesis aims to design a spatially-spatially-variant temporal prediction filter. In the course of the design process, we carried out a few experiments to understand

1. how the level of motion uncertainty may vary with the pixel location within a macroblock,

2. how the distribution of motion uncertainty may vary with video content,

3. what contexts may be useful in predicting the distribution of motion uncertainty, 4. how the filter should be adapted in response to the varying distribution of motion

uncertainty.

1.3 Contributions and Organization

The main contributions of this thesis include the following: 1. A detailed analysis on motion uncertainty.

2. A number of guidelines for designing spatially-variant temporal prediction filters. The remainder of this thesis is organized as follows: Chapter 2 contains a review of known prediction filter designs and several issues that relate to prediction efficiency. Chapter 3 introduces the concept of motion uncertainty and contains a number of ex-periments for discovering motion uncertainty characteristics. In Chapter 4 a spatially-variant Wiener prediction filter is designed and its performance is tested in comparison with well-known filter designs. Chapter 5 concludes our study with a summary of this work and provides a list of future works.

(15)

-2-CHAPTER 2

Background

2.1 H.264/AVC Interpolation Filter

The interpolation process in H.264/AVC is shown in figure 2.1. A1, A2...F 6 represent inter-pel pixels. Each half-pel pixels (b, h, j, aa, bb...jj = b, h, j, aa, bb, ...jj) is interpo-lated by specific inter-pell pixels with a fixed filter. For example,

b = (C1 − 5 ∗ C2 + 20 ∗ C3 + 20 ∗ C4 − 5 ∗ C5 − C6 + 16)/32 h = (A3 − 5 ∗ B3 + 20 ∗ C3 + 20 ∗ D3 − 5 ∗ E3 − F 3 + 16)/32

And each quarter-pel pixel is interpolated by applying a bilinear filter on two neighbor half-pel or interger-pel pixels.

During the sub-pel motion compensation process, each motion vector(MV) will be refined in pel precision for a certain range. Finally the refined MV points one sub-pel position, and sub-sub-pel pixel in related position will be gethered to be the predictor.

(16)

Sec 2.2. Related Works a b c d e f g h i j k l m n n A1 A2 A3 aa A4 A5 A6 B1 B2 B3 bb B4 B5 B6 C1 C2 C3 C4 C5 C6 cc dd ee ff D1 D2 D3 hh D4 D5 D6 E1 E2 E3 ii E4 E5 E6 F1 F2 F3 jj F4 F5 F6 gg

Figure 2.1: Average residual signal energy calculated from Football sequence using block size of 16x16.

2.2 Related Works

2.2.1 Motion- and Aliasing-Compensated Prediction

Experimental results of motion compensation show that even with translational motion, no perfect motion compensation is achieved for images. In addition to camera noise, Werner [8] supposes aliasing also cause the prediction error.

T. Wedi et al.[7] formulate the motion-compensated prediction process between sampled discrete image and natural continuous image, and derive the prediction error of these two types of image. The derivation results prove that aliasing did exist in the discrete motion-compensated predction process. Moreover, the results indicate two main feature of prediction error caused by aliasing:

1. the impact of aliasing on the prediction error vanishes at integer-pel motion displacements;

2. the impact of aliasing on the prediction error maximizes at half-pel motion dis-placements.

In order to reduce the impact of aliasing, T. Wedi et al.[7] suggest that inperpolation filter should be applied during motion-compensated prediction process. The proposed Weiner interpolation filter design is as Figure 2.2.

At first original image S is filtered by a low-pass filter, resulting in SLP. Then SLP

is downsampled into smaller image Sd. By using S and Sdas training set, Weiner filter

hw can be obtained by solving Wiener-Hopf equation. Finally, Weiner filter hw will be

(17)

-4-Chapter 2. Background upsampling downsampling filter training Low-pass filter S SLP Sd Sw hw

Figure 2.2: The filter design flow suggested by T.Wedi et al.

used to upsample Sd to obtain interpolated image Sw.

For interpolation filter dealing with higher downsampling rate images, T. Wedi et al. [7] suggest a top-down filter design. Instead of directly design filter for higher down-sampling rate images, the interpolation fitler for lower downdown-sampling rate is obtained first. Next the filter for higher downsampling rate image is obtained by using image resulting from lower downsampling rate filter and original image as training set. This design principle is which sub-pel Interpolation filter in H.264/AVC Standard mentioned in Chapter 2.1 is used.

2.2.2 AIF

To improve the motion-compensated prediction efficiency, the concept of AIF was in-troduced. Instead of a fixed interpolation filter as in H.264/AVC standard [2] [9], interpolation filters adaptively trained for each frame were suggested. Y. Vatis et al.[6] proposed a 2-D non-separable adaptive Weiner interpolation filter that has 5.77% of bit-rate saving on average.

Later Y. Ye et al.[10] proposed an Enhanced-AIF(E-AIF) that achieved averaging 12.53% of bit-rate saving on HD(720p, 1080p) sequences.

2.2.3 OBMC

M.T. Orchard et al.[5] presented an estimation-theoretic analysis of motion compensa-tion. The result indicates that in block-based motion compensation, residual variance gets higher when pixel position moving from block center. Figure 2.3 shows the residual variance of sequence football.

To improve the prediction accuracy, OBMC was also proposed in [5]. The concept of OBMC is to utilize the predictors indicating by neighboring MBs. For example, in

(18)

Sec 2.2. Related Works

Figure 2.3: Average residual signal energy calculated from Football sequence using block size of 16x16. r1 r₂ r₃ r₄ v₁ v2 v₃ v4 p

Figure 2.4: One example of OBMC. The shaded block is the current MB. Pixel p’s distance to block MV (v1, v2, v3, v4) is (r1, r2, r3, r4).

figure 2.4, v2, v3, v4 represent neighboring block’s MVs. The current block apply the

linear combination of the predictors indicated by v1_,v2, v3, v4 as its final predictor. Initially OBMC is only implemented in to H.263 [1]. Y.W. Chen et al.[4] proposed an A parametric window design for OBMC and extended H.264/AVC with proposed algorithm under quad-tree MB partiton coding framework. Average of 5% bit-rate saving is acquired.

2.2.4 Extended Macro Block Sizes

Since the high resolution video content is the trend for next generation video coding, their characteristics are analyzed. Extended MB sizes [3] were proposed to improve the coding efficiency for one of the characteristics that smooth area are enlarged. Extended

(19)

-6-Chapter 2. Background

32x32 32x16 16x32 16x16

16x16 16x8 8x 16 8x 8

8x8 8x4 4x8 4x 4

Figure 2.5: MB partition in 32x32 MBs.

macro block sizes allow MB partition to be 32x32 or even 64x64, quad-tree based partition exists in the same time as shown in figure 2.5. On average 15.10% of bit-rate saving is obtained on HD sequences.

2.3 Comparison of Well-Known Prediction Filter

Designs

H.264/AVC interpolation filter seems to solve aliasing problem. OBMC utilizes MVs from neighboring MB to generate the motion-compensated predictor, somehow dealing with motion uncertainty problem. However, the coding gain of OBMC is not as high as expected. AIF offers impressive coding gain upon H.264/AVC interpolation filter as well as OBMC, however its design principle still follows H.264/AVC interpolation filter.

(20)

CHAPTER 3 Analysis of Motion Uncertainty

3.1 Motion Uncertainty

According to one of the conclusion in [5], a block MV found by block motion estimation aims to be the MV of block center. Thus for pixels away from block center, the block MV may not be its true motion vector. The phenomenon is called "motion uncertainty." Two element related to motion uncertainty can be came up with immediately: (1) pixel distance from block center and (2) motion compensation block size. In the following sections we try to analyze motion uncertainty through the observation of true motion vector(TMV) distribution. Then true motion indicated by OBMC is compared with the TMV distribution, in order to verify how OBMC compensates motion uncertainty. We also perform a simple experiment by applying Wiener filters with different filter tap lengh to support our conclusions on motion uncertainty characteristics.

3.2 Observation of Motion Uncertainty

To analyze motion uncertainty, the following experiments are performed:

(21)

-8-Chapter 3. Analysis of Motion Uncertainty

Figure 3.1: The 4 distinct pixel position goups to be analyzed. These groups are marked as "corner," "near-corner," "near-center" and "center" related to their distance to block center.

1. For pixels at different positions compared to block center, find out their true motion vector characteristics.

2. For pixels at related positions, observe their motion uncertainty range.

3.2.1 Experiment Setting

Here are the experiment settings:

• CIF, 720p and 2560x1600 sequences with first 200 frames are tested.

• Only one previous original frame are referenced for MC, in order to avoid other encoding effects such as quantization noise.

• Fixed MB size of 16x16, 32x32 and 64x64 with search range 128 are applied for MC.

• Sum of square difference(SSD) minimization are used as MC criterion.

• MBs with SSD magnitude between only first 5%~60% are picked out as analy-sis candidates, since MBs with too large and too small SSD will confuse the experiment result.

• For simplicity, 4 distinct pixel position groups are analyzed. As shown in Figure 3.1.

The reason that the first 5% MBs are excluded is that MBs with large SSD are likely to be occlusion. In this case usually spatial prediction is preferred. In additon, on average there are 50% of MBs to be encoded as skip mode for each sequence under H.264/AVC framework. In skip mode the MV is generated by neightboring MBs and

(22)

Sec 3.2. Observation of Motion Uncertainty

the residual is set to zero. Also there are some non-skip mode MBs with no residual. In both cases the motion compensated prediction is good enough. So the first 60% SSD magnitude is an acceptable lower threshold. Additionally, the candidates MBs occupies more than 70% of tatal SSE in each sequence.

3.2.2 True Motion Distribution

To find out the true motion vector(TMV) of each pixel group, experiment settings as described above are applied with following detailed settings:

• MC criterion is changed to be weighted SSD.

• ME is modified to MB size of 9x9 with search range 17. • TMV search starts from the block motion.

The weighted SSD is realized by applying an Gaussian weighting kernel. The reason of using weighted SSD is that the pixel TMV is desired instead of block TMV. In the same way, the smaller MB size as well as search range are also optimized for finding TMV.

3.2.3 OMBC Motion Distribution

OBMC mentioned in Ch. 2.2.3 utilizes neighboring MVs as current MB MVs to gen-erate the predictor. Somehow OBMC compensates motion uncertainty problem. To visualize its behavior, the true motion distributions indicating by OBMC are also ac-quired.

For achieving a fair comparison between OBMC and true motion search in Chapter 3.2.2, the true motion probability measures are modified. Each MB will utilizes 3 neighboring block MVs as well as the MV itself, as shown in figure 2.4.

For each pixel position, specific weighting wi is computed and normalized by

wi = r−2 i 4 i=1 (r−2 i )

Where ri is denoted for the distance between the target pixel and the center of

related neighboring block. For exmaple, at corner pixels the distances to the 4 intereted MVs are about to be the same, the weightings are computed as (0.25, 0.25, 0.25, 0.25).

(23)

The weightings are nomalized to be summed as 1 because OBMC references 4 MVs as its TMV, while the true motion search in Chapter 3.2.1 find only one MV to be TMV for one pixel.

3.2.4 Experiment Results of True Motion Distribution

Following the above experiment settings, first the true motion distribution is tested for different pixel positions.

Figure 3.2 shows the true motion distribution for different sequences. Immediate obsevations are:

1. True motion spreading only depends on sequence.

2. The probability of block motion indicating TMV is always showed as peak. 3. The peak probability is hardly exceeds 0.5.

Figure 3.3 shows the true motion distribution for different pixel positions. The obsevations are:

1. Most of the TMVs appear mostly within 1 to 2 pixels away from block motion 2. TMV spreading becomes wider while target pixel moving from corner to block

center.

3. For OBMC, the peak value is much higher compared to TMV results. In the same time the spreading for OBMC is narrower, too.

4. The shape of spreading is alike for TMV as well as OBMC. The peak value also get larger as the target pixel moving from corner to center.

5. OBMC is sensitive to pixel positions where TMV are not.

6. From TMV cruves, concentration speed disparity is not obvious for different positions.

Next the true motion distribution is tested for different MB sizes.

Figure 3.4 shows the true motion distribution for different sequences for different MB sizes. The obsevations are:

1. True motion spreading shape and characteristics of peak value are the same as above.

2. Most of the TMVs still appear within 1 to 2 pixels away from block motion 3. TMV spreading becomes wider when MB size increases.

(24)

Sec 3.2. Observation of Motion Uncertainty 0.0 0.1 0.2 0.3 0.4 0.5 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P ro b . Y D ata X Data mobile_tmv16_corner 0.0 0.2 0.4 0.6 0.8 1.0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P o rb . Y D ata X Data mobile_obmc16_corner 0.0 0.1 0.2 0.3 0.4 0.5 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P ro b . Y D ata X Data parkrun_tmv16_corner 0.0 0.2 0.4 0.6 0.8 1.0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P o rb . Y D ata X Data parkrun_obmc16_corner 0.0 0.1 0.2 0.3 0.4 0.5 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P ro b . Y D ata X Data traffic2560_tmv16_corner 0.0 0.2 0.4 0.6 0.8 1.0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 P o rb . Y D ata X Data traffic2560_obmc16_corner

Figure 3.2: Example of how true motion indicated by TMV search and OBMC distributes for different sequences.

(25)

football_CIF_16x16 pixel position

Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c df 0.0 0.2 0.4 0.6 0.8 1.0 TMV_corner TMV_near-corner TMV_near-center TMV_center OBMC_corner OBMC_near-corner OBMC_near-center OBMC_center

Mobile_CIF_16x16 pixel position

Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c d f 0.0 0.2 0.4 0.6 0.8 1.0 TMV_corner TMV_near-corner TMV_near-center TMV_center OBMC_corner OBMC_near-corner OBMC_near-center OBMC_center

crowdrun_720p_16x16 pixel position

parkrun_720p_16x16 pixel position

crowdrun_2560_16x16 pixel position

traffic_2560_16x16 pixel position

Figure 3.3: True motion distributions for different pixel positions indicated by TMV search and OBMC. The axis marked "Range" is the distance started from block MV. The axis marked "cdf" is the cumulative distribution function for the probability of true motion occurance.

(26)

Sec 3.2. Observation of Motion Uncertainty football_CIF_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c df 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner Mobile_CIF_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c d f 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner crowdrun_720p_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c d f 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner parkrun_720p_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c d f 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner crowdrun_2560_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c d f 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner traffic_2560_MB Size Range 0 ± 1 ± 2 ± 3 ± 4 ± 5 ± 6 ± 7 ± 8 c df 0.0 0.2 0.4 0.6 0.8 1.0 TMV_16x16corner TMV_32x32corner TMV_64x64corner OBMC_16x16corner OBMC_32x32corner OBMC_64x64corner

Figure 3.4: True motion distributions for different MB sizes indicated by TMV search and OBMC. The axis marked "Range" is the distance started from block MV. The axis marked "cdf" is the cumulative distribution function for the probability of true motion occurance.

(27)

X

(a) (b) (c)

Figure 3.5: Weiner filter with: (a) 9 taps, (b) 25 taps and (c) 49 taps. The shaded blocks indicate filter supports and the block marked "X" indicates the pixel pointed by block MV.

3.3 First Experiment

After observing the true motion distribution, the following experiment is executed to investigate the influence of motion uncertainty spreading range on prediction efficiency. For each position in candidate MBs, three Weiner filters as in Figure 3.5 are generated for each sequence. The difference between those filters are only filter tap length.

Figure 3.6 shows the experiment results. By applying the filter with 9 taps, an obvious sum of square error(SSE) decline is achieved. However applying filters with 25 taps and 49 taps acquires merely advanced SSE decline than 9 taps filter. In reality there are up to 2% and 1% of additionally SSE decline for 25 taps and 49 taps filters. This experiment shows that increasing filter tap length over a certain magnitude cannot contribute desirable prediction efficiency. Moreover the simular filter supports will encounter averaging effect when the training sets for filter are not carefully classified.

3.4 Motion Uncertianty Characteristics

From the experiments above, the motion uncertainty phenomenon exists and is ob-served. Its characteristics are summerized as following.

1. Most-likely TMV for a pixel is its block MV.

2. Motion uncertainty could happen to every pixel in a MB.

3. The distribution of motion uncertainty occurrence depends on sequence. 4. Motion uncertainty occurs mostly within 1 or 2 pixels away from block MV.

(28)

Sec 3.4. Motion Uncertianty Characteristics

football_MSE16

Pixel Position

corner near-corner near-center center

S S E 2e+7 3e+7 3e+7 4e+7 4e+7 5e+7 5e+7 6e+7 6e+7 Candidates 9taps 25taps 49taps mobile_MSE16 Pixel Position

S S E 4.4e+7 4.6e+7 4.8e+7 5.0e+7 5.2e+7 5.4e+7 5.6e+7 Candidates 9taps 25taps 49taps crowdrun_720p_MSE16 Pixel Position

S S E 2.6e+8 2.8e+8 3.0e+8 3.2e+8 3.4e+8 3.6e+8 3.8e+8 4.0e+8 4.2e+8 4.4e+8 4.6e+8 Candidates 9taps 25taps 49taps parkrun_720p_MSE16 Pixel Position

S S E 4.2e+8 4.4e+8 4.6e+8 4.8e+8 5.0e+8 5.2e+8 5.4e+8 Candidates 9taps 25taps 49taps

Figure 3.6: SSE for sequences at MB size 16x16.

5. Motion uncertainty becomes smaller for pixels near block center compared to distant pixels.

6. Motion uncertainty becomes stronger when MB size increases.

7. The true motion distribution indicated by OMBC also fits the motion uncertainty distribution, however the magnitude is not exact.

(29)

-16-CHAPTER 4

Design of Spatially-Variant Weiner

Prediction Filter

In Chapter 3, the phenomenon of motion uncertainty is investgated and its character-istics are analyzed. Based on these characercharacter-istics, the author makes some assumptions and tries to develop a set of filters in order to decrease the prediction error. And finally the design principle of spatially-varaint weiner prediction filter is introduced.

4.1 Proposed Filters

Weiner filters with the same shape but different length are tested in Chapter 3.3. However, the reduction of SSE is gentle while filter tap increases. One reason is that motion uncertainty mostly happens within 1 or 2 pixels away. Moreover, there is a blind spot that Weiner filter is acquired by training. Therefore the seletion of training set is important.

In order to choose the proper training set to avoid averaging problem during com-putation, a set of filters are designed with different filter supports. Total of 14 Weiner filters are shown in Figure 4.1.

(30)

Sec 4.1. Proposed Filters X X X X X X X X X X X X X X

Figure 4.1: Proposed 14 Weiner filter supports.

(31)

-18-Chapter 4. Design of Spatially-Variant Weiner Prediction Filter

Each filter is trained seperately for 4 pixel positions as in Chapter 3.2.1. Each pixel can freely choose one among 14 filters as well as original motion compensated pixel to obtain mininal prediction error. Although the overhead of signaling is huge considering implementation in real application, this experiment is worth investing the efficeincy of proposed filter design.

4.2 Performance Comparison

Several well-known prediction filter design is compared to the proposed filter in this section. The experiment results for different MB sizes are shown in Figure 4.2, 4.3 and 4.4.

The line marked "Candidates" is the SSE of candidate MBs at each pixel position under the selectin criterion in Chapter 3.2.1. "AVC" indicates the H.264/AVC stan-dard[] interpolation filter with quarter-pel precision(totally 16 refinement selections). "AVC(position free)" indicates H.264/AVC interpolation filter with quarter-pel preci-sion, but each pixel position in a MB can freely choose the quarter-pel interpolation minimizing prediction error. In "AVC(all free)" there is no limitation of sub-pel selec-tion, every pixel can freely choose the best quarter-pel location in totally 16 selections. The standard H.264/AVC interpolation filter offers great decreasing of SSE. More SSE decreasing can be obtain by loosen the quarter-pel selection limits.

The line marked "OBMC" indicates the SSE generates by OBMC algorithm in Chapter 3.2.3. "OBMC Adaptive" indicates the SSE generated by the same algorithm, but each MB can choose the one between OBMC and original motion compensation predictor.

OBMC does a good job at decreasing the SSE of corner pixels. Unfortunately the OBMC algorithm implemented in this experiment does not acquire good performance since only integer-pel precision MVs are utilized. OBMC is expected to perfrom better if combined with sub-pel precision MVs.

The proposed filter design, the line marked "proposed," alway performs best for every sequence at every MB size. Although the overhead is high since the filter choice need to be signalled for each pixel, the proposed filter still outperforms H.264/AVC interpoation filter with the same overhead(the line marked "AVC(all free)").

(32)

Sec 4.2. Performance Comparison

football_MSE16

Pixel Position

S S E 0 1e+7 2e+7 3e+7 4e+7 5e+7 6e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{mobile_MSE16} Pixel Position

S S E 0 1e+7 2e+7 3e+7 4e+7 5e+7 6e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun_MSE16 Pixel Position

S S E 0 1e+8 2e+8 3e+8 4e+8 5e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{parkrun_MSE16} Pixel Position

S S E 1e+8 2e+8 3e+8 4e+8 5e+8 6e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun2560_MSE16 Pixel Position

S S E 0 2e+8 4e+8 6e+8 8e+8 1e+9 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{traffic2560_MSE16} Pixel Position

S S E 2.0e+7 4.0e+7 6.0e+7 8.0e+7 1.0e+8 1.2e+8 1.4e+8 1.6e+8 1.8e+8 2.0e+8 2.2e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed

Figure 4.2: SSE Comparison for MB size 16x16.

(33)

football_MSE32

Pixel Position

S S E 4.0e+6 6.0e+6 8.0e+6 1.0e+7 1.2e+7 1.4e+7 1.6e+7 1.8e+7 2.0e+7 2.2e+7 2.4e+7 2.6e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{mobile_MSE32} Pixel Position

S S E 0.0 2.0e+6 4.0e+6 6.0e+6 8.0e+6 1.0e+7 1.2e+7 1.4e+7 1.6e+7 1.8e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun_MSE32 Pixel Position

S S E 0.0 2.0e+7 4.0e+7 6.0e+7 8.0e+7 1.0e+8 1.2e+8 1.4e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{parkrun_MSE32} Pixel Position

S S E 2.0e+7 4.0e+7 6.0e+7 8.0e+7 1.0e+8 1.2e+8 1.4e+8 1.6e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun2560_MSE32 Pixel Position

S S E 5.0e+7 1.0e+8 1.5e+8 2.0e+8 2.5e+8 3.0e+8 3.5e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{traffic2560_MSE32} Pixel Position

S S E 1e+7 2e+7 3e+7 4e+7 5e+7 6e+7 7e+7 8e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed

(34)

Sec 4.2. Performance Comparison

football_MSE64

Pixel Position

S S E 2e+6 3e+6 4e+6 5e+6 6e+6 7e+6 8e+6 9e+6 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{mobile_MSE64} Pixel Position

S S E 0 1e+6 2e+6 3e+6 4e+6 5e+6 6e+6 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun_MSE64 Pixel Position

S S E 5e+6 1e+7 2e+7 2e+7 3e+7 3e+7 4e+7 4e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{parkrun_MSE64} Pixel Position

S S E 0 1e+7 2e+7 3e+7 4e+7 5e+7 6e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed crowdrun2560_MSE64 Pixel Position

S S E 2.0e+7 4.0e+7 6.0e+7 8.0e+7 1.0e+8 1.2e+8 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed _{traffic2560_MSE64} Pixel Position

S S E 0.0 5.0e+6 1.0e+7 1.5e+7 2.0e+7 2.5e+7 3.0e+7 3.5e+7 Candidates AVC(fix) AVC(all free) AVC (position free) OBMC OBMC Adaptive Proposed

Figure 4.4: SSE Comparison for MB size 64x64.

(35)

Besides the performance comparison, the trend of SSE also needs to be addressed. The SSE shows a slope in a decreasing manner form corner to center pixels in candi-dates blocks. OBMC reduce the SSE most at corner pixels, producing a flatter lline. H.264/AVC filters with different constrains offer significant SSE reduction but keeps the slope. The proposed filter still keeps the slope but the slope becomes gentle in some sequences such as foreman, crowdrun and parkjoy. It can be explained that motion uncertainty within one pixel is also an aliasing problem, thus the H.264/AVC interpo-lation compensates this problem. However the proposed filter design deals with motion uncertainty for more than 1 pixel, thus provides better performance.

Finally the slope of SSE indicates one more fact: as there is no perfect motion compensation, there maybe no perfect compensation for motion uncertainty can be achieved. However by designing proper filter, the effect of motion uncertainty is ex-pected to be reduced.

4.3 Design Pinciple of Spatially-Variant Prediction

Filter

In last section the proposed filter shows outstanding performance beyond those well-known filter design. After summarizing the experiment results in Chapter 3 and last section, the design principles of spatially-variant prediction filter in order to improve prediction efficiency are suggested.

• Filters for different pixels should be unequally related to their distance to block center.

• Filter supports with pixels within 3 pixels from its block MV are sufficient. • Training set selection is important for Weiner filter design.

• Context information can be introduced to reduce the amount of signaling over-head.

(36)

CHAPTER 5 Conclusions

In this work, we attempt to analyze the motion uncertainty problem resulting from block-based motion compensation. We first design a number of experiments to an-alyze the distribution of motion uncertainty and its characteristics. Based on the experimental results, some important observations were made to help to develop a spatially-variant temporal prediction filter. Preliminary experiments show that such an approach can more effectively overcome the problem of motion uncertainty in tem-poral prediction, as compared with the existing filter designs. Again we summarize our observation of motion uncertainty here:

1. Most-likely TMV for a pixel is its block MV.

2. Motion uncertainty could happen to every pixel in a MB.

3. The distribution of motion uncertainty occurrence depends on sequence. 4. Motion uncertainty occurs mostly within 1 or 2 pixels away from block MV. 5. Motion uncertainty becomes smaller for pixels near block center compared to

distant pixels.

6. Motion uncertainty becomes stronger when MB size increases.

7. The true motion distribution indicated by OMBC also fits the motion uncertainty distribution, however the magnitude is not exact.

(37)

-24-Chapter 5. Conclusions

Our work is still in its early stage. We plan to extend our investigation in sev-eral directions: (1) tradeoff between performance of spatially-variant prediction filter and signaling overhead, (2) implementation of spatially-variant prediction filter into H.264/AVC framework, and (3) The relationship between motion uncertainty and con-text information such as neighboring MVs.

(38)

Bibliography

[1] “Recommendation ITU-T H.263: Video Coding for Low Bit Rate Communica-tion,” International Telecommunication Union, 1998.

[2] “Draft ITU-T Recommendation and Final Draft International Standard of Joint

Video Specification (ITU-T rec. H.264 ˛aXISO/IEC 14496-10 AVC),” Joint Video

Team (JVT) of ISO/IEC MPEG, ITU-T VCEG (ISO/IEC JTC1/SC29/WGII, and ITU-T SG16 Q.6), March 2003.

[3] P. Chen, Y. Ye, and M. Karczewicz, “Video Coding Using Extended Block Sizes,” ITU-T Q.6/SG16 VCEG, COM16-C123, January 2009.

[4] Y. Chen, T. Wang, Y. Tseng, W. Peng, and S. Lee, “A Parametric Window Design for OBMC with Variable Block Size Motion Estimates,” 2009.

[5] M. Orchard and G. Sullivan, “Overlapped block motion compensation: an estimation-theoretic approach,” Image Processing, IEEE Transactions, vol. 3, pp. 693 — 699, September 1994.

[6] Y. Vatis and J. Ostermann, “Adaptive Interpolation Filter for H.264/AVC,” Cir-cuits and Systems for Video Technology, vol. 19, pp. 179 — 192, February 2009.

(39)

-26-BIBLIOGRAPHY

[7] T. Wedi and H. Musmann, “Motion- and Aliasing-Compensated Prediction for Hybrid Video Coding,” Circuits and Systems for Video Technology, IEEE Trans-actions, vol. 13, pp. 577 — 586, July 2003.

[8] O. Werner, “Drift Analysis and Drift Reduction for Multiresolution Hybrid Video Coding,” Signal processing: Image Commun., vol. 8, July 1996.

[9] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” Circuits and Systems for Video Technology, IEEE Transactions, vol. 13, pp. 560 — 576, July 2003.

[10] Y. Ye and M. Karczewicz, “Enhanced Adaptive Interpolation Filter,” ITU-T SG16/Q.6 Doc. T05-SG16-C-0464, April 2008.

基於空間可變之動作補償預測的動向不確定性之分析

國

立

交

通

大

學

資訊科學與工程研究所

碩

士

論

文

基於空間可變之動作補償預測的動向不確

定性之分析

Motion Uncertainty Analysis for

Spatially-Variant Temporal Prediction

研 究 生：詹家欣

指導教授：彭文孝 助理教授

基於空間可變之動作補償預測的動向不確定性之分析

Motion Uncertainty Analysis for

Spatially-Variant Temporal Prediction

研 究 生：詹家欣 Student：Chia-Hsin Chan

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

國 立 交 通 大 學

多 媒 體 工 程 研 究 所

碩 士 論 文

基於空間可變之動作補償預測的動向不確定性之分析

研 究 生：詹家欣 指導教授：彭文孝

國立交通大學多媒體工程研究所 碩士班

摘

要

Motion Uncertainty Analysis for Spatially-Variant

Temporal Prediction

Student : Chia-Hsin Chan Advisor : Wen-Hsiao Peng

Institute of Multimedia Engineering

National Chiao Tung University

ABSTRACT

誌

謝

Motion Uncertainty Analysis for Spatially-Variant

Temporal Prediction

Advisors: Prof. Wen-Hsiao Peng

Student: Chia-Hsin Chan

Institute of Multimedia Engineering

National Chiao-Tung Univeristy

1001 Ta-Hsueh Rd., 30010 HsinChu, Taiwan

September 2009

Contents

List of Figures

CHAPTER 1

Research Overview

1.1

Introduction

1.2

Problem Statement

1.3

Contributions and Organization

-2-CHAPTER 2

Background

2.1

H.264/AVC Interpolation Filter

2.2

Related Works

2.2.1

Motion- and Aliasing-Compensated Prediction

2.2.2

AIF

2.2.3

OBMC

2.2.4

Extended Macro Block Sizes

2.3

Comparison of Well-Known Prediction Filter

Designs

CHAPTER 3

Analysis of Motion Uncertainty

3.1

Motion Uncertainty

3.2

Observation of Motion Uncertainty

研究生：詹家欣

指導教授：彭文孝助理教授

研究生：詹家欣 Student：Chia-Hsin Chan

國立交通大學

多媒體工程研究所

碩士論文

研究生：詹家欣指導教授：彭文孝

國立交通大學多媒體工程研究所碩士班