高效能視訊壓縮之先進動態補償預估方法

(1)

國

立

交

通

大

學

資訊科學與工程研究所

博

士

論

文

高效能視訊壓縮之先進動態補償預估方法

Advanced Motion-Compensated Prediction (MCP) for High-Efficiency

Video Coding

研究生：陳渏紋

指導教授：彭文孝教授

李素瑛教授

(2)

高效能視訊壓縮之先進動態補償預估方法

Advanced Motion-Compensated Prediction (MCP) for High-Efficiency

Video Coding

研究生：陳渏紋 Student：Yi-Wen Chen

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

李素瑛 Suh-Yin Lee

國立交通大學

資訊科學與工程研究所

博士論文

A Dissertation

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Computer Science October 2011

Hsinchu, Taiwan, Republic of China

(3)

誌謝

首先，最感謝的是指導教授彭文孝老師與李素瑛老師。博士班期間，彭老師與李老師在研究上的諄諄善誘與耐心教誨，才讓我得以成就此篇博士論文。李老師在生活態度、待人處世，以及各方面應對進退上給我的教導，都讓我終身受益無窮。感謝杭學鳴教授在計畫書口試、校內口試以及校外口試時提供寶貴的建議與鼓勵。感謝所有口試委員：蔡文錦教授、陳宏銘教授、張寶基教授、陳美娟教授與郭天穎教授在口試過程中不吝提供多年的珍貴研究經驗，充實了本論文的深度與廣度，使本論文更趨完善。諸位口試委員都是我在學術研究上最佳學習典範。最後要謝謝家人給予我的關懷與支持讓我在博士求學過程中無後顧之憂。一路走來，家人總是給予我溫暖的關懷讓我能夠勇往直前的動力。有了他們的辛苦與支持，才有今日的我。感恩家人與其它親友對我的祝福與勉勵。要感謝的人很多，在此向所有曾經幫助關心過我的人，致上最真切的謝意。僅以此論文，獻給關心與幫助過我的大家。

(4)

高效能視訊壓縮之先進動態補償預估方法

研究生: 陳渏紋

指導教授：彭文孝教授

李素瑛教授

國立交通大學資訊工程學系

摘要

動態補償預估方法(Motion-Compensated Prediction，MCP)能移除視訊訊號在時間軸上的重複性，因此是許多視訊壓縮標準中常見的壓縮技術。雖然動態補償預估方法已經被提出並且研究超過 20 年，本論文仍將從理論、應用與實作等不同面向來重新探討動態補償預估方。首先，我們以新的觀點重新解讀動作補償預估機制的運作，我們將動態補償預估方法視為兩個步驟；第一個步驟為運動向量取樣，第二個步驟則為利用取樣所得之運動向量作像素預估值的估算。我們同時提出理論的分析來支持我們提出之新觀點並用以驗證現存常見之不同動態補償預估方法例如區塊動態補償(Block Motion Compensation， BMC)、SKIP 預估方法與樣板比對預估方法(Template Matching Prediction)等等。實驗結果也證明提出之架構能準確分析各種不同的動態補償預估方法。

承續上述觀點，我們提出了參數化交疊區塊動作補償(Parametric Overlapped Block Motion Compensation，POBMC)的技術來加強 MCP 的效率。傳統的區塊動作補償(OBMC) 是用來解決區塊動作補償(BMC)所具有之動作不確定性(Motion Uncertainty)的問題，藉由考慮鄰近區塊動作估測(Block Motion Estimation，BME)的結果，來做亮度值的估測。 OBMC 已被證實能夠提供較 BMC 為佳的編碼效率。然而在 H.264/AVC 採用了可變區塊大小動作補償(Variable Block Size Motion Compensation，VBSMC)的技術下，OBMC

(5)

與 VSBMC 的結合使得 OBMC 使用的權重計算與儲存變成了一大挑戰。我們透過亮度與動作自相關係數的理論模型，以及將 BME 產生的運動向量近似為區塊中心點動作向量的假設，提出了 POBMC 技術。此技術根據每個像素點各自所有的鄰近動作向量以及此像素點到各動作向量對應的取樣點(區塊中心點)距離，來分配最佳的權重以達到最佳的 MCP 效能。最後，我們利用提出之參數化交疊區塊動作補償架構來結合樣版比對預估以及方塊動量補償預估。由於樣板比對所產生的運動向量是不需耗費位元傳送，因此所以提出之雙向預估模式只需要傳送一個方塊運動向量即可達到利用兩個運動向量作雙向預估之效果。延續所提出的運動向量取樣架構，當樣板比對所找出的運動向量被近似為樣板重心點的運動向量後，此結合預估可以藉由找出最佳方塊運動向量的取樣點來達到最佳的結合預估效率。此外由於樣板比對預估有運算複雜度的問題，所提出的特殊雙向預估架構更可彈性地利用任何解碼端可推導出之運動向量來取代樣版比對運動向量以達到降低複雜度的目的。實驗結果最終也證明所提出之雙向預估模式可以有效增進現行視訊壓 縮效能。 在本論文中，我們首先將 MCP 的結構視為運動向量取樣及預估亮度場(Intensity Field)重建兩個部份。從此觀點出發，我們接著提出參數化交疊區塊動作補償的技術來加強 MCP 的效率。藉由提出的參數化交疊區塊動作補償架構，我們更進一步發展出一套特殊的雙向預估方法(Bi-Prediction)結合樣板比對(Template Matching)之運動向量與傳統之方塊運動向量來增加預估的效率。我們相信，延續本論文所提之 MCP 分析架構將有利於未來更多動態向量預估方法相關技術的改進以增進視訊壓縮之效率。

(6)

Advanced Motion-Compensated Prediction (MCP) for

High-Efficiency Video Coding

Student: Yi-Wen Chen

Advisor: Prof. Wen-Hsiao Peng

Prof. Suh-Yin Lee

Department of Computer Science,

National Chiao Tung University

Abstract

Motion-Compensated Prediction (MCP) has been the most popular approach, in the block-based hybrid video coding framework, for removing temporal redundancy. This dissertation attempts to reexamine its design from a theoretical perspective, with an aim to expose undisclosed details that crucially determine its performance and to seek further improvements.

Firstly, we introduce an analytical interpretation of MCP by viewing its process as consisting of motion sampling followed by the reconstruction of a temporal predictor. In this context, block-based motion estimation acts as a motion sampler taking samples at block centers while block-based motion compensation (BMC) interpolates between motion samples using the nearest-neighbor rule to reconstruct the motion field. Such an interpretation clearly reveals the essence of various MCP schemes. We have shown that the distinction between BMC, SKIP prediction and template matching prediction (TMP) lies in the choice of motion sampling structure, and likewise, that the celebrated control grid interpolation (CGI) and overlapped block motion compensation (OBMC) outperforms BMC, because of using more sophisticated motion interpolation algorithms.

(7)

OBMC with variable block-size motion partitioning, which was done heuristically in the H.263 standard. We cast this problem as forming a linear estimate of a pixel's intensity from motion samples taken on an irregular grid. To circumvent the difficulties arising from the least-squares solution, we express the optimal OBMC weights in closed form based on parametric signal assumptions. The computation of this parametric OBMC (POBMC) solution requires only the geometric relations between the prediction pixel and its nearby block centers, offering a generic framework capable of reconstructing a temporal predictor from any irregularly sampled motion vectors.

The last part of this dissertation proposes a novel bi-prediction scheme combining BMC and TMP, the design of which is another highlight of the motion sampling and reconstruction concept. This scheme attains bi-prediction performance with only one set of motion parameters. Specifically, we transform the problem of finding an optimized block motion vector based on the contribution from the template motion vector into that of searching for its optimal sampling location. The result is a particular type of geometry motion partitioning. This notion is further extended to enable a low-complexity, template-matching-free implementation.

The techniques above have been evaluated in several core experiments of the JCV-VC committee, showing very promising results. This demonstrates that when looking deeper into the underlying principles, it is possible to make further improvements to existing designs or bring completely new ideas. We believe the other components of the hybrid-based video coding framework can also be improved with the same philosophy.

(8)

List of Tables

2.1 Comparison of Mean-Square Prediction Error . . . 23

3.1 Encoder Configurations . . . 49

3.2 BD-rate Saving Comparison of Various OBMC Schemes . . . 53

3.3 Comparison of In-Loop Prediction Filters . . . 55

3.4 BD-rate Saving Comparison of Various In-loop Filters . . . 55

3.5 BD-rate Saving Comparison of Various Combinations of In-loop Filters . . . 56

3.6 Runtime Comparison of Various In-Loop Filters . . . 56

3.7 BD-rate and Runtime Comparisons of POBMCH-II+MD and Its Simplified Version . . . 58

(11)

List of Figures

1.1 The relationship between the proposed works and the signal models. 2 1.2 The operatioins of block motion compensation (BMC). . . 3 1.3 The operatioins of variable block size motion compensation

(VB-SMC). . . 4 1.4 The operatioins of template matching prediction (TMP). . . 4 2.1 Various MCP schemes in the 1-D case: (a) MCP based on the true

motion field, (b) BMC, (c) CGI and (d) OBMC. . . 12 2.2 Mean-square prediction error surface of TMP using the sequence

”Football”. . . 17 2.3 Template Matching Prediction. . . 18 2.4 Mean-square prediction error surfaces of block B produced with

BMC by (a) empirical results, (b) Tao and (c) Zheng’s model, re-spectively. The sequence is Football and the block size used for motion compensation is 16x16. . . 19 2.5 Mean-square prediction error surfaces of block B produced with

TMP by (a) empirical results, (b) Tao and (c) Zheng’s model, re-spectively. The sequence is Football and the block size used for motion compensation is 16x16. . . 20 2.6 Mean-square prediction error surfaces of block B produced with

SKIP by (a) empirical results, (b) Tao and (c) Zheng’s model, re-spectively. The sequence is Football and the block size used for motion compensation is 16x16. . . 21 3.1 The distribution of a block MV’s location when the block size used

for motion search is varied: (a) 16x16 and (b) 32x32. The MV location is approximated by the centroid position of the first ten pixels, in a block, having relatively smaller prediction error. . . 34 3.2 The effect of the clipping threshold value on the shape of the

pro-posed parametric windows with (a)(c)(e) non-diagonal and (b)(d)(f) diagonal D matrices. From top to bottom, the clipping threshold values are 10, 15, 35, respectively. . . 38 3.3 Window functions along the slide of Y=16 based on a (a)

non-diagonal or (b) non-diagonal D matrix. . . 39 3.4 Comparisons of window functions and their MSE surfaces using

testing sequence ”S04”: (a) parametric windows versus optimal least-squares windows; the MSE surfaces of the proposed paramet-ric solution with a (b) non-diagonal or (c) diagonal D matrix. . . 41

(12)

3.5 Comparisons of window functions and their MSE surfaces using testing sequence ”S03”: (a) parametric windows versus optimal least-squares windows; the MSE surfaces of the proposed

paramet-ric solution with a (b) non-diagonal or (c) diagonal D matrix. . . 42

3.6 An irregular motion sampling grid due to the use of variable block-size motion partitioning. . . 44

3.7 Window functions overlaid on the irregular motion sampling grid shown in Fig. 3.6: the proposed parametric windows with (a) a non-diagonal D matrix (Clip=17, δ=121), (b) a diagonal D matrix (Clip=19, δ=25). . . 45

3.8 Window functions overlaid on the irregular motion sampling grid shown in Fig. 3.6: the proposed parametric windows with (a) a diagonal D matrix plus a MB-adaptive adjustment of δ (Clip=19, δ=16 for 8x8 MVs and δ=36 otherwise), and (b) the H.263 windows. 46 3.9 Mode distribution comparison of POBMCH-IIand POBMCH-II+MD using testing sequence ”BQSquare”. . . 52

4.1 Mean-square prediction error surface of TMP using the sequence ”Football”. . . 63

4.2 Joint application of TMP and BMC. . . 65

4.3 SMSE surface as a function of Sb’s location, and the optimal window functions associated with vt and vb, respectively. . . 67

4.4 Window functions for typical template designs. . . 69

4.5 Adaptive motion merging and the approximation of vt. . . 70

4.6 Window functions for different merge candidates. . . 71

4.7 Partitioning of a 2Nx2N PU due to the application of the proposed window functions. In region A1, A2, B and C, wm,n(i, j)=0.75, 0.875, 0.5 and 0.125,respectively. . . 73

4.8 Simplified window functions set 1 for different merge candidates. . 74

(13)

Chapter 1 Introduction

1.1 Overview of Dissertation

The advances in video production technology and the consumer demand have led to the ever-increasing demands for further video coding standard towards higher resolution (4Kx2K resolution and above) and particularlily better video quality. After the success of existing H.264/AVC video coding standard, ITU-T Video Coding Experts Group (VCEG) targeted a new generation of video com-pression technology that has substantially higher comcom-pression capability than the H.264/AVC standard. Thus, ITU VCEG and MPEG worked together again and formed the so-called Joint Collaborative Team on Video Coding (JCT-VC) in January 2010. A joint Call-for-Proposal (CfP) for High Efficiency Video Coding (HEVC) was issued[2] to collect promsing coding tools as a good starting point to develop next-generation video coding design.

In most of the modern video coding standards such as MPEG1, MPEG2, MPEG4, H.263, H.264 and HEVC, a hybrid block-based motion compensated DCT-like transform coding architecture is still utilized. Motion-compensated pre-diction (MCP) is the key to the success of the modern video coding standards, as it removes the temporal redundancy in video signals and reduces the size of bit-streams significantly. Although MCP has been studied for over twenty years, we

(14)

Figure 1.1: The relationship between the proposed works and the signal models.

believe a deeper understanding of the principles behind the designs would bring a fundamental breakthrough in improving coding efficiency. In this dissertation, we therefore focus on improving MCP effeciency to provide better coding performance within a reasonable computation complexity overhead.

As shown in Fig. 1.1, to gain more insights into MCP, we first view MCP as a two stage process; it takes motion samples at block center and then generats predictor by the sampled motion vectors. Under this point of view and based on the presumed signal models for intensity and motion vectors, we then propose a parametric window design to tackle the problem of adapting overlapped block motion compensation (OBMC) windows for use with VBSMC. Lastly, we also demonstrate how template and block motion estimates can jointly be applied in a parametric overlapped block motion compensation (OBMC) framework to form an efficient bi-predictioin scheme to further improve temporal prediction. The

(15)

Figure 1.2: The operatioins of block motion compensation (BMC).

following summarizes our major contributions for developing MCP coding tools and some complexity reduced approaches.

1.2 Motion-Compensated Prediction (MCP): An

Analytical Perspective

An insightful perspective on MCP is to view its process as consisting of sparse motion sampling followed by the reconstruction of temporal predictors. In this context, as illustrated in Fig. 1.1, block-based motion estimation acts as a mo-tion sampler taking samples at block centers while BMC interpolates, using the nearest-neighbor rule, between motion samples to construct the motion field. This interpretation facilitates a better understanding of various MCP schemes from a unied framework. For example, if we take such a view, VBSMC is merely an enhancement of BMC in motion sampling as shown in Fig. 1.1.

The models are then applied to the analysis of prediction efficiency of various MCP schemes. To justify our theoretical analysis, we also show that template

(16)

Figure 1.3: The operatioins of variable block size motion compensation (VBSMC).

(17)

matching prediction (TMP), which estimates motion for a current block by using its surrending pixels (cf. Fig. 1.1), consistently outperforms SKIP prediction, but hardly competes with block motion compensation (BMC) unless both the motion and intensity fields are less random or have high spatial correlation.

To facilitate the analysis of various MVP schemes, we adopt the signal models, which assumes that the autocorrelation function of the intensity and motion fields follows some quadritic and exponential forms. Given these assumptions, we then examine the prediction error for BMC, CGI, OBMC, TMP and Skip predictioin. It is interesting that the mean-square error (MSE) of OBMC exhibits the same form as that of CGI, suggesting that OBMC and CGI have identical prediction efficiency and they outperform the other MCP scehemes in terms of prediction efficiency. Nevertheless, OBMC is generally preferable to CGI. The reasons are twofold. First, the true motion for every pixel is not easily accessible, which makes it difficult to estimate the weighting coefficients for CGI. Second, OBMC can not only alleviate motion uncertainty, but it also serves to attenuate quantization noises in reference pictures. These arguments also explain why OBMC normally outperforms CGI in practice.

With the above observations, we focus on improving OBMC. However, window design for OBMC becomes difficul when variable block size motion compensation (VBSMC) are incorporated. In an effort to adapt OBMC for use with VBSMC, we approach the problem using parametric solutions as detailed in the next sub-section.

1.3 Parametric OBMC

This work adapts overlapped block motion compensation (OBMC) to suit variable block-size motion partitioning. The motion vectors (MVs) for various partitions

(18)

are formalized as motion samples taken with an irregular grid. With this view-point, determining OBMC weights to associate with these samples becomes an under-determined problem since a distinct solution has to be sought for each pre-diction pixel.

We tackle this problem by expressing the optimal weights in closed-form based on parametric signal assumptions. The computation of this solution requires only the geometrical relationship between the prediction pixel and its nearby block centers, leading to a generic framework allowing for reconstructing temporal pre-dictors from any irregularly sampled MVs. A modified implementation is also proposed to address MV location uncertainty and to reduce computational com-plexity.

Extensive experiments have been conducted using the KTA software. Experi-mental results demonstrate that our scheme performs better than similar previous works, and provides about 5% BD-Rate savings compared to H.264/AVC anchor. When compared to the recently proposed Quadtree-based Adaptive Loop Fil-ter (QALF) [12]and Enhanced Adaptive InFil-terpolation FilFil-ter (EAIF)[29], POBMC also shows a comparable gain. Furthermore, the combination of it with either filter gives a combined effect that is almost the sum of their separate improvements.

Along with other promising coding tools in KTA2.4, the proposed POBMC [8] was submitted, for subjective viewing tests, in response to the HEVC Call for Proposals issued jointly by MPEG and VCEG in April, 2010 [2]. It was ranked 12 overall and 10 with low delay configurations among 27 proposals, in terms of the average mean opinion score [3]. After the CfP competition in the 1st JCT-VC meeting, TMuC (Testing Model under Consideration) is constructed mainly from the best performer’s codebase and the other top-performing HEVC proposals.

(19)

1.4 Bi-Prediction Combining TMP and BMC

An efficient bi-predictioin scheme combining TMP with BMC using POBMC is proposed. Template matching prediction (TMP), which estimates the motion for a target block by using its surrounding pixels, has been observed to perform efficiently in inter-frame coding. In this work, we expos how template and block motion estimates can jointly be applied in a parametric overlapped block motion compensation (OBMC) framework to further improve temporal prediction. When integrated in HM3.0, the reference software of HEVC, the combined technique is observed to achieve Y-BD-rate savings of 2% BD-rate reductioin. The notion is further extended to allow the template MV to be replaced with one of those MVs for neighboring prediction units, enabling a low-complexity and template-matching-free implementation. Experiments show that this reduced-complexity approach can provide competitive coding gain with lower computation complexity and memory access bandwidth.

Currently, the JCT-VC committeehas finished the HEVC working draft 4 and HEVC test model (HM) [23]. With its promising results and compatibility with existing tool features, the proposed new bi-prediction scheme which combines the implicitly inferred motion and block motion with POBMC is being evaluated in several formal core experiments in JCT-VC meeting[10][16][7][6].

1.5 Organization and Contribution

For more details of each part of the proposed advanced MCP schemes for High-Efficiency Video Coding, the rest of this dissertation is organized as follows:

• Chapter 2 introduces a new viewpoint by viewing MCP as a motion sampler taking motion sampling followed by a reconstruction of prediction signal.[26].

(20)

– We have analyzed, both theoretically and empirically, the prediction efficiency of BMC, CGI, OBMC, TMP and SKIP prediction.

– We have shown that although TMP hardly competes with BMC, it is shown to outperform SKIP prediction, which explains why the bit rate can be significantly reduced when TMP is efficiently combined with SKIP prediction.

– We have shown that OBMC and CGI outperform other MCP schemes. • Chapter 3 details the algorithm of Parametric Overlapped Block Motion

Compensation (POBMC) [9].

– Our scheme requires only the geometry relation to compute the weight vector for OBMC.

– Compared to EAIF and QALF, our scheme shows a comparable gain. Furthermore, the combination of it with either filter gives a combined effect that is almost the sum of their separate improvements.

– Our scheme is a suboptimal yet computationally efficient implementa-tion, which need not solve the Wiener-Hopf equation and thus requires no matrix inverse computation.

– By integrating POBMC into AVC/H.264 reference software KTA, our codec [8], submitted for subjective test in response to the Call for Proposals for Video Compression Technology issued jointly by ITU-T VCEG and ISO/IEC MPEG, ranks 12 overall (and 10 with Low Delay Settings) among 27 proposals.

• Chapter 4 introduces a bi-prediction scheme with only single motion over-head as for unidirectional prediction.

(21)

– It combines motion vectors found by template and block matchings with OBMC

– The concept of adaptive motion merging is incorporated to enable a template-matching-free implementation.

– The proposed bi-prediction scheme is being evaluated using HEVC ref-erence software and provides top one coding efficiency in the core ex-periment 1 of 6th JCTVC meeting.

• Lastly, Chapter 5 summarizes our works and illustrates the research activi-ties in the future.

(22)

Chapter 2 Motion-Compensated Prediction:

An Analytical Perspective

2.1 Introduction

Motion compensated prediction (MCP) is an algorithmic technique employed in the encoding/decoding of video data for removing temporal redundancy. In hybrid video coding schemes such as MPEG and H.264/AVC standards, pictures are predicted from previous or bidirectionally from previous and future pictures by a block-based motion compensation (BMC) scheme. It uses one single motion vector (MV) (two MVs for bipredictioin schemes) as an estimate of the true motion field for a block of pixels, in order to trade off the accuracy of motion representation for less overhead.

An insightful perspective on MCP is to view its process as consisting of sparse motion sampling followed by the reconstruction of temporal predictors. In this context, block-based motion estimation acts as a motion sampler taking samples at block centers while BMC interpolates, using the nearest-neighbor rule, between motion samples to construct the motion field. This interpretation facilitates a better understanding of various MCP schemes from a unified framework. For example, if we take such a view, VBSMC is merely an enhancement of BMC in motion sampling. The various MB partitionings are assimilated to different

(23)

sampling structures, and choosing a specific block partitioning can be thought of as determining a local sampling pattern. By a similar reasoning, the difference between BMC and CGI is easily seen to be a different choice of motion interpolator. Somewhat less intuitive is OBMC, which does not directly reconstruct the motion field. Nevertheless, it was shown in [25] that an optimal OBMC window is also an optimal motion interpolation function, with which CGI can achieve the same mean-square prediction error as OBMC. This result furnishes another view of OBMC from the standpoint of motion interpolation. As an illustration, Fig. 2.1 contrasts graphically these techniques for the 1-D case.

With these ideas in mind, in the following sections, we shall first show that when chosen to minimize the mean-square block matching error, the MV is shown to approximate the true motion of the block center based on the motion and intensity fields of video signals. We then apply the statistical motion distribution model to the analysis of prediction efficiency of various MCP schemes such as Template Matching Prediction (TMP), BMC and SKIP prediction. The analytical results are justified by empirical experiments with typical image sequences.

2.2 Motion and Intensity Models

In this section, we review two statistical models used to characterize the motion and intensity fields of video signals. These models will serve as the basis for ana-lyzing the motion compensation error of motion compensated prediction (MCP) in this dissertation.

To analyze the distribution of motion-compensated residuals, Tao etal. [24] assumes that the autocorrelation function of the intensity and motion fields can be

(24)

(a) (b)

(c) (d)

Figure 2.1: Various MCP schemes in the 1-D case: (a) MCP based on the true motion field, (b) BMC, (c) CGI and (d) OBMC.

(25)

approximated with a quadratic function and an exponential function, respectively: E[Ik(s1)Ik(s2)] = σ2I 1 −||s1−s2||22 K E[vx(s1)vx(s2)] = E[vy(s1)vy(s2)] = σm2ρ ||s1−s2||1 m , (2.1) where Ik(s) represents the intensity value of pixel s = (x(s), y(s))T of frame k;

v(s) = (vx(s), vy(s))T denotes its motion vector; and {σI2, K} and {σm2, ρm} are

their respective variance and correlation coefficient. Likewise, in [30] Zheng et al. introduces a motion distribution model assuming that the difference between motion at two pixels obeys the normal distribution:

vx(s1) − vx(s2) or vy(s1) − vy(s2) ∼ N (0, α ks1− s2k22), (2.2)

where α is a constant indicating the degree of motion variation in the horizontal or vertical direction.

Given these models, they both show that the block-based motion estimate tends to be the motion of the block center sc, with the mean-square prediction

error for pixel s, d(s; v(s_c)) ≡ Ik(s)−Ik−1(s + v(sc)), given respectively by

E[d2(s; v(s_c))] = 8σ 2 Iσm2 K 1 − ρ ||s−sc||1 m (2.3) and E[d2(s; v(s_c))] = ||s − sc||22, (2.4)

where is a factor related to the randomness of the motion and intensity fields (the randomness increases with increasing ). According to these equations, the prediction error is larger for boundary pixels, which agrees with the general ob-servation.

2.3 Analysis of Various MCP Schemes

Given these statistical models, we next examine the prediction error for various MCP schemes including BMC, CGI[21], OBMC [19], TMP[13] and SKIP modes.

(26)

Assume at first the sampling structure is a square lattice. Such is the case when an image is divided into equally spaced square blocks for motion estimation.

2.3.1 Error Variance Distribution of BMC, CGI and OBMC

The prediction error of pixel s, s ∈B for BMC, CGI and OBMC can be expressed respectively as dBM C(s) = Ik(s)−Ik−1(s + v(s0)) dCGI(s) = Ik(s) − Ik−1 s + 3 X i=0 w(c)_i (s)v(si) ! dOBM C(s) = Ik(s) − 3 X i=0 w_i(o)(s)Ik−1(s + v(si))

where {w_i(c)(s)} are chosen such that P w_i(c)(s)v(si) forms a vector LMMSE

es-timate of v(s) subject to the unit gain constraint1_{. By a similar approach, the}

weighting coefficients {w_i(o)(s)} and {w_i(ig)(s)} are derived to linearly estimate Ik(s)

based on the data sets {Ik−1(s + v(si))} and {Ik−1(s + v(ti))}, respectively.

Par-ticularly, in computing {w_i(ig)(s)} the motion vectors at ti, i = 1, 2, 3 are taken to

be known, while during actual motion compensation they are interpolated from those of nearby block centers (with the results denoted by _ev(ti)).

The mean-square prediction error (MSE) for the four MCP schemes can be evaluated by using (2.1), although the algebra is a bit tedious. We shall thus use CGI as an example to indicate the main idea without going into formal details. To start off, the vector LMMSE estimator for v(s) is firstly found by combining the scalar estimator for each of its components. As such, w(c)_i (s) is a matrix-valued function (of dimension 2x2). However, a great simplification can be made since

1_{We consider this Wiener filter rather than bilinear filter [21] since our interest is in}

(27)

(a) the horizontal and vertical motion fields are independent of each other and (b) they share an identical signal model as hinted in (2.1). The former makes the matrix become diagonal while the latter further equalizes the diagonal elements. Together the two conditions reduce w_i(c)(s) to a scalar, with its value given by the ith element of w(c)(s) = R−1 P − U U T_R−1_{P − 1} UT_R−1_U , (2.5) where U is a unit vector and Rij = E[vx(si)vx(sj)] and Pj = E[vx(s)vx(sj)] for

0 ≤ i, j ≤ 3.

To complete the evaluation of E[ dCGI(s)

2

], we still need to know E [I_k2(s)], E[I2

k−1(s +P w (c)

i (s)v(si))], and E[Ik(s)Ik−1(s +P w (c)

i (s)v(si))]. The first two

terms, according to (2.1), are simply σ2

I, while the last one can be computed by

substituting (2.1) and (2.5) into (2.6). E " Ik−1(s + v(s))Ik−1 s + 3 X i=0 w(c)_i (s)v(si) !# (2.6) = σ2_IE  1−2K−1 3 X i=0 w(c)_i (s)(vx(s) − vx(si)) !2 

where we have used the fact that Xw(c)_i (s) = 1. A straightforward computation then gives E h dCGI(s) 2i = f 2 X 0≤i,j≤3 w(c)_i (s)w(c)_j (s) 1 − ρks−sik1 m −ρks−sjk1 m + ρ ksi−sjk 1 m , (2.7) with the scaling factor f = 8σ_I2σ2_mK−1. Following similar derivations to those for CGI, we can calculate the MSE for the other schemes as

Eh dBM C(s) 2i = f 1 − ρks−s0k1 m Eh dOBM C(s) 2i = Eh dCGI(s) 2i w(c)_(s)=w(o)_(s)

(28)

where I∆(i) = Ik(s) − Ik−1(s+v(te i)) and E[I∆(i)I∆(j)] can be expanded and eval-uated term by term through a calculation similar to (2.6).

It is interesting that the MSE of OBMC exhibits the same form as that of CGI, with w(o)_{(s) substituting for w}(c)_{(s). Somewhat surprisingly, w}(o)_{(s) is found to}

be equal to w(c)(s), suggesting that OBMC and CGI have identical prediction efficiency and that the OBMC filter w(o)_{(s) is also a good motion interpolator.}

Nevertheless, OBMC is generally preferable to CGI. The reasons are twofold. First, the true motion for every pixel is not easily accessible, which makes it dif-ficult to estimate w(c)_{(s) for CGI. Second, OBMC can not only alleviate motion}

uncertainty, but it also serves to attenuate quantization noises in reference pic-tures. These arguments also explain why OBMC normally outperforms CGI in practice.

2.3.2 Error Variance Distribution of TMP

Template Matching Prediction (TMP) is a decoder-side motion derivation scheme. As shown in Fig. 2.3, TMP finds the predictor for a target block B by minimizing the predictor erro over the pixels in its immediate inverse-L-shaped neighborhood B.To gain some insights into TMP, Fig. 4.1 plots the mean-square prediction error surface with a TMP MV for motion compensation of the target block. It is seen that this MV tends to minimize the prediction error in the upper left quarter, a result that is intuitively agreeable since it approximates the true motion associated with the template centroid. Although it has been observed that TMPcan provide coding gain[13], there is almost no satisfactory theoretical basis that clearly inter-prets the theoretical aspects of TMP thoroughly. In the following sections, we will first analyze the prediction efficiency of TMP followed and then the comparisons of TMP, BMC and Skip prediction. This section provides a theoretical analysis to

(29)

Figure 2.2: Mean-square prediction error surface of TMP using the sequence ”Football”.

expose the factors that determine the prediction efficiency of TMP. The analysis is carried out based on the statistical models introduced in previous section

We begin by examining the distribution of TMP error variance. To do so requires modeling the template motion estimate. Proceeding as the approach described in [30], we can obtain, with the results that

st= arg min t X s∈T E[d2(s; v(t))] =   P s∈T x(s) |T | , P s∈T y(s) |T |   T . (2.8) Thus, the motion estimate found by minimizing the template matching error is likely to be the motion associated with the centroid of the template, a result that is intuitively agreeable and is a direct extension of that for (rectangular) block matching.

As shown in Fig. 2.3, the centroid of the template st is obviously not at the

block center when the template is straddled on the top and to the left of the target block B. Thus we can expect TMP to yield higher prediction error than BMC for block B. A little computation using st in place of sc in (2.7) and (3.5) further

(30)

Figure 2.3: Template Matching Prediction.

right quarter. This result is well supported by the empirical data displayed in Fig. 2.4, Fig. 2.5 and Fig. 2.6, where the actual error surface and the ones predicted by the two models are compared. For clarity we have rotated the error surfaces counterclockwise by 135◦. From the figure, we also observe that Zheng’s model seems to perform better in estimating error variances.

In summary, although TMP does not require extra motion information, its prediction efficiency is generally much worse than that of BMC in the mean-square error (MSE) sense. An exception is when both the intensity and motion fields are less random or have high spatial correlation, that is, with Tao’s model, σ_I2, σ2_m are smaller or ρm, K tend to be larger and with Zheng’s model, is small. It is

then natural to question how it can achieve a bit-rate saving of 10%. The answer becomes clear when its performance is compared with that of SKIP prediction.

2.3.3 Error Variance Distribution of SKIP Prediction

We shall now derive formulae that will enable us to estimate the error variance for SKIP prediction. Recall that if a block is coded in SKIP mode, its motion vector

(31)

(a)

(b)

(c)

Figure 2.4: Mean-square prediction error surfaces of block B produced with BMC by (a) empirical results, (b) Tao and (c) Zheng’s model, respectively. The sequence is Football and the block size used for motion compensation is 16x16.

(32)

(a)

(b)

(c)

Figure 2.5: Mean-square prediction error surfaces of block B produced with TMP by (a) empirical results, (b) Tao and (c) Zheng’s model, respectively. The sequence is Football and the block size used for motion compensation is 16x16.

(33)

(a)

(b)

(c)

Figure 2.6: Mean-square prediction error surfaces of block B produced with SKIP by (a) empirical results, (b) Tao and (c) Zheng’s model, respectively. The sequence is Football and the block size used for motion compensation is 16x16.

(34)

is determined by the median of those in its neighborhood. Using the example shown in Fig. 2.3, the inferred vector _bv for block B is

b

vx = Median{vx(s1), vx(s2), vx(s3)}

b

vy = Median{vy(s1), vy(s2), vy(s3)}

(2.9)

where (vx(si), vy(si))T, i = 1, 2, 3 are the motion vectors associated with blocks Bi

and are approximated by the motion of their centers. The corresponding mean-square prediction error for pixel s, s ∈B then becomes

Ed2(s;v)_b = E(Ik(s) − Ik−1(s+bv))

2

(2.10) = E(Ik−1(s + v(s)) − Ik−1(s+bv))

2_.

Computing the expectation in (2.10), which involves order statistics, is in gen-eral a difficult task. To circumvent the difficulties, we take a simpler approach by assuming that _bv(i, j) ≡ (_bvx,bvy) = (vx(si), vy(sj)), i, j = 1, 2, 3, with each ordered pair being equally likely. Hence, we can replace (2.10) with

E [d2(s;v)]_b = 1₉ 3 X i=1 3 X j=1 E(Ik−1(s + v(s)) − Ik−1(s+v(i, j)))b 2_, (2.11)

which can readily be evaluated by incorporating Tao’s model. A straightforward calculation then gives

Ed2_(s; b v) = 8σ 2 Iσm2 3K 3 X i=1 1 − ρks−sik1 m . (2.12)

Similarly, repeating the procedure in [30], we obtain the result for Zheng’s model as Ed2(s;v) ≈_b 3 3 X i=1 ks − sik2₂, (2.13)

where the approximation is due to the use of Taylor’s expansion in computing the prediction error Ik−1(s + v(s)) − Ik−1(s+bv(i, j)).

(35)

Table 2.1: Comparison of Mean-Square Prediction Error

Football QP22 Foreman QP22 Football QP38 Foreman QP38

Schemes Emp. T. Z. Emp. T. Z. Emp. T. Z. Emp. T. Z.

BMC(8) 112 109 113 19 17 19 141 134 141 43 40 43 TMP L2(8) 372 302 342 41 29 31 398 307 360 70 48 64 TMP L4(8) 382 346 369 39 33 34 405 351 385 70 55 66 BMC(16) 238 232 238 28 27 28 256 246 256 59 55 59 TMP L2(16) 590 530 609 54 48 34 600 516 597 85 67 66 TMP L4(16) 588 555 620 55 50 37 596 539 607 86 70 69 SKIP(16) 913 916 887 129 136 140 913 914 885 329 340 339

It is interesting to know that both (2.12) and (2.13) are merely a weighted sum of the mean-square prediction errors, i.e. P3

i=1(E[d

2_{(s; v(s}

i))]/3), when v(si),i =

1, 2, 3 are separately utilized for motion compensation of pixel s. In fact, this is a direct consequence of our assumption made about v. Its validity is justified by_b the empirical data given in Fig. 2.4, 2.5 and 2.6, where it is seen that the error surfaces predicted by (2.12) and (2.13) resemble closely the actual one. Also, as expected, with the help of v(s₂) SKIP prediction tends to minimize the error at the upper part of the block, especially at the upper right quarter.

2.3.4 Comparison of BMC, TMP and SKIP Prediction

Table 2.1 compares the MSE of residual signals for different schemes. The empir-ical values and those predicted by the models are illustrated. For experiments, we use CIF Football and Foreman sequences, each being 50-frame long. The search range for block or template matching is ±32 pixels, with quarter-pel accuracy. To simulate quantization effects, the reference frame and the template region (of size 2 or 4) are coded by H.264/AVC. In addition, the model parameters σ2

Iσ2m/K, ρm

and are estimated by a least-square fit to empirical data.

From the table, several observations can be made: (a) the models are consis-tent with experimental results (at least qualitatively); (b) with explicit motion information, BMC yields a minimum MSE among all the schemes; (c) TMP con-sistently outperforms SKIP prediction regardless of the template or target block

(36)

size; and (d) the MSE of TMP increases as the template or target block size is increased. The third explains why the bit rate can be significantly reduced when TMP is applied to SKIP macroblocks as an alternative prediction source [13]. The last is due to the fact that the template centroid deviates more from the center of the target block. Remarkably, these results are true in an average sense, meaning that a hybrid of TMP and BMC may outperform either one alone, as reported in [13].

2.4 Summary

We have re-examed the predictioin efficiency of motion compensated prediction (MCP) and interpreted it as a motion sampler followed by the reconstruction of prediction signal. We also show that, in a statistical sense, block matching based motion estimation will result in motion vectors that are most likely to be the motion vectors sampled at block centers. With the help of motion and intensity models, the comparison of BMC, CGI, OBMC, TMP and SKIP prediction are also demonstrated both theoretically and empirically. Although TMP hardly competes with BMC, TMP is shown to outperform SKIP prediction, which explains why the bit rate can be significantly reduced when TMP is efficiently combined with SKIP prediction. Based on this theoretical framework, in this dissertation, we then apply some of these results to design a parametric solution for OBMC to suit for irregular motion sampling structures.

(37)

Chapter 3 Parametric Overlapped Block

Motion Compensation

3.1 Introduction

As discussed in chapter 2, various algorithms have been proposed to improve BMC. The most straightforward technique is variable block-size motion compen-sation(VBSMC), which increases motion sampling density in areas with complex motion to compensate for the inefficiency of BMC. By contrast, Control-Grid Interpolation (CGI) [21] and Overlapped Block Motion Compensation (OBMC) [19] use more sophisticated algorithms to reconstruct the motion field without additional samples. The former improves motion interpolation by employing a triangular filter function, while the latter directly gives a linear estimate of each pixel’s intensity based on predictors derived from the current and nearby block MVs. Both are able to alleviate blocking artifacts effectively, but in practice, OBMC is preferred to CGI since the averaging of predictors also helps to reduce quantization noise [25]. To reduce and equalize prediction error within blocks, two approaches have been proposed: overlapped block motion compensation (OBMC) [3] and variable blocksize motion compensation (VBSMC) [4][5]. OBMC improves the motion compensation accuracy for every pixel by considering nearby motion estimates as different plausible hypotheses for its true motion. VBSMC, on the

(38)

other hand, extends BMC naturally to allow the use of subblocks of varying size in motion compensation. While OBMC requires no extra side information, VBSMC must additionally signal the choice of block size and motion vector. Each method has some merits and faults, and this dissertation seeks to form an optimized hybrid of the two techniques.

Motivated by the preceding observations, we are led to seek an optimized hybrid of VBSMC and OBMC, aiming to trade better prediction for fewer MVs while retaining the flexibility to adapt motion sampling structure according to variations in image statistics. However, determining OBMC weights to associate with MVs on an irregular grid poses a challenging problem. This is because the variable block-size partitioning yields spatially varying geometric relations between a prediction pixel and its nearby block centers. In this case, solving for the weights with the least-squares method would become an under-determined problem since a distinct solution has to be sought for each possible context. Clearly, there may be more parameters to be estimated than there are data points.

This problem is not new. A similar situation occurred in the development of H.263 [1]. At that time, it was resolved by treating larger blocks as a collection of smaller blocks with the same MV in each smaller block as in the larger aggregate block and by applying a fixed window function to all MVs. In an attempt to extend the notion to H.264/AVC, Wang et al. [27] additionally proposed to weight more heavily those MVs from smaller aggregate blocks, which they believed can more reliably represent the motion of neighboring blocks, although no justification was given. Both methods suffer from the same problem that inner pixels in larger blocks are not properly compensated. Essentially, the MVs utilized for OBMC of those pixels are replicated from the same (aggregate) block MVs, producing a net

(39)

effect like BMC. A third method that has recently been proposed is irregular-grid OBMC [11], which circumvents this deficiency by an adaptive window support that scales with local motion sampling density. It, however, remains unclear how to choose a proper scaling factor for each MV.

This dissertation departs from heuristic methods to approach the problem from a theoretical perspective. We formalize the notion of motion-compensated predic-tion (MCP) as a two-stage process consisting of sparse mopredic-tion sampling followed by the reconstruction of temporal predictors. Within such a framework, OBMC in its generalized form is seen to find a LMMSE estimate for every pixel’s inten-sity based on motion-compensated signals derived from MVs sampled at nearby block centers. This viewpoint allows us to derive a parametric solution, termed POBMC, for determining the optimal weights in closed form. In doing so, the signal models in [30] are adopted to describe the probabilistic structures of the underlying intensity and motion fields. One important result of our POBMC is that its parameters include only the `2 distances between the locations of the pre-diction pixel and the MVs involved–i.e., their geometric relations are all that are needed to determine the weights. This leads to a generic method of reconstructing temporal predictors from any sparsely and irregularly sampled motion data.

Although our approach has some parallels with the other parametric solution [24], the unique features that distinguish this work from it include

1. Our focus is to adapt OBMC to suit variable block-size motion partitioning, while [24] concentrates on adjusting OBMC windows, based on the use of fixed block-size partitioning, in response to variations in sequence statistics; 2. We adopt an alternative signal model [30], which not only better represents the reality but also gives a result that is considerably more intuitive and

(40)

tractable;

3. We address the uncertainty associated with a block MV’s location by in-troducing a compensation term to reflect its dispersion around the block center;

4. We propose a suboptimal yet computationally efficient implementation, which need not solve the Wiener-Hopf equation and thus eliminates the need to compute matrix inverse.

In addition, we implement the proposed scheme with KTA 2.4r1 [20] and provide a performance comparison with the recently proposed Enhanced Adaptive Inter-polation Filter (EAIF) [29] and Quadtree-based Adaptive Loop Filter (QALF) [12] together with an analysis on how they interact with each other.

In the common test conditions, our POBMC delivers better rate-distortion (R-D) performance than both the H.263 OBMC [1] and the parametric solution [24]. Relative to an H.264/AVC anchor with extended macroblock (MB) size, it achieves 3.1% (0.7-13.6%) BD-rate reductions, compared to 4.6% (0.5-10.1%) and 7.2% (1.3-18.0%) with the single use of EAIF and of QALF, respectively. Although POBMC has the least gain among these filters, it can be combined efficiently with either of the other two filters. The result is an improvement that is almost the sum of their separate effects. In particular, the combination of POBMC and QALF performs very close to or better than that of EAIF and QALF, even in cases where the single use of EAIF outperforms that of POBMC.

The rest of this dissertation is organized as follows: Section II revisits the notion of motion-compensated prediction from a perspective based on motion sampling and reconstruction. Section III presents in detail the derivation of our parametric solutions. Section IV examines their properties by contrasting

(41)

the-oretical predictions with empirical data. Section V evaluates the compression performance of POBMC from various aspects and provides a runtime analysis. Section VI concludes this dissertation with a summary of our observations and a list of future works. Finally, the implementation details of POBMC is elaborated in Appendix.

3.2 Parametric Overlapped Block Motion

Com-pensation (POBMC)

3.2.1 Review of OBMC

This section briefly reviews the basics of OBMC, to aid the understanding of our POBMC. In words, OBMC is to find a LMMSE estimate of a pixel’s intensity value Ik(s) based on motion-compensated signals {Ik−1(s+v(si))}Li=1derived from

its nearby block MVs {v(s_i)}L

i=1. From an estimation-theoretic perspective, these

MVs are plausible hypotheses for its true motion, and to maximize coding effi-ciency, their weights w = [w1, w2, ..., wL]T are chosen to minimize the mean-square

prediction error subject to the unit-gain constraint [19]: w∗ = arg min w ξ(w) s.t. L X i=1 wi = 1, (3.1) where ξ(w) = E    Ik(s) − L X i=1 wiIk−1(s + v(s_i)) !2   .

Applying the Lagrangian method to (3.1 ) then gives w∗ = R−1 P − U U T_R−1_{P − 1} UT_R−1_U , (3.2) where [R]_ij = E[Ik−1(s + v(s_i))Ik−1(s + v(s_j))] and [P]_j = E[Ik(s)Ik−1(s + v(s_j))]

(42)

vector with all elements equal to one [19]. Given that the underlying intensity and motion fields are stationary and that motion samples are taken on a square lattice (such is the case when an image is divided into a group of square blocks for motion search), the optimal weights w∗ for pixel s depend solely on its relative position within a block. They are often obtained using the least-squares method due to lack of knowledge of the probabilistic models of real data.

The concept of OBMC can be generalized to the case where motion sampling structure is irregular. The challenge, however, becomes how to compute for each pixel its optimal weights to associate with nearby MVs, given that both auto-and cross-correlation functions are spatially varying. The least-squares solution, although feasible in theory, is impractical because the storage of weighting coef-ficients optimized for different contexts demands huge memory requirements. To tackle this problem, we resort to a parametric solution.

3.2.2 Signal Models

POBMC aims to give a closed-form formula for the optimal weights. To do so, it usually needs to assume signal models for the intensity and motion fields. The choice of the models often involves a trade-off between accuracy, simplicity and tractability, and can sometimes be quite subtle. For instance, Tao et al. [24] model the auto-correlation functions of the intensity and motion fields using quadratic and exponential functions, respectively. These models are so chosen that R and P can be expressed in closed form. In general, different models have their merits and faults, and what model best represents reality is normally justified by empirical simulations.

In this dissertation we aim to give a direct estimate of the optimal weights w∗. This is accomplished by adopting the motion model proposed in [30], which

(43)

assumes that the difference between the true motion of any two pixels, e.g., s1 and

s2, has a normal distribution of the form

vx(s1) − vx(s2) or vy(s1) − vy(s2) ∼ N (0, αr2(s1, s2)), (3.3)

where α is a positive number indicating the degree of motion randomness in the horizontal or vertical direction 1_{, and r(s}

1, s2) is the `2 distance (measured in the

unit of pixel) between s1 and s2. Caution, however, must be exercised when using

(3.3) because it is an incomplete specification. The variance αr2(s1, mathbf s2)

must be bounded from above for the model to be proper. To see this, let us assume the motion field is stationary and symmetric. It then follows from (3.3) that E{vx(s1)vx(s2)} = E{vy(s1)vy(s2)} = σm2 + µ 2 m− αr2_(s 1, s2) 2 , (3.4) where µm and σ2m are the mean and the variance of the motion field, respectively.

Using the Cauchy-Schwarz inequality, we have 4σ2

m ≥ αr2(s1, s2) ≥ 0. The lower

bound is obvious, but the upper bound deserves more attention. According to (3.4), it implies that the MVs of two far-away pixels are negatively correlated. A tighter bound that agrees more with the general observation is 2σ2

m, which will

make them become uncorrelated. We can equivalently define a clipper function for r(s1, s2) to have the property r(sb 1, s2) = Clip(0, r(s1, s2), τ ), where the clipping threshold τ = p2σ2

m/α. Hereafter we shall omit the tedious repetition of this

constraint by using _br(s1, s2) in place of r(s1, s2).

3.2.3 Optimal Weights in Parametric Form

With the signal model in (3.3), we next proceed to determine the optimal weights w∗ using calculus. To begin with, we rewrite, by noting that PL

i=1wi = 1, the

(44)

mean-square prediction error ξ(w) in Eq. (3.1) as ξ(w) = E    L X i=1 wid(s; v(si)) !2   , (3.5)

where d(s; v(si)) = Ik(s) − Ik−1(s + v(si)) denotes the residual signal when Ik(s) is

predicted from the motion-compensated signal Ik−1(s + v(si)) using the MV v(si)

for block i. (3.5) can be written more compactly in matrix notation as

ξ(w) = wTE{ddT}w = wTDw, (3.6) where d = [d(s; v(s₁)),d(s; v(s₂)), ...,d(s; v(s_L))]T.

To continue, we borrow a result in [30], which shows that if (3.3) is valid, then E{d2(s; v(si)} has a closed-form formula given by

E{d2(s; v(si)} = E{(Ik−1(s + v(s)) − Ik−1(s + v(si)))2}

= _br2(s, si),

(3.7)

where is a constant indicating the joint randomness of the motion and intensity fields; Ik(s) = Ik−1(s + v(s)) with v(s) denoting the true motion of pixel s; and

the block MV v(si) is approximated as the motion associated with the block

center si. What remain to be determined in D are those off-diagonal entries, i.e.,

E{d(s; v(si)d(s; v(sj)}, i 6= j; in fact, their derivations are merely an application

of (3.7). With a little bit of algebra 2, we obtain E{d(s; v(si)d(s; v(sj)}

=E{(Ik(s)−Ik−1(s + v(si)))(Ik(s)−Ik−1(s + v(sj)))}

=1 2E{(Ik(s)−Ik−1(s + v(si)) 2 } +1 2E{(Ik(s)−Ik−1(s + v(sj)) 2 } −1 2E{(Ik(s+v(si))−Ik−1(s + v(sj)) 2_} =1 2 br 2_{(s, s} i) +rb 2_{(s, s} j) −br 2_(s i, sj) (3.8) 2_{(a − b)(a − c) =} 1 2(a − b) 2₊1 2(a − c) 2₋1 2(b − c) 2

(45)

The astute reader may feel a sense of misgiving about the approximation E{(Ik

(s+v(s_i))−Ik−1(s + v(sj)) 2

} ≈ _br2(si, sj), as it does not seem to be a direct

exten-sion of (3.7). The subtle difference is the replacement of v(s) with v(s_i). However, assuming that v(s_i) represents the true motion of the block center si, its proof can

be carried out in the same manner as for (3.7). Another testament to its mathe-matical correctness is that (3.8) includes (3.7) as a special case where si = sj.

Returning to (3.6), we are now ready to find the optimal weights. Since ξ(w) is to be minimized subject to PL

i=1wi = 1, the solution space has only a dimension

of L − 1. To simplify the computation, we define a reduced-dimension weight vector w = [_e w_e1,we2, ...,weL−1]

T_{, the elements of which are free variables and are}

related to the weight vector w by

w = e − Mw,_e (3.9) where e= [0, 0, ..., 1]T L×1 and M = −I UT =          −1 0 0 · · · 0 0 −1 0 · · · 0 0 0 −1 · · · 0 .. . ... ... . .. ... 0 0 0 · · · −1 1 1 1 · · · 1          L×(L−1) .

When spelled out, (3.9) simply states that wi = wei, 1 ≤ i ≤ L − 1 and wL = 1 −PL−1

i=1 wei. Substituting (3.9) into (3.6), setting the gradient of ξ with respect to w to 0, and solving the resulting system equations then yields_e

e

w∗ = (MTDM)−1MTDe. (3.10)

The result of w_e∗ immediately gives that of w∗ by (3.9):

(46)

(a) (b)

Figure 3.1: The distribution of a block MV’s location when the block size used for motion search is varied: (a) 16x16 and (b) 32x32. The MV location is approxi-mated by the centroid position of the first ten pixels, in a block, having relatively smaller prediction error.

Inspection of (3.11) reveals that the optimal weights depend solely on the distances between the prediction pixel s and the block centers involved {si}Li=1.

The term is absent in the final result. This remarkable property allows MVs sampled on a possibly irregular grid to be incorporated for OBMC, providing a reconstruction method applicable to any sampling structures.

3.2.4 Optimal Weights in a Special Case

An interesting special case occurs by considering D as a diagonal matrix. In this case, the prediction errors {d(s; v(si)}Li=1 are uncorrelated with each other, i.e.,

E{d(s; v(si)d(s; v(sj)} = 0, ∀i 6= j, and w∗ becomes

w∗ = L X i=1 1 b r2_{(s, s} i) !−1 1 b r2_{(s, s} 1) , 1 b r2_{(s, s} 2) , ..., 1 b r2_{(s, s} L) T . (3.12) The proof of this result requires some work but involves only straightforward com-putations. (3.12) is a great simplification of (3.11): the optimal weights w∗_i are simply the normalized inverses of the corresponding squared distances between s and si. It has the interpretation that prior to normalization, the contribution

(47)

of each MV v(s_i) to estimating its nearby pixel intensities is a function of pixel s that decays quadratically with _br(s, s_i). If we take such a view, other functions can be substituted for 1/_br2_{(s, s}

i). For example, it may be just as well to adopt

the raised cosine or bilinear function of various supports, or to change the power of 1/_br(s, s_i). As an afterthought, each of these functions may correspond to mak-ing some specific assumptions about the motion and intensity fields. Due to its simplicity, (3.12) will be included in the following sections as an alternative to (3.11).

3.2.5 MV Location Uncertainty

In the preceding derivation, we have always assumed that a block MV represents the true motion of the block center. However, this is an approximation; in fact, it may correspond to the motion of any pixel around the center. To see this, consider a small group of pixel locations in a block where prediction errors are relatively smaller. We think of the block MV as the motion connected to their centroid. Although not precise, this expedient provides a rough estimate of the MV location without having to acquire the true motion field. Fig. 3.1 presents two plots showing the centroid distributions when the block size used for motion search is varied. Two observations are immediate: (a) the means of both distributions are close to the block center, which justifies the widely accepted approximation, and (b) the variance is non-zero and increases with the increasing block size, which suggests that the locations of si, sj in (3.7) and (3.8) should be modeled

probabilistically.

We now generalize both equations to consider their random effects. To conform with our previous notation, we denote by_esi = si + ni (respectively,esj = sj+ nj) their true locations, which are characterized by an independent, additive noise

(48)

vector ni(respectively, nj) with mean zero and covariance matrix Knini =   δ(x)_i ρi q δ(x)_i δ_i(y) ρi q δ(x)_i δ_i(y) δ_i(y)  .

Substituting _esi for si in (3.7) and applying the law of iterated expectations, we

get E{d2(s; v(_esi)} =EE{d2(s; v(_esi))|esi} = E br 2_(s, esi) 'En(s(x)−s(x)_i − n(x)_i )2+ (s(y)−s(y)_i − n(y)_i )2

o '_br2(s, s_i) + (δ_i(x)+ δ_i(y)),

(3.13)

where the superscripts x, y indicate the two components of a point or a vector. In (3.13), the locations of pixel s and the block center si are treated as known

variables because we know exactly what MVs will be utilized for the motion com-pensation of pixel s. As such, they are deterministic quantities and the expectation in the penultimate approximation is taken with respect to ni only. In the course,

we have tacitly ignored the clipping effect on r(s,_esi),which however is crucial for

our signal models to be proper (Section 3.2.2). A way out of this difficulty is to assume that si is close enough to s so that the result in (3.13) is a good

ap-proximation. This assumption can be justified to some extent since in practical implementation of our schemes, we use only those neighboring MVs that are closer to a pixel for its motion compensation. From (3.13), the consequence of MV loca-tion uncertainty is an increase in the mean-square predicloca-tion error. Of particular interest is that the penalty depends only on the variances of ni (or equivalently,

(49)

A similar calculation leads us to E{d(s; v(_esi)d(s; v(esj)} =1 2E b r2(s,_esi) +br 2_(s, esj) −br 2₍ esi,esj) '1 2 b r2(s, s_i) + δ_i(x)+ δ(y)_i + 1 2 b r2(s, s_j) + δ_j(x)+ δ_j(y) −1 2 b r2(si, sj) + δ (x) i + δ (y) i + δ (x) j + δ (y) j =1 2 br 2 (s, si) +br 2 (s, sj) −br 2 (si, sj) ,

where ni and nj are assumed to be independent. As shown, the variance terms

in (3.14) cancel each other out, leading to the same result as in (3.8). Simply substituting (3.13) into the matrix D in (3.11) gives the modified optimal weights with consideration of MV location uncertainty. These results also apply to the case where D is a diagonal matrix.

In concluding this section, we want to point out that the proposed scheme has two parameters to be determined: the clipping threshold τ and the degree of MV location uncertainty δ = δ_i(x)+ δ_i(y). The latter actually denotes a set of parameters, one for each distinct block size. As will be discussed later, they can be determined by off-line training.

3.3 Analysis of Window Functions

While (3.11) characterizes the contributions of a set of MVs to estimating the intensity of a pixel, an equivalent yet more insightful perspective is to see the window function of each MV, which specifies its weights used to estimate pixel intensities in a neighborhood [19]. In this section, we shall gain further insights into the proposed solutions from this viewpoint. To ease comprehension, we first consider the simpler case of fixed block-size motion partitioning, followed by the more sophisticated one involving variable block-size partitioning.

(50)

(a) (b)

(c) (d)

(e) (f)

Figure 3.2: The effect of the clipping threshold value on the shape of the proposed parametric windows with (a)(c)(e) non-diagonal and (b)(d)(f) diagonal D matri-ces. From top to bottom, the clipping threshold values are 10, 15, 35, respectively.

(51)

(a) (b)

Figure 3.3: Window functions along the slide of Y=16 based on a (a) non-diagonal or (b) diagonal D matrix.

3.3.1 Theoretical Window Functions

Fig. 3.2(a), (b) and (c) plot the window functions for various clipping threshold values τ ’s. Their counterparts in Fig. 3.2(d), (e) and (f) show the results when the off-diagonal entries of D are set zero. In the former case, we observe that the window shape inflates with the increasing τ , and eventually converges to a bilinear function. This trend of inflation continues in the latter case although the change in the window shape is not that radical, especially when the value of τ becomes high enough. These phenomena can be explained by noting that a higher clipping threshold implies a stronger correlation between the motion of different pixels (smaller α) or a larger motion variance (larger σ2

m). Under these circumstances, it

is intuitive to expect that the influence of a block MV will extend to more pixels. To gain a better appreciation of how the window shape evolves, Fig. 3.3 further displays the cross sections of these windows along the slide Y = 16. There are several points to be noted here. First, the weights around the block center (X = 16) are seen to be smaller than 1. This result is a manifestation of MV location uncertainty. As expected, their values tend to approach 1 if we have δ = δ(x)_i + δ(y)_i = 0 (cf. (3.13)). Some other interesting observations follow from

(52)

comparing the window values at X = 16.5 (current block center) and at X = 0.5 or 32.5 (neighboring block centers). The windows with a diagonal D resemble normal functions in shape, and exhibit an upward trend in magnitude near the block center (respectively, a downward trend at the neighboring block centers) as the value of τ increases. In the general case, however, the behavior is more intricate: the peak value escalates first and then declines. But, both cases have one thing in common–their windows converge to a function dependent only on δ when τ is large enough.

3.3.2 Comparison with Empirical Window Functions

The different results above lead us to wonder which model is more reasonable and how much the penalty is for keeping only the diagonal entries of D. In this section, we provide empirical justifications by contrasting the parametric windows with those obtained by the least-squares method. Results of [24] are also included for comparison. Particularly, to demonstrate the best achievable performance of our parametric schemes, both the values of τ and δ are searched exhaustively based on minimizing the mean-square prediction error, and so is the parameter ρm in [24]3.

From the results presented in Fig. 3.4(a) and Fig. 3.5(a), we see that the proposed windows with a non-diagonal D match closely the least-squares ones. The other windows, although showing similar magnitudes at block boundaries (X = 8.5 or 24.5), have much higher weights near the block center (X = 16.5). Despite their distinct appearances, the penalties in MSE are somehow surprisingly not as high as expected. We find that there are actually several window functions

3_{The parametric solution in [24] originally has four parameters to be determined. But, a little}

neat algebra shows that the resulting window is dependent on only the correlation coefficient of the motion field, ρm.

(53)

(a)

(b)

(c)

Figure 3.4: Comparisons of window functions and their MSE surfaces using testing sequence ”S04”: (a) parametric windows versus optimal least-squares windows; the MSE surfaces of the proposed parametric solution with a (b) non-diagonal or (c) diagonal D matrix.

(54)

(a)

(b)

(c)

Figure 3.5: Comparisons of window functions and their MSE surfaces using testing sequence ”S03”: (a) parametric windows versus optimal least-squares windows; the MSE surfaces of the proposed parametric solution with a (b) non-diagonal or (c) diagonal D matrix.

高效能視訊壓縮之先進動態補償預估方法

國

立

交

通

大

學

資訊科學與工程研究所

博

士

論

文

高效能視訊壓縮之先進動態補償預估方法

Advanced Motion-Compensated Prediction (MCP) for High-Efficiency

Video Coding

研 究 生：陳渏紋

指導教授：彭文孝 教授

李素瑛 教授

高效能視訊壓縮之先進動態補償預估方法

Advanced Motion-Compensated Prediction (MCP) for High-Efficiency

Video Coding

研 究 生：陳渏紋 Student：Yi-Wen Chen

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

李素瑛 Suh-Yin Lee

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

博 士 論 文

誌 謝

高效能視訊壓縮之先進動態補償預估方法

研究生: 陳渏紋

指導教授： 彭文孝 教授

李素瑛 教授

國立交通大學資訊工程學系

摘要

Advanced Motion-Compensated Prediction (MCP) for

High-Efficiency Video Coding

Student: Yi-Wen Chen

Advisor: Prof. Wen-Hsiao Peng

Prof. Suh-Yin Lee

Department of Computer Science,

National Chiao Tung University

Abstract

Contents

List of Tables

List of Figures

Chapter 1

Introduction

1.1

Overview of Dissertation

1.2

Motion-Compensated Prediction (MCP): An

Analytical Perspective

1.3

Parametric OBMC

1.4

Bi-Prediction Combining TMP and BMC

1.5

Organization and Contribution

Chapter 2

Motion-Compensated Prediction:

An Analytical Perspective

2.1

Introduction

2.2

Motion and Intensity Models

2.3

Analysis of Various MCP Schemes

2.3.1

Error Variance Distribution of BMC, CGI and OBMC

2.3.2

Error Variance Distribution of TMP

2.3.3

Error Variance Distribution of SKIP Prediction

2.3.4

Comparison of BMC, TMP and SKIP Prediction

2.4

Summary

Chapter 3

Parametric Overlapped Block

Motion Compensation

研究生：陳渏紋

指導教授：彭文孝教授

李素瑛教授

研究生：陳渏紋 Student：Yi-Wen Chen

國立交通大學

資訊科學與工程研究所

博士論文

誌謝

指導教授：彭文孝教授

李素瑛教授