針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

(1)

國

立

交

通

大

學

多媒體工程研究所

碩

士

論

文

針對可調視訊編碼粗略可調性之模式相依

的位元與失真解析模型

Analytical Mode-Dependent Rate and Distortion Models

for H.264/SVC Coarse Grain Scalability

研究生：曾于真

指導教授：彭文孝教授

中

中華

華

華民

華

民

民國

民

國

國一百

一百年

一百

年

年十

十

十月

十

月

_月

月

(2)

針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

Analytical Mode-Dependent Rate and Distortion Models for H.264/SVC

Coarse Grain Scalability

研究生：曾于真 Student：Yu-Chen Tseng

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

國立交通大學

多媒體工程研究所

碩士論文

A Thesis

Submitted to Institute of MultimediaEngineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science Oct. 2011

Hsinchu, Taiwan, Republic of China

(3)

針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

研究生：曾于真指導教授：彭文孝

國立交通大學多媒體工程研究所碩士班

摘

要

摘

要

摘

要

可調視訊編碼的層間預測和動作補償預測中執行動作預測的區塊其不同的分割模式會導致位元與失真上差異。然而現今只有少數模型可以解釋可調式視訊編碼位元與失真行為，更遑論有任何方法可以讓我們針對可調視訊編碼中不同的區塊分割模式分析其位元與失真關係。針對可調視訊編碼粗略可調性，本論文推導出了一個解析性及模式相依的位元與失真關係模型。考慮到加強層為可採用層間殘留預測的壓縮方式，我們對可調視訊編碼中基礎層與加強層各提供了一取決於區塊分割模式和影像特性的位元與失真模型。在我們所提出的位元與失真模型推演過程中採納了一個向前信道模型以及一個時間上穩態的過程假設，我們藉由一個動作預測軌跡詮釋重建區塊，並且將殘留變異數設計成一個統計量模型。實驗結果顯示，我們提出的模型可以很準確的估量出不同區塊分割模式，其真實壓縮出的基礎層及加強層的位元與失真曲線。並且最後針對不同區塊分割層間殘留預測的效能分析後，所提出的模型也呈現與真時壓縮相似的位元與失真趨勢。

(4)

Analytical Mode-Dependent Rate and Distortion

Models for H.264/SVC Coarse Grain Scalability

Student : Yu-Chen Tseng Advisor : Wen-Hsiao Peng

Institute of Multimedia Engineering

National Chiao Tung University

ABSTRACT

In Scalable Video Coding (SVC), the inter-layer prediction and the variable motion estimation block partition modes for motion-compensated prediction (MCP) cause differences in rate and distortion behavior; however, there are just few models could explain the rate and distortion behavior of SVC, not to mention methods which focus on analyzing the rate and distortion of different partition mode pairs in SVC. In this thesis, we derive analytical mode-dependent rate and distortion models for Coarse-grain scalable video coding techniques. The rate and distortion models for base and enhancement layer both depend on the partition mode and sequence characteristics with consideration of the inter-layer residual prediction capability in enhancement layer. Adopting a forward channel model and an assumption of temporal-stationary process in the derivation of proposed models, we interpret the reconstructed block by a motion prediction trajectory and model the transformed residual variance into a mode-dependent statistic. Our experimental results show that the proposed model can estimate the actual-coded R-D curves of different partition modes in base layer and enhancement layer with high accuracy. In addition, similar tendencies between model and actual-coded curve are observed over the performances of different mode pair encoded with inter-layer residual prediction.

(5)

誌

謝

誌

謝

誌

謝

在兩年前我有榮幸進入多媒體架構與處理實驗室，首先，我要感謝我的指導教授—彭文孝博士，從我大三時起給予我於學問研究上的指導。彭老師的認真勤勉與他的絕頂思路高度相當，做事情追根究柢並且仔細嚴謹的態度，已經成為我在學習與研究路上的典範與楷模。其次，我要感謝吳崇豪同學，在龐大的課業壓力下也不辭辛勞地回答我的問題，給予許多珍貴的意見，並且總是從旁督促、修正我的研究觀念，給予我許多修課、學問上的幫助，使我在這兩年的碩士生涯，不再舉步維艱。在多媒體架構與處理實驗室這個優良的環境下不斷學習，還得感謝曾經一起做研究的學長們—陳渏紋學長、王澤偉學長，以及在我座位旁邊的李宗霖同學；他們總是讓我感受到實驗室的溫暖，不論是在學業上或是生活上的支持，對我來說都是研究生生涯中難以忘懷的。最後，我要感謝我的家人—曾正義先生、劉培良女士的栽培，在爭取碩士學位的路上，讓我有自由的空間，也不責怪我能力不足畢業時期拖得太晚。感謝我的室友們—朱庭玉和單師涵，總在我遇到挫折的時候第一時間給我關懷，生活中互相扶持了兩年，情感如同家人。更要感謝單師涵的媽媽單懷靈女士，百忙之中也願意幫我的論文做英文文法上的糾正。感謝我的老師、家人、與朋友們，是你們的支持，陪伴我取得學位的一路艱辛，謝謝你們。

(6)

List of Tables

5.1 Testing conditions and encoder parameters. . . 37

5.2 PSNR error between real and model. . . 40

5.3 Entropy rate and modified rate error. . . 43

5.4 Base-layer usage with base-layer QP equaling 28., Bus(CIF). . . 44

5.5 Base-layer usage with base-layer QP equaling 28., Football(CIF). . . 45

(9)

List of Figures

2.1 hybrid coding diagram. . . 6

2.2 H.264 quantization scheme. . . 8

3.1 Forward channel model and models in matrix notation. . . 12

3.2 3-D Model of Hybrid Coder. . . 13

3.3 Motion trajectory for a 16x16 predicted block along the time axis. . . . 13

3.4 Single-layer residual signal generating for a 16x8 predicted block . . . . 17

4.1 3-D model of SVC hybrid coder with inter-layer residual prediciton. . . 22

4.2 EL and BL motion trajectory starting from fE k_−m for a 16x16 predicted block. . . 25

4.3 Four relativities of fE k_−n and fkB_{−m,k−m−p} for a 16x16 predicted block. . . 30

4.4 Multi-layer residual signal generating for BL 16x16 predicted block and EL 16x8 predicted block . . . 32

4.5 Laplace distortion and its approximation. . . 33

5.1 PSNR v.s. QP curves of BL and EL appling different configuration regression result. (Foreman) . . . 38

5.2 Real v.s. Model D-Q curves appling one regression result. (mobile and foreman sequence) . . . 39

(10)

LIST OF FIGURES

5.3 Entropy curves compared with actual curves (Foreman) . . . 40 5.4 Linearity relationship between ln (R) and H∗ _{(a) Base mode16x16 (b)}

Base mode 16x8 (c) Base mode 8x8 form Forman(CIF) . . . 41 5.5 Modified rate( R∗_{) compared with actual rate, entropy( H) as a}

con-trast. Blue lines with solid squares are BL R-Q curve. Red lines with hollow squares are the curves for entropy v.s. QP. Green line with hollow triangles are the curves for modeified rate v.s. QP . . . 42 5.6 SVC performance (Football), the dotted lines indecate the distance

be-tween simulcast and base layer, as well as the distance bebe-tween simulcast and inter-layer residual prediction . . . 44

(11)

CHAPTER 1 Research Overview

1.1 Introduction

Scalable Video Coding(SVC) approaches have been investigated for more than 20 years to answer demand from various video transmission channels and heterogeneous viewing devices.

In the aspect of traditional nonscalable video coding, hybrid coding has drawn most of the attention in the past several decades, which derives well-known H.264/AVC standard. In hybrid coding, motion-compensated prediction (MCP) is used to exploit temporal similarities between successive video frames (inter-frame coding). Transform coding is then implemented in two steps, first, converting spatial values into transform coefficient values and second, quantizing the coefficients to achieve a lossy compression. In the block-based MCP, each macroblock is split into one or more partition, refered to partition mode, for motion compensation. Different partition mode causes different coding efficiency; when it comes to mode decision, the rate and distortion behavior of each partition mode is then desirable for decision criterion.

(12)

Chapter 1. Research Overview

Instead of independently encoding consecutive spatial layers using MCP based coders, SVC adopts additional inter-layer prediction to exploit statistical dependences between different layers. In comparison to simulcasting different qualities or resolutions, in inter-layer prediction method, pictures with higher quality or resolution levels utilize the information from the lower levels in order to improve coding efficiency. The issue of how to analyse the performance of inter-layer prediction then catches the attention and becomes critical.

1.2 Problem Statement

In transform coding of images and videos, two important factors are coding bit rate R and picture distortion denoted by D. Analysis and estimation of the R-D performance are significant in image and video coding. For example, based on the rate and distortion models, optimum bit allocation as well as other R-D optimization procedures can be adopted to improve the coding efficiency and, consequently, to improve the image quality or video presentation quality. In typical hybrid video coding, the rate and distortion behaviors are relevant to motion estimation partition mode of MCP and quantization method.

Many efforts have been made on deriving rate and distortion model for non-scalable hybrid coding [3][4][5]. Basically, these methods provide analytical or empirical ap-proach to the rate and distortion of overall video sequences. Among the non-scalable rate and distortion models, [4] proposes a quantization-distortion model for H.264/AVC with particular consideration of the motion-compensated prediction effect, however, the non-linear numerical computation required by this model is impractical, and it cannot be used to model the R-D variation between different partition mode in block level. In addition to the non-scalable coding, the rate distortion analysis in [6] gives a framework for evaluating the rate-distortion theoretic lower bound for spatially scalable video cod-ing in general. The approach in [6] is simply an extension of that in an earlier work by B. Girod [7] and the authors in [6] propose ideal assumptions for theoretical analysis which are far from adequate to describe real SVC codec’s. For capacity of practical application, an operational and analytical rate distortion model is still needed. In this thesis we derive rate and distortion models, which depend on block partition mode, to

(13)

approach the behavior of H.264/AVC coding and its extension SVC with inter-layer residual prediction for Coarse-grain scalable coding. When considering the inter-layer residual prediction, the problem of rate and distortion modeling in H.264/SVC is very challenging. This study aims to provide answers to the following questions:

1. How do we determine the single-layer mode-dependent rate and distortion models based on prediciton and quantization shcemes of H.264/AVC.

2. How do we extend the single-layer mode-dependent rate and distortion models for SVC inter-layer residual prediction, given quantization parameters qB for base

layer and qE for enhancement layer?

3. How do the rate and distortion behave with different partition mode and different characteristic factors extracted from an input sequence?

4. How does inter-layer residual prediction perform when applied to different parti-tion mode pairs for MCP?

Since R-D behavior is affected by features of input sequence and quantization para-meter, this thesis provides an in-depth study on the relationship between rate distor-tion and video contents, as well as reladistor-tionship between rate distoriton and quantiza-tion parameter in SVC for characterizing the rate-quantizaquantiza-tion (R-Q) and distorquantiza-tion- distortion-quantization (D-Q) models.

1.3 Contributions and Organization of Thesis

Specifically, our main contributions in this work are:

• Two mode-dependented distortion-quantization (D-Q) models are proposed for non-scalable and scalable video coder.

• Two mode-dependented rate-quantization (R-Q) models are conducted for non-scalable and non-scalable video coder.

• Our analysis is capable of evaluating the R-D performance of different mode pair with SVC inter-layer prediction.

The remaining of this thesis is organized as follows: Chapter 2 contains a review of hybrid coding, Coarse-grain scalability in SVC and a rate distortion model based on Laplace distortion. Chapter 3 presents a derivation of single layer rate and distortion models. And the rate distortion model for multi-layer encoder is introduced in Chapter

(14)

4. Chapter 5 provides simulation results of examining the accuracy of proposed rate and distortion models and analyzing the performance of inter-layer prediction. Finally, the thesis is concluded with a summary.

(15)

CHAPTER 2 Background

2.1 Overview of Hybrid Video Coding

In hybrid video coding depicted in Fig.2.1, a video sequence is temporally segmented into several groups of pictures (GOP). Each picture is divided into numbers of mac-roblocks (MBs); each MB is split into one or more MB partitions and an intra or inter prediction is applied on each partition. The error generated as difference between predictor and current block is called motion-compensated prediction (MCP) residual.

(16)

Chapter 2. Background

Then a fixed M × M block transform, which is commonly a DCT transform, is applied to the prediction residual of inter and intra-prediction modes; the prediction residual blocks are therefore transformed into DCT coefficients. Note that the size of an M ×M transform block is always less than or equal to the partition sizes in a MB. After that, scalar quantization followed by entropy coding is applied to the DCT coefficients. Fi-nally, the quantizer causes the main quality loss of compression, which is quantization distortion D.

2.2 Overview of Scalable Video Coding

2.2.1 Concept

Scalable Video Coding (SVC) standard [2][8][1] is a scalable extension of the H.264/AVC standard developed by the Joint Video Team (JVT), which allows a single bitstream to provide multiple frame sizes, frame rates and quality levels while achieving a reasonable coding efficiency. An SVC bitstream is organized into one base layer(BL) and one or more enhancement layers(EL) in corresponding dimension if it provides certain scala-bility. A subset of SVC bitstreams can be extracted to form another valid bitstream for a given decoder and be decoded to produce a playback with a reduced reconstruction quality compared to the original bitstream.

SVC supports three types of scalabilities: spatial, temporal and quality scalabilities. Subsets in the spatial scalability bit-stream represent the source content with a reduced picture size (spatial resolution). The temporal scalability is provided by hierarchical temporal prediction structures for each coding layer while quality scalability is achieved by two approaches: Coarse-grain scalable coding (CGS), which can be considered as a special case of spatial scalability with identical frame sizes for base and enhancement layer, and medium-grain scalable coding (MGS), which provides quality refinement layers inside each spatial layer and enables packet-based quality scalable coding.

2.2.2 Coarse-grain Scalable Coding (CGS)

SVC performs CGS through encoding series of quality layers, which have the same spa-tial and temporal resolutions. At first, the texture information is encoded into an AVC

(17)

Figure 2.2: H.264 quantization scheme.

compatible bitstream to provide a base layer(BL) with the minimum quality among layers at a given quantization level. At enhancement layers(EL), CGS decreases the quantization step sizes and encodes successive refinements of the transform coefficients. For residual information, inter-layer prediction is employed. The base layer signal of the co-located block is used as prediction for the residual signal of the current enhancement layer macroblock, so that only the corresponding difference signal is coded.

2.3 Rate and Distortion Model Based on Laplace

Distribution

Laplace distribution [3][4][9][10] is a well-known distribution which bear resemblance to the distribution for DCT coefficients of images. Due to its low computational com-plexity and high accuracy, in this thesis, we choose Laplace distribution as the base distribution of transform coefficient in the proposed derivation. A zero-mean Laplace-distributed random variable with probability density function (pdf) is:

f(x) = 1 2Λe

−|x|_Λ _,

Λ = √σ 2,

(18)

where x represents the transformed residual, and Laplace parameter Λ is a function of σ, which is their standard deviation indicating the property of the input sequence. Recent coding standards usually adopt the uniform quantizer depicted in Fig. 2.2. The probability that transform coefficient x fall inside each quantization bin i are calculated by P(i) =      iq+q₂+α iq₋q 2+α (x − iq) 2 f(x)dx if i > 0, q 2+α −q₂−αx 2_f(x)dx _{if i = 0.} (2.1)

where q is quantization step size and α is the quantizer dead-zone parameter. For H.264/AVC inter frame coding, α = ₃q. The above probability (2.1) can be computed and then represented by a close form. the close form then introduces the distortion function: D= σ2_{− (2α +}√_2σ) exp( −q−2α_√ 2σ ) 1 − exp(−√σ2q) q, (2.2)

In addition to the distortion model, the entropy of the quantized transformed residuals can also be computed according to the entropy definition with the probability function.

H _{= −}

i

P(i) log P (i) = −P (0) log P (0) − 2

∞

i=1

P(i) log P (i).

And the closed form of entropy is obtained by:

H _{= −p}0log p0− log c p (1 − p0) − 2c log p (1 − p)2, (2.3) where p_{= exp(−} √ 2q σ ), p0 = 1 − exp(− √ 2α σ ) √_p, c= 1 2exp(− √ 2α σ ) √_p (1 − p) .

In the next chapter, the proposed rate distortion model will be discussed in detail based on Laplace distribution.

(19)

CHAPTER 3 Rate Distortion Model: Single Layer

This chapter presents the derivation of our operational rate distortion model for a single layer, or so-called base layer in SVC. Single layer, which is equivalent to the well-known AVC coding, means that the encoded bitstream does not contain scalable resolutions or scalable bit-rates.

3.1 Derivation Outline

Based on Laplace distribution and H.264 quantization scheme, a coefficient distortion function of residual variance σ2 _{and quantization step size q can be developed as in}

(2.2). As well as the entorpy function is obtain by (2.3).

Given a quantization step size q in the distortion model, the influence of MCP method can only be revealed in the variance of residual transform coefficients σ2_{. To}

model the impat of MCP schemes on distortion and rate behavior, we introduce a forward channel model to help us conveniently construct a hybrid coding flow. Then we formulate a closed-from residual variance function σ2_{(q) and use it to attain the}

(20)

Chapter 3. Rate Distortion Model: Single Layer

The MCP prediction error in block base is generated as difference between pre-diction block in coded reference frame and the current block; current residual is then encoded and the current distortion is formed. Considering that current block distor-tion is also relevant to distordistor-tion of reference block, in this thesis we assume that video sequence is a locally temporal-stationary process.

The statistical models proposed by Tao et al. [11] are used to characterize the mo-tion and intensity fields of video signals. These models provide parameters of momo-tion, intensity, and block-partition mode to analyze the block-level motion-compensation predictor; therefore, the closed-from residual variance function σ2_{(q) we have can also}

be controled by those motion, intensity, and block-partition mode parameters. Even-tually, a rate distortion model that react to the MCP method is achieved.

3.2 Distortion Model for Single Layer

H.264/AVC is based on the block-based hybrid coding approach. The motion estima-tion is performed to find the predicestima-tion of each macroblock(MB) partiestima-tion, and DCT transform followed by quantization is applied on each M × M segment block inside a macroblock individually. It follows that the rate distortion model of a MB can be reduced to modeling an M × M transform block coverage. Therefore, the derivation of the distortion model of an inter mode is depicted on the basis of an M ×M transformed block. To model the distortion for a whole MB, we only need to model each M × M transform block separately.

To evaluate the transform domain residual variance σ2 _{for distortion function of}

Laplace-distributed source, we first formulate the prediction error by subtracting the reconstructed reference frame from the current kth original frame. Let fk be the

vec-torization of an M × M intensity block of an MB to be coded in (current) frame k, and f_k

−1 be the vectorized motion-compensated prediction of fk in the reference frame

k_{− 1. The corresponding residual vector e}k is

e_k_{= f}_k_− f_k

−1. (3.1)

Let eT

k, fkT, and fkT−1 represent the transformed vectors of ek, fk, and fk−1, respectively;

(21)

correspond-Chapter 3. Rate Distortion Model: Single Layer

(a) (b)

Figure 3.1: Forward channel model and models in matrix notation.

ing to ek, fk, or fk₋₁, and the equivalent transform after vectorization is H⊗H, where

⊗ is the Kronecker product. Since the equivalent transform is linear, (3.1) implies

eT

k=fkT− fkT−1, (3.2)

where eT

k contains M2 transform coefficients in column-major order.

Although being followed by specific quantization and entropy coding in hybrid video coding, the transform is instead accompanied with the forward channel model, as shown in Fig.3.1 [12], in the following derivations. It is well-known that if a Gaussian source with mean zero and finite variance σ2 _{and an additive Gaussian noise are given, then}

with proper scaling β = 1 − σD2 of the channel input, direct connection of the source

to the channel results in a system that provides an ideal rate distortion function of the source with respect to the squared-error criterion, where D is the squared-error distortion between input and output. Though the forward channel needs a Gaussian source as the input to achieve its ideality and is not quite suitable for the transformed residual signal, it is still adopted in our framework for mathematical tractability.

Based on the optimum forward channel as shown in Fig.3.1(a), we can give a model of hybrid coder as Fig.3.2 with dimension M2 _{extension obtained by Fig.3.1(b) for}

applying intensity input eT

k to it. Thus, the reconstruction eTk of prediction error eTk

is obtained by:

eTk = BkeTk + nTk, (3.3)

where nT

(22)

inde-Chapter 3. Rate Distortion Model: Single Layer

Figure 3.2: 3-D Model of Hybrid Coder.

Figure 3.3: Motion trajectory for a 16x16 predicted block along the time axis.

pendent of the input vector. Bkis an M2×M2 diagonal scaling matrix whose diagonal

entries represent the scaling β’s for each coefficient. Let fT

k be the reconstruction of fkT, eTk can be rewritten as

eTk = fkT − fkT−1. (3.4)

Substituting (3.2) and (3.4) into (3.3) gives fT

k = BkfkT + (I − βk) fkT₋₁+ nTk, (3.5)

which shows an affine relation between fT

k and fkT₋₁. In general, the M × M predicted

block corresponding to fT

k−1 is not aligned with an M ×M transformed block in the

ref-erence frame, so (3.5) is only valid for fT

k . However, we adopt the alignment assumption

that (3.5) is valid for every nonnegative integer k, and thus the reconstructed inten-sity sequence fT

k

k≥0 satisfies a recurrence relation and shows a motion trajectory, as

(23)

Substituting (3.5) into (3.2) recursively gives the closed-form

eT k = fkT − k−1 n=1 _n₋₁ i=0 (I − Bk−1−i) B_k_−1−nfT k−1−n− k−1 n=1 _n₋₁ i=0 (I − Bk−1−i) nT k−1−n − Bk₋₁fkT−1− nTk−1− _k₋₁ i=0 (I − Bk_−1−i) fT −1. (3.6)

As different characteristics appear between frames, the scaling matrices Bk’s may be

unequal. Since characteristics between frames do not vary greatly except for some special cases, e.g., scene changes, a temporal-stationary assumption, which assumes the scaling matrices are all equal, i.e., Bk = B for every k, is introduced to simplify

the very complicated (3.6):

eT_k _{= f}_kT ₋ k−1 n=0 (I − B)nBfT_k −1−n− k−1 n=0 (I − B)nnT_k −1−n− (I − B) k_f_T −1 = fkT − k−1 n=0 A_nf_kT −1−n− k−1 n=0 C_nnT_k −1−n, (3.7)

where An = (I − B)nB, Cn = (I − B)n, (I − B)0 = I, and f₋₁T = 0. That f₋₁T equals

0 _{indicates the M × M block corresponding to f}T

0 along the motion trajectory in the

first frame (k = 0) is intra-coded: fT

0 = B0f0T + (I − B0) f₋₁T + nT0

= BfT₀ + nT₀. (3.8)

In general, ek is a zero-mean random vector and so is eTk. Thus, the covariance

matrix of eT

k can be computed as follows:

EeT k eT k t = RT_f _{(0) −} k−1 n=0 A_nRT f (n + 1) − k−1 n=0 A_nRT f (n + 1) t (3.9) + k−1 n=0 k−1 m=0 A_nRT f (n − m) (Am)t+ k−1 n=0 C_nRT N(0) (Cn)t, where RT_f _{(n − m) = E} f_kT −n f_kT −m t for k ≥ n, m ≥ 0,

(24)

Chapter 3. Rate Distortion Model: Single Layer and RT N(n − m) = E nT k−n nT k−m t for k ≥ n, m ≥ 0 =      RT N(0) for n = m 0 _otherwise ,

where 0 is a zero matrix.

While M × M blocks along the motion trajectory, regarded as a vector-valued random process fT

k , is assumed to be a vector-valued wide-sense stationary process,

it implies that the autocovariance RT

f (n − m) depends only on frame interval n − m.

Therefore, the two autocovariance functions are independent of the specific frame num-ber k. Noting that any intensity vector and noise vector are statistically independent, we have EfT k−n nT k−m t = 0 for k ≥ n, m ≥ 0. The covariance matrix of eT

k can be seen as a generation of the scalar-valued variance

σ2_k(i) of ith coefficient of eT

k, which is extracted from the ith diagonal element of

EeT k eT k t : σ2k(i) = rfT (0; i) − 2βi k₋₁ n=0 (1 − βi) n rfT (n + 1; i) + βi2 k₋₁ n=0 k₋₁ m=0 (1 − βi)n+mrTf (n − m; i) + k₋₁ n=0 (1 − βi)2nrTN(0; i) , (3.10) where rT

f (n; i) and rNT (0; i) are the ith diagonal elements of RTf (n) and RTN (0),

re-spectively.

Instead of deriving variance in a specific frame k, i.e., σ2

k(i), it is more useful to

consider the convergent behavior of hybrid coding. Let k goes to infinity and adopt the following Markov-like assumption:

rT_f (n; i) =      r_fT (0; i) for n = 0 (αi)|n|−1rTf (1; i) otherwise where1 ≥ αi ≥ 0.

(25)

Chapter 3. Rate Distortion Model: Single Layer We then obtain σ2_i = lim k→∞σ 2 k(i) = 2 2 − βi rT_f _{(0; i) −} 2 2 − βi βi 1 − αi(1 − βi) rT_f (1; i) + rTN(0; i) 1 − (1 − βi)2 . (3.11)

Equation (3.11) shows a convergent form for the variance of the ith coefficient. Before substituting (3.11) to Laplace distortion function (??), we need to fill up the parameters shown up in (3.11), i.e., αi, βi, rTf (0; i), rfT (1; i), and rTN(0; i).

The parameters αi are the impact factors in temporal correlation between frames

which motion-compensated prediction (MCP) tries to exploit for a better prediction efficiency. MCP usual shows a better prediction efficiency when αi’s are all closed to

1 than that when closed to 0. To model a single-layer video coding with a good MCP scheme, each αi is approximated as 1 for all coefficients, i.e. αi ≈ 1 for all i. As

the definition in the theorem T. Berger[12] βi ≈ 1 − D_σ2i

i and r

T

N(0; i) ≈ βiDi (3.11) is

deduced:

σ_i2 = 2rfT (0; i) − 2rfT (1; i) . (3.12)

From the equation (3.12), we conclude that the transformed residual variance σ2 i is

a function of rT

f (0; i) and rTf (1; i). According to the definition in (3.9), rTf (0; i) and

rT

f (1; i) are the ith diagonal elements of RTf (0) and RTf (1), respectively. Regardless

of the independent factor, frame number k, RT

f (0) is autocovariance of the M × M

transform block fT

k and RTf (1) is covariance between fkT and motion compansated

reference block fT k−1.

In order to analyze the distribution of motion-compensated residuals, Tao et al. [11] assumes that the autocorrelation function of the intensity and motion fields can be approximated with a quadratic function and an exponential funciton, respectively:

E_{Ik(si) Ik(sj)} = σI2 1 − si− sj 2 2 K E_{vx(si) vx(sj)} = E {vy(si) vy(sj)} = σm2ρ si−sj ₁ m and E {v (si)} = E {v (sj)} ,

where Ir(s) represents the intensity value of pixel s = (x (s) , y (s)) in reference frame;

v_{(s) = (v}_x_{(s) , v}_y_{(s)) denotes the motion of s, and {σ}2

(26)

Figure 3.4: Single-layer residual signal generating for a 16x8 predicted block

(respective) variances and correlation coefficients. In [13], these model are further ex-tended to address the motion sampling efficiency for MCP. In the temporal dimension, it is further assumed that Ik(s) = Ik₋₁(s + v (s)), where Ik(s) represents the intensity

value of pixel s in current frame; moreover, a block motion vector vc is approximated

as the motion at the block center, i.e., vc ≈ v (sc), and in that regard, block-based

motion estimation is seen as a motion sampler. The fT

k and fkT₋₁ are the transformd intensity vectors of current block and reference

block; however, Tao’s model[11] approximates the intensity fields in the spatial domain. To derive the transfrom domain autocovariance matrices RT

f (0) and RTf (1), we first

illustrate a current intensity block in matrix form Fk, and each element of Fk is in

correspondence with each pixel intensity of block. For processing with covariance matrix, we need current intensity vector fk= vec (Fk) in column-major order:

F_k₌         Ik(s1) Ik(sM+1) · · · Ik(sM2_−M) Ik(s2) Ik(sM+2) · · · Ik(sM2_−M+1) ... ... . .. ... Ik(sM) Ik(s2M) · · · Ik(sM2)         −→ fk= Ik(s1) Ik(s2) · · · Ik(sM2) t

Considering for the transformed block FT

k in matrix form, which is a 2-D

transfor-mation by M ×M DCT matrix H, we can achieve the transformed vector fT _{= vec}FT

k

in column-major order:

(27)

Chapter 3. Rate Distortion Model: Single Layer RT f (0) = E fT k fT k t = E_{(H⊗H) f}k((H⊗H) fk) t = (H⊗H) Ef_kf_kt_(H⊗H)t RT_f _{(1) = E}f_kT −1 f_kTt = E_{(H⊗H) f}k₋₁((H⊗H) fk)t = (H⊗H) Ef_k −1fkt (H⊗H)t

We define the spatial doman covariance matrix E {fkfkt}and E {fk₋₁fkt} to be Rf(0)

and Rf(1). The (i, j)th element of Rf(0) , or [E {fkfkt}]ij, is then computed with

assistance of Tao’s model [11]:

E_{Ik(si) Ik(sj)} = E {Ik₋₁(si+v (si)) Ik₋₁(sj+v (sj))} = E σI2 1 − (si− sj) + (v (si) −v (sj)) 2 2 K = σI2  1 − si− sj2₂ K − 4σ2 m 1 − ρ si−sj ₁ m K   . (3.13)

The (i, j)th element of Rf(1), or [E {fk₋₁fkt}]ij:

E_{Ik−1(si+ vc) Ik(sj)} ≈ E {Ik−1(si+ v (sc)) Ik−1(sj+v (sj))} = σ2I  1 − si− sj2₂ K − 4σ2 m 1 − ρ sc−sj ₁ m K   (3.14)

Finally, we estimate the parameters σ2

I, K, σm2, and ρm, which account for the

sequence characteristics, and the distortion model is obtained by substituting residual variance (3.12) to the Laplace distortion function (2.2). The algorithm is detial in the following section and the estimation of parameter is described in experiment ch.5.

(28)

3.3 Rate Model for Single Layer

The entorpy (2.3), denoted as H, of the quantized transformd residuals is actually far from the true coding rate such it is the measure of independent coding. In hybrid ing, quantizated transform residuals are always dependently coded, like run-length cod-ing, at a block level. And it is extremely difficult to redeem this inaccuracy caused by dependent coding. The authors in [9] have noticed a stable relationship between the real coded rate R and entropy H and modified the rate model by involving some correciton factor. However, we bring out a more steady and more significant linearity relation-ship between natural logarithm of real rate ln (R) and ln

1 q 1+√2 σ H √ 2 σ exp(− √ 2 σ ) . Then we proposed a new and more accurate rate model:

ln R ≈ a ln   1 q 1+√_σ2 H √ 2 σ exp(− √ 2 σ )   + b R _{≈ q}− 1+√2 σ a H √ 2a σ exp b₋ √ 2a σ (3.15)

where the a and b are both constans at prediction mode level, and σ2 _{is the estimated}

residual variance by computing (3.12).

3.4 Rate and Distortion Summery for Single Layer

In this section, we give a summerized algorithm for a single layer modeling through previously proposed rate and distortion model. Some external parameters need to be provided as inputs.

(29)

Input: Variance of motion field, σ2 m,

Correlation coefficient of motion field, ρm,

Variance ofIntensity field, σ2 I,

Positive number, K, Quantization step of q,

Block partition mode, or prediction block center, sc.

Output: Coding bit rate, R (q),

Quantization distortion D (q) .

1. Residual variance for a M × M transform block coefficients: 1.1 Compute spatial domain parameters:

R_f_{(0) by (3.13), R}_f_{(1) by (3.14).} 1.2 Transform residual covariance matrix:

RT

f (0) = (H⊗H) Rf(0) (H⊗H) t

. RT_f _{(1) = (H⊗H) R}_f_{(1) (H⊗H)}t_.

1.3 Compute single layer ith variance of coefficients: σ2

i by (3.12),

2. R-D for a M × M transform block:

2.1 Compute quantization distortion of ith coefficient: Di(q) by substituting σi2 into (2.2).

2.2 Compute entropy of ith coefficient: Hi(q) by (2.3),

2.3 Output:

Average variance per pixel, σ2 ₌ 1 M2

#M2

i=1σi2 .

Average entropy per pixel, H (q) = _M12

#M2 i=1Hi(q) . Quantization distortion, D (q) = 1 M2 #M2 i=1 Di(q) .

(30)

CHAPTER 4 Rate Distortion Model: Multiple Layers

In the preceding single layer derivation, which exploits forward chennel model to form a model of single-layer hybrid coder, we have concluded the D-Q and R-Q function by an estimated residual variance function. In this chapter, we present the rate distortion model of the enhancement layer in Coarse-Grain scalable video coding, which can utilize inter-layer residual prediction to achieve a better prediction efficiency. Without loss of generality, a two-layer scenario is studied in the thesis for simplicity.

4.1 Distortion Model for Multiple Layers

The framework of a two-layer CGS coder base on the forward channel model is depicted in Fig. 4.1

We denote the domain base-layer signal with a superscript B and transform-domain enhancement-layer signal with E. For example, eB

k and eEk represent the

transform-domain prediction residual vector for the base layer and enhancement layer, respectively, of an M × M transform block of the current kth original frame. Note that in CGS, the base layer input vector fB

(31)

Chapter 4. Rate Distortion Model: Multiple Layers

Figure 4.1: 3-D model of SVC hybrid coder with inter-layer residual prediciton.

are vectorization of co-located M × M transform blocks of the current kth original frame, which are identical because both of base and enhancement layers are in the same resolution.

Fig. 4.1 illustrates the framework of a two-layer scalable coder with inter-layer residual prediction. The dashed lines in Fig. 4.1 indicate the inter-layer prediction propagation path of reconstructed base-layer residual eB

k. In that way, the enhancement

layer prediction residual eE

k to be coded exploits the spatial redundancies by subtracting

the base layer coded residual eB

k. The enhancement layer transformed residual vector

is eE k = f E k − f E k−1− eBk. (4.1) The reconstruction eE

k of inter-layer prediction error eEk is then made through the

forward channel model:

eEk = β E ke E k + n E k, (4.2)

or it can also be made as:

eE k = f E k − f E k−1− eBk. (4.3)

Substituting (4.1) and (4.3) into (4.2) gives transform domain representation of enhancement layer coded frame with Inter-layer residual prediction based on one

(32)

base-Chapter 4. Rate Distortion Model: Multiple Layers layer fE k = B E kf E k + I_{− B}E k f E k−1+ I_{− B}E k eBk + n E k. (4.4)

Adopting the same alignment assumption in single-layer derivation and temporal-stationary assumption, which assumes BE

k = BE for every k, we substitute (4.4) into

(4.1) recursively and then give the closed-form

eE_k _{= f}_kE₋ k₋₁ n=1 _n₋₁ i=0 I_{− B}E_k −1−i BE_k −1−nfkE_−1−n− k₋₁ n=1 _n₋₁ i=0 I_{− B}E_k −1−i nE_k −1−n − BE k₋₁fkE₋₁− nEk₋₁− _k₋₁ i=0 I_{− B}E_k −1−i _f_E −1− k₋₁ n=0 _n i=0 I_{− B}E_k −1−i eB k_−1−n− eBk = fE k − k−1 n=0 AE nf E k−1−n− k−1 n=0 CE nn E k−1−n $ %& ' ωE k − k−1 m=0 CE m e B k−m $ %& ' ωB k , (4.5) where AE n = I_{− B}EnBE_{, C}_nE ₌ I_{− B}En_{, and f}E k−n,1 ≤ n ≤ k are reference

blocks in the EL motion trajectory starting from fE

k , with eBk_−n,1 ≤ n ≤ k denoting

their co-located BL coded residual blocks defined as:

eBk−m = BBke B

k−m+ nBk−m.

To explain the reconstructed block in the first intra-coded frame (k = 0), we define fE

−1 = 0 and eB0 = 0. And the definition eB0 = 0 indicates that there is no inter-layer

residual prediction in intra coding. Then fE

0 is obtained by: fE 0 = BE0f0E + I_{− B}E 0 f−1E + I_{− B}E 0 eB 0 + nE0 = BE 0f0E + nE0.

The original residual signal of kth frame in single layer eT

k is modeled as (3.7); each

M _{× M transform block f}_kT_−n,_{1 ≤ n ≤ k is along an independent single-layer motion} trajectory predicted starting from fT

k. Nevertheless, in the inter-layer prediction, eBk_−m

is the BL original residual vector of the block which corresponds with the EL block fE

(33)

Chapter 4. Rate Distortion Model: Multiple Layers starting from fE k. Thefore, we define e B k−m as: eB_k −m = f E k_−m− k₋₁ p=0 AB_pf_kB −m,k−m−1−p− k₋₁ p=0 CB_pnB_k −m,k−m−1−p, where AB p = I_{− B}BpBB_{, C}B_p ₌ I_{− B}Bp_{, and f}_kB −m,k−m−1−p,0 ≤ p ≤ k − 1 are

reference blocks in the BL motion trajectory starting from fE

k−m illustrated in Fig. 4.2,

in other words, the starting block in the BL prediction trajectory is the co-located block fE

k−m in EL. And nBk−m,k−m−1−p,0 ≤ p ≤ k − 1 are corresponding Gaussian noise

vectors. Hence, eB

k_−m can be rewritten as:

eBk−m= BB fE k−m− k_−m−1 p=0 AB pf B k−m,k−m−1−p− k_−m−1 p=0 CB pn B k−m,k−m−1−p +nB k−m. (4.6)

For tractability, we separate the right-hand side of (4.5) into two terms represented as ωE

k and ωkB. According to (4.6), the last term in (4.5) is obtained by:

ωkB = k₋₁ m=0 CE_m_eB_k −m = k₋₁ m=0 CE_m BB f_kE −m− k_−m−1 p=0 AB_pf_kB −m,k−m−1−p− k_−m−1 p=0 CB_pnB_k −m,k−m−1−p + nBk_−m (4.7) Since eE

k is a zero-mean random vector, the transform-domain EL residual

covari-ance is shown as:

E_{eE_k eE

k

t

} = EωE_k ω_kEt_{− E}ω_kEω_kBt₋Eω_kEω_kBt t+ Eω_kBωB_kt, (4.8) and we can observe that the first squared expectation term is arithmetically identical to the derivation of single layer covariance matrix of eT

k (3.9). Eω_kEωE_kt= REf (0) − k₋₁ n=0 AE nR E f (n + 1) − k₋₁ n=0 _AE nR E f (n + 1) t + k₋₁ n=0 k₋₁ m=0 AE nR E f (n − m) (Am)t+ k₋₁ n=0 CE nR E N(0) (Cn)t, (4.9)

(34)

Figure 4.2: EL and BL motion trajectory starting from fE

k_−m for a 16x16 predicted block. where RE_f _{(n − m) = E} f_kE −n f_kE −m t for k ≥ n, m ≥ 0, and RE N(n − m) = E nE k−n nE k−m t for k ≥ n, m ≥ 0 =      RE N(0) for n = m O _otherwise where the autocovariances RE

f (n − m) and REN(n − m) depend only on frame interval

n_{− m accroding to the assumption of wide-sense stationary process as in single layer.} Therefore, we maintain our focus on the cross expectation term

E ωkE ωBk t = E    f_kE₋ k₋₁ n=0 AE_nf_kE −1−n− k₋₁ n=0 CE_nnE_k −1−n _k₋₁ n=0 CE_n_eB_k −n t  . (4.10) Since intensity vectors and noise vectors are statistically independent, by

(35)

substitut-Chapter 4. Rate Distortion Model: Multiple Layers

ing (4.7), we expand (4.10) into:

E_{ωE_k ω_kBt_} = E              f_kE₋ k₋₁ n=0 AE_nf_kE −1−n _k₋₁ n=0 CE_n_βBf_kE −n− k₋₁ n=0 k_−n−1 p=0 CE_nBBAB_pf_kB −n,k−n−1−p t $ %& ' Φf_k              + E              _k₋₁ n=0 CE nn E k−1−n _k₋₁ n=0 k−n−1 p=0 CE nB B_CB pn B k−n,k−n−1−p− k−1 n=0 CE nn B k−n t $ %& ' ΦN k              , (4.11) Noting that nB

k and nEk are memoreless additive Gaussian noise vectors, which can

be deemed as being uncorrelated with the input sequence and independent of any other signals. We conclude EnE_k −n nB_k −m t ≈ O for k ≥ n, m ≥ 0.

The expected value of ΦN

k can be easily worked out:

EΦN k = k−1 n=0 k−1 m=0 k−m−1 p=0 CE_n_EnE_k −1−n nB_k −m,k−m−1−p t CE_mBBCB_pt − k−1 n=0 k−1 m=0 CE nE nE k−1−n nB k−m t CE m t ≈ O

(36)

The cross expectation term (4.11) is then computed as follows:

E_{ω_kEωB_kt_{} = E}Φf_k+ O = k₋₁ m=0 RE f(m) CE mB Bt − k₋₁ m=0 k_−m−1 p=0 RBE f (m + 1, p) CE mB B_AB p t − k₋₁ n=0 k₋₁ m=0 AE_nRE_f_{(m − n − 1)}CE_mBBt + k₋₁ n=0 k₋₁ m=0 k_−m−1 p=0 AE nR BE f (m − n, p) _CE mB B_AB p t (4.12) where RBE f (m − n, p) = E fE k−n _fB k−m,k−m−p t for k ≥ m, n, p ≥ 0. The matrix RBE

f (m − n, p) is the covariance between EL block f E

k−n and BL reference

block fB

k_{−m,k−m−p} in the BL motion trajectory starting from fkE_−m, where the argument

m_{− n signifies the EL motion trajectory distance between f}_kE_−m and fE k−n.

The last expected value of (4.8) is:

Eω_kBω_kBt= E    _k₋₁ n=0 CE n e B k−n _k₋₁ n=0 CE n e B k−n t   = k−1 n=0 k−1 m=0 CE nE eBk−n eBk−m t CE m t . (4.13)

Here we assume that coded BL residuals of different frames are uncorrelated, that is

E eB k_−n eB k_−m t = O, for m = n.

Therefore, (4.13) is derived into:

Eω_kBωB_kt_≈ k₋₁ n=0 CE_n_E eBk_−n eBk_−n t CE_nt = k₋₁ n=0 CE_nBB_E eB_k −n eB_k −n t CE_nBBt₊ k₋₁ n=0 CE_nRB_N₍₀₎CE_nt (4.14)

(37)

Chapter 4. Rate Distortion Model: Multiple Layers where EeB k−n nB k−m t = O for k ≥ n, m ≥ 0 and RB_N_{(n − m) = E}nB_k −n nB_k −m t for k ≥ n, m ≥ 0 =      RB_N_{(0) for n = m} O _otherwise

We substitute(4.9), (4.12), and (4.14) to (4.8) and consequently obtain the covari-ance matrix EeE k eE k t

of transformed EL residual. The EL variance σ2

E,k(i) of

ith coefficient is extracted from the ith diagonal element of EeE k eE k t . And the diagonal elements of EωE k ωB k t

equal the diagonal elements ofEωE k ωB k t t , that is,Eω_kEω_kBt ii = + EωE_k ω_kBt t, ii

. The residual variance σ2

E(i) is then

obtain by letting k goes to infinity:

σ2E(i) = lim k_→∞σ 2 E,k(i) = lim k_→∞ EeE_k eE_kt ii = lim k→∞ Eω_kEω_kEt ii− limk→∞2 Eω_kEωB_kt ii+ limk→∞ EωB_k ωB_kt ii

and adopt the following assumption:

r_fB(n; i) =      rB f (0; i) for n = 0 αB i |n|−1 rB f (1; i) otherwise where1 ≥ αB i ≥ 0, (4.15) rE_f (n; i) =      rE f (0; i) for n = 0 αE i |n|−1 rE f (1; i) otherwise where1 ≥ αEi ≥ 0. (4.16)

(38)

Chapter 4. Rate Distortion Model: Multiple Layers rBE_f _{(m − n, p; i) =}                    rBE f (−1, 1; i) for m − n = −1, p = 1, αB i |p| rB f (1; i) for m − n = 0, αB i |p| αE i |m−n|−1 rE f (1; i) for m − n > 0, 0 otherwise, (4.17) where 1 ≥ αBi , α E i ≥ 0. where rB

f (n; i) and rfE(n; i) are the ith diagonal elements of RBf (n) and REf (n). In

(4.17), rBE

f (m − n, p; i) is ith diagonal element of RBEf (m − n, p), which is a

cross-covariance matrix between the EL block fE

k−n and the BL reference block fkB−m,k−m−p in

the BL motion trajectory starting from fE

k_−m. Based on parameters m, n, and p, The

relativity of fE

k−n and fkB−m,k−m−p is explained by four classified cases. A Markov-like

assumption is utilized for the calculation as well.

1. when m − n = −1, p = 1, or n = m + 1, p = 1, the matrix RBE

f (−1, 1) can

be evaluated with assistance of Tao’s model [11], and rBE

f (−1, 1; i) is the ith

diagonal element of RBE

f (−1, 1). This trajectory situation is depicted in Fig.

4.3(a). 2. Since fE

k−m and fkB−m,k−m are co-located blocks, we can conclude that fkE−m =

f_kB −m,k−m. In the case of m − n = 0, RBEf (0, p) = E f_kE −m f_kB −m,k−m−p t can be rewritten as RBE f (0, p) = E fB k−m,k−m fB k−m,k−m−p t

. According to the as-sumption of wide-sense stationary process, RBE

f (0, p) is equivalent to RBf (p) = EfB k fB k−p t

,where p is the frame interval between fB

k−m,k−m−p and fkE−m. The

above trajectory situation is illustrated in Fig. 4.3(b).

3. Fig. 4.3(c) is the trajectory situation of m − n > 0. We suppose that the target reference block depends only on the last block and not on the entire past trajectory.

4. In addition to above three cases, the remain case shown in Fig. 4.3(d) is approx-imated as zero.

(39)

(a) (b)

(c)

Figure 4.3: Four relativities of fE

k_−n and fkB_{−m,k−m−p} for a 16x16 predicted block.

After computation, we obtain

σE2(i) = 2 2 − βE i rfE(0; i) − rEf(1; i) βE i 1 − αE i (1 − βiE) + r E N(0; i) 1 − (1 − βE i )2 − β B i βE i (βiE− 2) σ2_B(i) − 2β B i 2 − βE i r_fE_{(0; i) −} 1 1 − αE i (1 − βiE) β_iE + β B i αBi (1 − βiE) 1 − αB i (1 − βiB) r_fE(1; i) + 2 βB i 2 2 − βE i 1 1 − αB i (1 − βiB) rBf(1; i) − rfBE(−1, 1; i) where rB

f (n; i), rfE(n; i), rfBE(0; i) and rEN(0; i) are the ith diagonal elements of RBf (n),

RE_f _{(n), R}BE_f _{(0) and R}E_N _{(0), respectively. The parameters α}B_i _{and α}E_i _{are the impact} factors in temporal correlation between frames of base and enhancement layer. There are the same reasons in single layer derivation, αB

(40)

coefficients, i.e. αB

i = αEi ≈ 1 for all i. As the definition in the theorem T. Berger[12]

βE i ≈ 1 − DE(i) σ2 E(i) and r E

N(0; i) ≈ βiEDE(i), the EL residual variance of ith coefficient is

σ2_E(i) = σ 2 E(i) σ2 E(i) + DE(i) σ_E2 (i) σ2 E(i) − DE(i) β_iBσ_B2 (i) + 2βB i r E f (1; i) + σ 2 E(i) σ_E2 (i) + DE(i) 2_{1 − β}iB rEf (0; i) − 2r E f (1; i) + σ 2 E(i) σ2 E(i) + DE(i) 2βB i r B f (1; i) − 2 β_iB2rEB_f _{(−1, 1; i) + D}E(i)

which is a function of enhancement layer distortion DE(i) and base layer imformation

βB

i = 1 − DB(i)

σ2

B(i); we rewrite it into a quadratic equation:

(A − B) DE(i) + (C − A + B + DE(i)) σ2E(i) −

σ_E2(i)2 = 0 (4.18) where A =2rE f (0; i) + 2r BE f (−1, 1; i) β B i − 2r B f(1; i) β_iB, B = 2rE f(0; i) − 2r E f (1; i), C = σ2_B(i) + 2rE f(1; i) β_iB. The parameters rE f(0; i), r E f(1; i), r B f (0; i), and r B

f (1; i) have been explicated in

pre-vious chapter as (3.13) and (3.14). Now rBE

f (−1, 1; i), which is the (i, i)th element

of RBE

f (−1, 1), is obtained by the same method in previous chapter. First, we need

to compute the spatial domain covariance matrix. Then we transform the covariance matrix into DCT domain to achieve RBE

f (−1, 1). Note that fB and fE are co-located

M_{×M transform blocks of the current original frame, i.e., f}B _{= f}E_{, however, for}

differ-ent prediction mode in BL and EL, they have differdiffer-ent corresponding motion vB(sB)

and vE(sE) since a block motion vector is approximated as the motion at the block

(41)

Figure 4.4: Multi-layer residual signal generating for BL 16x16 predicted block and EL 16x8 predicted block

spatial domain covariance matrix is

E_{Ik₋₁(si+vE(sE)) Ik₋₁(sj+vB(sB))} = E σ_I2 1 − (si− sj) + (vE(sE) −vB(sB)) 2 2 K = σ_I2  1 − si− sj2₂ K − 4σ2 m 1 − ρ sE−sB ₁ m K   . (4.19)

4.2 Approximation Distortion Solution for

Multi-ple Layers

A problem that arises in solving EL variance equation is that the equation (4.18) has unknown parameter, distortion of EL DE(i). In this thesis, we approximate that

the residual coefficients of inter-layer residual prediction obey Laplace distribution. Therefore, the distortion and entropy function is the same as single layer’s, and we can obtain the EL variance by soloving the simultaneous equation which contains EL variance function (4.18) and Laplace distortion function (2.2). The equation is very

(42)

Figure 4.5: Laplace distortion and its approximation.

hard to solve because Laplace distortion is a nonlinear function (2.2). The Laplace distortion of five different QP values are schematized in Fig.4.5, and it has the same curve tendency of the other QP values. We can observe that given different QP, there is a distortion upper bound quickly reached by growing σ2_{. We can easily bring out}

the bounding value of distortion for different QP

Dupper =

q2 12 + α

2_.

where q is the corresponding quantization step size. In the case of very small error variance σ2_{, however, the distortion would make a great deal of difference from its}

upper bound Dupper. We use parabolas through the origin to approach the distortion

curves in the region of small σ2_:

Dpara= mσ2+ nσ = mσ+ n 2m 2 − n 2 4m,

where m and n depend on the vertex (a, b) of the parabola. The vertex (a, b) is designed as the point when Laplace distortion is 0.7 times the distortion upper bound Dupper

sketched in Fig. 4.5. Thus, m and n are given as:

n= 2b

a, m= − b a2.

(43)

And the approximation curves are illustrated in Fig. 4.5.

Instead of nonlinearly solving the simultaneous equation of (4.18) and (2.2), we substitute Dupper or Dpara into (4.18) to solve the EL residual variance. A table of the

change points δ (q) for each QP is builded up beforehand. As a result, if the solution of residual variance σ2

E(i) by substituting Dupper into (4.18), which is

(A − B) q2 12 + α 2 + C − A + B + q 2 12+ α 2 σ_E2_{(i) −}σ_E2(i)2 = 0, (4.20)

is smaller than δ (q) looked up from the change point table, i.e., σ2

E(i) < δ (q), we

re-solve it by substituting Dpara:

C − (A − B) + n (A − B) + (n + m (A − B) − 1) σ2E(i) + m σ_E2(i)2 = 0. (4.21) where A =2rE f (0; i) + 2r BE f (−1, 1; i) β B i − 2r B f(1; i) β_iB, B = 2rfE(0; i) − 2r E f (1; i), C = σ2_B(i) + 2rE f(1; i) β_iB.

The final answer of residual variance σ2

E(i) is substituted into Laplace distribution

distortion function (2.2) ultimately for achieving enhancement layer distortion.

4.3 Rate Model for Multiple Layers

The rate model of multi-layer is identical to the rate function of single layer in Sec. 5.1.2. We substitute the EL residual variance with inter-layer residual prediction into the rate function.

4.4 Rate and Distortion Summery for Multiple

Lay-ers

To summerize, we present an algorithm for modeling a two-layer coding with inter-layer residual prediction. Single-layer model is used as the base-layer model in SVC.

(44)

Input: Variance of motion field, σ2 m,

Correlation coefficient of motion field, ρm,

Variance of intensity field, σ2 I,

Positive number, K,

Quantization step of BL, qB, and EL, qE,

Turning point δ (qE),

Block partition mode pair, sB and sE.

Output: Coding bit rate of BL and EL, RB(qB) , RE(qE) ,

Quantization distortion of BL and EL, DB(qB) , DE(qE) .

1. Compute base layer rate and distoriton:

RB(qB), DB(qB) and σB2(i) for each coefficient by Sec.3.4.

2. Compute residual variance for a EL M × M transform block: 2.1 Spatial domain EL parameters:

by (3.13), (3.14),and (4.19)

2.2 Transform residual covariance matrix: RT

f = (H⊗H) Rf (H⊗H)t.

2.3 Obtain EL ith coefficient of residual variance: σ2

E(i) = solution of (4.20),

if σ2

E(i) > δ (qE) then σE2(i) = solution of (4.21).

3. Rate distoriton for a EL M × M transform block:

3.1 Compute quantization distortion of ith coefficient: DE(qE, i) by (2.2)

3.2 Compute entropy of EL ith coefficient: HE(qE, i) by (2.3),

Average entropy per pixel, HE(qE) = _M12

#M2 i=1HE(qE, i). 3.3 Output: Quantization distortion, DE(qE) = _M12 #M2 i=1 DE(qE; i) .

(45)

CHAPTER 5 Experiments and Analyses

Having derived our rate distortion model for SVC inter-layer residual prediction, we conduct extensive experiments in this chapter to evaluate the accuracy of the proposed R-D estimation method, and to analyze the coding efficiency of the inter-layer residual prediction by using our model. We compare our model with eight common test video sequences in CIF and 4CIF format encoded by SVC reference software JSVM 9.19.8[14] into two quality layers. The proposed model is based on the analysis of MCP coding for different partition mode. In our experiments, we test the proposed method in IPPP coding structure. Table 5.1 details above encoder setting. And the mode 16 × 8 and the mode 8 × 16 are viewed as the same due to their symmetry.

5.1 Comparison of Estimation Accuracy

As shown in the R-D function summery of single layer in Sec. 3.4 and multi-layer in Sec. 4.4, we need to estimate the sequence characteristics σ2

I, K, σm2 , and ρm. Those

prarmeters are addressed from the statistical models, and their estimations have been described in the Tao et al. [11]. Instead of the estimation methods in [11], we measure

(46)

Chapter 5. Experiments and Analyses

Sequence CIF@30Hz, 4CIF@30Hz (120 frames) DCT transform size 4 × 4

Prediction structure 1 Reference Frame + IPPP... Intra period -1

ME partition mode pair 16 × 16/16 × 16, 16 × 16/16 × 8, 16 × 16/8 × 8, (BL/EL) 16 × 8/16 × 16, 16 × 8/16 × 8, 16 × 8/8 × 16,16 × 8/ 8 × 8,

8 × 8/16 × 16, 8 × 8/16 × 8, 8 × 8/8 × 8 Base layer QP 24, 26, 28, 30, 32, 34, 36, 38, 40 Enhancement layer QP BL_QP−4, BL_QP−6, BL_QP−8 Inter-layer residual prediction On

Table 5.1: Testing conditions and encoder parameters.

each parameter through a PSNR curves regression, which is described in a following, to find the accuracy of the proposed rate disotrtion model in the better condition.

The designed experiment can be described as following steps:

1. Encode sequence into two-layer SVC bitstream with inter-layer residual prediction based on the setting in Table 5.1.

2. PSNR curves regression for the variables σ2

I, K, σm2, and ρm: Apply conditions

with BL mode 16 × 16, EL mode 16 × 16, and QP difference 4 to the algorithm in Sec. 3.4 and Sec. 4.4. Then sum up the PSNR values of every BL QP and find the parameter set that gives the least sum-up PSNR difference from the real. 3. Take the solutions of variables σ2

I, K, σm2, and ρm from step 2 as inputs for the

algorithm in Sec. 3.4 and Sec. 4.4 to gererate rate distortion model of every configuration in Table 5.1.

As our rate distortion model is a combination of distortion function of QP(D-Q) and rate function of QP(R-Q), we examine the accuracy of distortion and rate separately, and the following experiments are presented with considering only Luma component. Besides, considering the simulcast for enhancement layer, we stretch the QP range of BL PSNR curves regression up to 16, and demonstrate the experiment results in this QP range.

5.1.1 Distortion Model Accuracy

The proposed rate distortion model, can be used to estimate the D-Q curves of the encoded video sequences. The encoding distortion is measured in terms of PSNR between the encoded video and the original one.

As stated before, the proposed experiment requires estimations for σ2

(47)

(a) (b)

Figure 5.1: PSNR v.s. QP curves of BL and EL appling different configuration regression result. (Foreman)

ρm. Despite that different mode pair has slightly different regression parameters, we

only apply the regression result from the situation of BL mode 16×16, EL mode 16×16, and QP difference 4. It is reasonable to adopt this simple and convenient method since the parameters associate with sequence characteristics not the ME mode pair. And the D-Q curves of the mode pair of BL mode 16 × 16 and EL mode 16 × 16 under different configurations (mode pair and QP difference) regression result is demonstrated in Fig.5.1.

The example is the result of Foreman(CIF). We can observe some divergences of curves under different regression results in both BL and EL.

The D-Q curves estimated with the only one regression result of mode pair of BL 16 × 16 and EL 16 × 16 are compared with the actual D-Q curves in Fig.5.2. It can be seen that for the two testing sequences with distinct characteristics, the D-Q curves estimated by the proposed model fit the actual curves very well; the preciseness is maintained at other QP differences and sequences. To substantiate our claim, Table 5.2 provides the results of all testing sequences in different QP differences; the numbers represent the average PSNR error of every mode. It is seen that for both base and enhancemet layer, there is no more than 0.5dB PSNR error across many sequences and QP differences except, the worst case, BL of Mobile(CIF).

(48)

BL 16 × 16 BL 16 × 8 BL 8 × 8

EL EL EL

Figure 5.2: Real v.s. Model D-Q curves appling one regression result. (mobile and foreman sequence)

(49)

Resolution Sequence PSNR error BL(∆dB) EL(∆dB) Qpd=4 Qpd=6 Qpd=8 CIF Bus 0.36 0.32 0.32 0.34 Football 0.40 0.24 0.18 0.16 Foreman 0.33 0.31 0.25 0.20 Mobile 0.67 0.49 0.50 0.50 4CIF City 0.22 0.29 0.24 0.22 Crew 0.41 0.24 0.19 0.16 Harbour 0.25 0.30 0.28 0.27 Soccer 0.33 0.26 0.20 0.17 Average 0.37 0.31 0.27 0.25

Table 5.2: PSNR error between real and model.

(a) (b)

Figure 5.3: Entropy curves compared with actual curves (Foreman)

5.1.2 Rate Model Accuracy

In JSVM, the entropy coding design includes Context-adaptive binary arithmetic coding (CABAC) and Context-adaptive variable-length coding (CAVLC). As the bitrate in video compression is highly related to the entropy coding method and the dependency of quantized coefficients at block level, it is a very difficult problem to conclude a rate model.

Based on the assumption of Laplace distribution, the entropy can be obtain as (2.3). Since entropy is a measurement for the case of independent coding, we can observe wide discrepancies between the actual R-Q curves and entropy versus QP curves of Foreman(CIF) in Fig.5.3 as example. Note that we only present the residual rates in the actual R-Q curves.

(50)

(a) (b) (c)

Figure 5.4: Linearity relationship between ln (R) and H∗ _{(a) Base mode16x16 (b)}

Base mode 16x8 (c) Base mode 8x8 form Forman(CIF)

the same parameter set of distortion model, which is the regression result by the mode pair of BL 16 × 16 and EL 16 × 16.

To compensate this inaccuracy of entropy, we exploit a relationship between the real coded rate R and entropy H. A linearity relationship between natural logarithm of real rate ln (R) and ln

1 q 1+√2 σ H √ 2 σ exp(− √ 2 σ ) , which is represented as H∗_{, is}

observed as in Fig.5.4, Foreman(CIF), for example. The blue lines with solid squares represent the linearity relationship in single-layer coding, and the lines with hollow tokens indecate the linearity relationship for inter-layer prediciton coding of different modes.

The other testing sequences have the similar relationship between ln (R) and H∗

as Fig.5.4; we can achieve a modified rate model, which is represented as R∗_{and is}

discribed before in Sec.. The constants a and b in (3.15) are the coefficients of the approximate line function of ln (R) and H∗_{, and the constants vary accroding to mode}

pair and coded sequence. We provide some example curves of modified rate model compared with real rate curves in Fig.??. Contrast the modified rate with entropy H, it can be observed that the rate modification has a great improvement in fitting the real rate curves acorss difficient sequences. For comprehensive analysis, the results of all testing sequences in different QP differences appear in Table 5.5; the enormous errors of entropy H is corrected by the proposed rate model R∗ _{throughout the difference}

(51)

Chapter 5. Experiments and Analyses Foreman(CIF) (a) (b) Football(CIF) (c) (d) Soccer(4CIF) (e) (f)

Figure 5.5: Modified rate( R∗_{) compared with actual rate, entropy( H) as a contrast.}

Blue lines with solid squares are BL R-Q curve. Red lines with hollow squares are the curves for entropy v.s. QP. Green line with hollow triangles are the curves for modeified rate v.s. QP

針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

國

立

交

通

大

學

多媒體工程研究所

碩

碩

碩

碩

士

士

士

士

論

論

論

論

文

文

文

文

針對可調視訊編碼粗略可調性之模式相依

的位元與失真解析模型

Analytical Mode-Dependent Rate and Distortion Models

for H.264/SVC Coarse Grain Scalability

研 究 生：曾于真

指導教授：彭文孝 教授

中

中

中

中 華

華

華 民

華

民

民 國

民

國

國

國 一百

一百 年

一百

一百

年

年

年 十

十

十 月

十

月

月

月

針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

Analytical Mode-Dependent Rate and Distortion Models for H.264/SVC

Coarse Grain Scalability

研 究 生：曾于真 Student：Yu-Chen Tseng

指導教授：彭文孝 Advisor：Wen-Hsiao Peng

國 立 交 通 大 學

多 媒 體 工 程 研 究 所

碩 士 論 文

針對可調視訊編碼粗略可調性之模式相依的位元與失真解析模型

研 究 生：曾于真 指導教授：彭文孝

國立交通大學多媒體工程研究所 碩士班

摘

要

摘

摘

要

要

摘

要

Analytical Mode-Dependent Rate and Distortion

Models for H.264/SVC Coarse Grain Scalability

Student : Yu-Chen Tseng Advisor : Wen-Hsiao Peng

Institute of Multimedia Engineering

National Chiao Tung University

ABSTRACT

研究生：曾于真

指導教授：彭文孝教授

中華

華民

民國

國一百

一百年

年十

十月

_月

研究生：曾于真 Student：Yu-Chen Tseng

國立交通大學

多媒體工程研究所

碩士論文

研究生：曾于真指導教授：彭文孝

國立交通大學多媒體工程研究所碩士班