A New Coding Algorithm for Arbitrarily Shaped Video Object in MPEG-4

(1)

中華大學碩士論文

題目：一個應用於 MPEG-4 中任意形狀視訊物 件之編碼方法

A New Coding Algorithm for Arbitrarily Shaped Video Object in MPEG-4

系所別：資訊工程學系碩士班學號姓名：M09002031 黃正岱指導教授：李建興博士

中華民國九十二年七月

(2)

Abstract

The ability of coding arbitrarily shaped image is an important feature of MPEG-4.

In this thesis, a new texture coding algorithm, called boundary pixel scanning (BPS) technique, for coding arbitrarily shaped video object is presented. The main idea behind the algorithm is to re-order the object pixels such that pixels with similar gray levels will be aligned in the same column. Thus, the magnitude of the AC coefficients will be also insignificant after applying 1-D SA-DCT first along the vertical and then along the horizontal directions. Moreover, we integrate the BPS technique into the boundary block grouping and merging (BBGM) technique. Experimental results have shown the effectiveness of the proposed technique.

(3)

Abstract

I List of figures IV List of tables

VI

Chapter 1 Introduction

………..1

1.1 Motivation………..1

1.2 An Overview of the MPEG-4 Video Coding……….2

1.2.1 Shape coding……….3

1.2.2 Motion estimation and compensation………....6

1.2.3 Texture coding………7

1.3 Outline of the Thesis………...………..9

Chapter 2 A Review of the MPEG-4 Texture Coding Methods

………10

2.1 Extrapolation-Based Technique……….11

2.1.1 Fixed value padding technique………..12

2.1.2 Low-pass extrapolation (LPE) technique………...…13

2.1.3 Macroblock padding technique………..…13

2.1.4 Extension interpolation padding technique………....15

2.1.5 Interlaced transform padding technique………..…….……..17

2.2 Shape-Adaptive Transforms…...……….………....20

(4)

2.2.1 Shape-adaptive DCT (SA-DCT)………..…….………21

2.2.2 ∆DC-SA-DCT………..……….23

2.3 Block-Merging Technique………..…….25

2.3.1 Boundary block-merging technique………..…25

Chapter 3 The Proposed Texture Coding Technique for Arbitrarily Shaped Video Object

………....….33

3.1 Boundary Pixel Scanning SA-DCT (BPS-SA-DCT)……….33

3.2 Boundary Pixel Scanning and Boundary Block Grouping and Merging (BPS-BBGM)………...42

Chapter 4 Experimental results

………....45

4.1 Experiments on BPS-SA-DCT, SA-DCT, and LPE………...48

4.2 Experiments on BPS-BBGM, SA-DCT, BBM, and BBGM…...52

Chapter 5 Conclusion

………...56

References

………..…………..57

(5)

List of Figures

Fig. 1.1 Structure of MPEG-4 VOP encoder………..….3

Fig. 1.2. Rectangular bounding box……….4

Fig. 1.3 (a) Original frame (b) A Segmented VOP (c) Corresponding binary shape...5

Fig. 1.4 Gray scale shape coding technique……….6

Fig. 1.5. An example to show the block types of an arbitrarily shaped VOP………..8

Fig. 1.6 Structure of the VOP decoder……….……....9

Fig. 2.1 (a) The texture encoder (b) The texture decoder………10

Fig. 2.2 A boundary block………...12

Fig. 2.3 An example of the macroblock padding technique………..….….15

Fig. 2.4 An 1-D example of the EI algorithm………..16

Fig. 2.5 An example of the 2-D EI algorithm. (a) Encoding (b) Decoding…….…...16

Fig. 2.6 The process of 2-D padding………...…….20

Fig. 2.7 The steps for implementing SA-DCT……….…22

Fig. 2.8 Block diagram of ∆DC-SA-DCT. (a) Encoder (b) Decoder……..……...….24

Fig. 2.9 (a) MPEG-4 texture coder (b) BBM integrated into the MPEG-4 texture coder………..26 Fig. 2.10 Blocks merging types. (a) Horizontal merging (b) Vertical merging

(6)

(c) Diagonal merging………..…..…..……….….……….27

Fig. 2.11 An example of BBM process. (a) Before BBM. (b) After BBM………28

Fig. 2.12 Types of block grouping and merging………29

Fig. 2.13 The process of boundary block-merging………31

Fig. 3.1 (a) Original boundary block (b) Left-side scanning (c) Right-side scanning (d) Top-side scanning (e) Bottom-side scanning...………..35

Fig. 3.2 The process of BPS-BBGM method……….43

Fig. 4.1 Intra quantizer matrix………46

Fig. 4.2 Akiyo sequence: (a) VOP (b) Corresponding binary shape………...46

Fig. 4.3 Weather sequence: (a) VOP (b) Corresponding binary shape………...47

Fig. 4.4 News sequence: (a) VOP (b) Corresponding binary shape………47

Fig. 4.5 Mother and daughter sequence: (a) VOP (b) Corresponding binary shape…...47

Fig. 4.6 R-D curve for boundary blocks (a) Akiyo (b) Weather (c) News (d) Mother and daughter………51

Fig. 4.7 R-D curve for boundary blocks (a) Akiyo (b) Weather (c) News (d) Mother and daughter………55

(7)

List of Table

Table 3.1………..…40

Table 3.2………..41

Table 4.1 Code mode distribution for the VOP of Akiyo………48

Table 4.2 Code mode distribution for the VOP of Weather………49

Table 4.3 Code mode distribution for the VOP of News……….49

Table 4.4 Code mode distribution for the VOP of Mother and daughter………49

Table 4.5 Code mode distribution for the VOP of Akiyo………52

Table 4.6 Code mode distribution for the VOP of Weather………53

Table 4.7 Code mode distribution for the VOP of News……….53

Table 4.8 Code mode distribution for the VOP of Mother and daughter………53

(8)

Chapter 1 Introduction

1.1 Motivation

The objective of the MPEG-4 audio-visual standard is to provide the core technology allowing efficient content-based storage, transmission and manipulation of video, graphics, audio and other multimedia data [1-5]. Applications of MPEG-4 include digital television, streaming video, mobile multimedia, and games, etc. To facilitate flexible coding of audio-visual data, MPEG-4 supports many important features, such as improved coding efficiency, robustness to transmission error, and interactivity with the users, etc.

Traditional video coding standards such as MPEG-1 and MPEG-2 represent the moving pictures as a single intrinsic entity. That is, they are frame-based coding techniques. MPEG-4, however, is an object-based coding technique. That is, the moving pictures are treated as an organized collection of visual objects. The MPEG-4 visual specification supports several types of visual objects. Among these visual objects, the video object can be thought of as a sequence of two-dimensional arbitrarily shaped images. Therefore, a video object is described by the information of texture, shape, and motion vectors. Efficient methods for coding the shape, texture,

(9)

and motion information are important to MPEG-4. The texture coding is especially important since the data required to represent the texture information is much larger than that of the shape or motion information. In this thesis, an efficient arbitrarily shaped texture coding technique will be proposed. A survey of arbitrarily shaped texture coding techniques will be presented in Chapter 2. In the following section, we will first give an overview of the MPEG-4 video coding standard.

1.2 An Overview of the MPEG-4 Video Coding Standard

In this section, an overview of the MPEG-4 video coding standard is given.

Since MPEG-4 is used to encode arbitrarily shaped video objects, each visual scene must be first segmented into a set of arbitrarily shaped regions called video object planes (VOPs). Note that the MPEG-4 standard didn’t specify how the VOP is generated. The VOP is the basic input to the MPEG-4 encoder. Fig. 1.1 shows the structure of the VOP encoder in MPEG-4. The encoder is mainly composed of three parts: motion coder, texture coder, and shape coder. The motion coder computes the motion vector information of a macroblock or a block. Block-based DCT is used for coding the texture information. The shape coder is used to represent the shape information of the VOP efficiently. These coders must be adapted to work with arbitrary shaped VOPs.

(10)

Shape Coding

Motion Estimation

Motion Compensation

Frame Memory

Texture Coding

MUX VOP of Arbitrary

Shape

Shape Information

Motion Information

Buffer VOP

+ -

Fig. 1.1 Structure of MPEG-4 VOP encoder.

1.2.1 Shape coding

The shape information of arbitrarily shaped video objects is very useful in

object-based video coding. According to the shape information, we can know where

the object is in a VOP. The shape information includes two types: binary shape and

gray scale shape.

In MPEG-4, each arbitrarily shaped VOP is circumscribed by a rectangular

bounding box. Fig. 1.2 shows an arbitrarily shaped VOP and its corresponding

bounding box. The binary shape information of a VOP is represented as a matrix with

binary values. Each element in this matrix has one of two possible values which

(11)

indicates whether a pixel is located within the VOP or not. Each 16×16 block within

the bounding box is named as a binary alpha block (BAB). The BABs can be classified into three types: transparent blocks, opaque blocks, and alpha blocks. The transparent block is outside the VOP entirely, so it does not contain any information about the video object. The opaque block is entirely located within the VOP. For the alpha block, only a part of the pixels is located in the VOP.

Bounding Box

Macroblock Inside VOP

Shape Boundary

Macroblock Video Object

Plain Macroblock

Outside VOP Fig. 1.2. Rectangular bounding box.

In MPEG-4, The shape information is encoded by the modified content-based arithmetic encoding (CAE) [6]. Fig. 1.3(a) shows a frame in a video sequence. The segmented VOP composited onto a black background is shown in Fig. 1.3(b). Fig.

1.3(c) is the binary shape information for this VOP.

(12)

(a)

(b) (c)

Fig. 1.3 (a) Original frame (b) A segmented VOP (c) Corresponding binary shape

The gray scale shape information is similar to the binary shape, but each pixel can take on a range of values instead of binary values. The gray scales are used to represent the degree of transparency of the object and thus the aliasing effect can be reduced. The gray scale shape information is encoded by separate encoding of the shape and transparency information as shown in Fig. 1.4. The coding method of the shape information is similar to the binary shape coding technique. The transparency values are encoded by the texture coding method.

(13)

Gray scale Shape Coding

Support Transparency

Value

Binary Shape

Coder Texture Coder

Fig. 1.4 Gray scale shape coding technique.

1.2.2 Motion Estimation and Compensation

In general, Motion estimation and compensation are used to reduce the temporal redundancies among video sequences. In MPEG-4, these techniques are similar to those used in MPEG-1 or MPEG-2. There are three types of encoding modes for an input VOP: Intra VOP (I-VOP), Predicted VOP (P-VOP), and Bidirectionally Interpolated VOP (B-VOP). An I-VOP is independently encoded without reference to other VOPs. The P-VOP is predicted by the previously decoded VOP and the B-VOP is predicted based on the past as well as future VOPs. In these three coding modes,

only the P-VOP and B-VOP need to code the motion vector information. In the beginning, a VOP is divided into macroblocks of size 16×16 pixels. If the macroblock

is entirely outside the VOP, no motion estimation is needed. If the macroblock is entirely inside the VOP, motion estimation is performed in the same way as that of

(14)

MPEG-1 or MPEG-2 standard. For boundary macroblocks, in which only a part of the pixels is inside the VOP, a modified block matching method is used. Since the motion vector may be outside the reference VOP, a padding technique is performed to extrapolate the values of those pixels that are outside the reference VOP. In the block matching process, sum of absolute difference (SAD) is only computed for those pixels that are within the VOP and their corresponding pixels in the reference VOP. To improve the prediction quality, MPEG-4 also supports overlapped motion compensation technique, which is similar to that of H.263 standard.

1.2.3 Texture Coding

Only the texture data that are within the VOP have to be encoded. First, a VOP is divided into a set of 8×8 blocks. There are three types of blocks: the interior block

which is completely located inside the VOP, the boundary block in which only a part of the block pixels is located in the VOP, the exterior block that is completely located outside the VOP. Fig. 1.5 gives an example of these three types of blocks.

(15)

Boundary Block

Exterior Block

Interior Block

Fig. 1.5. An example to show the block types of an arbitrarily shaped VOP.

The exterior blocks contain no texture information and thus no texture coding is required. The interior blocks can be encoded by using conventional block-based DCT coding methods. For boundary blocks, only those pixels that are within the VOP are defined, so that the traditional block-based DCT coding methods are not suitable. In general, the texture coding methods can be classified into two major types:

extrapolation padding techniques [9-12], and shape-adaptive DCT (SA-DCT) techniques [13, 19, 20]. In Chapter 2, we will describe these methods in detail.

The structure of a VOP decoder is shown in Fig. 1.6. The decoding process is similar to the encoding process but in reverse order.

(16)

DEMUX

Shape Decoding

Motion Decoding

Texture Decoding

Motion Compensation

Reconstructed VOP

VOP Memory Bitstream

Fig. 1.6 Structure of the VOP decoder.

1.3 Outline of the Thesis

In Chapter 2, we will make a review of some MPEG-4 texture coding methods.

The proposed texture coding algorithm will be described in Chapter 3. The experimental results will be given in Chapter 4 to show the effectiveness of the proposed texture algorithm. Finally, a conclusion is given in Chapter 5.

(17)

Chapter 2 A Review of the MPEG-4 Texture Coding Methods

The texture information includes a luminance and two chrominance values. For the texture information there are two types of coding modes: the inter mode and the intra mode. For the inter mode, the values of the prediction error are transform coded.

For the intra mode, the original texture values are directly transform coded. In the thesis, we focus our texture coding methods on the intra mode only. The block diagram of the intra-mode texture coding is shown in Fig. 2.1.

DCT Quantization DC/AC

Prediction

Coefficient Scan

Variable Length

Encoding Bitstream

VOP Texture

(a) Variable

Length Decoding

Inverse Scan

Inverse DC/AC Prediction

Inverse

Quantization Inverse DCT Bitstream

Reconstructed VOP (b)

Fig. 2.1 (a) The texture encoder (b) The texture decoder.

(18)

For coding the texture information of a VOP, the VOP is first divided into non-overlapping 8×8 blocks and each block is DCT transformed. The transform

coefficients are then quantized. The quantized DC and AC values are predicted to reduce inter-block correlation. Then, the scanning method is applied to the prediction errors to convert them into a 1-D vector. Finally, the variable length Huffman coding is performed to produce the coded bitstream. The coding processes of the decoder are in the reverse order of that of the encoder.

The major problem with block-based DCT is that it cannot be directly applied to the boundary blocks since not all pixels are within the block. Therefore, many well-known texture coding methods are proposed for coding the boundary blocks, including extrapolation-based techniques [9-12], shape-adaptive DCT (SA-DCT) methods [13-20], boundary block merging methods [21, 22], and wavelet-transform based methods [7, 8], etc. In the following sections, we will make a review of the texture coding methods.

2.1 Extrapolation-Based Technique

In signal extrapolation methods, a padding process is exploited to place values on transparent pixels in the boundary block such that rectangular block-based DCT coding can be applied to the boundary block. Fig. 2.2 is an example of a boundary

(19)

block in which the blank pixels indicate the transparent pixels.

29 32 31 16 32 103 149 143

30 33 37 15 16 89 153 148

39 34 42 26 18 77 147 151

48 43 48 34 25 75 139 148

34 87 136 142

108 139 133

118 142 131

117 144 133

Fig. 2.2 A boundary block

2.1.1 Fixed value padding technique

The simplest padding method is to place zero values on the transparent pixels.

This zero padding method, however, will produce many nonzero high-frequency

coefficients and thus the coding performance suffers. A better method is to fill the

transparent pixels with the mean values of the object data. The discontinuity at the

border of the object boundaries can be reduced and some high-frequency information

can be eliminated. Another better method to diminish the discontinuities at the border

is to extend the object pixels with its mirror reflection outside the object region.

However, separable implementation of the 2-D mirror reflection is sensitive to the

order of operations (horizontal followed vertical or vice versa). The major advantage

(20)

of these methods is their low computational complexity. However, the obvious drawback is the increasing of many nonzero high-frequency coefficients. Therefore, MPEG-4 video standard adopted another padding method called low-pass extrapolation (LPE) method [9, 24] to reduce the number of nonzero high-frequency coefficients.

2.1.2 Low-pass extrapolation (LPE) technique

In MPEG-4 standard, the LPE technique is performed in three steps. First, the arithmetic mean value m of all pixels within the object region is calculated. Secondly, the value m is assigned to each transparent pixel. Finally, apply the following low-pass filtering operation to each transparent pixel:

p(i, j) = [ p(i, j-1) + p(i-1, j) + p(i, j+1) + p(i+1, j)]/4

If one or more of the four pixels used for low-pass filtering are outside of the block, the corresponding pixels are not considered for the averaging operation and the averaging factor 1/4 is modified accordingly. This LPE method has two advantages.

First, the signal will be rather smooth because it has a low-pass characteristic.

Secondly, it can avoid the discontinuities at the border of the object boundary.

2.1.3 Macroblock padding technique

In the macroblock padding technique [10, 24], the padding process consists of

(21)

two steps: horizontal repetitive padding and vertical repetitive padding. The padding operations for these two padding steps are of the same. For the horizontal repetitive padding, a binary value (0 or 1) is first assigned to every pixel in the macroblock. A value of 0 indicates a transparent pixel and a value of 1 indicates an object pixel or already padded transparent pixel. That is, if the transparent pixel is padded, its binary value is set to 1. The padding operation is repetitive until all pixels are checked and the corresponding binary values are set to 1. The padded value of a pixel is determined by the values of its left and/or right pixels. If the binary values of its left and right pixel are both equal to 1, the padded value is the average of these two neighboring pixels. Otherwise, if only one of the two neighboring pixels has a binary value equal to 1, the padded value is set to be the value of the neighboring pixel with binary value equal to 1. Fig. 2.3 shows an example of the boundary macroblock padding technique. Here a 4×4 block instead of a 16×16 block is illustrated for simplicity.

(22)

- C

- F

A D - G

B - - -

- E - -

A C - F

A D - G

B

- G

B E - G 2 D E

Texturedata

2 2 / )

(D E G

A C

F A D

G B

G B E

G 2 G E 2

F C

2 G D

2 E D

0 1 0 1

1 1 0 1

1 0 0 0

0 1 0 0

1 1 0 1

1 1 1 1

Shapedata

Horizontal padding

Vertical padding +

+ +

Fig. 2.3 An example of the macroblock padding technique.

2.1.4 Extension-interpolation padding technique

The extention-interpolation padding method was proposed by Yi et al. [11],

which is based on the extension-interpolation (EI) scheme. The interpolation

operation of the EI scheme is performed on the transform domain. Assume that the

vector length is N. If the length of an object is equal to K, the K-point 1-D DCT is

performed on the K object pixels and will produce K DCT coefficients. To extend the

length of the DCT coefficients to be N, the EI method will pad the (N-K)

high-frequency DCT coefficients with zeros. An N-point inverse DCT is performed on

the new padded transformed coefficient vector. The padding process is illustrated in

Fig. 2.4.

(23)

1-D DCT Inverse DCT

Transparent pixel DCT coefficient Reconstructed pixel Object pixel Zero padding

Fig. 2.4 An 1-D example of the EI algorithm.

To implement 2-D interpolation, the EI method is first performed in vertical direction and then horizontal direction or vice versa. Fig. 2.5 illustrates an example, where EI is performed first horizontally and then vertically.

Original Boundary Block Horizontal EI Vertical EI (a)

Vertical Inverse EI Horizontal Inverse EI Reconstructed Boundary Block

(b)

Fig. 2.5 An example of the 2-D EI algorithm (a) Encoding (b) Decoding

(24)

2.1.5 Interlaced transform padding technique

The method is proposed by Shen et al. [12]. It provides a good performance and requires less computational complexity. In this method, the number of nonzero transformed coefficients after 2-D DCT is equal to the number of object pixels. Since the human visual system is less sensitive to high-frequency components, the method retains the same number of low-frequency coefficients as that of object pixels and then sets the other high-frequency coefficients to be zero.

The padding operation is performed for the transform coefficients. Assume that the input vector u has an arbitrarily length of K (1 ≤ K ≤ N), and thus the padding

vector v has a length of N-K. Then, the N-point DCT is applied to the padded vector [u^T v^T ]^T. The N-point DCT can be expressed by the following matrix multiplication:





 







 



 − − − − −

−

= −





 





−

-K N-

K

-K N-

K-

v v u

) , N c(N , K)

c(N ) , K c(N )

, c(N-

) c(K, N c(K, K)

) c(K, K )

c(K,

) , N c(K , K)

c(K ) , K c(K )

, c(K-

V V U

1 0 1

1 1 1

1 1 0

1

1 1

0

1 1 1

1 1 0

1

M M

L L

M O

M M

O M

L L

M O

M M

O M

M

M 





 −



 U₀ c(0, 0) L c(0, K 1) c(0, K) L c(0, N-1) u₀

(1)

The matrix C = [ c(p, q) ]N×^N is the DCT transform matrix with



 +

= N

)π q c p(

c(p,q) N _p

2 1 cos 2

2

2 1

where p, q = 0, …, N-1. Here cp = if p = 0 and cp = 1 otherwise. In fact, the above equations can be rewritten as follows:









=

 

U C₀₀ C₀₁ u

(2)

(25)

where u and U are of dimension K, v and V are of dimension N-K, and



c(K−1, 0) L c(K−1, K−1)







 −

=

) c(K, K )

c(K, 0 1

10 M O M

L C







 −

=

) , K c(

) ,

c(0 0 0 1

00 M O M

L

C 



 −

=

) , N c(

, K)

c(0 0 1

01 M O M

L C

v C u C

V ₁₀ + ₁₁ ,



c(K−1, K) L c(K−1, N−1)







 −

=

) c(K, N

c(K, K) 1

11 M O M

L C



c(N−1, 0) L c(N−1, K−1) ,



c(N−1, K) L c(N−1, N−1)

The objective of the padding technique is to force the N-K high-frequency components to be zero, that is V = 0. From (2), we can get = . By solving this equation, we can derive . Obviously, if we set V = 0 then we will get the padding vector

)

( ₁₀

1

11 V C u

C v= ⁻ −

u C C

v=− ₁₁⁻¹ ₁₀ . (3) An 1-D example is given to express this idea. Consider an input vector u of size 5, u = [2, 4, 6, 8, 10]. We have to pad three extra values in order to perform the 8-point DCT. From Eq. (3), the three padded values can be computed and denoted as v = [7.7251, -1.8216, -11.8763]. Then, we get a padded vector [u^T v^T]^T = [2, 4, 6, 8, 10, 7.7251, -1.8216, -11.8763]. This new vector is transformed using 8-point DCT.

The transformed coefficient vector will be [U^T V^T]^T = [8.4949, 8.5505, -15.0866, 6.6025, -2.7506, 0, 0, 0]. As expectation, the values of the last three high-frequency coefficients are equal to 0. However, the other nonzero AC coefficients have relatively large magnitudes and thus the compression efficiency will be reduced. To solve this

(26)

problem, an interlaced padding technique is exploited to reduce the magnitudes of the nonzero AC coefficients.

If we substitute Eq. (3) into Eq. (2), we can derive where . From the matrix theory, we know that || . When p = 2, this inequality indicates that the energy of the transformed vector U is bounded through the L

Bu U=

10 1 11 01

00 C C C

C

B= − ⁻ U||_p≤||B||_p|| u||_p

2-norm of the transform kernel matrix B. For achieving higher compression efficiency, the energy of U is expected to be as small as possible.

Therefore, we can interlace the original vector u with the padding vector v, in which the interlacing order is determined so that the matrix B has the smallest L₂-norm. This interlacing strategy will make the magnitudes of the DCT coefficients as small as possible. According to the strategy, the interlacing order can be determined off-live for K = 2, 3, …, 7. Another example is given to explain the interlacing operation. Let an input vector u be of size 5, u = [ 2, 4, 6, 8, 10]. According to the interlacing strategy, the three padded values are v = [2.5670, 5.4894, 6.3555] and are interlaced with the vector u to form the new padded vector [u^T v^T]^T = [2, 2.5670, 4, 5.4894, 6, 6.3555, 8, 10]. If this vector is transformed using 8-point DCT, the transformed coefficient vector becomes [U^T V^T]^T = [15.7020, -6.8860, 0.2763, -1.4990, 0.9076, 0, 0, 0]. Obviously, the magnitudes of the nonzero AC coefficients are minimized.

(27)

The padding operation can be extended to the 2-D case. The extension scheme is similar to that of EI algorithm. Either vertical padding or horizontal padding can be performed first. The detailed procedure of the 2-D padding operation is shown in Fig.

2.6. In this example, the vertical padding is performed first.

Fig. 2.6 The process of 2-D padding.

Vertical Padding

Vertical 1-D DCT

Original Boundary Block Horizontal

Padding

Position outside an object or zero DCT coefficient

Horizontal 1-D DCT

Position inside an object

Position for vertical padding Position of nonzero coefficient after vertical D DCT

Position for horizontal padding

Position of nonzero coefficient After horizontal 1D DCT

2.2 Shape-Adaptive Transforms

Instead of coding a full rectangular block, shape-adaptive transforms encode only those pixels within the object. The 2-D shape-adaptive transform is implemented by successively applying two 1-D transforms, first in the vertical direction and then in the horizontal direction.

(28)

2.2.1 Shape-adaptive DCT (SA-DCT)

SA-DCT [13] was proposed by Sikora and Makai. The objective of SA-DCT is to find a reasonable tradeoff among computational complexity, coding efficiency, and full backward compatibility to traditional DCT techniques. The basic idea of SA-DCT is to separate 2-D DCT into two 1-D DCT transforms. Initially, the pixels in every column are aligned by shifting to the upper border. L-point 1-D DCT is performed on every column vector xj with L = Nj elements according to the following equation:

j L L

j DCT x

b =α (4)

where αL represents the scaling factor depending on Nj for the column vector xj and DCTL denotes the L×L DCT kernel matrix:

] 2) ( 1 cos[

) ,

(i j i j L

L

γ + ⋅π

=

DCT (5) where for i = 0 and γ = 1 else. Next, the Mi coefficients belonging to the

same row i are shifted to the left border. Finally, 1-D SA-DCT is performed on every row vector ci with L = Mi elements:

2

= 1 γ

i L L

i DCT c

d =α (6) where di is the i-th row vector of the final 2-D SA-DCT coefficients.

For inverse transform, two inverse 1-D SA-DCT are performed successively:

∗

∗ = _L^T _i

L

i L DCT d

c α

2 (7)

∗

= ⋅ _L^T _j

L

j L DCT b

x α

2 (8)

(29)

that the DC value is affected by the vector length L. This condition will result in a severe problem that a spatially uniform object may generate some horizontal AC coefficients. To solve this problem, the scaling factor is set to be .

L L

= 4 α

The detailed steps for implementing SA-DCT are illustrated in Fig. 2.7. Fig.

2.7(a) shows an 8×8 boundary block where the black region indicates the object region and the white region indicates the transparent region. In the first step, all pixels in each column are shifted to the upper border (see Fig. 2.7(b)). Next, the 1-D SA-DCT is applied to the object pixels in each column (see Fig. 2.7(c)). Then the transform coefficients of each row are horizontally shifted to the left border (see Fig.

2.7(d)). Finally, for each row, a horizontal 1-D SA-DCT is applied to get the final DCT coefficients as shown in Fig. 2.7(e).

DCT1 DCT2 DCT3 DCT4 DCT6 DCT3

DCT₆ DCT₅ DCT₄ DCT2 DCT₁ DCT₁

(a) (b) (c)

Fig. 2.7 The steps for implementing SA-DCT.

(d) (e)

(30)

Since the column vectors may be of different sizes, after the first stage of 1-D SA-DCT, the coefficients located on the same row may indicate different frequency components. Therefore, the cross correlation among these row coefficients will be very low and thus the second stage of 1-D SA-DCT will result in many AC coefficients with large magnitudes. The coding efficiency is thus reduced. To solve this problem, some algorithms [14-17] try to align the coefficients with similar frequencies or higher cross-correlation into the same row. Simulation results show that improvement on the conventional SA-DCT can be achieved.

2.2.2 ∆DC-SA-DCT

The 1-D SA-DCT formulas described in Eqs. (4) and (6) are called non-orthonormal SA-DCT (NO-SA-DCT) because the transformation matrix is not normalized. To normalize the basis function of SA-DCT, the scaling factor is set to be . This version of transform is called pseudo-orthonormal SA-DCT (PO-SA-DCT). However, the PO-SA-DCT is not DC-preserving. Therefore,

PO-SA-DCT is not suitable for intraframe coding. Kauff et al. proposed an extension of PO-SA-DCT, called ∆DC-SA-DCT [19]. In ∆DC-SA-DCT, the DC value of a block is separated before PO-SA-DCT. At the decoder, a ∆DC correction procedure is

applied while performing the inverse PO-SA-DCT. The block diagram of

L L

= 2 α

(31)

∆DC-SA-DCT is shown in Fig. 2.8. First, a mean value µ is calculated by averaging

the gray levels of all object pixels. Secondly, zero mean image data Yi, j are derived by subtracting the mean value from Xi, j, that is Yi, j = Xi, j -µ. Then, the PO-SA-DCT is applied to Yi, j. The mean value µ is scaled in the same way as DC values in standard DCT. Finally, the DC coefficient of PO-SA-DCT is overwritten with this scaled mean

value. The decoding process is the same as the encoding process but in reverse order.

While performing the inverse PO-SA-DCT, an appropriate ∆DC correction procedure

is required to enhance the reconstructed image quality. In the texture coding method proposed by A. Kaup. [20], the NO-SA-DCT is applied for intraframe coding and PO-SA-DCT is applied for interframe coding.

Mean Calculation

PO- SA-DCT

DC- Scaling

= DC X_{i, j}

DC

= 0

+ +

Y_{i, j} C_{i, j}

C_{0, 0}

*C_{i, j}

µ

(a)

Fig. 2.8 Block diagram of ∆DC-SA-DCT. (a) Encoder (b) Decoder

(32)

DC

Extraction C_0,0 = 0 +

*C_{i, j} Inverse

PO-SA- DCT DC- Re-scaling

DC *

C_{i, j}* Y_{i, j}* X_{i, j}*

* = 0

b_i e_corr

µ

µ DC-

Correction

∆

(b)

Fig. 2.8 (Continued)

2.3 Block-Merging Technique

The block-merging technique was originally proposed by Moon et al. [21]. The main idea behind the method is that if one block can be merged into another one, the number of encoding blocks would be reduced. Therefore, the bit rate will be reduced accordingly.

2.3.1 Boundary block-merging technique

In MPEG-4 texture coder, the padding process is first performed in background region and then conventional rectangular DCT texture coding is applied to the padded block. Fig. 2.9(a) shows the texture coder in MPEG-4. Moon et al. proposed a method called boundary block-merging technique (BBM) [21] in which one block is merged into another one. Fig. 2.9(b) shows the MPEG-4 texture coder into which the BBM

(33)

only to a predefined pair of neighboring blocks. The improvement of coding efficiency is obtained by a reduction of the number of blocks to be encoded.

Boundary Block Merging

Texture Coding

Texture Decoding Boundary

Block Splitting

Reconstructed

(b) Data Fig. 2.9 (a) MPEG-4 texture coder (b) BBM integrated into the MPEG-4 texture coder

Padding Texture Coding

Texture Decoding Boundary

Block

Shape Information

Texture Information

Texture Information Reconstructed

Data Reconstructed

Shape (a)

Boundary

Block Padding

Shape Information

In BBM, there are three types of block pairs for merging (see Fig. 2.10):

horizontal merging, vertical merging, and diagonal merging. In the case of horizontal merging, block 2 is merged into block 1 and block 4 can also be merged into block 3.

In the case of vertical merging, blocks 3 and 4 are merged into blocks 1 and 2, respectively. Similarly, in the case of diagonal merging, blocks 3 and 4 are merged into blocks 2 and 1, respectively.

(34)

(a) (b) (c)

1 2

3 4

1 2

3 4

1 2

3 4

：Merging

Fig. 2.10 Blocks merging types. (a) Horizontal merging (b) Vertical merging (c) Diagonal merging.

The steps of integrating the BBM technique into SA-DCT for boundary macroblocks are explained in detail as follows.

Step 1) Horizontal Merging:

In the case of horizontal merging, block 2 is first rotated 180° and merged into

block 1 if the object pixels do not overlap each other. An example of the horizontal merging process is shown in Fig. 2.11. The merging process for block 3 and block 4 is similar to that for block 1 and block 2. If there is a merging occurred in this horizontal merging, then the vertical and diagonal merging steps are not performed.

Step 2) Vertical Merging:

In the case of vertical merging, block 3 is first rotated 180° and merged into

block 1 if the object pixels do not overlap each other. The merging process for block 2 and block 4 is similar to that for block 1 and block 3. If there is a

(35)

performed.

Step 3) Diagonal Merging:

In the case of diagonal merging, block 4 is first rotated 180° and merged into

block 1 if the object pixels do not overlap each other. The merging process for block 2 and block 3 is similar to that for block 1 and block 4.

Padded Pixel Value: b

Padded Pixel Value: a Pixel Value: (a+b)/2 Merged Block Boundary Block Changed to Exterior Block

After BBM

Block A Block B

(a) (b)

Fig. 2.11 An example of BBM process. (a) Before BBM. (b) After BBM.

Ng et al. proposed an improved block merging method called boundary block grouping and merging (BBGM) technique [22]. In BBGM, in addition to merging two blocks, three and four neighboring blocks can be grouped into one block. The BBGM technique first determines which blocks to be merged and then how they are merged into one block. There are eight types of block grouping or merging as shown in Fig.

2.12.

(36)

(f)

1 2

3 4

(g)

1 2

3 4

(h)

1 2

3 4

(d)

1 2

3 4

(c)

1 2

3 4

(b)

1 2

3 4

(a)

1 2

3 4

(e)

1 2

3 4

：Merging

Fig. 2.12 Types of block grouping and merging.

The steps of applying BBGM to a boundary macroblock are explained in detail as follows.

Step 1) Four Blocks Grouping: (see Fig. 2.12(a))

If the sum of the number of object pixels in block 1, 2, 3, and 4 are less than the total number of pixels in one block, these four blocks can be grouped into one block. If there is a grouping occurred in these four blocks, the following steps are not performed.

Step 2) Three Blocks Grouping: (see Fig. 2.12(b)~(e))

If the sum of the number of object pixels in the three blocks are less than the

(37)

total number of pixels in one block, these three blocks can be grouped into one block. If there is a grouping occurred in these three blocks, the following steps are not performed.

Step 3) Horizontal Merging: (see Fig. 2.12(f))

If the sum of object pixels in block 1 and block 2 (respectively, block 3 and block 4) are less than the total number of pixels in one block, block 1 and block 2 (respectively, block 3 and block 4) can be merged into one block. If there is a merging occurred in this horizontal merging, the following steps are not performed.

Step 4) Vertical Merging: (see Fig. 2.12(g))

If the sum of object pixels in block 1 and block 3 (respectively, block 2 and block 4) are less than the total number of pixels in one block, block 1 and block 3 (respectively, block 2 and block 4) can be merged into one block. If there is a merging occurred in this vertical merging, the diagonal merging is not performed.

Step 5) Diagonal Merging: (see Fig. 2.12(h))

If the sum of object pixels in block 1 and block 4 (respectively, block 2 and block 3) are less than the total number of pixels in one block, block 1 and block 4 (respectively, block 2 and block 3) can be merged into one block.

(38)

Fig. 2.13 gives a detailed process for merging two boundary blocks using the

BBGM technique. The blocks to be merged are transferred into a large block of size 16×16 as shown in Fig. 2.13(b). A modified SA-DCT technique is applied to this large block. Then the coefficients are moved to a smaller block (8×8) using zigzag scanning.

Fig. 2.13(d) shows the merging result of the merged boundary blocks.

Block Extension

(a) Two Vertical 8 8

Boundary Block (b) 16 16 Block Containing Two Boundary Blocks Modified SA-DCT

(c) DCT Coefficients

Zig-Zag Scanning

(d) Re-arrange Coefficients into One 8 8 Block

Fig. 2.13 The process of boundary block-merging.

(39)

At the decoder, the above procedures are reversed. The DCT coefficients in a block (8×8) can be transferred to the larger 16×16 block by using the zigzag scanning

and the shape information. Then the inverse modified SA-DCT technique is performed to recover the image pixels, which are later re-distributed back into the two 8×8 blocks.

(40)

Chapter 3 The Proposed Texture Coding Technique for Arbitrarily Shaped Video Object

In this chapter, we will describe the proposed texture coding technique for arbitrarily shaped video object. The proposed algorithm, called boundary pixel scanning (BPS) method, first re-orders the pixels within a boundary block such that the texture coding performance can be improved. In the following sections, we will first describe the proposed technique integrated with SA-DCT and then integrated with BBGM.

3.1 Boundary Pixel Scanning SA-DCT (BPS-SA-DCT)

The main idea behind the proposed method is to re-order the object pixels such that pixels with similar gray levels will be aligned in the same column. In fact, if the pixels in each column have similar values, the magnitude of the AC coefficients after applying the vertical 1-D DCT will be small. Moreover, the magnitude of the AC coefficients will be insignificant after applying the second stage of 1-D DCT along the horizontal direction. Therefore, the coding efficiency will be improved. The detailed

(41)

scanning process is described as follows.

In general, the pixels on the border of a boundary block will have the same or similar gray levels. Therefore, the scanning operation is performed along the border of the object in a layer by layer fashion. The boundary pixels located on the outermost layer of the object are put on the first column of the re-ordered block. The pixels on the second outermost layer are put on the second column, and so on. After all pixels in a boundary block have been scanned, the pixels located on the same layer along the border will be aligned into the same column of the re-ordered block.

In fact, the scanning operation can be performed on the top side, bottom side, left side, or right side. In the case of left-side scanning, each row is scanned from left to right. The first object pixel in each row is shifted horizontally to the first column and the second object pixel is shifted to the second column, etc. In the case of right-side scanning, each row is scanned from right to left. Similarly, each column is scanned from top to bottom or from bottom to top in the case of top-side or bottom-side scanning, respectively. Fig. 3.1 shows one example of these four types of scanning for a boundary block. Note that the pixels labeled with the same index value are put on the same column.

(42)

(a) 29

32 31 16 32 103 149 143

30 33 37 15 16 89 153 148

39 34 42 26 18 77 147 151

48 43 48 34 25 75 139 148

34 87 136 142

108 139 133

118 142 131

117 144 133

Index Value Reordered Result

(b) 5

5 5

6 6 6

7 7 7

8 8 8

103 149 143

89 153 148

77 147 151

75 139 148

1

4 3 2 1

48 43 48 34 34 117

39 34 42 26 25 118

30 33 37 15 18 108

29 32 31 16 16 87 1

1 1 1 1 1 1 1

2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4

5

29 32 31 16 32

30 33 37 15 16

39 34 42 26 18

48 43 48 34 25

34 87 136 142

108 139 133

118 142 131

117 144 133

(c) 4

4 4 4 5 8 8 8

3 3 3 3 4 7 7 7

2 2 2 2 3 6 6 6

1 1 1 1 2 5 5 5

4 4

3 3

2 2

1 1

144 133

142 131

139 133

136 142

32 75 139 148

77 147 151

89 153 148

103 149 143

Fig. 3.1 (a) Original boundary block (b) Left-side scanning (c) Right-side scanning (d) Top-side scanning (e) Bottom-side scanning.

(43)

Index Value Reordered Result (d)

1 2 3 4 5 6 7 8

1 2 3 4

1 2 3

29 30 39 48 34 108 118 117

32 33 34 43 87 139 142 144

31 37 42 48 136 133 131 133

16 15 26 34 142

32 16 18 25

103 89 77 75

149 153 147 139

143 148 151 148

8 7 6 5 4 3 2 1

4 3 2 1

3 2 1

143 148 151 148 142 133 131 133

149 153 147 139 136 139 142 144

103 89 77 75 87 108 118 117

32 16 18 25 34

16 15 26 34

31 37 42 48

32 33 34 43

29 30 39 48

(e)

Fig. 3.1 (Continued)

The scanning algorithm consists of two steps: index value assignment and object pixel re-ordering. The first step assigns an index value to every object pixel according to the scanning sequence. The second step will re-order the object pixels such that object pixels with the same index value will be aligned into the same column. The detailed algorithms for these four types of scanning are listed as follows.

(44)

Algorithm3.1 Left-Side Scanning

Given: An binary alpha map B ( r , c ) with B ( r , c ) = 1 for object pixel and B ( r , c) = 0 for transparent pixel. Let BO( ; ) and BR( ; ) denote the original and re-ordered block, respectively.

Step 1: for r = 1 to N index = 1 ; for c = 1 to N

if ( B ( r , c ) = = 1 ) I ( r , c ) = index ; index + + ; else

I ( r , c ) = 0 ; Step 2: for r = 1 to N

for c = 1 to N

if ( i = I ( r , c ) ≠^{0 )}

BR ( r , i ) = BO ( r , c ) ;

Algorithm3.2 Right-Side Scanning

Given: an binary alpha map B ( r , c ) with B ( r , c ) = 1 for object pixel and B ( r , c) = 0 for transparent pixel. Let BO( ; ) and BR( ; ) denote the original and re-ordered block , respectively.

Step 1: for r = 1 to N index = 1 ; for c = N to 1

I ( r , c ) = 0 ; Step 2: for r = 1 to N

for c = N to 1

if ( i = I ( r , c ) ≠^{0 )}

B_R ( r , i ) = B_O ( r , c ) ;

(45)

Algorithm3.3 Top-Side Scanning

Given: an binary alpha map B ( r , c ) with B ( r , c ) = 1 for object pixel and B ( r , c) = 0 for transparent pixel. Let BO( ; ) and BR( ; ) denote the original and re-ordered block, respectively.

Step 1: for c = 1 to N index = 1 ; for r = 1 to N

I ( r , c ) = 0 ; Step 2: for c = 1 to N

for r = 1 to N

if ( i = I ( r , c ) ≠^{0 )}

BR ( c , i ) = BO ( r , c ) ;

Algorithm3.4 Bottom-Side Scanning

Given: an binary alpha map B ( r , c ) with B ( r , c ) = 1 for object pixel and B ( r , c) = 0 for transparent pixel. Let BO( ; ) and BR( ; ) denote the original and re-ordered block, respectively.

Step 1: for c = 1 to N index = 1 ; for r = N to 1

I ( r , c ) = 0 ; Step 2: for c = 1 to N

for r = N to 1

if ( i = I ( r , c ) ≠^{0 )}

BR ( c , i ) = BO ( r , c ) ;

(46)

After all the object pixels in a boundary block have been scanned and re-ordered, SA-DCT is applied to the re-ordered block. 1-D SA-DCT is first performed on every column and then on every row.

For coding all boundary blocks, no specific type of scanning will get the best coding performance. That is, some blocks directly coded by SA-DCT will performs best, while others may get better performance if they are re-ordered and then coded by SA-DCT. Therefore, to encode a boundary block there are five types of coding modes:

SA-DCT, Left-Side Scanning SA-DCT (LS-SA-DCT), Right-Side SA-DCT (RS-SA-DCT), Top-Side Scanning SA-DCT (TS-SA-DCT), and Bottom-Side Scanning SA-DCT (BS-SA-DCT). To distinguish these five coding modes, an extra code is needed to indicate the coding mode for each boundary block. Table 3.1 shows the extra codes for these five coding modes. To encode a boundary block, we first apply SA-DCT to the original block and then the initial result in terms of bit rate as well as the PSNR value will be derived. Initially, the default coding mode is the SA-DCT mode. Each type of scanning operation is applied and SA-DCT is performed to get its corresponding result. These coding modes are implemented in the following order: LS-SA-DCT, RS-SA-DCT, TS-SA-DCT, and BS-SA-DCT. If the reordered block performs best in terms of bit rate as well as the PSNR value, this type of coding mode is selected as the final coding mode for the boundary block.

(47)

Table 3.1

Coding mode Code

SA-DCT 0

LS- SA-DCT 100

RS- SA-DCT 101

TS- SA-DCT 110

BS- SA-DCT 111

The detailed steps for coding a boundary block is described as follows:

Step 1) Apply SA-DCT to the original block B_O( ; ) and get the initial peak signal to noise ratio, PSNR, and bit rate, BR.

Step 2) Apply SA-DCT to the re-ordered block B_L( ; ) corresponding to the left-side scanning and compute the peak signal to noise ratio, PSNR_LS, and bit rate, BR_LS. If PSNRLS ≥ PSNR and BRLS ≤ BR, LS-SA-DCT is chosen as the current coding mode and the values of PSNR and BR are updated accordingly.

Step 3) Apply SA-DCT to the re-ordered block B_R( ; ) corresponding to the right-side scanning and compute the peak signal to noise ratio, PSNR_RS, and bit rate, BR_RS. If PSNRRS ≥ PSNR and BR RS ≤ BR, RS-SA-DCT is chosen as the current coding mode and the values of PSNR and BR are updated accordingly.

(48)

Step 4) Apply SA-DCT to the re-ordered block B_T( ; ) corresponding to the top-side scanning and compute the peak signal to noise ratio, PSNR_TS, and bit rate, BR_TS. If PSNRTS ≥ PSNR and BR TS ≤ BR, TS-SA-DCT is chosen as the current coding mode and the values of PSNR and BR are updated accordingly.

Step 5) Apply SA-DCT to the re-ordered block B_B( ; ) corresponding to the bottom-side scanning and compute the peak signal to noise ratio, PSNR_BS, and bit rate, BRBS. If PSNRBS ≥ PSNR and BR BS ≤ BR, BS-SA-DCT is chosen as the current coding mode and the values of PSNR and BR are updated accordingly.

Table 3.2 shows the coding results for the boundary block shown in Fig. 3.1(a).

From this table, we set that BS-SA-DCT outperforms other coding modes in terms of PSNR as well as bit rate.

Table 3.2

SA-DCT LS-SA-DCT RS-SA-DCT TS-SA-DCT BS-SA-DCT

Bit number 124 124 117 147 77

PSNR 31.823 31.823 31.401 30.201 35.723

(49)

In the decoder, inverse SA-DCT is first conducted. According to the shape information and the extra code indicating the coding mode, the decoded gray levels are re-distributed to the original pixel locations.

3.2 Boundary Pixel Scanning and Boundary Block Grouping and Merging (BPS-BBGM)

In the boundary block grouping and merging (BBGM) technique [22], adjacent boundary blocks will be grouped or merged into one block and then encoded using SA-DCT. The number of encode blocks as well as the bit rate will thus be reduced.

However, the merging process will produce many high frequency AC coefficients. In addition, the quality of the reconstructed block will be sacrificed due to the fact that some high frequency information is eliminated. To solve this problem, we will exploit

both the merits of the proposed BPS technique and the BBGM method. To encode a boundary 16×16 macroblock, the proposed BPS-BBGM technique will first determine

which blocks to be merged and then how they are merged into a block. This first step is the same as that of the BBGM method, in which there are eight types of block pairs or groups for merging. In the merging process, however, the boundary-pixel scanning technique is first adopted to re-order the pixels in the selected blocks such that the pixels with similar gray levels are aligned in the same column of the merged block.

(50)

Fig. 3.2 gives a detailed process for merging two boundary blocks using the proposed

BPS-BBGM technique. The merging blocks selected by the BBGM technique is then transferred into a large block of size 16×16 as shown in Fig. 3.2(b). The BPS technique is applied to re-order the pixels in this 16×16 block. Modified SA-DCT is then conducted and the coefficients are re-distributed to a smaller block (8×8) using

zigzag scanning. Fig. 3.2(d) shows the merging result of the merged boundary blocks.

Block Extension (a) 2 Horizontal 8 8

Boundary Block

(b) 16 16 Block Containing 2 Boundary Blocks

(c) DCT Coefficients

Zig-Zag Scanning

(d) Re-arrange Coefficients into Single 8 8 Block Modified SA-DCT

Top-Side Scanning

Fig. 3.2 The process of BPS-BBGM method.

(51)

At the decoder, the above procedures are reversed. The DCT coefficients in a block (8×8) can be transferred to the larger 16×16 block by using the zigzag scanning

and the shape information. Then the inverse SA-DCT and BPS techniques are performed to recover the image pixels, which are later re-distributed back into the two smaller 8×8 blocks.

A New Coding Algorithm for Arbitrarily Shaped Video Object in MPEG-4

中 華 大 學 碩 士 論 文