Object-oriented video coding algorithm for very low bit-rate system

(1)

riented Video Coding

Algorithm

Y

Bit-Rate

System

Liang- Gee Chen, You-Ming Chiu,

Tzi-Dar

Chiueh, Her-Ming Jong

Department

of

Electrical Engineering

National Taiwan University

Taipei, Taiwan,

R. 0. C.

et

In this paper, a video coding method based on re-

gion segmentation is proposed. I t is designed for very low bit-rate s y s t e m such as video-phone and video- conferencing. Instead of block-based coding, we adopt a n object-oriented approach, which first calculates the motion vector of each pixel using a modified optical flow approach, then segments the motion field into circumscribed rectangular regions t h a t can be efficiently coded. Simulation results of several typical image sequences re- veal t h a t the proposed algorithm is both effective and of high performance.

uct ion

In many standard video coding systems, motion com-

pensation

(MC)

with block-oriented motion estimation

(ME)

algorithms is widely used. However, disadvan-

tages such as block effect degrade the subjective perfor-

mance; and it’s inefficient t o encode a t the moment of typical video-phone and video conference applications.

T o enhance coding efficiency, several approaches such as

model-based coding[l] and analysis-synthesis coding[2] have been proposed. Nevertheless, these method require complex procedures in computation and system control. Therefore, a n efficient and simpler algorithm is still lack- ing for the very low bit-rate application.

An algorithm proposed years ago, pel-recursive al-

gorithm (PRA)[3], is a gradient-descent approach that

minimizes (E,u

+

E y v - E t [ , where E indicates the lu- minance of picture, ( u , v ) is t h e motion vector viewed from the current frame t o the previous frame, and the subscripts 3: and y represent partial differentiation; for

example, E , indicates

E.

P R A makes a very good estimation for each pixel, i.e. the residual error, or the

displaced frame difference (DFD), is small.

However, because each vector is calculated locally, a

densely and randomly distributed motion field is generated and it is impossible t o transmit these motion vec-

tors efficiently. Optical flow algorithm( QFA) [4] is simi- lar t o PRA except that the cost function t o be minimized is now

E =

/

/ ( ( E , U

+

E , v

-

E t ) 2

+

A(

+

U;

+

v:

+

v i ) )dxdy

.

T h e equation is composed of two terms: t h e former is the same as t h a t in PRA, which is called constraint term; the latter is a smoothness term.

In this paper, a n efficient video coding method based on QFA is proposed. By combining region segmenta tion, the proposed method can provide very low bit-rate coding and still preserve high subjective quality. The

derivation of the algorithm from OFA is shown in Sec.

2. Details of the proposed algorithm is described in Sec. 3. Sec. 4 presents the simulation results and some dis- cussions. T h e conclusion is made in Sec. 5.

(1)

2 Solution to

low

When the minimum of Eq. (1) is reached,

$

=

$j

= 0. That is, we have

a& a

a&

a

aE

-_---

= 0 ,

au

axau,

ayaUy

a&

a

a&

a

ae

av

axav,

a y a v y

-_---

= 0 , Therefore, aZu a2u I ax2 ay2

x

a Z v

a%

1 a x 2 ay2

x

-

+

-

= -(E,u

+

E Y v - Et)E,,

-

+

-

= -(E,u

+

E ~ v - Et)E,.

By replacing U , and uy with ui,j - u i - 1 , ~ and uy with

ui,j - ui,j-1 respectively, we have

-

(2)

Similarly, 1

-

azv

a t 2

+

-

ay2

=

4 T - 4 v .

-

1 0 -1 0 - 0 4

A more clear description o f 5 and T is shown in Fig. l.(a).

Therefore, the motion vector

at

position (i,j) becomes

(3)

Figure 1: (a) Weights t o compute E and ii (b) Weights

to compute 0 and 6

In order t o code the motion vectors further efficiently, we have to come up with motion field with less spa- tial variation. Instead of using extra estimation filtering scheme[5], we proposed a modified cost function which

is shown as follows:

E =

/

/ ( ( % u + E . w

-&I2

+a(u:+uz+w: +w:)+P(u:, +v;,))dxdy.

(4)

The additional third term indicates the penalty on con- voluted edges. In other word, the cost function favors motion fields with smoother edges. To minimize the cost function, the following relation must be satisfied

+

--

=

0.

a& a a&

a

a&

a 2

aE

au

a t a u x

a y a u ,

axayauxy

-

- -- -

--

By similar derivation, we have

1

x

4(aV

+

Pe

- (a

+

p ) v >

=

-(&U

+

Eyv - where 0, 6 indicates as shown in Fig. l.(b). Let

a5

+

pa

=

(a

+

P)u*, aV

+

pe

= (a

+

p).’,

then (5) E ~ u * -I-

E ~ v *

-

Et uj,3 = U* - 4 ( a +

P )

+

E:

+

E; Ex’

We can find Eq. (5)(6) has a similar form as that of Eq. (2)(3).

3 The

Proposed

Coding

Al

rit

h:m

The proposed algorithm, which is called as modified op- tical flow algorithm(MOFA), consists of two phases: the first step is to find the interframe motion flow ( U , v); the

other is to segment the flow and t o code the information.

To obtain the motion flow, we can iteratively use the Eq. (5)(6). It should be noticed that in each iteration, the motion vector of each pixel is calculated indepen- dently and parallelly.

After the motion field is obtained, we segment these motion vectors and record them into coding linked list (CLL). Each element in the linked list represents one “object” in the frame. An object is defined as a re-

gion that contains the same nonzero motion vector. To

encode a n object, we have t o record its shape and the motion vector. At first, determine the circumscribed rectangle of this “object” by traversing its contour, as

shown in Fig. 2.(a), and identify the pixels in the rectangle with the nonzero vector as “1” and the others as

“0”. Next, transform the bit-map into a bit-stream by a

spiral-like scanning (Fig. 2.(b)) which may generate as

many contiguous 1 or 0 as possible. For example, the

bit-stream in Fig. 2.(a) is

000011100001!000011110011l11111111011101111111~1~.

Therefore, the information of a n object contains the lo-

cation and &:e of the rectangle, the vector, and the bit- stream, as shown in Fig. 3. The whole encoding algorithm is listed below:

1. Clear the CLL and initialize all ( U , v ) to zero.

2. For k=:l to n u m b e r of iterations for i, j in raster scanning

find (ui,j, vui,j) by Eq. (5)(6).

3. Check the pel. If its motion vector is nonzero and if it doesn’t belong t o any traversed objects, then goto step 4, else check the next pel.

4. Find the circumscribed rectangle of this “object”

and append the information of the object to the CLL.

5. Goto step 3 until all pixels in current frame are s-

canned.

6. Transmit the whole CLL for further possible lossleas

scheme, such as VLC and RLC.

In the proposed algorithm, residual error will not be sent. Consequently, to enhance the performance we em- ploy a n extra stationary region test. If the DFD with MC is larger t h a n non-compensated difference, then t h e motion vector of the pixel is set to be zero. In other words, if D F D ( u , v )

>

D F D ( 0 , O), let ( U , v)

=

(0,O).

(3)

location size

I

vec

bit-stream

Figure 3: D a t a structure of coding linked list

T h e parameters a and

p

must be chosen properly.

If a is too large, the motion vectors tend t o group to- gether, which degrades the performance badly. On the other hand, if a is too small, motion vectors distribute randomly and are difficult t o code. T h e parameter

p

should be chosen much less than CY; if it is larger than a,

a n oscillated motion field may be generated.

dation Result and Discus-

SicPE3

Similar vectors in the motion field tend t o group togeth- er because of the smoothness term in Eq. (5)(6). Fig. 4

shows the amplitude of motion vectors between 2 succes-

sive frames in the sequence Claire. We find most portion

of t h e frame is stationary and other parts sensitive t o human eyes, say, the mouth and the eyes, will be coded accurately because their vectors are calculated precisely. The picture reconstructed with our approach, therefore, is of high quality.

Performances and bit-rates of the sequence Claire us-

ing different approaches are shown in Fig. 5 and Fig. 6,

respectively. OFA16 indicates t h a t utilizing OFA with

16 iterations; in the same manner, MOFA16 and MO-

FA8 indicate utilizing MOFA with 16 and 8 iterations, respectively. According t o the results, MOFAl6 and 0- FA16 are almost the same in PSNR, while the former requires less bandwidth when the bit-rate is relatively higher. I t is worth noticing that MOFAB degrades a little in PSNR but requires much lower bit-rate. Therefore,

we choose this approach as the final proposed algorithm.

Figure 4: T h e amplitude of motion flow of Claire

5 2 I 1 I 0 5 1 5 2 0 2 5 3 0 3 5 Frame Number 1 0 3 6

Figure 5: The performance of sequence Claire

We list the average PSNRs of 4 sequences in Table 1

with the proposed algorithm in comparison with 16 x 16

full-search block matching algorithm (FSBMA). We use

the MOFA with 8 iterations, (a = 1024,

p

= 64). From

the simulation result we find t h a t the performances of MOFA are better than or close t o FSBMA in PSNR except for the case of susie, which is a fast and global moving sequence. Nevertheless, our reconstructed im-

ages may look better t h a n that of FSBMA subjectively

because of the object-oriented characteristic of the proposed approach. T o enhance t h e PSNR, more iterations are needed but

it

may entail a higher bit-rate.

The average bit-rates

(

bits per pixel) are also listed in Table 1. T h e results of m i s s a n d Claire are very good, while

that

of s a l e s m a n is moderate. However, the

result for the sequence susie i s not satisfactory. It is because the distribution of motion flow is sparse due t o the zooming and global movement in this sequence. Furthermore, because the initial guesses are zero, and 8 iterations is not enough t o find the true motion vectors. Besides, our algorithm is vulnerable for high-detailed- moving or fast-moving sequence. High-detailed-moving implies t h a t there're so many local moving with different

(4)

0 1 0 09 0 0 8 0 07 0 oc 0 05 0 0 4 0 01 0 0 2 0 01 0 0 5 1 5 2 0 2 5 3 0 3 5 Frame Number 10

Figure 6 : The bit-rate of sequence claim

large amount of side information due to lots of segment- ed regions. On the other hand, motion of fast-moving objects is difficult t o calculate through limited number

of iterations, and this causes a poor performance.

MOFAB(bpp)

I

0.133

1

0.040

I

0.014

I

0.023

U

Table 1: Average performance and bit-rate for different

sequences

Figure 7: The reconstructed image of susie with PSNR= 28.5 dB

Due to the smoothness term, the motion vectors dis-

tribute much more regularly than that in

PRA.

So we

can segment them into “objects” and code them efficiently. The additional penalty term results in fewer and

more regular regions without degrading the performance

because a little shape distortion won’t be noticed by hu-

man eyes. In addition, the larger residual error occurs

Figure 8: The original image of susie

on the edges of the moving object, where .is neglected by

human eyes. Therefore, the subjective performance of the reconstructed €rames using the proposed algorithm must be better than those using block-based methods even if the PSNRs are the same. Fig. 7 shows the re-

constructed image of susie using MOFA. Although the

PSNR is as low as 28.5 dB, it looks almost as good as the original image, as shown in Fig. 8.

5 Conclusion

Optical flow is originally applied in image analysis because it makes a precise estimation o f object motion.

The proposed algorithm is inspired from its feasibility of

segmentation of’ motion vectors. Because the high quality of motion compensated pictures, there are no residual errors to transmit. To code these motion vectors efficiently, we also propose a “circumscribed rectangular” segmentation method, and the required bit-rate is quite low.

The proposed approach also provide a multi-

functional system. The transmitted information can be used not only for image reconstruction but also for image analysis because it is coded with object orientation. Some traditional coding schemes such as the block-based method provide much limited information for analysis. Therefore, for different requirement, all we have to do at the transmitter is to adjust some parameters with the

same architecture.

When applying the proposed algorithm to video-

phone of 180x 120 and 10 frames per second, the bit-rate

of 0.04 bpp results in a transmission rate of 8.64 Kbps,

which is feasible for current telephone network. Con-

sidering wider alpplications in real-world systems, there are some improvements for further study. One is the

adaptability: to provide a more uniform bit-rate that

doesn’t severely depend on the content of frames, a bit-

rate control is needed; that is, the system should be able

(5)

to adjust the amount of coded information. The other is the complexity: although the proposed algorithm

i s much simpler than other previously reported object- oriented approaches, the computation is still too heavy

for practical applications. To reduce the system cost, a

simplified algorithm, which is feasible for hardware im- plementation while preserving the good performance, is being developed.

rences

[l] Y. Nakaya, Y.C. Chuah, and H. Harashima,

“Model-based/waveform hybrid coding for video-

phone images”,ICASSP, Vol. 4, pp. 2741-2744,

1991.

[2] Harald Schiller, Michael Hotter, “Investigations on color coding in a n object-oriented analysis-synthesis coder”,Signal Processing: Image Communication, Vol. 5, pp. 319-326, 1993.

[3] A. Metravali, J . Robbis, “Motion compensated tele- vision coding part I”, Bell Syst: Tech.

J.,

Vol. 58, pp. 631-670, March. 1979.

.P. Horn, B. G . Schunck, “Determining optical

’, Artificial Intelligence, Vol. 17, pp. 185-203,

1981.

.V. Brandt, “Object tracking and motion estima-

tion with a moving camera”, Proc. ASST, 1990,