Building a pseudo object-oriented very low bit-rate video coding system from a modified optical flow motion estimation algorithm

(1)

BUILDING A PSEUDO OBJECT-ORIENTED VERY LOW BIT-RATE VIDEO

CODING SYSTEM FROM A MODIFIED OPTICAL FLOW MOTION

ESTIMATION ALGORITHM

Chung-Wei Ku, You-Ming Chiu, Liang-Gee Chen, and Yung-Pin

Lee

DSP/IC Design Lab., Department

of

Electrical Engineering,

National Taiwan University, Taipei, Taiwan, R.O.C.

email: willi am@video.

ee.

nt

u

.edu.

tw

A B S T R A C T

In this paper, a modified optical flow algorithm (MO- FA) is proposed for the development of a very low bit-rate video coding system. Another edge preserving constraint and pyramid approach are suggested t o generate a n accurate motion field and reduce the possibility of trapped in local minimum respectively. Post- processing schemes are also designed t o eliminated the problems for special regions. Compared with other motion estimation algorithms, the proposed method gives more exact estimation in terms of PSNR and subjective view. T h e arbitrarily shaped transform of motion field is then selected to further remove spatial redundancy. According t o some primary simulation results, most of the test sequences are compressed into 16 Kbps or lower with excellent picture quality.

1. I N T R O D U C T I O N

Although H.261 has defined a standard for videophone or videoconferencing with p x 64 Kbps, lower bit-rate is expected to utilize current telephone network. Because the channel bandwidth is small, a very high compression ratio must be achieved. According t o the standard of modern modem (V.34), transmission rate is defined as 28.8 Kbps. As early as May 1 9 9 1 the MPEG raised the issue of audio-visual standard targeted a t the bit-rate of 4.8-64 Kbps, with t he motivation being the limited channel bandwidth and limited storage capac- ity. These efforts was approved in July 1993 with the MPEG-4 nickname and the title “Very Low Bit-Rate Coding of Moving Pictures and Associated Audio”. Re- cently, ITU-T announced a draft about “Video Coding for Narrow Telecommunication Channels at below 64 Kbit/s” or H.263. It is still based on a block-wise coding algorithm basically. In fact, it is a widespread belief that only substantial innovations in coding algorithm will produce a lasting standard which can satisfy users.

0-7803-3 192-3/96 $5.0001996 IEEE

In order t o compress video signal efficiently, motion compensation (MC) is widely adopted in many standards where fix-sized block matching algorithm is applied. However, for very low bit-rate video coding visible block effect will degrade the performance. To tackle this problem, many quite different approaches are proposed; such as model-based coding [l] and analysis-synthesis coding algorithms [2]. Basically s- peaking, object-oriented coding is block-effect free and its visual performance is good enough. In addition, it is an efficient approach since head and shoulders are the major parts on screen for conversation application- s. It also provides more meaningful information compared with traditional waveform coding scheme. In this paper, we propose a modified optical flow algorithm (MOFA) for motion estimation. Compared with other motion estimation algorithms, MOFA gives accurate motion field and its performance is efficient for video coding or image understanding. Because the motion field generated by MOFA is homogeneous, the “pseudo objects” are extracted and segmented easily. These objects are applied with arbitrarily shaped transform (AST) t o achieve dat a compression further.

2. M O D I F I E D O P T I C A L F L O W A L G O R I T H M

Although optical flow algorithm can generate a reasonable motion field, some drawbacks still exist. In order t o code the motion vectors more efficiently, we would like t o generate a motion field with less spatial varia- tion but with good quality at the same time. Instead of using extra estimation filtering scheme, we propose a modified cost function which is shown as follows:

(2)

T h e additional third t erm indicates the penalty on con- voluted edges. In other word, the cost function favors motion fields with straight edges. To minimize the cost function, the following relation must be satisfied:

a€:

d d& d dc

+--

a 2

a&

= 0.

__

- - __ - -

__

azL

az

dux a y a u y axayau,, By similar derivation, we have

1

-(E,u

_x

+

EYu - E,)E,, 1

x

4(aZl-t

05

- (a

+

0 ) ~ )

4(aV t

0.;

- (a

+

p ) ~ )

=

-(E,u

+

EYu - E t ) E y . where 5, 6 indicates local averages. Let

crz+

p5 = (a +/?)U*, av

+

p.;

= ( a

+

p).,*

then

Another problem is t h a t the answers may be trapped in local minimum due t o gradient descent approach. To reduce the probability of this situation, a pyramid approach is suggested. T h e higher level of the pyramid is composed of the subsampled pixels at the lower level. Motion estimation is executed from t he top level t o the b o tt o m level. T h e initial guess on each level is the results of its adjacent higher level. T h e scale of adjust- ment at higher level is 1argr.r than t h a t of lower ones. At higher level, the local details are filtered out so the possibility of trapped in local minimum is reduced; at lower level, t h e motion vectors are adjusted locally t o make more precise estimation. Therefore, at lower level we do not need t o execute too many iterations. Gen- erally most local minima are removed while the global minimum is reserved.

For /gradient descent approaches, large prediction errors occur on the edges due t o t h e large gradients; the motion vectors in edge regions must be processed in a n alternative way. If t he compensated error is too large, the motion vector of this pixel will be replaced by one inotion vector of its neighbors. Figure 1 illus- trates a n example: suppose uo indicates t he motion vector of th e current pixel, and I J ~ , U ~ , U ~ , V ~ are the

motion vectors of its 4 neighbors. If the residual error of is larger t h an a predefined threshold, the 4 residues, denoted as T I , r2, ~ 3 ~ 7 - 4 which are the corre-

sponding residual errors of ~ 1 , u2, u3, u 4 for the current

pixel, are compared one another. In case r l is the minimum, the motion vector for the current pixel will be replaced by u1. Since the original pixel with motion

vector u1 is not in the edge region, in gieneral its esti- mated vector should be more accurate t han the others. It is reasonable t hat many motion vectors in edge regions will be substituted for the better vectors which are found in the homogeneous regions. We find t he substitution scheme alleviates most of the performance degradations due t o edges. On the other hand, in the flat stationary regions such as background, there may be some non-zero motion vectors close to the moving edges because the motion vectors of thje moving part will “propagate” t o their neighboring stationary pixel- s due t o the smoothness constraint. Therefore, these non-zero parts must be filtered out to grnerate a more homogeneous and practical motion field. We can apply a large Gaussian window convolved with the residual error; if the result is less then the window convolved with frame difference (error of non-motion), then set the motion vector of the central pixel t o be zero. In other word, given a Gaussian window G and residual error R, if G . R

<

G . Ro, where RO is frabme difference,

then the motion vector of the central pixel in G will be forced t o be zero because the region around the pixel is stationary.

All the above we call it a m o d i f i e d optical %ow al- gorithm (MOFA). To prevent the drawbacks of optical

flow algorithms, the modified optical flow algorithm u- tilizes extra edge preserving constraint and pyramid approach to generate a n edge-preserving motion field and alleviate the risk of local minimum. Furthermore, the post processing of edges and stationary regions guarantees the motion field t o be more realistic for either video coding or object extraction, as later simulation results show.

3. SIMULATION RESULTS OF THE

M 0 T I 0 N ESTIMATION A L G 0 RIT H M S

To understand how the mentioned motion estimation algorithms perform, a 176 x 144 with 10 frames/sec sequence “Miss America” is selected as the test sequence. Four kinds of motion estimation algorithm- s, block matching algorithm, pel-recursive algorithm, optical flow algorithm, and modified optical flow algorithm, are applied on the successive frames t o generate the motion fields, respectively. The block matching algorithm is full-search of 1 6 x 16 blocks .with searching range -8 t o $7. The motion vectors within a block are all the same for block matching algorithm obviously. For pel-recursive algorithm, the distribution of motion vectors is too random t o be encoded efiiciently. In addition, most of the motion vectors around edge region are trapped in local minimum. Original optical flow algorithm with a smoothness constraint can generate

(3)

a more reasonable motion field, but some vectors are still trapped in local minimum. Besides, the evaluat- ed motion vectors seems to be not large enough for such a fast movement. The proposed modified optical flow algorithm with an edge-preserving constraint and pyramid-search approach generates the best motion field; the movement of object can be observed and the motion field can be encoded the most efficiently, as a n example in Figure 2 shows. Table 1 lists a summary of these algorithms. For all the test sequence, MO- FA gives the best estimation and the generated motion field is efficient for both coding a nd understanding. In sequence “Elsa” and “Caisy”, which are also “head and shoulders” sequences with small motion, the movement is not very sharp so the improvement of MOFA is only about 1dB compared with OFA. However, for sequence with large motion such as “Miss America” and “Jian”, the improvement of performance is significant.

4. PROPOSED PSEUDO

OBJECT-ORIENTED VIDEO CODING SYSTEM

Th e first frame is encoded in intra mode; this part is similar t o H.261 an d not displayed in the figure. For inter frame coding, motion estimation between the current and previous frames is applied according to the modified optical flow algorithm. As previously mentioned, the proposed MOFA will generate a dense motion field. Because of the homogeneous property of the motion field

,

we can segment the motion field into several “pseudo objects” easily; in fact, there is only one major “pseudo object” in our applications and we call this object as Motion Compensated Object (M-

CO). Of course some areas on screen could not be encoded exactly by motion compensation. These areas are named as Motion Off Objects (MFO’s). The cod-

ing scheme of MFO is still under development. All the motion vectors generated by MOFA will be segmented into regions which are further compressed by arbitrarily shaped transform (AST) [6]. AST is very similar t o D C T except t hat the transformed region is not restricted t o rectangles. The price paid for is the information about the shape should be transmitted because the transformation kernels depend on the shape of the region. T h e operation of AST includes: find the circumscribed rectangle of t he encoded region, or- thogonalize the transformation kernels according to the shape of the region, and apply AST t o the x and y part- s of the motion field respectively. Finally the param- eters about the shape and two groups of transformed coefficients are sent to the variable length coder for lossless compression. A feedback loop reconstructs the

picture for the synchronization with decoder. Getting together with these coded elements, a control buffer arranges the dat a stream and set the priority flag for each object. In decoder, just the reversed operations of the above are applied. Because the proposed coding method is based on applying AST t o the motion field of the “pseudo object” which is extracted by MOFA, the whole system is called “pseudo object-oriented video coding system” [5]. However, it is different from previous object-oriented approaches. Currently the whole system is written in C language and built in X-Window environment. We develop all the interfaces and make the system user-friendly in an interactive style. Accord- ing to the current simulation results of our system, we can easily compress the test ‘sequences into 16 Kbps or even lower. At the same time, the quality is still guaranteed. We are trying to improve the operations in AST and the process about MFO now. In fact, all primary results have shown that our proposed method- s will be very useful for very low bit-rate video coding applications. Table 2 gives the average performance of the proposed system for several test sequences.

5. CONCLUSION

In this paper, we propose a modified optical flow based motion estimation algorithm. To eliminate the block effect in block matching approaches, a pixel based motion estimation algorithm is suggested. To avoid the weakness of all the other pixel based algorithms, we design several schemes in our modified optical flow algorithm. The suggested edge preserving t erm in cost function guarantees a more realistic motion field. The pyramid approach reduces the possibility of trapped in local minimum. All these strategies make the estimation faster and more accurate even for large movements. We also designed a post-processing method which im- proves the accuracy of motion vectors in both edge and smooth regions. Compared with other motion estimation algorithms, its performance is the best for both subjective view and PSNR. Besides, the generated motion field is more practical for video coding or image understanding.

In the proposed system, MOFA is chosen as the motion estimation algorithm to r e m o v e the temporal

redundancy in video sequence. The generated motion field is segmented into regions and applied with arbitrarily shaped transform t o remove the spatial redundancy in motion field. For conversation application, usually there is only single object on screen and most of the contents on screen are well compensated by the above method; this “pseudo object” is called motion compensated object (MCO). The process of MFO

(4)

should be further improved b!y using some VQ method- s . Generally speaking, our system is a “pseudo object- oriented” approach for very low bit-rate video coding, but different from the object-oriented methods. Since the patterns appear on screen is less restricted in our system, we believe the proposed system is more suitable and practical for videophone or videoconferencing.

In the future, we will combine this motion estimation algorithm with the AST technique which is modified and improved currently t o achieve more da ta compression. To optimize the syst8em, it is interesting to in- vestigate the possibility of optimizing MOFA and AST

all together rather t h an individually. In addition, for some detail operations such ils the actions of eyes and

mouth, several fine compensation methods can also be appended t o advance the acceptance of picture. The short term goal of our group is t o build a prototype of very low bit-rate video coding system. For a long term point of view, we will t ry t o build a multimedia email system, or a multimedia telephone system.

6. REFER:ENCES

[l] K. Aizawa, H. Harashima and T. Saito, “Model- based analysis-synthesis image coding (MBASIC) system for a person’s face”, Signal Processing: Im-

age Communication, vol. 1, pp. 139-152, 1989. [2] H. Slhiller an d M. Hotter, “Investigations on col-

or coding in a n object-oriented analysis-synthesis coder”

,

Signal Processing: Image Communication,

vol. 5, pp. 319-326, 1993.

[3] D.R. Walker and K.R. Rao, ‘‘Improved pel- recursive motion compensation”, IEEE Trans.

Communication, col. C’OM-32, pp. 1128-1134,

OCT. 1984.

[4] B.K.P. Horn and B.G. Schunck, “Determining optical flow”, Artificial Intelligence, Vol. 17, pp. 185-

203, 1981.

[5] C.-Mr. Ku, L.-G. Chen, a,nd Y.-M. Chiu, “A Very

Low Bit-Rate Video Coding System based on Op- tical Flow an d Region Segmentation Algorithm-

s”

,

Proceeding of the SPIE Visual Communication and Image Processing, Taipei, vol. 3, pp. 1318-

1327, May 1995.

[6] M. Gilge, “Coding of arbitrarily shaped image seg- ments based on a generalized orthogonal transfor- m”, Signal Processing: Image Communication, vol

1, PPI. 153-180, 1989. Compensated residues: r o 12 r l 3 (Winner) r2 1 1 r3 lo r4 8 motion field edge

Figure 1: Example of vector substitution on the edge

(vo is substituted for VI).

. . .

_. _.

_I

... . . . ... ... ... ... ... -... ... ...

--

r

.-.-- ..

. _ _ A _

,.-

..---. IC---..

-_.

c . . .-_ 7

_-__

c I .-. c . c . ...

_- _--

-

-<. .... ..

...--

--.-.

----

... .. ..-.__-.- .... ... -.----..-. . .

--

.. r - . - . . . ...

-

. . . ... _---.,---- ... ... ...

...

_.

_.,,...,. ~ ... . . .,,,,, ~~ ____,,,_ ... ... ... . . . . .

...

... ... ... .--,... . .-... ...

.

..._...

...

. . . * ,.,..,. ...

I

. . .

1

Figure 2: The motion field generated by MOFA.

Table 1: The performance of several allgorithms.

Sequence name BMA PRA

1

M i s s A m e r i c a 34.525 37.353 3;.El 36.666 38.751 41.149 42.662 32.705 36.919 38.693 40.856 37.779 39.488 43.019 E l s a J i an C a i s y

Table 2: Primary results of the proposed system.

Sequence name Sequence length

zfy:

7

M i s s A m e r i c a 4 7 16 Kbpr E l s a 51

;;::

9 K b p r

I

C a i s y 32 9 Kbps C M J 530 3 4 d B 7 K b p s

2067

Building a pseudo object-oriented very low bit-rate video coding system from a modified optical flow motion estimation algorithm