riented Video Coding
Algorithm
Y
Bit-Rate
System
Liang- Gee Chen, You-Ming Chiu,
Tzi-Dar
Chiueh, Her-Ming Jong
Department
ofElectrical Engineering
National Taiwan University
Taipei, Taiwan,
R. 0. C.
et
In this paper, a video coding method based on re-
gion segmentation is proposed. I t is designed for very low bit-rate s y s t e m such as video-phone and video- conferencing. Instead of block-based coding, we adopt a n object-oriented approach, which first calculates the motion vector of each pixel using a modified optical flow approach, then segments the motion field into circum- scribed rectangular regions t h a t can be efficiently coded. Simulation results of several typical image sequences re- veal t h a t the proposed algorithm is both effective and of high performance.
uct ion
In many standard video coding systems, motion com-
pensation
(MC)
with block-oriented motion estimation(ME)
algorithms is widely used. However, disadvan-tages such as block effect degrade the subjective perfor-
mance; and it’s inefficient t o encode a t the moment of typical video-phone and video conference applications.
T o enhance coding efficiency, several approaches such as
model-based coding[l] and analysis-synthesis coding[2] have been proposed. Nevertheless, these method require complex procedures in computation and system control. Therefore, a n efficient and simpler algorithm is still lack- ing for the very low bit-rate application.
An algorithm proposed years ago, pel-recursive al-
gorithm (PRA)[3], is a gradient-descent approach that
minimizes (E,u
+
E y v - E t [ , where E indicates the lu- minance of picture, ( u , v ) is t h e motion vector viewed from the current frame t o the previous frame, and the subscripts 3: and y represent partial differentiation; forexample, E , indicates
E.
P R A makes a very good es- timation for each pixel, i.e. the residual error, or thedisplaced frame difference (DFD), is small.
However, because each vector is calculated locally, a
densely and randomly distributed motion field is gener- ated and it is impossible t o transmit these motion vec-
tors efficiently. Optical flow algorithm( QFA) [4] is simi- lar t o PRA except that the cost function t o be minimized is now
E =
/
/ ( ( E , U+
E , v-
E t ) 2+
A(+
U;+
v:+
v i ) )dxdy.
T h e equation is composed of two terms: t h e former is the same as t h a t in PRA, which is called constraint term; the latter is a smoothness term.
In this paper, a n efficient video coding method based on QFA is proposed. By combining region segmenta tion, the proposed method can provide very low bit-rate coding and still preserve high subjective quality. The
derivation of the algorithm from OFA is shown in Sec.
2. Details of the proposed algorithm is described in Sec. 3. Sec. 4 presents the simulation results and some dis- cussions. T h e conclusion is made in Sec. 5.
(1)
2
Solution to
low
When the minimum of Eq. (1) is reached,
$
=$j
= 0. That is, we havea& a
a&
a
aE
-_---
= 0 ,au
axau,
ayaUy
a&
a
a&
a
ae
av
axav,
a y a v y
-_---
= 0 , Therefore, aZu a2u I ax2 ay2x
a Z va%
1 a x 2 ay2x
-
+
-
= -(E,u
+
E Y v - Et)E,,-
+
-
= -(E,u+
E ~ v - Et)E,.By replacing U , and uy with ui,j - u i - 1 , ~ and uy with
ui,j - ui,j-1 respectively, we have
-
Similarly, 1
-
azv
a t 2+
-
ay2=
4 T - 4 v .-
1 0 -1 0 - 0 4A more clear description o f 5 and T is shown in Fig. l.(a).
Therefore, the motion vector
at
position (i,j) becomes(3)
Figure 1: (a) Weights t o compute E and ii (b) Weights
to compute 0 and 6
In order t o code the motion vectors further efficiently, we have to come up with motion field with less spa- tial variation. Instead of using extra estimation filtering scheme[5], we proposed a modified cost function which
is shown as follows:
E =
/
/ ( ( % u + E . w-&I2
+a(u:+uz+w: +w:)+P(u:, +v;,))dxdy.(4)
The additional third term indicates the penalty on con- voluted edges. In other word, the cost function favors motion fields with smoother edges. To minimize the cost function, the following relation must be satisfied
+
--
=
0.a& a a&
a
a&
a 2aE
au
a t a u x
a y a u ,axayauxy
-
- -- -
--
By similar derivation, we have
1
x
4(aV
+
Pe
- (a+
p ) v >=
-(&U+
Eyv - where 0, 6 indicates as shown in Fig. l.(b). Leta5
+
pa
=
(a+
P)u*, aV+
pe
= (a+
p).’,
then (5) E ~ u * -I-E ~ v *
-
Et uj,3 = U* - 4 ( a +P )
+
E:+
E; Ex’We can find Eq. (5)(6) has a similar form as that of Eq. (2)(3).
3
The
Proposed
Coding
Al
rit
h:m
The proposed algorithm, which is called as modified op- tical flow algorithm(MOFA), consists of two phases: the first step is to find the interframe motion flow ( U , v); the
other is to segment the flow and t o code the information.
To obtain the motion flow, we can iteratively use the Eq. (5)(6). It should be noticed that in each iteration, the motion vector of each pixel is calculated indepen- dently and parallelly.
After the motion field is obtained, we segment these motion vectors and record them into coding linked list (CLL). Each element in the linked list represents one “object” in the frame. An object is defined as a re-
gion that contains the same nonzero motion vector. To
encode a n object, we have t o record its shape and the motion vector. At first, determine the circumscribed rectangle of this “object” by traversing its contour, as
shown in Fig. 2.(a), and identify the pixels in the rect- angle with the nonzero vector as “1” and the others as
“0”. Next, transform the bit-map into a bit-stream by a
spiral-like scanning (Fig. 2.(b)) which may generate as
many contiguous 1 or 0 as possible. For example, the
bit-stream in Fig. 2.(a) is
000011100001!000011110011l11111111011101111111~1~.
Therefore, the information of a n object contains the lo-
cation and &:e of the rectangle, the vector, and the bit- stream, as shown in Fig. 3. The whole encoding algo- rithm is listed below:
1. Clear the CLL and initialize all ( U , v ) to zero.
2. For k=:l to n u m b e r of iterations for i, j in raster scanning
find (ui,j, vui,j) by Eq. (5)(6).
3. Check the pel. If its motion vector is nonzero and if it doesn’t belong t o any traversed objects, then goto step 4, else check the next pel.
4. Find the circumscribed rectangle of this “object”
and append the information of the object to the CLL.
5. Goto step 3 until all pixels in current frame are s-
canned.
6. Transmit the whole CLL for further possible lossleas
scheme, such as VLC and RLC.
In the proposed algorithm, residual error will not be sent. Consequently, to enhance the performance we em- ploy a n extra stationary region test. If the DFD with MC is larger t h a n non-compensated difference, then t h e motion vector of the pixel is set to be zero. In other words, if D F D ( u , v )
>
D F D ( 0 , O), let ( U , v)=
(0,O).location size
I
vec
bit-stream
Figure 3: D a t a structure of coding linked list
T h e parameters a and
p
must be chosen properly.If a is too large, the motion vectors tend t o group to- gether, which degrades the performance badly. On the other hand, if a is too small, motion vectors distribute randomly and are difficult t o code. T h e parameter
p
should be chosen much less than CY; if it is larger than a,a n oscillated motion field may be generated.
dation Result and Discus-
SicPE3Similar vectors in the motion field tend t o group togeth- er because of the smoothness term in Eq. (5)(6). Fig. 4
shows the amplitude of motion vectors between 2 succes-
sive frames in the sequence Claire. We find most portion
of t h e frame is stationary and other parts sensitive t o human eyes, say, the mouth and the eyes, will be coded accurately because their vectors are calculated precisely. The picture reconstructed with our approach, therefore, is of high quality.
Performances and bit-rates of the sequence Claire us-
ing different approaches are shown in Fig. 5 and Fig. 6,
respectively. OFA16 indicates t h a t utilizing OFA with
16 iterations; in the same manner, MOFA16 and MO-
FA8 indicate utilizing MOFA with 16 and 8 iterations, respectively. According t o the results, MOFAl6 and 0- FA16 are almost the same in PSNR, while the former requires less bandwidth when the bit-rate is relatively higher. I t is worth noticing that MOFAB degrades a lit- tle in PSNR but requires much lower bit-rate. Therefore,
we choose this approach as the final proposed algorithm.
Figure 4: T h e amplitude of motion flow of Claire
5 2 I 1 I 0 5 1 5 2 0 2 5 3 0 3 5 Frame Number 1 0 3 6
Figure 5: The performance of sequence Claire
We list the average PSNRs of 4 sequences in Table 1
with the proposed algorithm in comparison with 16 x 16
full-search block matching algorithm (FSBMA). We use
the MOFA with 8 iterations, (a = 1024,
p
= 64). Fromthe simulation result we find t h a t the performances of MOFA are better than or close t o FSBMA in PSNR except for the case of susie, which is a fast and global moving sequence. Nevertheless, our reconstructed im-
ages may look better t h a n that of FSBMA subjectively
because of the object-oriented characteristic of the pro- posed approach. T o enhance t h e PSNR, more iterations are needed but
it
may entail a higher bit-rate.The average bit-rates
(
bits per pixel) are also list- ed in Table 1. T h e results of m i s s a n d Claire are very good, whilethat
of s a l e s m a n is moderate. However, theresult for the sequence susie i s not satisfactory. It is because the distribution of motion flow is sparse due t o the zooming and global movement in this sequence. Furthermore, because the initial guesses are zero, and 8 iterations is not enough t o find the true motion vectors. Besides, our algorithm is vulnerable for high-detailed- moving or fast-moving sequence. High-detailed-moving implies t h a t there're so many local moving with different
0 1 0 09 0 0 8 0 07 0 oc 0 05 0 0 4 0 01 0 0 2 0 01 0 0 5 1 5 2 0 2 5 3 0 3 5 Frame Number 10
Figure 6 : The bit-rate of sequence claim
large amount of side information due to lots of segment- ed regions. On the other hand, motion of fast-moving objects is difficult t o calculate through limited number
of iterations, and this causes a poor performance.
MOFAB(bpp)
I
0.1331
0.040I
0.014I
0.023U
Table 1: Average performance and bit-rate for different
sequences
Figure 7: The reconstructed image of susie with PSNR= 28.5 dB
Due to the smoothness term, the motion vectors dis-
tribute much more regularly than that in
PRA.
So wecan segment them into “objects” and code them effi- ciently. The additional penalty term results in fewer and
more regular regions without degrading the performance
because a little shape distortion won’t be noticed by hu-
man eyes. In addition, the larger residual error occurs
Figure 8: The original image of susie
on the edges of the moving object, where .is neglected by
human eyes. Therefore, the subjective performance of the reconstructed €rames using the proposed algorithm must be better than those using block-based methods even if the PSNRs are the same. Fig. 7 shows the re-
constructed image of susie using MOFA. Although the
PSNR is as low as 28.5 dB, it looks almost as good as the original image, as shown in Fig. 8.
5
Conclusion
Optical flow is originally applied in image analysis be- cause it makes a precise estimation o f object motion.
The proposed algorithm is inspired from its feasibility of
segmentation of’ motion vectors. Because the high quali- ty of motion compensated pictures, there are no residual errors to transmit. To code these motion vectors effi- ciently, we also propose a “circumscribed rectangular” segmentation method, and the required bit-rate is quite low.
The proposed approach also provide a multi-
functional system. The transmitted information can be used not only for image reconstruction but also for im- age analysis because it is coded with object orientation. Some traditional coding schemes such as the block-based method provide much limited information for analysis. Therefore, for different requirement, all we have to do at the transmitter is to adjust some parameters with the
same architecture.
When applying the proposed algorithm to video-
phone of 180x 120 and 10 frames per second, the bit-rate
of 0.04 bpp results in a transmission rate of 8.64 Kbps,
which is feasible for current telephone network. Con-
sidering wider alpplications in real-world systems, there are some improvements for further study. One is the
adaptability: to provide a more uniform bit-rate that
doesn’t severely depend on the content of frames, a bit-
rate control is needed; that is, the system should be able
to adjust the amount of coded information. The oth- er is the complexity: although the proposed algorithm
i s much simpler than other previously reported object- oriented approaches, the computation is still too heavy
for practical applications. To reduce the system cost, a
simplified algorithm, which is feasible for hardware im- plementation while preserving the good performance, is being developed.
rences
[l] Y. Nakaya, Y.C. Chuah, and H. Harashima,
“Model-based/waveform hybrid coding for video-
phone images”,ICASSP, Vol. 4, pp. 2741-2744,
1991.
[2] Harald Schiller, Michael Hotter, “Investigations on color coding in a n object-oriented analysis-synthesis coder”,Signal Processing: Image Communication, Vol. 5, pp. 319-326, 1993.
[3] A. Metravali, J . Robbis, “Motion compensated tele- vision coding part I”, Bell Syst: Tech.
J.,
Vol. 58, pp. 631-670, March. 1979..P. Horn, B. G . Schunck, “Determining optical
’, Artificial Intelligence, Vol. 17, pp. 185-203,
1981..V. Brandt, “Object tracking and motion estima-
tion with a moving camera”, Proc. ASST, 1990,