A robust visual servo system for tracking an arbitrary-shaped object by a new active contour method

(1)

Proceeding of the 2004 American Control Conference Boston, Massachusetts June 30 -July 2,2004

WeP06.5

A Robust Visual Servo System for Tracking an Arbitrary

-Shaped Object by a New Active Contour Method

Pei-Bng Chen'

Cheng-Ming Huang'

Li-Chen

Fu',~

Department

of

Electrical

Engineering

'

Department of Computer Science

and

Information

Engineering

National Taiwan University, Taipei,

Taiwan, R.O.C.

E-mail:

lichen@ccms.

ntu.edu.

tw

Abstract

This paper presents a real-time, highly reliable, open field visual tracking system, which can automatically detect an arbitrary-shaped object in 3-D space and find out its location so that the camera platform can be controlled to keep the target centered in the monitor image. Even if the object goes through highly cluttered environment or is occluded by other objects, the system should not fail to work properly. And the total processing period is less than 34 ms. The overall system consists of a motion detector, a snake-based outline extraction,

a

hybrid tracking methodology, and a VPDA filter which evolves from Probabilistic Data Association filter (PDA filter). At last, the effective functionality of the visual servo system are confirmed by a series of experiments.

1 Introduction

Visual tracking has been an important topic in computer vision and robotics fields. For practical visual tracking systems, there are some basic functionahties required real-time, automation, and robustness to nonideal situations, such as occlusion and cluttered environment. Surveys show that individual problems mentioned above have been solved. For example, CONDENSATION algorithm [5]

is highly regarded since it solved both the real-time and the robustness problems, but not the automation one. Up to now, however, no literature seems to be found able to cope with all the problems completely. Therefore, an integrated algorithm is presented in the paper to overcome all three problems mentioned above as well as to deal with the occlusion problem by combining template matching and Snake. The proposed algorithm also overcomes the disadvantages of classical Snake.

2 Tracking System

The architecture of our visual tracking system can be easily shown by the block diagram in Fig.1. It consists of four subsystems, including motion detection, Snake -based outline extraction, hybrid tracking algorithm and visual probability data association (VPDA) filter. The on-line image sequences are grabbed by a camera, which

is

mounted on a pan-tilt servo platform.

2.1

Motion Detection

There are several advantages of using motion detection before tracking an object. First, it provides two important clues to an automatically tracking system. Second, we can determine a smaller search range to reduce computation time due the position

(2)

I

I 1 I I

Fig1 The overview of overall system architecture

Fig. 2 Block diagram ofthe motion detection of the moving object in the image, obtained by motion detection with a stationary camera. Third, the segmentation of the moving object can be the basis for automatic initialization of Snake.

As shown in Fig.2, two consecutive image frames

I(k)

and

I(k-1)

are subtracted pixel by pixel, and the results are then hinarized. If the total number of white pixels is less than the moving threshold Thrmaujo8

,

the motion detection unit continues to examine the next two consecutive frames. On the contrary, if the number exceeds the moving threshold, it means that there is a target to be detected. The concept of “moving edge” is included by doing a logic AND operation between the subtracted image and the edge image of the current frame. This highlights the edges within the moving pixel region to obtain the moving edges in the latest image, which denotes the information of target’s outline. Finally, the moving edge image is submitted to generate a proper-sized initialization of Snake for outline

extraction followed by a tracking algorithm

2.2

Snake-Basled Outline Extraction

The objective of the section is to extract the outline of the target by active contour models (also called Snakes) [I]. A modified Snake is presented here on the basis of the Greedy algorithm [2]. The proposed Snake has an external constraint force to speed up the convergence to the desired feature of the object, and provides an easy way to determine if there is a noise point in order to avoid wrong convergence.

If vi = ( x i , y , ) ’ for

i

= 0,1,

...,

N

represents the

N -length

discrete contour, the modified Snake energy is I N = _{{ Y E A ( V , )}₊_P,

_4”!&)

₊_{T A} ( V I ) + %EdS,

1

,=I I

where E,,,, i s the continuity energy, E,, is curvature energy,

E,

is the image energy and

Edr3 is

the constraint energy, called distance

energy.

The parameters

q,jj

__

y; and

v,

are

used

to

balance the

relative

influence of

the

fourth terms.

h e

mathematical formula

of

each

energy term is

where

2

is the’average distance between contour

(3)

points

v,

and

v , ( j )

represents the eight neighbors of a point vt for

j

= 0,1,

...,

8 .

VI(v,) denotes the

image

intensity

of

the edge

image

at current position and D(v,)denotes the

absolute

value

of

the

distance between the

current

position and

the

center

of

the object; VI,, ( Om, )

and

VI,,,,,

(O,,,,,

) denote

the

maximum and minimum

image intensity (distance

fiom

the center

of

the object)

in

the

neighborhood,

respectively.

The

moving direction

of

the Snake can be decided by adding

the

distance energy to or subtracting it

from

the

Snake energy. Hence, the

automatic initialization

of

Snake

can be solved by

combining

the

modified Snake and motion

detector, which

will

be described in next section. The concept of Greedy algorithm [2] is to take minimized

**E*Tnob**

as sum of each single minimized

E,,

(v,

1

I that is N

min

~ *

=

~

min

~

E ~ , , ~ ~ ~

~

(v,

. k ~ (3)

,=I

Figure 3(a) demonstrates how the iterative Greedy algorithm works. The energy function is computed for the current position

v,

and each of its neighbors. Then the location having the smallest value is chosen as the new position

v;

. As a result of repeating the aforementioned process point hy point, all points of the contour keep moving forward to their corresponding new positions and then form a new contour. Throughout the processes, the Snake completes an iteration of deformation loop. The Snake will repeat the deformation loop again and again until it converges to the desired feature of the

Fig.

3

(a)

The demonstration of Greedy algorithm.

(b)

The

deformable

of

Snake.

(a) (b)

Fig.4 In moving edge image, (a)shows the bounding box of the object, and (h) is the searching area and its center.

From the result of motion detector, a moving object is detected. Assume the whole body of the object is completely inside a 140x140 area, which is the searching area. We choose a proper-sized ellipse (including circles), which can enclose the whole moving object inside, as an initialization of Snakes. And then the

center

of ellipse

is

located at

the

center of the bounding box that contains the object inside. Then, axes length of the initial contour are determined according to the width and height of the hounding box. At the same time, we also use the center of the minimized rectangle as the center of the

(4)

Different from the traditional Snake-based tracking algorithms [3]-[IO], Snake is only used to extract the outline of the target rather than to perform tracking in our system because that Snake-based tracking algorithms should be under the assumption of slowly moving object, which unfortunately imposes serious constraints on the general use of tracking. The paper presents a contour matching method to make use of the extracted contour model for the purpose of tracking an arbitrary-shaped object. But the method also has a restriction against low-contract environment, just like what Snake-based tracking algorithms encountered. Therefore, we integrate the most commonly used method in visual tracking, called template matching, with contour matching.

The details of contour matching are described as

follows. To highlight the contour of an object in an image, the edge image is used for object tracking (see Fig.5). Just like template matching, we need to sum the total gradient values pixel by pixel in the edge image along a pre-extracted contour model. After summing over a contour model, we will shift the center of contour model to the next pixel and compute the total sum along the perimeter of the contour model again. After going through all the search area, we can get the largest value corresponding to the distribution of edge pixels which resembles the object's contour, whereby the true object's contour is successfully located. The numerical process in each searching loop as mentioned above can be summarized by the following normalized sum equation

(4)

where s = ( x , y ) represents the position of the center of contour model.

N .

is the total numbers

Fig.5 Search the object by matching contour model in the edge image.

of all pixels along the perimeter of contour model, and gj denotes the gradient value of each pixel along the perimeter of contour model. Combined with sum of absolute'difference (SAD) template matching, The best object position is

s*

=

arg"{

r , t S

?$As,)+

i L , ( S , ) } >

( 5 ) where

-

+8(s)-min4z(s;)

I , E S ( s ) = and

max

4x

(s,

1

-

min

4g

(s,

1

s,es *,tS

-

4 s A D (s,

1 -

h,

( S I

maxbSA,,,(sj)-min4

(s ) I, E S s,es sAu

'

4 S A D (SI =

are the normalized form, S is the center position of the contour model as well as the template, and

s

is the search area. '

3 Trajectory Design

Highly cluttered environment may contain many background lookb that are quite resembling the target and then result in false alarms, which may further cause error accumulation in the subsequent tracking. Not only a bad template or contour will be involved, but also the trajectory detection will be mislead,

(5)

which seriously affect the position of the searching window. Thus, it gives rise to a high possibility of loss of tracking once the best match is a false match. For that reason, Visual Probabilistic Data Association (VPDA) is adopted owing to the challenge from cluttered environment in reality. It can provide a more reliable approach to predict next position of the target, and enhance the robustness of the tracking system, even in cluttered environment. Actually, the VPDA filter is the original probabilistic data association (PDA) filter [7] integrated with the visual information, introduced in [8].

The concept of the PDA filter is to take all possible targets into account instead of the best match one that may be mimicked by parts of the cluttered environment, and then produce a weighted-average output from all possible candidates. The method applied for computing the weights is probability. It computes the posterior association probabilities for all current possible candidate measurements and uses them to form a weighted sum of innovations for updating the target’s state in a suitably modified version of the Kalman Filter.

4 Experimental

Results

The performance of the Snake-based outline extraction is quite appealing and very fast even if the object appears in a highly cluttered environment, as shown in Fig6. Figure 7-9 demonstrate

tracking

with

a hybrid

method with contour matching

plus

SAD

template matching. Experimental results show that our tracking system is highly robust against occlusion, low-contrast environment and rapid motion.

5 Conclusion

Different from the traditional video surveillance system, the major contribution of this work is to establish an integrated visual servo system, which can track an arbitrary-shaped object in highly noisy environment, even with some occlusion on the target. The visual servo system can perform the tracking processes with 320x240 image size in nearly real-time constraint (less than 34 ms) and center the target up to 15-pixel square range tolerance. Furthermore, we made improvement for Snake: i) efficient automatic initialization of Snake via cooperation with a motion detection, ii) modification to Snake energy function so that initial Snake contour may not need to be close enough to the desired target outline, iii) a guideline is devised to update the parameters of Snake for the purpose of more robust performance. In order to robustify the hereby proposed visual tracking method, the so-called hybrid tracking method which integrates contour matching and SAD template matching is developed. Such a novel method is shown to be able to track high-speed moving object subject to very economical computation cost. At last, this thesis further develops a strengthened VPDA filter to enhance the robustness of tracking capability.

Reference

[ I ] M. Kass, A. Witkin, and D. Terzopoulos, “Snake: Active Contour Models,” Inf. J. Comput.

Ks.,

Vol.

1, pp. 321-331, 1987.

[2] D.J. Williams and M. Shah. “A Fast Algorithm for Active Contours and Curvature Estimation,”

CVGIP: Image Understanding, Vol. 5 5 , No. 1, pp. 14-26, Jan. 1992.

[3] C. Xu and J.L.Prince, “Snakes, Shapes, and Gradient Vector Flow”, IEEE Trans. Image

(6)

Processing, Vol. 7, No. 3, pp. 359-369, 1998.

[4] J. Denzeler, and H. Niemann, “A Two Stage Real-Time Object Tracking System,” In Pavesi’c

et al.

[SI Isard M. and Blake A., “CONDENSATION- Conditional Density Propagation for Visual Tracking,” Int. J Computer Vision, pp. 1-36,

1998.

[6] J. Denzeler and H. Niemann, “A New Energy Term Combining Kalman Filter and Active

Contour Models for Object Tracking,” Machine

Graphics W o n , Vol. 5(1/2), pp. 157-165, 1996. [7] Bar-Shalom, Y., E. Tse, “Tracking in a Cluttered

Environment with Probabilistic Data Association,” Automafica, Vol. 11, pp. 451-460,

1975.

[SI Liu, D., “Real-Time Visual Tracking in Cluttered Environment with a Pan-Tilt Camera,” Master

Thesis, Dept. of Electronical Eng., National Taiwan University, 2001.

Fig.6 It takes only 9 iterations to extract the tank’s outline with 84, pixel-length of initial contour.

(b) (c)

Fig.9 Tracking quick motion in a cluttered background.