Proceeding of the 2004 American Control Conference Boston, Massachusetts June 30 -July 2,2004
WeP06.5
A Robust Visual Servo System for Tracking an Arbitrary
-Shaped Object by a New Active Contour Method
Pei-Bng Chen'
Cheng-Ming Huang'
Li-Chen
Fu',~
Department
of
ElectricalEngineering
'
Department of Computer Science
and
InformationEngineering
National Taiwan University, Taipei,
Taiwan, R.O.C.
E-mail:
lichen@ccms.ntu.edu.
twAbstract
This paper presents a real-time, highly reliable, open field visual tracking system, which can automatically detect an arbitrary-shaped object in 3-D space and find out its location so that the camera platform can be controlled to keep the target centered in the monitor image. Even if the object goes through highly cluttered environment or is occluded by other objects, the system should not fail to work properly. And the total processing period is less than 34 ms. The overall system consists of a motion detector, a snake-based outline extraction,
a
hybrid tracking methodology, and a VPDA filter which evolves from Probabilistic Data Association filter (PDA filter). At last, the effective functionality of the visual servo system are confirmed by a series of experiments.1 Introduction
Visual tracking has been an important topic in computer vision and robotics fields. For practical visual tracking systems, there are some basic functionahties required real-time, automation, and robustness to nonideal situations, such as occlusion and cluttered environment. Surveys show that individual problems mentioned above have been solved. For example, CONDENSATION algorithm [5]
is highly regarded since it solved both the real-time and the robustness problems, but not the automation one. Up to now, however, no literature seems to be found able to cope with all the problems completely. Therefore, an integrated algorithm is presented in the paper to overcome all three problems mentioned above as well as to deal with the occlusion problem by combining template matching and Snake. The proposed algorithm also overcomes the disadvantages of classical Snake.
2 Tracking System
The architecture of our visual tracking system can be easily shown by the block diagram in Fig.1. It consists of four subsystems, including motion detection, Snake -based outline extraction, hybrid tracking algorithm and visual probability data association (VPDA) filter. The on-line image sequences are grabbed by a camera, which
is
mounted on a pan-tilt servo platform.2.1
Motion DetectionThere are several advantages of using motion detection before tracking an object. First, it provides two important clues to an automatically tracking system. Second, we can determine a smaller search range to reduce computation time due the position
I
I
I
I
I
I 1 I I
Fig1 The overview of overall system architecture
Fig. 2 Block diagram ofthe motion detection of the moving object in the image, obtained by motion detection with a stationary camera. Third, the segmentation of the moving object can be the basis for automatic initialization of Snake.
As shown in Fig.2, two consecutive image frames
I(k)
andI(k-1)
are subtracted pixel by pixel, and the results are then hinarized. If the total number of white pixels is less than the moving threshold Thrmaujo8,
the motion detection unit continues to examine the next two consecutive frames. On the contrary, if the number exceeds the moving threshold, it means that there is a target to be detected. The concept of “moving edge” is included by doing a logic AND operation between the subtracted image and the edge image of the current frame. This highlights the edges within the moving pixel region to obtain the moving edges in the latest image, which denotes the information of target’s outline. Finally, the moving edge image is submitted to generate a proper-sized initialization of Snake for outlineextraction followed by a tracking algorithm
2.2
Snake-Basled Outline ExtractionThe objective of the section is to extract the outline of the target by active contour models (also called Snakes) [I]. A modified Snake is presented here on the basis of the Greedy algorithm [2]. The proposed Snake has an external constraint force to speed up the convergence to the desired feature of the object, and provides an easy way to determine if there is a noise point in order to avoid wrong convergence.
If vi = ( x i , y , ) ’ for
i
= 0,1,...,
N
represents theN -length
discrete contour, the modified Snake energy is I N = { Y E A ( V , ) + P,4”!&)
+ T A ( V I ) + %EdS,1
,=I Iwhere E,,,, i s the continuity energy, E,, is curvature energy,
E,
is the image energy andEdr3 is
the constraint energy, called distanceenergy.
The parameters
q,jj
__
y; andv,
are
used
tobalance the
relativeinfluence of
the
fourth terms.
h e
mathematical formulaof
each
energy term is
where
2
is the’average distance between contourpoints
v,
andv , ( j )
represents the eight neighbors of a point vt forj
= 0,1,...,
8 .
VI(v,) denotes theimage
intensityof
the edgeimage
at current position and D(v,)denotes theabsolute
valueof
the
distance between thecurrent
position andthe
center
of
the object; VI,, ( Om, )and
VI,,,,,
(O,,,,,
) denotethe
maximum and minimum
image intensity (distance
fiom
the centerof
the object)in
the
neighborhood,respectively.
The
moving direction
of
the Snake can be decided by addingthe
distance energy to or subtracting itfrom
the
Snake energy. Hence, the
automatic initializationof
Snake
can be solved bycombining
the
modified Snake and motion
detector, which
will
be described in next section. The concept of Greedy algorithm [2] is to take minimizedE*Tnob
as sum of each single minimizedE,,
(v,1
I that is Nmin
~ *=
~min
~E ~ , , ~ ~ ~
~(v,
. k ~ (3),=I
Figure 3(a) demonstrates how the iterative Greedy algorithm works. The energy function is computed for the current position
v,
and each of its neighbors. Then the location having the smallest value is chosen as the new positionv;
. As a result of repeating the aforementioned process point hy point, all points of the contour keep moving forward to their corresponding new positions and then form a new contour. Throughout the processes, the Snake completes an iteration of deformation loop. The Snake will repeat the deformation loop again and again until it converges to the desired feature of theFig.
3
(a)The demonstration of Greedy algorithm.
(b)
The
deformableof
Snake.(a) (b)
Fig.4 In moving edge image, (a)shows the bounding box of the object, and (h) is the searching area and its center.
From the result of motion detector, a moving object is detected. Assume the whole body of the object is completely inside a 140x140 area, which is the searching area. We choose a proper-sized ellipse (including circles), which can enclose the whole moving object inside, as an initialization of Snakes. And then the
center
of ellipseis
located atthe
center of the bounding box that contains the object inside. Then, axes length of the initial contour are determined according to the width and height of the hounding box. At the same time, we also use the center of the minimized rectangle as the center of theDifferent from the traditional Snake-based tracking algorithms [3]-[IO], Snake is only used to extract the outline of the target rather than to perform tracking in our system because that Snake-based tracking algorithms should be under the assumption of slowly moving object, which unfortunately imposes serious constraints on the general use of tracking. The paper presents a contour matching method to make use of the extracted contour model for the purpose of tracking an arbitrary-shaped object. But the method also has a restriction against low-contract environment, just like what Snake-based tracking algorithms encountered. Therefore, we integrate the most commonly used method in visual tracking, called template matching, with contour matching.
The details of contour matching are described as
follows. To highlight the contour of an object in an image, the edge image is used for object tracking (see Fig.5). Just like template matching, we need to sum the total gradient values pixel by pixel in the edge image along a pre-extracted contour model. After summing over a contour model, we will shift the center of contour model to the next pixel and compute the total sum along the perimeter of the contour model again. After going through all the search area, we can get the largest value corresponding to the distribution of edge pixels which resembles the object's contour, whereby the true object's contour is successfully located. The numerical process in each searching loop as mentioned above can be summarized by the following normalized sum equation
(4)
where s = ( x , y ) represents the position of the center of contour model.
N .
is the total numbersFig.5 Search the object by matching contour model in the edge image.
of all pixels along the perimeter of contour model, and gj denotes the gradient value of each pixel along the perimeter of contour model. Combined with sum of absolute'difference (SAD) template matching, The best object position is
s*
=arg"{
r , t S?$As,)+
i L , ( S , ) } >
( 5 ) where-
+8(s)-min4z(s;)
I , E S ( s ) = andmax
4x
(s,1
-min
4g
(s,1
s,es *,tS-
4 s A D (s,1
-
h,
( S ImaxbSA,,,(sj)-min4
(s ) I, E S s,es sAu'
4 S A D (SI =are the normalized form, S is the center position of the contour model as well as the template, and
s
is the search area. '3 Trajectory Design
Highly cluttered environment may contain many background lookb that are quite resembling the target and then result in false alarms, which may further cause error accumulation in the subsequent tracking. Not only a bad template or contour will be involved, but also the trajectory detection will be mislead,
which seriously affect the position of the searching window. Thus, it gives rise to a high possibility of loss of tracking once the best match is a false match. For that reason, Visual Probabilistic Data Association (VPDA) is adopted owing to the challenge from cluttered environment in reality. It can provide a more reliable approach to predict next position of the target, and enhance the robustness of the tracking system, even in cluttered environment. Actually, the VPDA filter is the original probabilistic data association (PDA) filter [7] integrated with the visual information, introduced in [8].
The concept of the PDA filter is to take all possible targets into account instead of the best match one that may be mimicked by parts of the cluttered environment, and then produce a weighted-average output from all possible candidates. The method applied for computing the weights is probability. It computes the posterior association probabilities for all current possible candidate measurements and uses them to form a weighted sum of innovations for updating the target’s state in a suitably modified version of the Kalman Filter.
4
Experimental
Results
The performance of the Snake-based outline extraction is quite appealing and very fast even if the object appears in a highly cluttered environment, as shown in Fig6. Figure 7-9 demonstrate
tracking
with
a hybridmethod with contour matching
plus
SAD
template matching. Experimental results show that our tracking system is highly robust against occlusion, low-contrast environment and rapid motion.5
Conclusion
Different from the traditional video surveillance system, the major contribution of this work is to establish an integrated visual servo system, which can track an arbitrary-shaped object in highly noisy environment, even with some occlusion on the target. The visual servo system can perform the tracking processes with 320x240 image size in nearly real-time constraint (less than 34 ms) and center the target up to 15-pixel square range tolerance. Furthermore, we made improvement for Snake: i) efficient automatic initialization of Snake via cooperation with a motion detection, ii) modification to Snake energy function so that initial Snake contour may not need to be close enough to the desired target outline, iii) a guideline is devised to update the parameters of Snake for the purpose of more robust performance. In order to robustify the hereby proposed visual tracking method, the so-called hybrid tracking method which integrates contour matching and SAD template matching is developed. Such a novel method is shown to be able to track high-speed moving object subject to very economical computation cost. At last, this thesis further develops a strengthened VPDA filter to enhance the robustness of tracking capability.
Reference
[ I ] M. Kass, A. Witkin, and D. Terzopoulos, “Snake: Active Contour Models,” Inf. J. Comput.
Ks.,
Vol.1, pp. 321-331, 1987.
[2] D.J. Williams and M. Shah. “A Fast Algorithm for Active Contours and Curvature Estimation,”
CVGIP: Image Understanding, Vol. 5 5 , No. 1, pp. 14-26, Jan. 1992.
[3] C. Xu and J.L.Prince, “Snakes, Shapes, and Gradient Vector Flow”, IEEE Trans. Image
Processing, Vol. 7, No. 3, pp. 359-369, 1998.
[4] J. Denzeler, and H. Niemann, “A Two Stage Real-Time Object Tracking System,” In Pavesi’c
et al.
[SI Isard M. and Blake A., “CONDENSATION- Conditional Density Propagation for Visual Tracking,” Int. J Computer Vision, pp. 1-36,
1998.
[6] J. Denzeler and H. Niemann, “A New Energy Term Combining Kalman Filter and Active
Contour Models for Object Tracking,” Machine
Graphics W o n , Vol. 5(1/2), pp. 157-165, 1996. [7] Bar-Shalom, Y., E. Tse, “Tracking in a Cluttered
Environment with Probabilistic Data Association,” Automafica, Vol. 11, pp. 451-460,
1975.
[SI Liu, D., “Real-Time Visual Tracking in Cluttered Environment with a Pan-Tilt Camera,” Master
Thesis, Dept. of Electronical Eng., National Taiwan University, 2001.
Fig.6 It takes only 9 iterations to extract the tank’s outline with 84, pixel-length of initial contour.
(b) (c)
Fig.9 Tracking quick motion in a cluttered background.