On the Distribution-Based Tracking Systems

(1)

Proceedings of the 2004 IEEE

International Conference on Networking, Sensing E Control Taipei, Taiwan. Mwch 21-23, 2004

On the Distribution-Based Tracking Systems

Hwann-Tzong Chen'.' Tyng-Luh Liul Chiou-Sham FuhZ 'Institute of Information Science, Academia Sinica, Nankang, Taipei 115, Taiwan

'Department of CSIE, National Taiwan University, Taipei 106, Taiwan

Abstract

We investigate the issues of object representation and search techniques for distribution-based tracking systems.

While representing objects by color distributions has the advantage to capture the essential ponion of a tracked ob- ject, it generally does not handle scale changes appropn- arely We thus adopt a new object representation by inte- grating color and edge information via two coupled weight- ing schemes derivedfmm a covariance ellipse model. The representation allows a system toperform optimization over a continuous space and to yield better tracking perfor- mances. On the aspect of search techniques, we discuss two popular iterative optimization approaches: line-search and trust-region methods. We demonstrate the differences of the ^twoby analyzing the quality of their respective solutions through numerical and real tracking examples.

Keywords: Vision, tracking, trust-region methods.

1. Introduction

Visual tracking is an important topic with practical a p plications in vision research. There are quite a number of methods proposed over the years. Nevertheless, our review focuses only on distribution-based tracking systems.

1.1. Related work

If the objects to he tracked are non-rigid, it is convenient to represent them with probability distributions of some salient features of the objects. The most straightforward way to derive a distribution model is through histogram analysis [l], [21, [3], [4]. Birchfield [l] has proposed an algorithm for tracking a person's head by modeling it as a vertical ellipse with a fixed aspect ratio. In [2], Bradski presents a CAMSHIFI (continuously adaptive mean shift) system for use in a perceptual user interface to track face.

Comaniciu et al. [4] apply mean shift analysis to real-time tracking for non-rigid objects. They model objects by color

distributions, which are constructed via kernel density es- timation, and then measure the similarity between the target and candidate distributions using a Bhattacharyya coefficient. The best location that maximizes the similarity measure is found iteratively by moving along the direction of a mean shift vector, which approximates the density gradient of the Bhattacharyya coefficient.

More recently, Perez et al. [7] use the Bhattacharyya co- efficient to compare two color distributions. They incorpo- rate this similarity measurement of color distributions into the observation likelihood of a probabilistic framework, and can then apply the particle filter technique to color-based tracking. Also, in [9], Zhang and Freedman derive a PDE- based curve Row that describes the evolution of an object's contour. The curve Row is guided by the likeness between the candidate and target color distributions. The likeness can be measured by either the Bhattacharyya coefficient or by the Kullback-Leibler divergence.

2. Distribution-Based Tracking Systems

Tracking objects by distributions is efficient but not nec- essarily sufficient if appreciable amount of changes in scale and shape are present. Suppose that the scale of a target object of monotone color is enlarged. Then, it is not al- ways guaranteed that the appropriate scale will he recovered since, in this case, any sub-portion of the object has a similar distribution to that of the object, i.e., the optimal solutions are not unique. Therefore other tracking cues are needed to elevate the performance of a distribution-based tracker, e.g., [SI. In our system. the representation model consists of two elements: the first one is to characterize the RGB color distribution, and the second is to estimate the edge point density near the boundary of an object. Since the lat- ter is contributed only from samples near the boundary, and more prone to be affected by the scene background, especially in clutter, its significance is weighted less to make the color distribution the primary cue for determining a target object's whereabouts.

(2)

2.1. Representation models

Color distribution: We divide the RGB color space into n bins to model the color distribution. A single-valued bin assignment function b is defined uniquely by pixel's RGB value as b : xi H {1,

. . . ,

n}, where xi is any pixel in an image. To account for non-rigidity, a weighting scheme based on the bivariate normal distribution is adopted so that color features at different locations within an ellipse are treated differently. In particular, we use E ( X ; C) = 1 to represent an ellipse centering at p = ( p , , pz), where

5

=

( p , o , p ) isafive-dimensionalvector,andu = (a1,uz) and p are related to the lengths of the principal semi-diameters and the rotation angle with respect to the horizontal axis.

Then, the area within the ellipse can be represented as A1(C) = {x

I

E(x;C) 5 l } , where the subscript 1 is to emphasize that each such an ellipse is indeed the covari- ance ellipse of a bivariate normal distribution.

To compute a probability distribution of color for tracking, let I o be the first image frame and

Co

= (fi0,uo, P O ) .

Furthermore, a target object initially centering at p o can be associated with A1(Co) = {x

I

E(X;

C o )

5 ^l},the area enclosed by the corresponding covariance ellipse, E l ( C o ) . We then define the target's color distribution within A I ^(CO), denoted as p(u;

Co),

by

where 6 is the Kronecker delta function, and w c is the weighting function for color distribution derived from the bivariate normal, Le.,

That p ( u ;

6')

is a probability implies that the total weight

C, =

cxi

^{c o )}wc(xi;Co). For convenience, the nota-

tionp(u; C0) will be abbreviated into p(u) since

Co

only describes the target's initial state. Analogously, during tracking an image area enclosed by some A 1 (C), its color probability distribution, denoted as q(u; C), will be

d u ; C ) =

-

1 wc(Xi;C)d(b(xi) - U) ^I

''

^xi€Ai(C)

where C, is the total weight such that

Edge density: The motivation for estimating the edge density near the loci of a covariance ellipse is to aid the system to determine a reasonable solution if there are several competing covariance ellipses of similar color distributions.

We first use a high-pass filter with a typical 5 x 5 Laplacian q(u;C) = 1.

~

kernel to perform convolution and to generate a binary edge map E for each image frame [6].

Just like the use of a color weighting function w. for the color distribution, an edge weighting function ^to. is called for to derive an appropriate edge density estimation near an ellipse's loci. Conveniently, for every ^tocdefined by a covariance ellipse in (I), we can define a corresponding UJ, for edge weights using a coupled craterfunction, i.e.,

UJ&;C) = ~ E ( Xi ;C ~e x P { - - ' ( x i ; C~ } 7 ¹ (2) 2

where y is the parameter to adjust the shape of a crater function and the size of a crater's opening. In practice, we find better tracking performance can be achieved by using a slightly larger y, say y = 4, so that significant values of edge weight are within the covariance ellipse. Finally, we adopt the notation e(C) to represent the edge density within and near the boundary of a covariance elliptic region A 1 (C ).

The scale-invariant definition of e ( ( ) is as follows.

where E ( x i ) = 1 if x, is an edge point, and 0, otherwise.

2.2. Iterative optimization techniques

Iterative algorithms for optimization can be divided into two classes: line-search and trust-region, depending on how they find out the iterates. For a line-search one, the iterates are determined along some specific directions, e.g., the steepest descent locates its iterates by considering the gradient directions. A trust-region method, however, derives its iterates in a more general manner by solving the corresponding optimization problem in a bounded region iteratively so that there are more options to select the iterates.

In fact, line-search methods can be considered as special cases of trust-region methods. Essentially, there are three elements of any trust-region methods: (i) trust-region ra- dius, to determine the size of a trust region, (ii) trust-region subproblem, to approximate a minimizer in the region, and (iii) trust-regionJidelity. to evaluate the accuracy of an ap- proximating solution. (Interested readers may refer to [51 for a detailed discussion on trust-region methods.) 'bust-region vs. line-search: Both trust-region and line- search are guaranteed to converge to a local minimum.

However, not all local minima are of interest for real appli- cation. It can be shown that typical line-search, e.g., steep- est descent or trust-region with linear model may often converge to a local minimum that is even inferior to a more nearby one. Unlike steepest descent, the mean-shift technique in [4] is a more conservative line-search that instead

(3)

,"5 f(x)=[x-BO) (x-65) (x-70) (x-75) (x-78) (x-80)

Figure 1. Optimizing with steepest descent, TR+linear model, and TR+quadratic model.

There are totally three local minima, 21,

x 2 , and x3. Out of 1000 runs, with ini- tial positions xos, sampled uniformly from [56.96,66.95], we record in each entry the number of times that a method converges to a local minimum. Note that even the 50s are near the best local minimum x l , a well- known line-search method like steepest de- scent fails to converge to x1 258 times.

of taking largestkteepest steps along gradients, it usually progresses by small steps, computed from the information withinfued-size windows. Such approach tends to converge to a nearby local minimum whether it is significant or not.

Thus both a more sophisticated model approximation and a mechanism to adjust the regions of interest iteratively are needed to reduce the chance of converging to a local minimum not of interest.

In Fig. 1, we construct an objective function with three local minima, x l r 2 2 , and 23. Among them, 2 1 is clearly the global minimum. We then test the three schemes: steepest descent, TR with linear model, and TR with quadratic model, using 1000 different initial positions 2 0 s sampled uniformly from [56.96,66.95]. Though the x o s are near the best local minimum xl. a well-known line-search method like steepest descent could fail to converge to X I 258 times.

On the other hand, uust-region methods are more success- ful in converging to x l . For trust-region with linear model, since its performance relies on the ability to adjust trust regions adaptively, the outcomes depend more on the values

of initial mst-region radius A o . While, with a quadratic model approximation, a trust-region method gains additional information from a better approximation to the objective function, and thus less sensitive to the values ofA

,,.

3. Distribution-Based Objective Functions

Since there are two rather distinct features included in the representation model: color and edge, the resulting objective function must address the two factors justifiably.

3.1.

KL,

distance and BH coefficient

First, for measuring the similarity between two color distributions, we consider the Kullback-Leibler (KL) distance,

wherep(u) is the target (true) color distribution and q(u;

6 )

is the one for the covariance ellipse ^E

~ ( c ) .

Second, to estimate whether the boundary edge density of a candidate covariance ellipse is comparable to the one of target's, we embed the edge density ratio, denoted as h ( c ) , into a sig- midfunction to derive the following,

.

^..

where h ( c ) = e(C)/e(Co); a a n d p are parmeters for set- ting up the initial sigmoid function. (We have used (1 = 5 and

0

⁼1 for all the experiments.) Finally, with the defi- nitions in (4) and (5). the underlying optimization problem for image frame I t can be formally written as

et

⁼^argmin^f

^(C)

⁼^fc^{+ A}^f,

^,

⁽⁶⁾

< E n .

where A is a parameter to weigh the relative importance of the two terms, and f i t denotes the space consisting of all possible C's for any combinations of translation, scale, and orientation. The implication of (6) is that using color distribution alone, the tracking problem may not always be well-posed, and by coupling with edge density estimation around the boundary, the additional tracking cue functions as an auxiliary term to help yield more appropriate results.

In [41, the Bhattacharyya

(BH)

coefficient, defined by h ( x ) = J p ( u ) q ( u ; x ) , is chosen to be the objective function to be maximized. Since the values of Bhat- tacharyya coefficient do not vary much (between 0 and 1) and are derived by taking square rm t, it generally gives a smoother level surface than that of the Kullback-Leibler

(4)

distance. However, as we will discuss in the next section, a level surface with less variations may cause an iterative op- timizer to stuck in a flat region, and fail to perform the optimization properly. In fact, the Kullback-Leibler distance and the Bhattacharyya coefficient are closely related by ob- serving their respective first derivatives,

3.2. How to compute the KL distance

Each time our tracker is to determine the object's status, it seeks for an optimal elliptic region by minimizing the objective function f = f c

+

Xfe. While the computation off.

is straightfonuard, it requires some efforts to compute the Kullhack-Leibler distance f c for measuring the correlation between two color distributions and the derivatives of f for solving the optimization problem.

In practice, many of the color bins would have null distributions in either p ( u ) or q(u;

c).

These null values can cause singularities in the denominators of (4). A common way tu avoid such singularities is to simply add a small pusi- tivenumbertobothp(u)andq(u;C)foralluE {l, ..., n ] , and then re-normalize them into probabilities. However, we find such approach is very unreliable, and often results in misleading results: in particular, it will affect the derivatives of the Kullback-Leihler distance considerably. As in our implementation, we compute a Kullback-Leibler distance by the following approximation:

Note that cp = t x

&,,

^cp⁼ ^x^q:in. and we have

t = for all experiments. The value o f t should be

considered together with n, the total number of bins divided for modeling the color distribution, to ensure that

np x c x p;,, << 1 ==+ p(u) ^EP ( u ) , nq x c x q;in << 1

*

It follows that & a p ( u ) = 1 and

C,c$(u;C)

⁼1, and more importantly, the modified color probability distributions p(u) and

@(.;e)

differ negligibly from the true distributions p ( u ) and q ( u ; c ) , respectively. The main advantage of the approximation in (9) is that the computation of the Kullback-Leibler distance and its partial derivatives involvesonlythose color bins that eitherp(u) orq(u;C) has nonzero distribution.

?(.;e)

^z=q ( u ; C ) .

4.

Comparisons and Experimental Results

In this section, we demonstrate the efficiency and the reliability of a trust-region tracker by (i) making comparisons with a tracker based on line-search, e.g., the mean- shift tracker by Comaniciu et al. [4], and (ii) investigating the need of a combined objective function (6) and the ad- vantages of using a quadratic model.

4.1. Trust-region vs. mean-shift

In [4], color distribution is used as the only cue for tracking, and the Bhattacharyya coefficient, defined by

d m ,

is chosen to be the objective function to be maximized. Since a mean-shift vector is simply to approximate the gradient of an objective function, thus for the sake of comparison, we implement a trust-region tracker with a linear model approximation, and also use the exact color representation model described in [41 for all compar- isons here. This implies we are dealing with two trackers: trust-region (TR) and mean-shift (MS), and two objective functions: Kullback-Leibler distance (KL) and Bhat- tacharyya coefficient (BH). Totally there are four possible combinations: MS+BH, TR+BH, MS+KL, and TRtKL.

We test the two trackers with the Car sequence (see Fig. 2). The catch of this sequence is that the toy car will later run over a portion of the carpet, which is of similar color to the car's, and then complicates the tracking task.

The experiments are done first with the Bhattacharyya oh- jective function where the results for both trackers are some- what less satisfactory mostly due to a flat and smooth level surface, say, when processing the 100th frame of the Car sequence. Alternatively, the corresponding KL level sur- face, shown in Fig. 2f, is more informative in the sense that though the level surface is less smooth, it provides more variations and responses owing to the KL formula to corre- late two distributions. When the KL is coupled to a trust- region tracker, it again produces consistent and satisfactory tracking results. Using Fig. 2e to explain pictorially,

(5)

I

^r’

I

(a) Car #OOO (b) Car #110 (c) Car#119 (d) Car#121

(e) A car (#034)

(0

KL Level surface (9) KL Level curves (h) Locally enlarged Figure 2. TR vs. MS: (a)-(d) Tracking outcomes of TR+KL and MS+BH are plotted together, top and bottom, respectively. (e)-(f) Comparisons between TR+KL and MS+KL.

the two points, 1 and 4 inside the small rectangle, are the car’s positions at previous and current frame, respectively.

For a system to track the target correctly, its tracker should movditerate from point 1 to point 4. In Fig. Zg, the KL level curves within the small rectangular area are plotted, where a magnified portion is provided in Fig 2h. When a trust-region tracker is used, it takes 3 iterations to converge to point 4 (1 ⁺2 ^-i3 + 4). Instead, a mean- shift tracker will first reach point A (see Fig 2h), then find out the KL value there is higher and iterate backward by A + B ⁺

. .

⁺1, where in this example all the back- w&d iterates happen to have higher KL level values. As a result, the mean-shift tracker will stay at point 1, and m i s s the target.

From the first-derivative equations in (7), ( 8 ) and the above comparisons, we conclude that the two functions, KL and BH, are closely related, and in most cases, appropriate for the tracking application. However, the KL distance tends to be more responsive that it can detect the differences in the correlated signals and yield a level surface to reflect the variations. Such property is especially useful for

an optimization-based tracker, as we have demonstrated in the experiments.

4.2. Tracking by edge density

Apparently tracking by edge density alone is not going to work very well, but it is instructive to understand its char- acteristics and limitation. In the first experiment, the task is to track an object’s contour and to investigate the im- pact of the speed of object’s movement on the tracking outcomes. We have learned when the background is simple, i.e., edges from the background are insignificant, and the object’s motion is slow, a trust-region tracker using only the edge density function f., defined in

(3,

can track the contour successfully (see Fig. 3a). The performance then starts to deteriorate once the object speeds up its movement. A typical scenario is shown in Fig. 3b that the tracker fails to capture the whole contour but just part of it. In terms of optimization, every such solution corresponds to a mediocre local minimum since it intersects with some portion of the object’s boundary owing to a crater-like weighting scheme (2). Thus the resulting level surface off. should be mostly

(6)

(a) Moving slowly @) Moving fast (c) Synthesis #060 (d) Synthesis #I54 Figure 3. Tracking with e d g e density information only.

smooth with a concentration of local minima near the objects’s location at the previous frame. As to the global minimum, i.e., the one captures the whole contour, it could be either near-by when the object’s motion is slow, or com- paratively farther than many of the local minima when the motion is fast. For the second experiment, we create a syn- thesized image sequence of a monotone-color elliptic oh- ject undergoing various changes in its scale and orientation (without any translation). In this case an algorithm that tracks solely by comparing color distributions will fail to capture the right scale because of the uniform color of the target, i.e., the underlying tracking problem using only color distribution is ill-posed. Nevertheless, it is handled accu- rately by tracking with edge density, where we have shown the results for two image frames in Fig. 3c and Fig. 3d. The results and the analysis of the experiments justify the use of the proposed objective function defined in (6) by integrating both color distribution and edge information for tracking.

Overall, we have discussed the issues of object representation and search techniques for distribution-based tracking systems. By arguing a convergent local minimum should be not only significant but also of interest, we show that trust- region methods provide the techniques and controls for this aspect of consideration. This is also the main reason that we think uust-region is more appropriate for tracking than line-search. We have demonstrated this point by comparing with a well-known mean-shift tracker, via a number of experiments. Our other efforts have been made to design a good representation model. Since tracking by using color distribution only is not a well-posed problem, we consider a representation that integrates the object’s geometry and its color distribution. While computing a target’s charac- teristics within an area enclosed by a boundary of arhitrary shape is rather complicated, we contend with using the bivariate covariance ellipse to describe them. The resulting representation integrates color distribution and edge density information via two coupled weighting schemes, and it en- ables the system to perform optimization over a continuous space to yield more accurate results.

References

[I] S.T. Birchfield, “Elliptical Head Tracking Using In- tensity Gradients and Color Histograms,” Pmc. Cong Computer Vision and Pattern Recognition, pp. 232- 237, Santa Barbara, CA, 1998.

[2] G.R. Bradski, “Computer Vision Face Tracking for Use in a Perceptual User Interface,” Intel Technology Jour- nal, 1998

[3] H.T. Chen and T.L. Liu, “Trust-Region Methods for Real-Timk Tracking,” Pmc. Eighth IEEE Inr’l Conf.

Computer Ksion, vol. 2, pp. 717-722, Vancouver, Canada, 2001.

[4] D. Comaniciu, V. Ramesh, and P. Meer, “Real-Time Tracking of Non-Rigid Objects using Mean Shift,”

Pmc. Conf Computer Vision and Pattern Recognition, vol. 2, pp. 142-149, Hilton Head Island, South Car- olina, 2000.

[5] A.R. COM, N.I.M. Gould, andP.L. Toint, Tiust-Region Methods, SIAM, 2000.

[6] Intel Corporation, intel image Processing Libraly Ref- erence Manual, 2000, Document Number 663791-005.

[7] P. Perez, C. Hue, J. Vermaak, and M. Gangnet, “Color- Based Probabilistic Tracking,” Pmc. Seventh Eumpean Con$ Computer Vision, vol. 1, p. 661 ff., Copenhagen, Denmark, 2002.

[8] C. Rasmussen and G.D. Hager, “Probabilistic Data As- sociation Methods for Tracking Complex Visual Ob- jects,” IEEE Tram. Pattern Analysis and Machine in-

telligence, vol. 23, no. 6, pp. 560-576, June 2001.

[9] T. Zhang and D. Freedman, “Tracking objects using density matching and shape priors,” Pmc. Ninth IEEE int’l Con$ Compuzer Vision, pp. 1056-1062, Nice, France, 2003.