Real-Time Tracking Using Trust-Region Methods

(1)

Real-Time Tracking Using Trust-Region Methods

Tyng-Luh Liu, Hwann-Tzong Chen

Abstract — Optimization methods based on iterative schemes can be divided into two classes: line-search methods and trust-region methods. While line-search techniques are commonly found in various vision applications, not much attention is paid to trust-region ones. Motivated by the fact that line-search methods can be considered as spe- cial cases of trust-region methods, we propose to establish a trust-region framework for real-time tracking. Our ap- proach is characterized by three key contributions. First, since a trust-region tracking system is more effective, it of- ten yields better performances than the outcomes of other trackers that rely on iterative optimization to perform track- ing, e.g., a line-search based mean-shift tracker. Second, we have formulated a representation model that uses two coupled weighting schemes derived from the covariance el- lipse to integrate an object’s color probability distribution and edge density information. As a result, the system can address rotation and non-uniform scaling in a continuous space, rather than working on some presumably possible discrete values of rotation angle and scale. Third, the frame- work is very flexible in that a variety of distance functions can be adapted easily. Experimental results and compar- ative studies are provided to demonstrate the efficiency of the proposed method.

Keywords— Tracking, vision, iterative optimization, trust- region methods.

I. Introduction

A key component of a successful tracking system is its ability to search efficiently for the target. Focusing on this goal, we propose a new approach for tracking using trust- region methods [6]. Previous uses of trust-region have been in areas other than real-time tracking, e.g., [12], [13]. While the applications are different, the efficiency of trust-region methods as an optimization tool has been demonstrated.

Recently, Chen and Liu [4] have applied trust-region to tracking, and Sminchisescu and Triggs [16] have used them for 3-D body tracking.

A. Our Approach

We view a tracking process as a sequence of iterative optimization problems: For each image frame the task is to find an optimal solution that best describes the status of a target object. It requires an effective method to solve the underlying optimization problem appropriately. Most of the iterative optimization techniques used in tracking as well as other vision research are line-search in that the iter- ates are restricted to some iteration-dependent directions, e.g., the gradient. We instead use trust-region methods for their efficiency and reliability.

In addition, motivated by [5], we formulate a ﬂexible object representation. It integrates both color and edge in-

T.-L. Liu and H.-T. Chen are with the Institute of Information Science (IIS), Academia Sinica, Taipei, Taiwan. E-mail: {liutyng, pras}@iis.sinica.edu.tw. H.-T. Chen is also a Ph.D. student at the Department of Computer Science and Information Engineering, Na- tional Taiwan University, Taipei, Taiwan.

formation via two coupled weighting schemes derived from a covariance ellipse model. Unlike other previous related works [2], [5], where the values of scale are limited to few pre-determined ones, the representation allows a system to perform optimization over a continuous space to yield better performance.

B. Previous Work

Methods based on Bayesian framework have been play- ing an important role in tracking, e.g., [11], [15], [18].

Among them, the CONDENSATION algorithm [11], intro- duced by Isard and Blake to track contours in clutter via factored sampling, is perhaps the most well-known one. Its main idea is to pinpoint the inappropriateness of the Gaus- sian state density assumption for tracking in clutter while multiple competing observations exist.

It is also possible to track objects of more complicated shapes using a learning approach [1], [7], [8], [9], [17]. Dif- ferent from CONDENSATION, Freedman and Brandstein [7], [8] consider the contour-tracking problem without assum- ing any dynamical model. They establish, via learning, a subset tracker to perform tracking through minimization.

Exemplar-based methods [9], [17] require an oﬀ-line learning phase to generate object representations from exam- ples, and then use distance measures to perform template matching.

If the objects to be tracked are non-rigid, it is con- venient to represent them with probability distributions.

A straightforward way to derive a distribution model is through histogram analysis [2], [3], [4], [5]. Birchfield [2] has proposed an algorithm to track a person’s head by mod- eling it as a vertical ellipse with a fixed aspect ratio. In [3], Bradski presents a CAMSHIFT (continuously adaptive mean shift) system for use in a perceptual user interface to track faces. Comaniciu et al. [5] have used the mean shift to track non-rigid objects. They model objects by color distributions, and then measure the similarity between the target and candidate distributions using a Bhattacharyya coefficient. Note that the mean-shift technique is indeed a line search, and later we will discuss the comparisons between the mean-shift and our approach.

II. Trust-Region Methods

Iterative algorithms for optimization can be divided into two classes: line-search and trust-region. For a line-search one, the iterates are determined along some speciﬁc di- rections, e.g., steepest descent locates its iterates by con- sidering the gradient directions. A trust-region method, however, derives its iterates by solving the corresponding optimization problem in a bounded region iteratively. So, there are more options to select the iterates. In fact, line- search methods can be considered special cases of trust-

(2)

region methods [6].

The concept of trust-region methods can be better un- derstood by considering a typical unconstrained minimization problem,

x∈Vminf (x) , (1) whereV is a vector space, and f is some objective function to be minimized.

Essentially, there are three elements to any trust-region method: (i) trust-region radius, to determine the size of a trust region; (ii) trust-region subproblem, to approximate a minimizer in the region; and (iii) trust-region ﬁdelity, to evaluate the accuracy of an approximating solution.

To illustrate, suppose an initial guess x₀ and an initial trust-region radius ₀ > 0 are given, and let η₁ and η₂ be some constants satisfying 0 < η₁ ≤ η2 < 1. For each iteration k≥ 0, we ﬁrst deﬁne, for the vector space V, an iteration-dependent norm·k and an iteration-dependent inner product· , ·_k by

s²_k=s, skdef

=s, Mks, for anys ∈ V, where · , · is the inner product, and Mk is an iteration- dependent matrix. (We will discuss how to determine M_k later.) Then at iteration k, with iteratexkand trust-region radius k, the following three steps are performed within the trust regionB_k ={x ∈ V | x − x_k_k≤ _k}.

1. Trust-region subproblem: We ﬁrst construct a model m_k to approximate f inB_k. In our system, a quadratic model is used for the approximation, i.e.,

m_k(xk+s) = mk(xk) +gk,s +1

2s, Hks, (2) where m_k(x_k) = f (x_k), g_k = ∇_xf (x_k), and H_k is the Hessian of f at xk. When H_k = 0, mk is said to be a second-order model. A trust-region subproblem is then to compute an s_k, where s_k_k ≤ _k, such that the model m_k is “suﬃciently reduced,” that is,

sk= argmin

sk≤k

ψ_k(s)def

= gk,s +1

2s, Hks. (3) 2. Trust-region ﬁdelity: After solving a subproblem, the trial point x_k +s_k will be tested to see if it is a good candidate for the next iterate. This is evaluated explicitly by

r_k = f (x_k)− f(x_k+s_k) m_k(xk)− mk(xk+sk).

If r_k ≥ η1, then the trial point is accepted, i.e., xk+1 = xk+sk. Otherwise,xk+1=xk. Since η₁is a small positive number, the above rule favors a trial point only when the value of the objective function f is also reduced. When m_k approximates f well and yields a large r_k, the trust-region radius will be expanded for the next iteration. On the other hand, if r_k is smaller than η₁ or r_k is negative, it suggests that the objective function f is not well approximated by the model m_k within the current trust regionB_k. There- fore, the iterate remains unchanged, and the trust-region

radius will be shrunk to derive more appropriate model and subproblem for the next iteration.

3. Trust-region radius: More speciﬁcally, the trust-region radius can be updated as follows.

∆_k+1=







max{α₁s_k_k, ∆_k} if r_k≥ η₂,

∆_k if r_k∈ [η1, η₂), α₂skk if r_k< η₁,

where, following [6, p.782], we have η₁= 0.05, η₂= 0.9, and α₁ = 2.5, α₂ = 0.25. The iterative optimization process for (1) will be repeated until the sequence of iterates{xk} converges.

A. Trust-Region Scaled Norm

An objective function f (x) may have variables whose typical values are of diﬀerent orders of magnitude. For example, in real-time tracking, the values of spatial variables are often much larger than the vales of scale variables. Without re-scaling the variables properly, the contributions from variables of small values tend to be dom- inated by those from variables of large values. It can then lead to unexpected optimization results. To deal with such issues, the re-scaling will be done for each it- eration k through a nonsingular matrix S_k to ensure every trust-region subproblem is solved in a reasonably scaled space. In particular, we have used nonsingular diagonal matrices S_ks, where the diagonal entries correspond to typical values of the respective variables. It follows that the new variables, say ˜x, in the scaled space are derived by ˜x = S_k⁻¹x . As a result, ˜x will be of comparable scales after the re-scaling. Moreover, as is proved in [6], it is not necessary to reformulate a trust-region subprob- lem using the new variables since re-scaling the variables is equivalent to using an iteration-dependent scaled norm deﬁned by s²_k =< s, Mks >=< s, S_k^−TS_k⁻¹s >, where M_k= S_k^−TS_k⁻¹ is an iteration-dependent matrix.

B. Trust-Region vs. Line-Search

In our approach, we have used a quadratic model m_k for the implementation. If, instead, a linear model is used, then the RHS of (2) is reduced to the ﬁrst two terms. This implies that a trust-region method with a linear model ap- proximation is almost like gradient descent, but it often achieves better performances owing to its ability to adjust trust regions adaptively throughout the iterations. This is why line-search methods can be considered special cases of trust-region.

Both trust-region and line-search are guaranteed to converge to a local minimum. However, not all local minima are of interest for real application. Typical line-search, e.g., steepest descent or even trust-region with a linear model approximation may often converge to a local minimum that is inferior to a nearby one. In Fig. 1, we construct an ob- jective function with three local minima, x₁, x₂, and x₃. Among them, x₁ is clearly the global minimum. We test the three schemes, using 1000 diﬀerent initial positions, x₀s, sampled uniformly from [56.96, 66.95]. Though the

(3)

60 65 70 75

−2

−1 0 1 2 3

x 10⁵ f(x)=(x−60) (x−65) (x−70) (x−75) (x−78) (x−80)

x1=61.64 66.95 x

2=72.24 x

3=79.22 56.96

x0

Steepest Descent : TR+Linear : (∆₀=2, 4, 6, 8)

TR+Quadratic : (∆₀=2, 4, ...,22)

x1 x

2 x

3

TR+Linear : (∆₀=10)

742 12 246

1000 0 0

773 23 204

1000 0 0

Fig. 1. Optimizing with Steepest Descent, TR+linear model, and TR+quadratic model. Out of 1000 runs, with initial positionsx0s, sampled uniformly from [56.96, 66.95], we record in each entry the number of times that a method converges to a local minimum.

x₀s are indeed close to the global minimum x₁, steepest de- scent fails to converge to x₁258 times. Trust-region methods are tested with different ∆₀, and are more successful in converging to x₁. In passing, note that the mean-shift technique in [5] is a more conservative line-search. Instead of taking largest/steepest steps along gradients, it usually progresses by small steps, computed from the information within fixed-size windows. Such an approach tends to con- verge to a nearby local minimum regardless of its signifi- cance. Thus, both a more sophisticated model approximation and a mechanism to iteratively adjust the regions of interest are needed to reduce the chance of converging to a local minimum not of interest.

III. Representations and Objective Functions Motivated by the work of Comaniciu et al. [5], we also use probability distributions to represent targets. But unlike [5], where the analysis relies on kernel properties, we simply treat the color distribution as a weighted color his- togram to account for the possible non-rigidity of objects.

A. Representation Models for Tracking

Tracking objects by distribution is efficient but not nec- essarily sufficient. Suppose the scale of a target object of monotone color is enlarged. Then, it is not guaranteed that the appropriate scale will always be recovered since, in this case, any sub-portion of the object has a similar distribution to that of the object. Other tracking cues are needed to elevate the performance, e.g., [14]. In our system, the representation model consists of two elements: the first is to characterize the RGB color distribution, and the second

is to estimate the edge density near the object boundary.

Since the edge density is contributed mainly from samples near the boundary, and more prone to be aﬀected by the background, we choose color distribution as the primary cue for tracking.

A.1 Color Distribution

For computing the weighed color histogram, the RGB color space is first divided into n bins, and a bin assignment function b is defined uniquely by each pixelxi’s RGB value as b :x_i→ {1, . . . , n}. We then formulate a color weighting scheme based on the bivariate normal distribution, defined by φ(x, µ, Σ) = _2π|Σ|¹1/2e^−(x−µ)^T^Σ⁻¹^(x−µ)/2, where x = (x₁, x₂)^T, µ = (µ₁, µ₂)^T is the mean vector, andΣ is the covariance matrix.

Let the correlation coeﬃcient ρ = σ₁₂/σ₁σ₂. Then, when

|ρ| < 1, the bivariate normal distribution can be rewritten as

φ(x; ζ) = 1

2πσ₁σ₂

1− ρ²exp

−ε(x; ζ) 2

, (4) where, to simplify the notations, we have σ = (σ₁, σ₂)^T, ζ = (µ, σ, ρ) = (µ₁, µ₂, σ₁, σ₂, ρ), and

ε(x; ζ) = 1 1− ρ²

(x₁− µ₁)²

σ²₁ − 2ρ(x₁− µ₁)(x₂− µ₂) σ₁σ₂ +(x₂− µ2)²

σ₂²

. From (4), it implies that lines of constant φ correspond to constant exponents, i.e., ε(x; ζ) = constant represents an

(4)

−10

−5 0

5 10

−10 0 10 0 0.5 1

x₁ x₂

−10

−5 0

5 10

−10 0 10 0 0.5 1

x₁ x₂

σ₁ σ₂ p₁ p₂

θ x₁ x₂

−15 −10 −5 0 5 10 15

0 0.2 0.4 0.6 0.8

1 ^bivariate

crater r=2 crater r=4

x₁ (a) Color weights (b) Edge weights (c) Covariance ellipse (d) 1-D plot

Fig. 2. (a) Bivariate normal for color weights. (b) Crater function for edge weights. (c) A covariance ellipse can be represented either by (p1, p2, θ) or by (σ1, σ2, ρ), where p1,p2are lengths of the principal semi-diameters, andθ is the angle between the p1semi-diameter and the x1 axis. (d) Peaks of a crater function occur near the loci of the coupled covariance ellipse.

ellipse centering at µ. We focus only on the covariance ellipses, which satisfy ε(x; ζ) = 1 (denoted as ε₁(ζ)), to construct the color weighting scheme.

Now, let I⁰ be the ﬁrst image frame and ζ⁰ = (µ⁰,σ⁰, ρ⁰). Then, a target object initially centering at µ⁰ can be associated with A₁(ζ⁰) = {x | ε(x; ζ⁰) ≤ 1}, the area enclosed by ε₁(ζ⁰). We deﬁne the target’s color distribution within A₁(ζ⁰), denoted as p(u;ζ⁰), by

p(u;ζ⁰) = 1 C_p

xi∈A1(ζ⁰)

w_c(x_i;ζ⁰)δ(b(x_i)− u)

w_c(xi;ζ⁰) = exp

−ε(xi;ζ⁰) 2

,

(5)

where δ is the Kronecker delta function, and w_c is the de- rived color weighting function. That p(u;ζ⁰) is a probabil- ity implies C_p =

xi∈A1(ζ⁰)w_c(xi;ζ⁰). For convenience, the notation p(u;ζ⁰) will be abbreviated into p(u) sinceζ⁰ only describes the target’s initial state. Analogously, dur- ing tracking, the color distribution of some A₁(ζ), denoted as q(u;ζ), is

q(u;ζ) = 1 C_q

xi∈A1(ζ)

w_c(xi;ζ)δ(b(xi)− u),

where C_q is the total weight such that _n

u=1q(u;ζ) = 1.

A.2 Edge Density

For every w_cin (5), we can use a crater function to deﬁne a coupled edge-point weighting function w_e. Speciﬁcally, we have w_e(xi;ζ) = γ ε(xi;ζ) exp

−^γ₂ε(xi;ζ)

, where γ is the parameter to adjust the shape of a crater function and the size of a crater’s opening. It can be veriﬁed by a straightforward calculation that for γ = 2, the peaks of the level surface of a crater function occur at the loci of the associated covariance ellipse as shown in Fig. 2d.

In practice, we ﬁnd better tracking performance can be achieved by using a slightly larger γ, say γ = 4, in that signiﬁcant values of edge weight are within the covariance ellipse.

Finally, we adopt the notation e(ζ) to represent the edge density within and near the boundary of a covariance el- liptic region A₁(ζ). The scale-invariant deﬁnition of e(ζ)

is as follows.

e(ζ) = 1 σ₁σ₂

xi∈A1(ζ)

w_e(xi;ζ)E(xi), (6)

where E(x_i) is a binary edge map, derived from a high-pass 5× 5 Laplacian ﬁlter in [10].

B. Objective Functions for Tracking

In view of the object representation model just described, a tracking process for an arbitrary target can be characterized then by evolution dynamics of a covariance ellipse, ε₁(ζ^t), where we simply denote the process as ζ⁰→ ζ¹→ ζ² → · · · . To decide an optimal ζ^t for each frame I^t, we still need to formulate an appropriate objective function to complete the framework.

Since there are two rather distinct features included in the representation model, the resulting objective functional must address these two factors justiﬁably. First, to measure the similarity between two color distributions, we consider the Kullback-Leibler distance, i.e.,

f_c(ζ) =ⁿ

u=1

p(u) log p(u)

q(u;ζ), (7)

where p(u) is the target (true) color distribution and q(u;ζ) is the one for the covariance ellipse ε₁(ζ). Second, to esti- mate whether the boundary edge density of a candidate covariance ellipse is comparable to the one of target’s, we em- bed the edge density ratio, denoted as h(ζ) = e(ζ)/e(ζ⁰), into a sigmoid function to derive the following,

f_e(ζ) = 1 − 1

1 + exp{−α(h(ζ) − β)}

= 1

1 + exp{α(h(ζ) − β)}, (8) where α and β are parameters for setting up the initial sigmoid function. (We use α = 5 and β = 1 for all the experiments.) Finally, with the deﬁnitions in (7) and (8), the underlying optimization problem for each image frame I^tcan be formally written as

ζ^t= argmin

ζ∈Ω^t f (ζ) = f_c+ λ f_e, (9)

(5)

MS + BH

TR + KL

(a) Magnet #080 (b) Magnet #110 (c) Magnet #190 (d) Magnet #228

Fig. 3. TR: Trust-Region, MS: Mean-Shift, BH:Bhattacharyya, and KL: Kullback-Leibler. In each image, the ﬁnal convergent circle is plotted in white and the intermediate ones in yellow.

where λ is a parameter to weigh the relative importance of the two terms, and Ω^tis the space consisting of all possible ζ’s for any combinations of translation, scale, and orienta- tion.

IV. Experimental Results and Discussion We demonstrate the eﬃciency of our method by (i) mak- ing comparisons with a mean-shift tracker [5], and (ii) car- rying out a variety of experiments of diﬀerent scenarios.

A. Trust-Region vs. Mean-Shift

In [5], the color distribution is used as the only cue for tracking, and the Bhattacharyya coeﬃcient, deﬁned by _n

u=1

p(u)q(u;x), is chosen to be the objective function to be maximized. Since a mean-shift vector is simply to approximate the gradient of an objective function, thus for the sake of comparison, we implement a trust-region tracker with a linear model approximation, and use the exact color representation model described in [5] for all comparisons.

This implies we are dealing with two trackers: trust-region (TR) and mean-shift (MS), and two objective functions:

Kullback-Leibler distance (KL) and Bhattacharyya coeﬃ- cient (BH). Totally there are four possible combinations:

MS+BH, TR+BH, MS+KL, and TR+KL.

In Fig. 3, we show some of the results obtained by using MS+BH and TR+KL, respectively. The main advantage of experimenting with such a sequence is that the resulting level surfaces are mostly smooth but with sporadic local ex- trema. Thus, it is easier to pinpoint the causes of diﬀerent outcomes. We also examine the values of objective functions explicitly. To do so, we randomly generate 500 initial positions for an arbitrary image frame from the Magnet se- quence, then perform optimizations from each position us- ing MS+BH and TR+BH, respectively. The same process is repeated for MS+KL and TR+KL. We then count the number of occurrences of converging to a better objective function value by trust-region. This quantitative analysis is also performed for the other three sequences shown

in Fig. 5. Our results, in Fig. 4, indicate that no matter which objective function is used, BH or KL, the probability that a trust-region tracker is more eﬀective is about 90%, by converging to better values 3675 times out of 4000 tests.

Note that the eﬃciency can be further improved by using a quadratic model approximation.

B. Tracking by TR+KL+Edge

We turn now our attention to experimenting with the complete algorithm, i.e., using trust-region with a quadratic model to preform tracking via optimizing with (9). In all our experiments, the RGB space is divided into 16× 16 × 16 = 4096 bins. Other parameters used include:

λ = 0.2, initial trust-region radius ∆₀ = 4, and typical values of the diagonal of S_k are (10, 10, 1, 1, 0.1). The ex- periments are carried out on a Pentium-4 2.4GHz PC.

The first sequence is to show the tracker’s ability to pur- sue a fast moving object (a kid jumping around), account- ing for the 2-D translation factor only. Note that, in Fig. 5a - 5d, the intermediate iterates/ellipses are plotted in green to illustrate the underlying optimization process. In the second experiment, we demonstrate, in Fig. 5e - 5h, the effectiveness of optimizing over a 5-dimensional continuous space to capture various changes in the object’s scale, shape, and orientation. We emphasize that if a system has a status variableζ limited to just some pre-determined discrete values of scale and orientation, it generally could not deliver a comparable performance. The third test is performed using a pan/tilt/zoom camera where the target person in the scene moves back and forth to bring about rapid and substantial changes in the size of the face appeared in Face sequence. While most tracking-by-distribution sys- tems cannot handle such difficulties, our method addresses the issues of scales robustly as shown in Fig. 5i - 5l.

C. Complexity Analysis

Since the algorithm is iterative, and it typically takes just few iterations to converge, it suﬃces to analyze the

(6)

0 100 200 300 400 500

−1

−0.5 0 0.5 1

Magnet Kid Hand Face

f_MS+BH^∗ − f_{T R+BH}^∗

0 100 200 300 400 500

−40

−20 0 20 40

Magnet Kid Hand Face

f_{T R+KL}^∗ − f_MS+KL^∗

(a) TR+Linear+BH vs. MS+BH (b) TR+Linear+KL vs. MS+KL

Fig. 4. Sorted diﬀerences in the objective function values derived by TR+Linear and MS.

time complexity for one iteration. For frame t, let m be the number of pixels within A₁(ζ^t) and d be the dimensionality ofζ. (In our formulation, d = 5.) We first need to compute the color histogram q(u;ζ^t), the edge density e(ζ^t) in (6), the gradient g_k, and the Hessian matrix H_k, which it takes O(m), O(m), O(dm), and O(d²m) time, respectively. Next, it takes O(n) time to evaluate f_c by summing up p(u) log^p(u)_q(u), and O(1) time for f_e. (Recall that n is the number of color bins.) Finally, for solving a trust-region subproblem, since the number of iterations is assumed to be less than a fixed number, the time complex- ity only depends on dimensionality d. In particular, to find a minimizer of the subproblem, we have to compute ψ_kand

s_k_k in (3), or find the intersection on the region bound- ary. The first requires O(d²) time, and the latter two take O(d) computation time. When H_k is non-convex, we need extra O(d²) time to find the other possible iterate. There- fore, the time complexity for one iteration of the complete TR tracking algorithm is O(d²m + n).

D. Discussion

Our approach focuses mainly on two important issues:

optimization and representation. Specifically, we have dis- cussed three choices for optimization: line-search, trust- region with a linear-model approximation, and trust-region with a quadratic-model approximation. While the three all have the desired property to converge to a local mini- mum, we investigate the quality of a solution. To empha- size, we note that a line-search method may fail to converge to a better, nearby extremum due to a crude approximation to the local shape of an objective function. We then compare our method with a well-known mean-shift tracker to demonstrate the advantages of being able to find the iterates in a region and to adjust the size of the region adaptively. Nonetheless, it is difficult to evaluate quantita- tively the performances of two different tracking methods because when testing with a video sequence, they often

start at different initial positions for each intermediate image frame. Thus, we instead do the quantitative analysis for one arbitrary image frame with randomly generated starting positions. This is equivalent to solving an iterative optimization problem using the two methods, respectively, for each initial value. Such modifications make it possible to analyze the results explicitly, and to further verify that a trust-region implementation for tracking is often more reliable and effective than a line-search one.

Other eﬀorts have been made to design a good representation model. We have formulated a covariance-ellipse representation to integrate color and edge density information. It enables the system to perform optimization over a continuous space to yield more accurate results. Our future work includes extending the framework for multiple-object tracking, and exploiting other possible applications in computer vision using trust-region methods.

Acknowledgments

This work was supported by NSC grants 90-2213-E-001- 016 and 91-2213-E-001-023, and in part by the Institute of Information Science, Academia Sinica of Taiwan. H.-T.

would like to thank the Foundation for the Advancement of Outstanding Scholarship for a student travel grant.

References

[1] S. Avidan, “Support Vector Tracking,” Proc. Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 184–191, Kauai, Hawaii, 2001.

[2] S.T. Birchﬁeld, “Elliptical Head Tracking Using Intensity Gradi- ents and Color Histograms,” Proc. Conf. Computer Vision and Pattern Recognition, pp. 232–237, Santa Barbara, CA, 1998.

[3] G.R. Bradski, “Computer Vision Face Tracking for Use in a Perceptual User Interface,” Intel Technology Journal, 1998.

[4] H.T. Chen and T.L. Liu, “Trust-Region Methods for Real-Time Tracking,” Proc. Eighth IEEE Int’l Conf. Computer Vision, vol. 2, pp. 717–722, Vancouver, Canada, 2001.

[5] D. Comaniciu, V. Ramesh, and P. Meer, “Real-Time Tracking of Non-Rigid Objects using Mean Shift,” Proc. Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 142–149, Hilton Head Island, South Carolina, 2000.

(7)

(a) Kid #000 (b) Kid #096 (c) Kid #152 (d) Kid #348

(e) Hand #000 (f) Hand #461 (g) Hand #577 (h) Hand #588

(i) Face #000 (j) Face #136 (k) Face #185 (l) Face #415

Fig. 5. (a)-(d) Kid sequence: track a target with rapid motion (frame rate:> 200fps). (e)-(h) Hand sequence: track a target with substantial changes in size, shape, and orientation (frame rate: 35fps). (i)-(l) Face sequence: the strength of a trust-region tracker is even more appreciable where a wide range of scales of target face are tracked properly (frame rate: 20fps).

[6] A.R. Conn, N.I.M. Gould, and P.L. Toint, Trust-Region Meth- ods, SIAM, 2000.

[7] D. Freedman and M.S. Brandstein, “Contour Tracking in Clut- ter: A Subset Approach,” Int’l J. Computer Vision, vol. 38, no.

2, pp. 173–186, July 2000.

[8] D. Freedman and M.S. Brandstein, “Provably Fast Algorithms for Contour Tracking,” Proc. Conf. Computer Vision and Pat- tern Recognition, vol. 1, pp. 139–144, Hilton Head Island, South Carolina, 2000.

[9] D. Gavrila and V. Philomin, “Real-Time Object Detection for Smart Vehicles,” Proc. Seventh IEEE Int’l Conf. Computer Vi- sion, pp. 87–93, Corfu, Greece, 1999.

[10] Intel Corporation, Intel Image Processing Library Reference Manual, 2000, Document Number 663791-005.

[11] M. Isard and A. Blake, “Contour Tracking by Stochastic Propa- gation of Conditional Density,” Proc. Fourth European Conf.

Computer Vision, vol. 1, pp. 343–356, Cambridge, England, 1996.

[12] M. Jagersand, O. Fuentes, and Nelson R. C., “Experimental Evaluation of Uncalibrated Visual Servoing for Precision Ma- nipulation,” Proc. 1997 Int’l Conf. Robotics and Automation, Albuquerque, NM, 1997.

[13] T.Q. Phong, R. Horaud, A. Yassine, and P.D. Tao, “Object Pose from 2-D to 3-D Point and Line Correspondences,” Int’l J.

Computer Vision, vol. 15, no. 3, pp. 225–243, July 1995.

[14] C. Rasmussen and G.D. Hager, “Probabilistic Data Association Methods for Tracking Complex Visual Objects,” IEEE Trans.

Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp.

560–576, June 2001.

[15] H. Sidenbladh and M.J. Black, “Learning Image Statistics for Bayesian Tracking,” Proc. Eighth IEEE Int’l Conf. Computer Vision, vol. 2, pp. 709–716, Vancouver, Canada, 2001.

[16] C. Sminchisescu and B. Triggs, “Covariance Scaled Sampling for Monocular 3D Body Tracking,” Proc. Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 447–454, Kauai, Hawaii, 2001.

[17] K. Toyama and A. Blake, “Probabilistic Tracking in a Metric Space,” Proc. Eighth IEEE Int’l Conf. Computer Vision, vol. 2, pp. 50–57, Vancouver, Canada, 2001.

[18] Y. Wu and T.S. Huang, “A Co-inference Approach to Robust Visual Tracking,” Proc. Eighth IEEE Int’l Conf. Computer Vi- sion, vol. 2, pp. 26–33, Vancouver, Canada, 2001.