**C**

**ONDENSATION**

**—Conditional Density Propagation for Visual Tracking**

MICHAEL ISARD AND ANDREW BLAKE

*Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK*
*Received July 16, 1996; Accepted March 3, 1997*

**Abstract.** The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate
because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypothe-
ses. The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static
images, in which the probability distribution of possible interpretations is represented by a randomly generated set.

Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.

**1.** **Tracking Curves in Clutter**

The purpose of this paper^{1} is to establish a stochas-
tic framework for tracking curves in visual clutter, us-
ing a sampling algorithm. The approach is rooted in
ideas from statistics, control theory and computer vi-
sion. The problem is to track outlines and features of
foreground objects, modelled as curves, as they move
*in substantial clutter, and to do it at, or close to, video*
frame-rate. This is challenging because elements in
the background clutter may mimic parts of foreground
features. In the most severe case of camouflage, the
background may consist of objects similar to the fore-
ground object, for instance, when a person is moving
past a crowd. Our approach aims to dissolve the result-
ing ambiguity by applying probabilistic models of ob-
ject shape and motion to analyse the video-stream. The
degree of generality of these models is pitched care-
fully: sufficiently specific for effective disambiguation
but sufficiently general to be broadly applicable over
entire classes of foreground objects.

*1.1.* *Modelling Shape and Motion*

Effective methods have arisen in computer vision for modelling shape and motion. When suitable geometric

models of a moving object are available, they can be matched effectively to image data, though usually at considerable computational cost (Hogg, 1983; Lowe, 1991; Sullivan, 1992; Huttenlocher et al., 1993). Once an object has been located approximately, tracking it in subsequent images becomes more efficient computa- tionally (Lowe, 1992), especially if motion is modelled as well as shape (Gennery, 1992; Harris, 1992). One important facility is the modelling of curve segments which interact with images (Fischler and Elschlager, 1973; Yuille and Hallinan, 1992) or image sequences (Kass et al., 1987; Dickmanns, and Graefe, 1988).

This is more general than modelling entire objects but more clutter-resistant than applying signal-processing to low-level corners or edges. The methods to be dis- cussed here have been applied at this level, to segments of parametric B-spline curves (Bartels et al., 1987) tracking over image sequences (Menet et al., 1990;

Cipolla and Blake, 1990). The B-spline curves could,
in theory, be parameterised by their control points. In
practice, this allows too many degrees of freedom for
stable tracking and it is necessary to restrict the curve
**to a low-dimensional parameter x, for example, over**
an affine space (Koenderink and Van Doorn, 1991;

Ullman and Basri, 1991; Blake et al., 1993), or more generally allowing a “shape-space” of non-rigid mo- tion (Cootes et al., 1993).

6 *Isard and Blake*

Finally, prior probability densities can be defined
over the curves (Cootes et al., 1993) represented
**by appropriate parameter vectors x, and also over**
their motions (Terzopoulos and Metaxas, 1991; Blake
et al., 1993), and this constitutes a powerful facility
for tracking. Reasonable defaults can be chosen for
those densities. However, it is obviously more satisfac-
tory to measure or estimate them from data-sequences
* (x*1

*2*

**, x***, . . .). Algorithms to do this, assuming Gaussian*densities, are known in the control-theory literature (Goodwin and Sin, 1984) and have been applied in computer vision (Blake and Isard, 1994; Baumberg and

*Hogg, 1995). Given the learned prior, and an observa-*

*tion density that characterises the statistical variability*

**of image data z given a curve state x, a posterior distri-**

**bution can, in principle, be estimated for x**

*t*

**given z**

*t*at

*successive times t.*

*1.2.* *Kalman Filters and Data-Association*

Spatio-temporal estimation, the tracking of shape and
position over time, has been dealt with thoroughly by
Kalman filtering, in the relatively clutter-free case in
*which p (x*

*t*

*) can satisfactorily be modelled as Gaussian*(Dickmanns and Graefe, 1988; Harris, 1992; Gennery, 1992; Rehg and Kanade, 1994; Matthies et al., 1989)

*Figure 1.* Kalman filter as density propagation: in the case of Gaussian prior, process and observation densities, and assuming linear dynamics,
the propagation process of Fig. 2 reduces to a diffusing Gaussian state density, represented completely by its evolving (multivariate) mean and
variance—precisely what a Kalman filter computes.

and can be applied to curves (Terzopoulos and Szeliski,
1992; Blake et al., 1993). These solutions work rela-
**tively poorly in clutter which causes the density for x*** _{t}*
to be multi-modal and therefore non-Gaussian. With
simple, discrete features such as points or corners com-
binatorial data-association methods can be effective
with clutter but combinatorial methods to do not ap-
ply naturally to curves. There remains a need for an
appropriately general probabilistic mechanism to han-
dle multi-modal density functions.

*1.3.* *Temporal Propagation of Conditional Densities*
The Kalman filter as a recursive linear estimator is a
special case, applying only to Gaussian densities, of
a more general probability density propagation pro-
cess. In continuous time this can be described in terms
of diffusion, governed by a “Fokker-Planck” equation
**(Astrom, 1970), in which the density for x*** _{t}* drifts and
spreads under the action of a stochastic model of its
dynamics. In the simple Gaussian case, the diffusion
is purely linear and the density function evolves as
a Gaussian pulse that translates, spreads and is rein-
forced, remaining Gaussian throughout, as in Fig. 1, a
process that is described analytically and exactly by the
Kalman filter. The random component of the dynamical

*Figure 2.* Probability density propagation: propagation is depicted here as it occurs over a discrete time-step. There are three phases: drift due
to the deterministic component of object dynamics; diffusion due to the random component; reactive reinforcement due to observations.

model leads to spreading—increasing uncertainty—

while the deterministic component causes the density
function to drift bodily. The effect of an external obser-
**vation z***t*is to superimpose a reactive effect on the dif-
fusion in which the density tends to peak in the vicinity
of observations. In clutter, there are typically several
competing observations and these tend to encourage a
non-Gaussian state-density (Fig. 2).

The Condensation algorithm is designed to ad- dress this more general situation. It has the striking property that, generality notwithstanding, it is a consid- erably simpler algorithm than the Kalman filter. More- over, despite its use of random sampling which is often thought to be computationally inefficient, the Condensation algorithm runs in near real-time.

This is because tracking over time maintains relatively tight distributions for shape at successive time-steps, and particularly so given the availability of accurate, learned models of shape and motion.

**2.** **Discrete-Time Propagation of State Density**
For computational purposes, the propagation process
*must be set out in terms of discrete time t. The state of*
**the modelled object at time t is denoted x***t*and its history

is*X**t* **= {x**1**, . . . , x***t*}. Similarly, the set of image fea-
**tures at time t is z***t*with history*Z**t***= {z**1**, . . . , z***t*}. Note
that no functional assumptions (linearity, Gaussianity,
unimodality) are made about densities in the general
treatment, though particular choices will be made in
due course in order to demonstrate the approach.

*2.1.* *Stochastic Dynamics*

A somewhat general assumption is made for the prob- abilistic framework that the object dynamics form a temporal Markov chain so that

*p (x*

*t*

*| X*

*t*−1

**) = p(x***t*

**| x**

*t*−1

*)*(1)

—the new state is conditioned directly only on the im-
mediately preceding state, independent of the earlier
history. This still allows quite general dynamics, in-
cluding stochastic difference equations of arbitrary or-
der; we use second order models and details are given
later. The dynamics are entirely determined therefore
*by the form of the conditional density p (x*

*t*

**| x**

*t*−1

*). For*instance,

*p (x*

*t*

**| x**

*t*−1

*) ∝ exp*µ

−1

2**(x***t***− x***t*−1*− 1)*^{2}

¶

8 *Isard and Blake*

represents a one-dimensional random walk (discrete diffusion) whose step length is a standard normal vari- ate, superimposed on a rightward drift at unit speed.

**Of course, for realistic problems, the state x is multi-**
dimensional and the density is more complex (and, in
the applications presented later, learned from training
sequences).

*2.2.* *Measurement*

**Observations z***t* are assumed to be independent, both
mutually and with respect to the dynamical process.

This is expressed probabilistically as follows:

*p(Z**t*−1**, x***t**| X**t*−1**) = p(x***t**| X**t*−1*)*

*t*−1

Y

*i*=1

*p (z*

*i*

**| x**

*i*

*). (2)*

**Note that integrating over x*** _{t}*implies the mutual condi-
tional independence of observations:

*p(Z**t**| X**t**) =*
Y*t*
*i*=1

*p (z*

*i*

**| x**

*i*

*).*(3)

The observation process is therefore defined by speci-
*fying the conditional density p (z*

*t*

**| x**

*t*

*) at each time t,*and later, in computational examples, we take this to

*be a time-independent function p*say for now that, in clutter, the observation density is multi-modal. Details will be given in Section 6.

**(z | x). Suffice it to***2.3.* *Propagation*

Given a continuous-valued Markov chain with inde-
*pendent observations, the conditional state-density p**t*

*at time t is defined by*

*p*_{t}**(x***t***) ≡ p(x***t**| Z**t**).*

*This represents all information about the state at time t*
that is deducible from the entire data-stream up to that
time. The rule for propagation of state density over
time is

*p (x*

*t*

*| Z*

*t*

*) = k*

*t*

*p*

**(z***t*

**| x**

*t*

**)p(x***t*

*| Z*

*t*−1

*),*(4) where

*p (x*

*t*

*| Z*

*t*−1

*) =*Z

**x**_{t−1}

*p (x*

*t*

**| x**

*t*−1

**)p(x***t*−1

*| Z*

*t*−1

*) (5)*

*and k**t*is a normalisation constant that does not depend
**on x***t*. The validity of the rule is proved in the appendix.

The propagation rule (4) should be interpreted sim-
ply as the equivalent of the Bayes’ rule (6) for inferring
posterior state density from data, for the time-varying
*case. The effective prior p (x*

*t*

*| Z*

*t*−1

*) is actually a pre-*

*diction taken from the posterior p*

**(x***t*−1

*| Z*

*t*−1

*) from the*previous time-step, onto which is superimposed one time-step from the dynamical model (Fokker-Planck drift plus diffusion as in Fig. 2), which is expressed in (5). Multiplication in (4) by the observation density

*p*

**(z***t*

**| x**

*t*

*) in the Bayesian manner then applies the re-*active effect expected from observations. Because the observation density is non-Gaussian, the evolving state

*density p*

**(x***t*

*| Z*

*t*

*) is also generally non-Gaussian. The*

*problem now is how to apply a nonlinear filter to eval-*uate the state density over time, without incurring ex- cessive computational load. Inevitably this means ap- proximating. Numerous approaches, including “multi- ple hypothesis tracking”, have been proposed but prove unsuitable for use with curves as opposed to discrete features—details are given in the appendix. In this pa- per we propose a sampling approach which is described in the following two sections.

**3.** **Factored Sampling**

This section describes first the factored sampling algo- rithm dealing with non-Gaussian observations in single images. Then factored sampling is extended in the fol- lowing section to deal with temporal image sequences.

A standard problem in statistical pattern recognition
**is to find an object parameterised as x with prior p****(x),****using data z from a single image. The posterior density**
*p (x | z) represents all the knowledge about x that is de-*
ducible from the data. It can be evaluated, in principle,
by applying Bayes’ rule (Papoulis, 1990) to obtain

*p (x | z) = kp(z | x)p(x)* (6)

*where k is a normalisation constant that is independent*

**of x. In cases where p****(z | x) is sufficiently complex***that p*iterative sampling techniques can be used (Geman and Geman, 1984; Ripley and Sutherland, 1990; Grenander et al., 1991; Storvik, 1994). The factored sampling algorithm (Grenander et al., 1991) generates a random

**(x | z) cannot be evaluated simply in closed form,****variate x from a distribution**

**˜p(x) that approximates the***posterior p*

**(x | z). First, a sample-set {s**

^{(1)}

**, . . . , s***} is*

^{(N)}*generated from the prior density p*

**(x) and then an index***Figure 3.* **Factored sampling: a set of points s**^{(n)}*, the centres of the blobs in the figure, is sampled randomly from a prior density p(x). Each*
sample is assigned a weight*π**i* *(depicted by blob area) in proportion to the value of the observation density p(z | x = s*^{(n)}*). The weighted*
*point-set then serves as a representation of the posterior density p** (x | z), suitable for sampling. The one-dimensional case illustrated here extends*
naturally to the practical case that the density is defined over several position and shape variables.

*n∈ {1, . . . , N} is chosen with probability π**n*, where
*π**n*= *p** _{z}*¡

**s*** ^{(n)}*¢
P

*N*

*j*=1*p**z*

¡**s*** ^{( j)}*¢

and

*p*_{z}**(x) = p(z | x),**

**the conditional observation density. The value x**^{0}**= x***n*

chosen in this fashion has a distribution which approx-
*imates the posterior p (x | z) increasingly accurately as*

*N increases (Fig. 3).*

Note that posterior mean properties* E[g(x) | z] can*
be generated directly from the samples

**{s**

*} by weight-*

^{(n)}*ing with p*

_{z}

**(x) to give:*** E[g(x) | z] ≈*
P

*N*

*n*=1*g*¡
**s*** ^{(n)}*¢

*p**z*

¡**s*** ^{(n)}*¢
P

*N*

*n*=1 *p**z*

¡**s*** ^{(n)}*¢

*.*(7)

*For example, the mean can be estimated using g (x) = x*

*(illustrated in Fig. 4) and the variance using g*

**(x) =****xx**

^{T}*. In the case that p*

**(x) is a spatial Gauss-Markov***process, Gibbs sampling from p*to generate the random variates

**(x) has been used****{s**

^{(1)}

**, . . . , s***}. Oth- erwise, for low-dimensional parameterisations as in this paper, standard, direct methods can be used for Gaussians*

^{(N)}^{2}(Press et al., 1988). Note that, in the case

*that the density p*by factored sampling is consistent with an estimate ob- tained more conventionally, and efficiently, from linear least squares estimation. For multi-modal distributions which cannot be approximated as normal, so that linear

**(z | x) is normal, the mean obtained****estimators are unusable, estimates of mean x by fac-**
tored sampling continue to apply.

**4.** **The C****ONDENSATION****Algorithm**

The Condensation algorithm is based on factored sampling but extended to apply iteratively to successive images in a sequence. The same sampling strategy has been developed elsewhere (Gordon, et al., 1993;

Kitagawa, 1996), presented as developments of Monte- Carlo methods. Jump-diffusion tracking (Miller et al., 1995) may also be related to the approach described here.

Given that the process at each time-step is a self-
contained iteration of factored sampling, the out-
put of an iteration will be a weighted, time-stamped
sample-set, denoted**{s**^{(n)}*t* *, n = 1, . . . , N} with weights*
*π**t** ^{(n)}*, representing approximately the conditional state-

*density p*

**(x***t*

*| Z*

*t*

*) at time t. How is this sample-set*obtained? Clearly, the process must begin with a prior

*density and the effective prior for time-step t should*

*be p*

**(x***t*

*| Z*

*t*−1

*). This prior is of course multi-modal in*general and no functional representation of it is avail- able. It is derived from the sample set representation

**{(s**

^{(n)}*t*−1

*, π*

*t*

*−1*

^{(n)}

**), n = 1, . . . , N} of p(x***t*−1

*| Z*

*t*−1

*), the*output from the previous time-step, to which predic- tion (5) must then be applied.

The iterative process as applied to sample-sets, de-
picted in Fig. 5, mirrors the continuous diffusion pro-
cess in Fig. 2. At the top of the diagram, the out-
*put from time-step t* − 1 is the weighted sample-set
**{(s**^{(n)}*t*−1*, π**t** ^{(n)}*−1

*), n = 1, . . . , N}. The aim is to maintain,*

*at successive time-steps, sample sets of fixed size N ,*

10 *Isard and Blake*

*Figure 4.* Sample-set representation of shape distributions: the sample-set representation of probability distributions, illustrated in one dimen-
**sion in Fig. 3, is illustrated here (a) as it applies to the distribution of a multi-dimensional curve parameter x. Each sample s*** ^{(n)}*is shown as a
curve (of varying position and shape) with a thickness proportional to the weight

*π*

*n*. The weighted mean of the sample set (b) serves as an estimator of the distribution mean.

*Figure 5.* One time-step in the Condensation algorithm: Each of the three steps—drift-diffuse-measure—of the probabilistic propagation
process of Fig. 2 is represented by steps in the Condensation algorithm.

so that the algorithm can be guaranteed to run within
a given computational resource. The first operation
*therefore is to sample (with replacement) N times from*
the set**{s**^{(n)}_{t}_{−1}}, choosing a given element with probabil-
ity*π**t** ^{(n)}*−1. Some elements, especially those with high
weights, may be chosen several times, leading to iden-
tical copies of elements in the new set. Others with
relatively low weights may not be chosen at all.

Each element chosen from the new set is now sub-
jected to the predictive steps. First, an element un-
dergoes drift and, since this is deterministic, identical
elements in the new set undergo the same drift. This
is apparent in the Fig. 5. The second predictive step,
diffusion, is random and identical elements now split
because each undergoes its own independent Brownian
motion step. At this stage, the sample set**{s**^{(n)}*t* } for the
new time-step has been generated but, as yet, without its
weights; it is approximately a fair random sample from
*the effective prior density p (x*

*t*

*| Z*

*t*−1

*) for time-step t.*

Finally, the observation step from factored sampling is
applied, generating weights from the observation den-
*sity p (z*

*t*

**| x**

*t*

*) to obtain the sample-set representation*

**{(s**

^{(n)}*t*

*, π*

*t*

^{(n)}*)} of state-density for time t.*

Figure 6 gives a synopsis of the algorithm. Note the
*use of cumulative weights c*^{( j)}_{t}_{−1}(constructed in step 3)
to achieve efficient sampling in step 1. After any time-
step, it is possible to “report” on the current state, for
example, by evaluating some moment of the state den-
sity as shown.

One of the striking properties of the Condensa- tion algorithm is its simplicity, compared with the Kalman filter, despite its generality. Largely, this is due to the absence of the Riccati equation which ap- pears in the Kalman filter for the propagation of co- variance. The Riccati equation is relatively complex computationally but is not required in the Conden- sation algorithm which instead deals with variability by sampling, involving the repeated computation of a relatively simple propagation formula.

**5.** **Stochastic Dynamical Models**
**for Curve Motion**

In order to apply the Condensation algorithm,
which is general, to tracking curves in image-streams,
specific probability densities must be established both
for the dynamics of the object and for the observation
*process. In the examples described here, x is a linear*
parameterisation of the curve and allowed transforma-
tions of the curve are represented by linear transfor-

*mations of x. The Condensation algorithm itself*
*does not demand necessarily a linear parameterisation*
though linearity is an attraction for another reason—the
availability of algorithms to learn object dynamics. The
algorithm could also be used, in principle, with non-
linear parameterised kinematics—for instance, repre-
senting an articulated hand in terms of joint angles
(Rehg and Kanade, 1994).

*5.1.* *Linear Parameterisations of Splines*
*for Tracking*

We represent the state of a tracked object following
methods established for tracking using a Kalman filter
(Blake et al., 1995). Objects are modelled as a curve
(or set of curves), typically though not necessarily the
*occluding contour, and represented at time t by a pa-*
**rameterised image curve r***(s, t). The parameterisation*
is in terms of B-splines, so

**r****(s, t) = (B(s) · Q**^{x}**(t), B(s) · Q**^{y}*(t)),*

for 0*≤ s ≤ L, (8)*
**where B***(s) is a vector (B*1*(s), . . . , B**N*_{B}*(s))** ^{T}* of

**B-spline basis functions, Q**

^{x}**and Q**

*are vectors of B-*

^{y}*spline control point coordinates and L is the number*of spans. It is usually desirable (Blake et al., 1993) to restrict the configuration of the spline to a shape-space

**of vectors X defined by**

Ã**Q**^{x}**Q**^{y}

!

* = WX +*
Ã

**¯Q**

^{x}**¯Q**^{y}

!

*,* (9)

*where the matrix W is a matrix of rank N**X*considerably
*lower than the 2N**B* degrees of freedom of the uncon-
strained spline. Typically the shape-space may allow
affine deformations of the template shape ¯**Q, or more**
generally a space of rigid and non-rigid deformations.

The space is constructed by applying an appropriate
*combination of three methods to build a W -matrix:*

1. determining analytically combinations of contours derived from one or more views (Ullman and Basri, 1991; Koenderink and Van Doorn, 1991; Blake et al., 1993), a method that is usable both for affine spaces and for certain classes of articulated object;

2. capturing sequences of key frames of the object in different poses (Blake et al., 1995);

3. performing principal components analysis on a set of outlines of the deforming object (Cootes et al.,

12 *Isard and Blake*

*Figure 6.* The Condensation algorithm.

1993; Baumberg and Hogg, 1994) to derive a small set of representative contours.

*5.2.* *Dynamical Model*

Exploiting earlier work on dynamical modelling (Blake
et al., 1993, 1995), object dynamics are modelled as a
second order process, conveniently represented in dis-
*crete time t as a second order linear difference equation:*

**x***t***− ¯x = A(x***t*−1**− ¯x) + Bw***t* (10)

**where w***t*are independent vectors of independent stan-
dard normal variables, the state-vector

**x***t* =
µ**X**_{t}_{−1}

**X***t*

¶

*,* (11)

and where* ¯x is the mean value of the state and A, B are*
matrices representing the deterministic and stochastic
components of the dynamical model, respectively. The
system is a set of damped oscillators, whose modes,
natural frequencies and damping constants are deter-

*mined by A, driven by random accelerations coupled*

**into the dynamics via B from the noise term Bw. While***it is possible to set sensible defaults for A,* * ¯x and B,*
it is more satisfactory and effective to estimate them
from input data taken while the object performs typical
motions. Methods for doing this via Maximum Like-
lihood Estimation are essential to the work described
here and are described fully elsewhere (Blake et al.,
1995; Reynard et al., 1996).

The dynamical model can be re-expressed in such a way as to make quite clear that it is a temporal Markov chain:

*p (x*

*t*

**| x**

*t*−1

*)*

∝ exp µ

−1

2*kB*^{−1}**((x***t***− ¯x) − A(x***t*−1**− ¯x))k**^{2}

¶ (12)

wherek · · · k is the Euclidean norm. It is therefore clear that the learned dynamical models are appropriate for use in the Condensation algorithm.

*5.3.* *Initial Conditions*

Initial conditions for tracking can be determined by
*specifying the prior density p (x*0

*), and if this is*Gaussian, direct sampling can be used to initialise the Condensation algorithm. Alternatively, it is pos-

*sible simply to allow the density p*

**(x***t*

*) to settle to a*

*steady state p*

**(x**_{∞}

*), in the absence of object measure-*ments. Provided the learned dynamics are stable (free of undamped oscillations) a unique steady state exists.

*Furthermore, if p (x*0

**) is Gaussian, p(x**_{∞}

*) is Gaussian*with parameters that can be computed by iterating the Riccati equation (Gelb, 1974). At this point the density function represents an envelope of possible configura- tions of the object, as learned during the training phase.

(Background clutter, if present, will modify and bias this envelope to some extent.) Then, as soon as the foreground object arrives and is measured, the density

*p (x*

*t*

*) begins to evolve appropriately.*

**6.** **Observation Model**

*The observation process defined by p (z*

*t*

**| x**

*t*

*) is as-*sumed here to be stationary in time (though the Condensation algorithm does not necessarily de-

*mand this) so a static function p*specified. As yet we have no capability to estimate it from data, though that would be ideal, so some reasonable assumptions must be made. First, a mea- surement model for one-dimensional data with clutter

**(z | x) needs to be**is suggested. Then an extension is proposed for two- dimensional observations that is also used later in com- putational experiments.

*6.1.* *One-Dimensional Observations in Clutter*
In one dimension, observations reduce to a set of
scalar positions * {z = (z*1

*, z*2

*, . . . , z*

*M*

*)} and the ob-*

*servation density has the form p*one-dimensional position. The multiplicity of measure- ments reflects the presence of clutter so either one of the events

**(z | x) where x is***φ**m**= {true measurement is z**m**}, m = 1, . . . , M*
occurs, or else the target object is not visible with prob-
*ability q* = 1 −P

*m**P(φ**m**). Such reasoning about*
clutter and false alarms is commonly used in target
tracking (Bar-Shalom and Fortmann, 1988). Now the
observation density can be expressed as

*p (z | x) = qp(z | clutter) +*
X

*M*

*m*=1

*p (z | x, φ*

*m*

*)P(φ*

*m*

*).*

A reasonable functional form for this can be obtained
by making some specific assumptions: that^{3} *P(φ**m**) =*
*p, ∀m, that the clutter is a Poisson process along the line*
with spatial density*λ and that any true target measure-*
ment is unbiased and normally distributed with stan-
dard deviation*σ. This leads to*

*p (z | x) ∝ 1 +* 1

√2*πσα*
X

*m*

exp µ

−*ν**m*^{2}

2*σ*^{2}

¶
(13)
where*α = qλ and ν**m* *= z**m**− x, and is illustrated*
in Fig. 7. Peaks in the density function correspond to
measured features and the state density will tend to be
reinforced in the Condensation algorithm at such
points. The background level reflects the possibility
that the true target has not been detected at all. The
effect on tracking behaviour is to provide for the possi-
bility of “tunneling”: a good hypothesis should survive
a transitory failure of observations due, for example,
to occlusion of the tracked object. The parameters*σ*
(units of distance) and *α (units of inverse distance)*
must be chosen, though, in principle, they could be
estimated from data by observing measurement error
*σ and both the density of clutter λ and probability of*
*non-detection q.*

Considerable economy can be applied, in practice, in
the evaluation of the observation density. Given a hy-
*pothesised position x in the “observation” step (Fig. 6)*
*it is not necessary to attend to all features z*1*, . . . , z**M*.

14 *Isard and Blake*

*Figure 7.* One-dimensional observation model: a probabilistic observation model allowing for clutter and the possibility of missing the target
*altogether is specified here as a conditional density p(z | x).*

Any*ν**m*for which

√ 1

2*πσα*exp
µ

−*ν**m*^{2}

2*σ*^{2}

¶

¿ 1

can be neglected and this sets a search window around
*the position x outside which measurements can be ig-*
nored. For practical values of the constants the search
window will have a width of a few*σ . In practice, the*
clutter is sufficiently sparse and*σ is sufficiently small*
that the search window rarely contains more than one
feature.

*Note that the density p (z | x) represents the informa-*

*tion about x given a fixed number M of measurements.*

*Potentially, the event* *ψ**M* *that there are M measure-*
*ments, regardless of the actual values of those measure-*
*ments, also constitutes information about x. However,*
we can reasonably assume here that

*P(ψ**M**| x) = P(ψ**M**),*

*for instance because x is assumed to lie always within*
the image window. In that case, by Bayes’ theorem,

*p(x | ψ**M**) = p(x)*

—the event *ψ**M* provides no additional informa-
*tion about the position x. (If x is allowed also to fall*
outside the image window then the event*ψ**M**is infor-*
*mative: a value of M well above the mean value for*
*the background clutter enhances the probability that x*
lies within the window.)

*6.2.* *Two-Dimensional Observations*

**In a two-dimensional image, the set of observations z**
is, in principle, the entire set of features visible in the
image. However, an important aspect of earlier sys-
tems in achieving real-time performance (Lowe, 1992;

Harris, 1992; Blake et al., 1993) has been the restriction of measurement to a sparse set of lines normal to the tracked curve. These two apparently conflicting ideas can be resolved as follows.

*The observation density p (z | x) in two dimensions*
describes the distribution of a (linearly) parameterised

**image curve z**

*(s), given a hypothetical shape in the*

**form of a curve r**

*(s), 0 ≤ s ≤ 1, represented by a*

**shape parameter x. The two-dimensional density be**derived as an extension of the one-dimensional case. It

*is assumed that a mapping g(s) is known that associates*

**each point z**

*on the shape. In practice, this mapping is set up by*

**(s) on the image curve with a point r(g(s))**

**tracing normals from the curve r. Note that g**(s) is not**necessarily injective because z**

*(s) includes clutter as*well as foreground features. Next, the one-dimensional density (13) is approximated in a more amenable form that neglects the possibility of more than one feature lying inside the search interval:

*p (z | x) ∝ exp*
µ

− 1

2*σ*^{2} *f(ν*1*; µ)*

¶

*where f(ν; µ) = min(ν*^{2}*, µ*^{2}*),* (14)

*Figure 8.* Observation process: the thick line is a hypothesised shape, represented as a parametric spline curve. The spines are curve normals
along which high-contrast features (white crosses) are sought.

*µ =*√

2*σ log(1/*√

2*πασ ) is a spatial scale constant,*
and*ν*1is the*ν**m*with smallest magnitude, representing
*the feature lying closest to the hypothesised position x.*

A natural extension to two dimensions is then

*p (z | x) = Zexp*
Ã

−1
*2r*

Z _{L}

0

*f (z*1

**(s) − r(s); µ) ds**!

(15)
* in which r is a variance constant and z*1

*(s) is the closest*

**associated feature to r**

*(s):*

**z**1**(s) = z(s**^{0}*) where s*^{0}= arg min

*s*^{0}*∈g*^{−1}*(s)***|r(s) − z(s**^{0}*)|.*

Note that the constant of proportionality (“partition
*function”) Z (x) is an unknown function. We make the*

**assumption that the variation of Z with x is slow com-***pared with the other term in (15) so that Z can be treated*as constant. It remains to establish whether this as- sumption is justified.

The observation density (15) can be computed via a discrete approximation, the simplest being:

*p (z | x) ∝ exp*
Ã

−
X*M*
*m*=1

1

*2r M* *f (z*1

*(s*

*m*

**) − r(s***m*

*); µ)*

!
*,*

(16)
*where s**m* *= m/M. This is simply the product of*
one-dimensional densities (14) with*σ =*√

*r M, eval-*
*uated independently along M curve normals as in*
Fig. 8.

**7.** **Applying the C****ONDENSATION****Algorithm**
**to Video-Streams**

Four examples are shown here of the practical efficacy of the Condensation algorithm. Movie (MPEG) versions of some results are available on the web at http://www.robots.ox.ac.uk/~ab/.

16 *Isard and Blake*

*Figure 9.* Tracking three people in a cluttered room: the first frame of a sequence in which one figure moves from right to left in front of two
stationary figures.

*7.1.* *Tracking a Multi-Modal Distribution*

The ability of the Condensation algorithm to rep- resent multi-modal distributions was tested using a 70 frame (2.8 s) sequence of a cluttered room containing three people each facing the camera (Fig. 9). One of the people moves from right to left, in front of the other two. The shape-space for tracking is built from a hand- drawn template of head and shoulders (Fig. 8) which is then allowed to deform via planar affine transforma- tions. A Kalman filter contour-tracker (Blake et al., 1993) with default motion parameters is able to track a single moving person just well enough to obtain a sequence of outline curves that is usable as training data. Given the high level of clutter, adequate perfor- mance with the Kalman filter is obtained here by means of background modelling (Rowe and Blake, 1996), a statistical form of background subtraction, which ef- fectively removes clutter from the image data before it is tracked. It transpires, for this particular training set, that the learned motions comprise primarily horizontal translation, with vertical translation and horizontal and vertical shear present to a lesser degree.

The learned shape and motion model can now
*be installed as p (x*

*t*

**| x**

*t*−1

*) in the Condensation*

*algorithm which is run on a test sequence but with-*

*out the benefit of background modelling, so that the*

background clutter is now visible to the tracker.

Figure 10 shows how the state-density evolves as track-
ing progresses. Initialisation is performed simply by
iterating the stochastic model, in the absence of mea-
surements, to its steady state and it can be seen that this
corresponds, at time 0, to a roughly Gaussian distribu-
tion, as expected. The distribution rapidly collapses
down to three peaks which are then maintained appro-
priately even during temporary occlusion. Although the
tracker was designed to track just one person, the Con-
densation algorithm allows the tracking of all three,
for free; the ability to represent multi-modal distribu-
tions effectively provides multiple hypothesis capabil-
ity. Tracking is based on frame rate (40 ms) sampling
in this experiment and distributions are plotted in the
figure for alternate frames. The experiment was run
*using a distribution of N* = 1000 samples per time-
step.

*7.2.* *Tracking Rapid Motions Through Clutter*
The ability to track more agile motion, still against
clutter, was tested using a 500 field (10 s) sequence
of a girl dancing vigorously to a Scottish reel. The
shape-space for tracking was planar affine, based on a
hand-drawn template curve for the head outline. The
training sequence consisted of dancing against a largely

*Figure 10.* Tracking with multi-modal state-density: an approximate depiction of the state-density is shown, computed by smoothing the
**distribution of point masses s**^{(1)}_{t}**, s**^{(2)}*t* *, . . . in the Condensation algorithm. The density is, of course, multi-dimensional; its projection onto the*
horizontal translation axis is shown here. The initial distribution is roughly Gaussian but this rapidly evolves to acquire peaks corresponding to
each of the three people in the scene. The right-most peak drifts leftwards, following the moving person, coalescing with and separating from
the other two peaks as it moves. Having specified a tracker for one person we effectively have, for free, a multi-person tracker, owing to the
innate ability of the Condensation algorithm to maintain multiple hypotheses.

uncluttered background, tracked by a Kalman filter contour-tracker with default dynamics to record 140 fields (2.8 s) of tracked head positions, the most that could be tracked before losing lock. Those 140 fields were sufficient to learn a bootstrap motion model which then allowed the Kalman filter to track the training data for 800 fields (16 s) before loss of lock. The motion model obtained from these 800 fields was used in exper- iments with the Condensation tracker and applied to the test data, now including clutter.

Figure 11 shows some stills from the test sequence, with a trail of preceding head positions to indicate mo- tion. The motion is primarily translation, with some horizontal shear apparent as the dancer turns her head.

*Representing the state density with N* = 100 samples
at each time-step proves just sufficient for successful
tracking. As in the previous example, a prior density
can be computed as the steady state of the motion model
and, in this case, that yields a prior for position that
spreads across most of the image area, as might be

expected given the range of the dance. Such a broad
distribution cannot effectively be represented by just
*N* *= 100 samples. One alternative is to increase N in*
the early stages of tracking, and this is done in a later
experiment. Alternatively, the prior can be based on
a narrower distribution whose centre is positioned by
hand over the object at time 0, and that is what was done
here. Observation parameters were*µ = 24, σ = 7*
*with M*= 18 normals.

Figure 12 shows the motion of the centroid of the estimated head position as tracked both by the Con- densation algorithm and by a Kalman filter using the same motion model. The Condensation tracker correctly estimated head position throughout the se- quence, but after about 40 fields (0.80 s), the Kalman filter was distracted by clutter, never to recover.

Given that there is only one moving person in this experiment, unlike the previous one in which there were three, it might seem that a unimodal repre- sentation of the state density should suffice. This is

18 *Isard and Blake*

*Figure 11.* Tracking agile motion in clutter: the test sequence consists of 500 fields (10 s) of agile dance against a cluttered background. The
dancer’s head is tracked through the sequence. Several representative fields are shown here, each with a trail of successive mean tracked head
*position at intervals of 40 ms. The Condensation algorithm used N = 100 sample per time-step to obtain these results.*

emphatically not the case. The facility to represent multiple modes is crucial to robustness as Fig. 13 illus- trates. The figure shows how the distribution becomes misaligned (at 900 ms), reacting to the distracting form of the computer screen. After a further 20 ms the distri- bution splits into two distinct peaks, one correspond- ing to clutter (the screen) and the other to the dancer’s head. At this point the clutter peak actually has the higher posterior probability—a unimodal tracker, for instance, a Kalman filter, would almost certainly dis- card the lower peak, rendering it unable to recover.

The Condensation algorithm however, capable as it is of carrying several hypotheses simultaneously, does recover rapidly as the clutter peak decays for lack of confirmatory observation, leaving just one peak corre- sponding to the dancer at 960 ms.

*7.3.* *Tracking an Articulated Object*

The preceding sequences show motion taking place in affine shape-spaces of just six dimensions. High di- mensionality is one of the factors, in addition to agility and clutter, that makes tracking hard (Blake et al., 1993). In order to investigate tracking performance in higher dimensions, we used a 500 field (10 s) test sequence of a hand translating, rotating, and flexing its fingers independently, over a highly cluttered desk scene (Fig. 14). Figure 15 shows just how severe the clutter problem is—the hand is immersed in a dense field of edges.

A model of shape and motion model was learned from training sequences of hand motion against a plain background, tracked by Kalman filter (using signed

*Figure 12.* The Condensation tracker succeeds where a Kalman filter fails: the estimated centroid for the sequence shown in Fig. 11 is plotted
against time for the entire 500 field sequence, as tracked first by the Condensation tracker, then by a comparable Kalman filter tracker.

The Condensation algorithm correctly estimates the head position throughout the sequence. The Kalman filter tracks briefly, but is soon distracted by clutter and never recovers.

edges to help to disambiguate finger boundaries). The procedure comprised several stages, creative assembly of methods from the available “toolkit” for learning (Blake et al., 1995).

**1. Shape-space was constructed from six templates**
drawn around the hand with the palm in a fixed
orientation and with the fingers and thumb in var-
ious configurations. The six templates combined
linearly to form a five-dimensional space of defor-
mations which were then added to the space of trans-
lations to form a seven-dimensional shape-space.

**2. Default dynamics in the shape-space above were**
adequate to track a clutter-free training sequence of
600 frames in which the palm of the hand main-
tained an approximately fixed attitude.

**3. Principal components analysis: the sequence of**
600 hand outlines was replicated with each hand
contour rotated through 90^{◦}and the sequences con-
catenated to give a sequence of 1200 deformations.

Projecting out the translational component of mo- tion, the application of Principal Component Anal- ysis (PCA) to the sequence of residual deformations of the 1200 contours established a 10-dimensional space that accounted almost entirely for deforma- tion. This was then combined with the translational space to form a 12-dimensional shape-space that

accounted both for the flexing of fingers and thumb and also for rotations of the palm.

**4. Bootstrapping: a Kalman filter with default dy-**
namics in the 12-dimensional shape-space was suf-
ficient to track a training sequence of 800 fields of
the hand translating, rotating, and flexing fingers
and thumb slowly. This was used to learn a model of
motion.

**5. Relearning: that motion model was installed in a**
Kalman filter used to track another, faster training-
sequence of 800 fields. This allowed a model for
more agile motion to be learned, which was then
used in experiments with the Condensation
tracker.

Figure 16 shows detail of a series of images from a
tracked, 500 field test-sequence. The initial state den-
sity was simply the steady state of the motion model,
obtained by allowing the filter to iterate in the absence
of observations. Tracker initialisation was facilitated
*by using more samples per time-step (N* = 1500) at
*time t* = 0, falling gradually to 500 over the first
4 fields. The rest of the sequence was tracked using
*N* = 500. As with the previous example of the dancer,
clutter can distract the tracker but the ability to repre-
sent multi-modal state density means that tracking can
recover.

20 *Isard and Blake*

*Figure 13.* Recovering from tracking failure: detail from four consecutive fields of the sequence illustrated in Fig. 11. Each sample from the
*distribution is plotted on the image, with intensity scaled to indicate its posterior probability. (Most of the N*= 100 samples have too low a
probability to be visible in this display.) In field 45, the distribution is misaligned, and has begun to diverge. In fields 46 and 47 it has split into
two distinct peaks, the larger attracted to background clutter, but converges back onto the dancer in field 48.

*7.4.* *Tracking a Camouflaged Object*

Finally, we tested the ability of the algorithm to track rapid motion against background distraction in the extreme case that background objects actually mim- iced the tracked object. A 12 s (600 field) sequence

showed a bush blowing in the wind, the task being to track one particular leaf. A template was drawn by hand around a still of one chosen leaf and allowed to undergo affine deformations during tracking. Given that a clutter-free training sequence was not available, the motion model was again learned by means of a

*Figure 14.* A hand moving over a cluttered desk: Field 0 of a 500 field (10 s) sequence in which the hand translates, rotates, and the fingers
and thumb flex independently.

*Figure 15.* Severe clutter: detail of one field (Fig. 14) from the test-sequence shows the high level of potential ambiguity. Output from a
directional Gaussian edge detector shows that there are many clutter edges present as potential distractors.

bootstrap procedure. A tracker with default dynamics proved capable of tracking the first 150 fields of a train- ing sequence before losing the leaf, and those tracked positions allowed a first approximation to the model to be learned. Installing that in a Condensation tracker, the entire sequence could be tracked, though with occasional misalignments. Finally, a third learned

model was sufficient to track accurately the entire 12-
second training sequence. Despite occasional violent
gusts of wind and temporary obscuration by another
leaf, the Condensation algorithm successfully fol-
lowed the object, as Fig. 17 shows. In fact, tracking
*is accurate enough using N* = 1200 samples to sepa-
rate the foreground leaf from the background reliably,

22 *Isard and Blake*

*Figure 16.* Tracking a flexing hand across a cluttered desk: representative stills from a 500 field (10 s) sequence show a hand moving over a
highly cluttered desk scene. The fingers and thumb flex independently, and the hand translates and rotates. Here the Condensation algorithm
*uses N**= 1500 samples per time-step initially, dropping gradually over 4 fields to N = 500 for the tracking of the remainder of the sequence.*

The mean configuration of the contour is displayed.

*Figure 17.* Tracking with camouflage: the aim is to track a single camouflaged moving leaf in this 12-s sequence of a bush blowing in the wind.

Despite the heavy clutter of distractors which actually mimic the foreground object, and occasional violent gusts of wind, the chosen foreground leaf is successfully tracked throughout the sequence. Representative stills depict mean contour configurations, with preceding tracked leaf positions plotted at 40 ms intervals to indicate motion.

an effect which can otherwise only be achieved using

“blue-screening”. Having obtained the model itera-
tively as above, independent test sequences could be
*tracked without further training. With N* = 1200 sam-
ples per time-step the tracker runs at 6.5 Hz on a SGI
Indy SC4400 200 MHz workstation. Reducing this to
*N* = 200 increases processing speed to video frame-
rate (25 Hz), at the cost of occasional misalignments
in the mean configuration of the contour. Observation
parameters were*µ = 8, σ = 3 with M = 21 normals.*

**8.** **Conclusions**

Tracking in clutter is hard because of the essential
multi-modality of the conditional observation den-
*sity p (z | x). In the case of curves multiple-hypothesis*
tracking is inapplicable and a new approach is needed.

The Condensation algorithm is a fusion of the sta-
tistical factored sampling algorithm for static, non-
Gaussian problems with a stochastic model for object
motion. The result is an algorithm for tracking rigid
and non-rigid motion which has been demonstrated to
be far more effective in clutter than comparable Kalman
filters. Performance of the Condensation algorithm
*improves as the sample size parameter N increases;*

*formally computational complexity is O(N log N), al-*
*though this can be made O(N) with a minor modifi-*
cation to the sampling procedure. Impressive results
have been demonstrated for models with between 6
*and 12 degrees of freedom, even when N is as low*
as 100–200. Performance in several cases was im-
*proved still further with an increased value N*≈ 1000.

In a six-dimensional shape-space, the system currently
*runs with N* = 100 in real-time (50 Hz) on a desk-top
graphics workstation (SGI Indy R4400SC, 200 MHz).

The new approach raises a number of questions.

First, alternative observation models could be explored in order to make greater use of image intensity varia- tions, though without sacrificing too much in the way of photometric invariance. It is to be hoped in the in- terests of efficiency that, as happens with the search window in the edge-based case, computational atten- tion could be concentrated in a band around the hy- pothesised curve without significant loss of accuracy in the model. Such a model would have echoes of cor- relation matching but of course without the exhaustive search characteristic of correlation matchers which is quite infeasible in more than two or three dimensions.

Secondly, the availability of general state densities suggests the need for more general representations of

those densities. When the density is approximately unimodal, first and second moments may be adequate to convey the likely states, but in the multi-modal case, as for example when several people are tracked simul- taneously, the mean configuration is not a particularly useful statistic—it meaninglessly combines the config- urations of the three people. An alternative is to attempt to develop a mode finder capable of pin-pointing sev- eral modes when present. More generally, there is a need for “operators” to interrogate densities: for in- stance, an operator to find a person moving to the right, or to find the tallest person. Perhaps such op- erators could be formulated as hypothesis tests applied to sample sets.

A third question concerns the random sampling
scheme and its efficiency. Factored sampling can be
*inefficient as the modes of p (z | x) become narrow.*

One approach is “importance sampling” (Ripley, 1987)
in which a heuristically chosen distribution, approxi-
*mating p (z | x), is used to concentrate random sam-*
pling around modes. However, this has the drawback

*that the prior p*whereas, in temporal propagation, the prior (predic-

**(x) must be repeatedly evaluated***tion) p*

**(x***t*

**| z**

*t*−1

*) cannot be evaluated pointwise, only*sampled.

Finally, it is striking that the density propagation
equation (4) in the Condensation algorithm is a
continuous form of the propagation rule of the “for-
ward algorithm” for Hidden Markov Models (HMMs)
(Rabiner and Bing-Hwang, 1993). The integral over
continuous states in (5) becomes a summation over
*discrete states in the HMM, with p (x*

*t*

**| x**

*t*−1

*) repre-*sented by a transition matrix. This suggests a natu- ral opportunity to combine the two so that mixed dis- crete/continuous states could be propagated over time.

This would allow switching between multiple mod- els, for instance, walk-trot-canter-gallop, each model represented by a stochastic differential equation, with transitions governed by a discrete conditional proba- bility matrix. It seems likely that such a system could be executed as a Condensation tracker. A further challenge is to develop a learning algorithm for mixed dynamical models.

**Appendix A:** **Non-Linear Filtering**

There are four distinct probability distributions rep- resented in a non-linear Bayesian filter. Three of them form part of the problem specification and the fourth constitutes the solution. The three specified

24 *Isard and Blake*

distributions are:

*1. the prior density p (x) for the state x*

*2. the process density p (x*

*t*

**| x**

*t*−1

*) that describes the*stochastic dynamics

*3. the observation density p (z | x)*

and the filter evolves over time to generate, as the so-
*lution at each time-step, the state-density p*_{t}**(x) where***p*_{t}**(x***t***) ≡ p(x***t**| Z**t**). Only when all of the three spec-*
*ified distributions are Gaussian is the state-density p**t*

*also Gaussian. Otherwise, for non-Gaussian p**t*, it is
possible to use one of a number of approximate filters,
depending on which of the specified densities it is that
is non-Gaussian.

*A.1.* *Non-Gaussian Prior Density*

The case that the prior density is non-Gaussian is the simplest to deal with provided only that it can ade- quately be represented (or approximated) as an additive Gaussian mixture:

*p*_{0}* (x) =*
X

*M*

*m*=1

*w*^{(m)}*G*¡

**x***; µ*^{(m)}*, P*_{0}* ^{(m)}*¢

*.*

In that case, provided that other specified densities are Gaussian, the state density can also be represented as a corresponding mixture

*p*_{t}* (x) =*
X

*M*

*m*=1

*w*^{(m)}*G*¡

**x***; µ*^{(m)}*t* *, P**t*^{(m)}

¢

in which the means*µ*^{(m)}*t* *and variances P*_{t}* ^{(m)}*vary over
time but the weights

*w*

^{(m)}*are fixed. Each of the M mix-*ture components evolves as an independent Gaussian so that, in fact, the state density is just a sum of densities

*from M independent linear Kalman filters.*

*A.2.* *Non-Gaussian Process Density*

Non-Gaussian state densities can arise from the nature of the process either because the dynamics are driven by non-Gaussian process noise, or, more generally, be- cause the deterministic dynamics are non-linear. One approach to filtering is then to approximate the dynam- ics by Taylor expansion as a linear process with time- varying coefficients and proceed as for linear Kalman filters. This generates a Gaussian representation of the

evolving state-density which may be a good approx- imation depending on the nature of the non-linearity.

This is the basis of the “Extended Kalman Filter” (EKF)
(Gelb, 1974; Bar-Shalom and Fortmann, 1988). Al-
ternatively, one can attempt a mixture representation,
as earlier, but now allowing the weights*w** ^{(m)}* also to
vary over time. Unfortunately, even allowing dynamic
re-weighting (Sorenson and Alspach, 1971) does not

*produce exact solutions for p*

*t*

*vidual Gaussian components do not remain Gaussian over time. For example, consider the case in which the*

**(x), because the indi-***process density p*

**(x***t*

**| x**

*t*−1

*) is itself an additive mix-*

*ture of k> 1 Gaussian components. According to the*Bayesian propagation equation (5) each component of

*p*

*t*

*splits into k separate components in the transition*

*from time n to time n*+ 1; the total number of compo-

*nents in p*

_{t}*grows exponentially as k*

^{t}*. Clearly, p*

*must be approximated at each time-step to prune back the number of components (Anderson and Moore, 1979)*

_{t}*within some resource-limited bound M. Effectively,*

*there are Mk full Kalman filters running at each time-*step, each bringing the computational expense of its own Riccati equation step. Clearly, the success of this

*approach depends on how well the densities p*

*t*and

*p*

**(x***t*

**| x**

*t*−1

*) can be approximated with a modest num-*

*ber Mk of components.*

*A.3.* *Non-Gaussian Observation Density*

In the case of visual tracking in clutter, non-linearity
of the tracking filter arises, as we have seen, because
*the observation density p (z | x) is non-Gaussian and,*
furthermore, is multi-modal so that it cannot be well
approximated by a single Gaussian. Each of the meth-
ods just mentioned for handling non-Gaussian process
density, the EKF and Gaussian mixtures, are relevant
also to non-Gaussian observation density but continue
to have the same drawbacks. Note that, in the case of
Gaussian mixtures, the number of mixture components
again proliferates at each time-step of (4), albeit via a
different mechanism involving products of Gaussians
rather than convolutions. Even this assumes that the
observation density can be approximated as a mixture
but in clutter this becomes rather inefficient, requiring
at least one component per visible feature.

There is an additional class of techniques which ap-
plies to this case when the non-Gaussian state den-
sity arises from clutter of a particular sort. In the
simplest case, one of a finite set of measurements
**z***t* *= {z**t**,1**, . . . , z**t**,k**} at time n is to be associated with*