**Advanced Topics in Learning and Vision**

Ming-Hsuan Yang

mhyang@csie.ntu.edu.tw

**Announcements**

• Term project presentation: Dec 28 and Dec 29.

• All critiques due on Jan 9, 2006 (midnight, Taipei local time). No overdue critiques will be accepted.

• Final term project report: Due on Jan 16, 2006 (midnight, Taipei local time).

No overdue term reports will be accepted.

• Supplementary reading:

- M. Black and A. Jepson. Eigentracking: Robust matching and tracking of articulated
*objects using view-based representation. International Journal of Computer vision, vol.*

26, no. 1, pp. 63–84, 1998.

- A. Jepson, D. Fleet, and T. El-Maraghi. Robust Online Appearance Models for Visual
*Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no.*

10, pp. 1296–1311, 2003.

**Overview**

• Particle filters

• 3D human tracking

• On-line visual tracking

• Mean shift algorithm and applications

**Problems with Particle Filter**

• Sampling in high dimensional space is difficult and inefficient.

• Need a lot of particles.

• Curse of dimensionality: usually does not scale well (e.g., up to 10 dimensional space).

**Multiple Hypothesis Particle Filter**

• Multiple hypothesis filter (MHF) is a classical approach to representing multi-modal distributions with Kalman filters.

• In mode-based multiple hypothesis filter [Cham and Rehg CVPR99] , each mode is modeled by a truncated Gaussian (as opposed to a set of discrete samples used in the condensation algorithm).

• Sample weights: p(x) = k max_{i=1,...,N}{p_{i} exp(−^{1}_{2}(x − m_{i})^{T}S^{−1}_{i} (x − m_{i})}

where p_{i} is the Gaussian mixture weight and m_{i} is the i-th Gaussian center
with covariance S_{i}.

• Use 2D scaled prismatic models with 19 degree of freedom.

• Observation model:

p(z_{t}|x_{t}) ∝ Y

u

exp(−(I(u) − T (u, x_{t}))^{2}

2σ^{2} ) (1)

where u represent image pixel coordinates, I(u) are the image pixel values
at u, T (u, x_{t}) are the overlapping template pixel values at u with state

vector x_{t}.

• *T.-J. Cham and J. Rehg. A multiple hypothesis approach to figure tracking. Proceedings of*
*IEEE Conference on Computer Vision and Pattern Recognition, pp. 239–245, 1999.*

**Annealed Particle Filter**

• Incorporate the concept of simulated annealing with particle filter [Deutscher et al. CVPR00]

• Based on Markov chain method of simulated annealing in the optimization context [Kirkpatrick et al. 01].

• Handling multiple modes.

• Multi-layer search, akin to coarse to fine search.

• 3D human model is based on kinematic chain with 29 degree of freedom.

• Each limb is represented by conic sections with elliptical cross surfaces.

• Use edge feature and background subtraction with calibrated cameras.

e

X(x, z) = 1 N

N

X

i=1

(1 − p^{e}_{i}(x, z))^{2} (2)

where x is the model state vector and z is the image from which the pixel
map is derived. p_{i}(x, z) are the values of the edge pixel map at the N

sampling points taken along the model’s silhouette.

• Likewise,

r

X(x, z) = 1 N

N

X

i=1

(1 − p^{r}_{i}(x, z))^{2} (3)

where p_{i}(x, z) are the values of the foreground pixel map at the N sampling
points taken from the interior of the conical sections.

• Sample weight:

w(x, z) = exp^{−(}^{P}^{e}^{(x,z)+}^{P}^{r}^{(x,z))}

where x is the model configuration vector and z is the image observation (edge e and region r).

• Can incorporate observations from multiple cameras.

• J. Deutscher, A. Blake, and I. Reid. Articulated body motion capture by annealed particle
*filtering. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,*
pp. 2126–2133, 2000.

**Tracking Loose Limbed People**

• A body is represented by a graphical model [Sigal et al. NIPS03]:

- node: body part (torso, upper leg, etc)

- edge: spatial and angular constraints between two adjacent body parts

• Each body part is modeled by 5 fixed parameters (lengths and widths at proximal and distal ends, and offsets) and 6 estimated parameters (3 global positions and 3 angular orientations).

• Each directed edge between parts i and j is modeled by a conditional
distribution ψ_{ij}(x_{i}, x_{j}).

• Conditional distributions capture physical constraints and can be learned from motion capture data or constructed by hands.

ψ_{ij}(x_{i}, x_{j}) = λ^{0}N (x_{j}; µ_{ij}, λ_{ij}) + (1 − λ^{0})

M_{ij}

X

m=1

δ_{ijm}N (x_{j}; F_{ijm}(x_{i}), G_{ijm}(x_{i}))

(4)
where λ^{0} is a fixed outlier probability, µ_{ij} and λ_{ij} are the mean and

covariance of the Gaussian outlier process. F_{ijm}(·) and G_{ijm}(·) are means
and covariance for the m-th Gaussian mixture component with weight σ_{ijm}.

• Image likelihood is based on multi-scale edge and ridge filter responses.

• As the configuration vector has 6 dimensions, conventional use of discrete
samples to approximate x_{i} may not be efficient in applying traditional belief
propagation algorithm.

• Non-parametric belief propagation [Isard CVPR03] [Sudderth et al. CVPR 03] is a generalized particle filter algorithm that operates on continuous valued random variables.

• Non-parametric belief propagation is achieved by treating the particle set as an approximation and replacing the distribution by a product of incoming message set.

m_{ij}(x_{j}) =
Z

ψ_{ij}(x_{i}, x_{j})φ(x_{i}) Y

k∈A_{i},k6=j

m_{kj}(x_{i})dx_{i} (5)

where A_{i} is the set of neighbors of node i and φ(x_{i}) is the local likelihood
associated with node i.

• The message m_{ij}(x_{i}) can be be approximated by importance sampling
from a proposal function f (x_{i}).

• See [Isard CVPR03] [Sudderth et al. CVPR 03] for details on non-parametric belief propagation.

• Extended for body tracking over time [Sigal CVPR04].

• Able to detect body parts automatically.

• Tracking body parts over time.

• L. Sigal, M. Isard, B. Sigelman and M. black. Attractive people: Assembling loose-limbed
*models using non-parametric belief propagation. Advances in Neural Information*

*Processing Systems, , pp. 1539–1546, MIT Press, 2004.*

• L. Sigal, S. Bhatia, S. Roth, M. black, and M. Isard. Tracking loose-limbed people.

*Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.*

421–428, 2004.

**Scaled Covariance Sampling**

• Instead of using isotropic Gaussian distribution for sampling.

• Scaled each Gaussian distribution with its eigenvector.

• Reminiscent of Mohalanobis distance in metric.

• 1. Condensation (dashed circle) randomizes each sample by dynamic noise. 2. MHF (solid circle) samples within covariance support (dashed circle) and applies the same noise model. 3. Covariance scaled sampling (pattern ellipse) focuses on good cost minima (flat filled ellipses) by inflating the highly uncertain region (dashed ellipse).

• Scaled covariance sampling algorithm:

• C. Sminchisescu and B. Triggs. Covariance scaled sampling for monocular 3d body

*tracking. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,*
pp. 447–454, 2001.

**3D Human Tracking with Coordinated Mixture of Factor** **Analyzers**

• Instead of drawing samples directly from high dimensional space, learned the nonlinear manifold structure from motion capture data.

• Learn the bi-directional nonlinear projection function using mixture of factor analyzers within a global coordinate [Teh and Roweis NIPS02].

• Mixture of factor analyzers concurrently carries out dimensionality reduction and clustering.

• Traverse sample trajectory along the nonlinear manifold.

• Able to draw samples using multiple hypothesis filter akin to [Cham and Rehg CVPR99].

• Compared with the results using GPLVM [Urtasun ICCV05], annealed particles [Deutscher et al. CVPR00], simple particle filters.

**Other Particle Filters**

• Joint probabilistic data association (JPDAF) filter.

• Unscented particle filter.

• ...