Advanced Topics in Learning and Vision

(1)

Advanced Topics in Learning and Vision

Ming-Hsuan Yang

mhyang@csie.ntu.edu.tw

(2)

Announcements

• Term project presentation: Dec 28 and Dec 29.

• All critiques due on Jan 9, 2006 (midnight, Taipei local time). No overdue critiques will be accepted.

• Final term project report: Due on Jan 16, 2006 (midnight, Taipei local time).

No overdue term reports will be accepted.

• Supplementary reading:

- M. Black and A. Jepson. Eigentracking: Robust matching and tracking of articulated objects using view-based representation. International Journal of Computer vision, vol.

26, no. 1, pp. 63–84, 1998.

- A. Jepson, D. Fleet, and T. El-Maraghi. Robust Online Appearance Models for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no.

10, pp. 1296–1311, 2003.

(3)

Overview

• Particle filters

• 3D human tracking

• On-line visual tracking

• Mean shift algorithm and applications

(4)

Problems with Particle Filter

• Sampling in high dimensional space is difficult and inefficient.

• Need a lot of particles.

• Curse of dimensionality: usually does not scale well (e.g., up to 10 dimensional space).

(5)

Multiple Hypothesis Particle Filter

• Multiple hypothesis filter (MHF) is a classical approach to representing multi-modal distributions with Kalman filters.

• In mode-based multiple hypothesis filter [Cham and Rehg CVPR99] , each mode is modeled by a truncated Gaussian (as opposed to a set of discrete samples used in the condensation algorithm).

• Sample weights: p(x) = k max_i=1,...,N{p_i exp(−¹₂(x − m_i)^TS⁻¹_i (x − m_i)}

where p_i is the Gaussian mixture weight and m_i is the i-th Gaussian center with covariance S_i.

(6)

• Use 2D scaled prismatic models with 19 degree of freedom.

• Observation model:

p(z_t|x_t) ∝ Y

u

exp(−(I(u) − T (u, x_t))²

2σ² ) (1)

where u represent image pixel coordinates, I(u) are the image pixel values at u, T (u, x_t) are the overlapping template pixel values at u with state

vector x_t.

(7)

• T.-J. Cham and J. Rehg. A multiple hypothesis approach to figure tracking. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 239–245, 1999.

(8)

Annealed Particle Filter

• Incorporate the concept of simulated annealing with particle filter [Deutscher et al. CVPR00]

• Based on Markov chain method of simulated annealing in the optimization context [Kirkpatrick et al. 01].

• Handling multiple modes.

• Multi-layer search, akin to coarse to fine search.

(9)

• 3D human model is based on kinematic chain with 29 degree of freedom.

• Each limb is represented by conic sections with elliptical cross surfaces.

• Use edge feature and background subtraction with calibrated cameras.

e

X(x, z) = 1 N

N

X

i=1

(1 − p^e_i(x, z))² (2)

where x is the model state vector and z is the image from which the pixel map is derived. p_i(x, z) are the values of the edge pixel map at the N

sampling points taken along the model’s silhouette.

(10)

• Likewise,

r

X(x, z) = 1 N

N

X

i=1

(1 − p^r_i(x, z))² (3)

where p_i(x, z) are the values of the foreground pixel map at the N sampling points taken from the interior of the conical sections.

• Sample weight:

w(x, z) = exp⁻⁽^P^e^(x,z)+^P^r^(x,z))

where x is the model configuration vector and z is the image observation (edge e and region r).

• Can incorporate observations from multiple cameras.

(11)

• J. Deutscher, A. Blake, and I. Reid. Articulated body motion capture by annealed particle filtering. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2126–2133, 2000.

(12)

Tracking Loose Limbed People

• A body is represented by a graphical model [Sigal et al. NIPS03]:

- node: body part (torso, upper leg, etc)

- edge: spatial and angular constraints between two adjacent body parts

• Each body part is modeled by 5 fixed parameters (lengths and widths at proximal and distal ends, and offsets) and 6 estimated parameters (3 global positions and 3 angular orientations).

(13)

• Each directed edge between parts i and j is modeled by a conditional distribution ψ_ij(x_i, x_j).

• Conditional distributions capture physical constraints and can be learned from motion capture data or constructed by hands.

ψ_ij(x_i, x_j) = λ⁰N (x_j; µ_ij, λ_ij) + (1 − λ⁰)

M_ij

X

m=1

δ_ijmN (x_j; F_ijm(x_i), G_ijm(x_i))

(4) where λ⁰ is a fixed outlier probability, µ_ij and λ_ij are the mean and

covariance of the Gaussian outlier process. F_ijm(·) and G_ijm(·) are means and covariance for the m-th Gaussian mixture component with weight σ_ijm.

(14)

• Image likelihood is based on multi-scale edge and ridge filter responses.

• As the configuration vector has 6 dimensions, conventional use of discrete samples to approximate x_i may not be efficient in applying traditional belief propagation algorithm.

• Non-parametric belief propagation [Isard CVPR03] [Sudderth et al. CVPR 03] is a generalized particle filter algorithm that operates on continuous valued random variables.

• Non-parametric belief propagation is achieved by treating the particle set as an approximation and replacing the distribution by a product of incoming message set.

m_ij(x_j) = Z

ψ_ij(x_i, x_j)φ(x_i) Y

k∈A_i,k6=j

m_kj(x_i)dx_i (5)

where A_i is the set of neighbors of node i and φ(x_i) is the local likelihood associated with node i.

(15)

• The message m_ij(x_i) can be be approximated by importance sampling from a proposal function f (x_i).

• See [Isard CVPR03] [Sudderth et al. CVPR 03] for details on non-parametric belief propagation.

(16)

• Extended for body tracking over time [Sigal CVPR04].

• Able to detect body parts automatically.

(17)

• Tracking body parts over time.

• L. Sigal, M. Isard, B. Sigelman and M. black. Attractive people: Assembling loose-limbed models using non-parametric belief propagation. Advances in Neural Information

Processing Systems, , pp. 1539–1546, MIT Press, 2004.

• L. Sigal, S. Bhatia, S. Roth, M. black, and M. Isard. Tracking loose-limbed people.

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp.

421–428, 2004.

(18)

Scaled Covariance Sampling

• Instead of using isotropic Gaussian distribution for sampling.

• Scaled each Gaussian distribution with its eigenvector.

• Reminiscent of Mohalanobis distance in metric.

• 1. Condensation (dashed circle) randomizes each sample by dynamic noise. 2. MHF (solid circle) samples within covariance support (dashed circle) and applies the same noise model. 3. Covariance scaled sampling (pattern ellipse) focuses on good cost minima (flat filled ellipses) by inflating the highly uncertain region (dashed ellipse).

(19)

• Scaled covariance sampling algorithm:

• C. Sminchisescu and B. Triggs. Covariance scaled sampling for monocular 3d body

tracking. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–454, 2001.

(20)

(21)

3D Human Tracking with Coordinated Mixture of Factor Analyzers

• Instead of drawing samples directly from high dimensional space, learned the nonlinear manifold structure from motion capture data.

• Learn the bi-directional nonlinear projection function using mixture of factor analyzers within a global coordinate [Teh and Roweis NIPS02].

• Mixture of factor analyzers concurrently carries out dimensionality reduction and clustering.

(22)

• Traverse sample trajectory along the nonlinear manifold.

• Able to draw samples using multiple hypothesis filter akin to [Cham and Rehg CVPR99].

• Compared with the results using GPLVM [Urtasun ICCV05], annealed particles [Deutscher et al. CVPR00], simple particle filters.

(23)

(24)

Other Particle Filters

• Joint probabilistic data association (JPDAF) filter.

• Unscented particle filter.

• ...