Advanced Topics in Learning and Vision

(1)

Advanced Topics in Learning and Vision

Ming-Hsuan Yang

mhyang@csie.ntu.edu.tw

(2)

Overview

• Unsupervised Learning

• Multivariate Gaussian

• EM Algorithm

• Mixture of Gaussians

• Mixture of Factor Analyzers

• Mixture of Probabilistic Component Analyzers

• Isometric Mapping

• Local Linear Embedding

• Global coordination of local representation

(3)

Announcements

• Required and supplementary material available on the course web page

• Send your critiques by Oct 18

• Term project: tinkering your ideas as early as possible

(4)

Unsupervised Learning

• Goal:

- dimensionality reduction - finding clusters from data

- finding hidden causes or sources of data (i.e, factors, principal components)

- model data density

• Application:

- data compression

- denoising, outlier detection - classification

- efficient computation

- explain human learning and perception - ...

(5)

PCA Application: Web Search

• PageRank: Suppose we have a set of four web pages, A, B, C, and D as depicted above. The PageRank (PR) of A is

P R(A) = ^{P R(B)}₂ + ^{P R(C)}₁ + ^{P R(D)}₃

P R(A) = ^{P R(B)}_L(B) + ^{P R(C)}_L(C) + ^{P R(D)}_L(D) (1)

• Random surfer: Markov process

P R(p_i) = q

N + (1 − q) X

p_j∈N E(p_i)

P R(p_j)

L(p_j) (2)

(6)

• The PR values are the entries of the dominant eigenvector of the modified adjacency matrix. The dominant eigenvector is







P R(p₁) P R(p₂)

...

P R(p_N)







(3)

of

R =







q/N q/N...

q/N







+ (1 − q)







l(p₁, p₁) l(p₁, p₂) . . . l(p₁, p_N) l(p₂, p₁) . . .

...

l(p_N, p₁) l(p_N, p_N)







R (4)

where l(p_i, p_j) is an adjacency function.

• Related to random walk, Markov process and spectral clustering

(7)

• L. Page and S. Brim Pagerank, “An eigenvector based ranking approach for hypertext,” In 21st Annual ACM/SIGIR International Conference on

Research and Development in Information Retrieval, 1998.

(8)

PCA Application: Account for Illumination Change [Belhumeur and Kriegman 97]

• What is the set of images of an object under all possible illumination conditions?

• Illumination cone lies near a low dimensional linear PCA subspace of the image space

• Can be used for object recognition

(9)

(10)

PCA Application: Appearance Compression and Synthesis [Nishino et al. 99]

• Given a 3D model (can be obtained by various vision algorithms or range sensors), how to capture the variation of object appearance under different viewing and illumination conditions?

• Take a sequence of the same image patch under different viewing conditions

• Under different view angles (left: input images, right: synthesized images)

(11)

• Under different lighting condition

(12)

Review

p(x, y) = p(x)p(y|x)

= p(y)p(x|y) p(y|x) = ^p(x|y)p(y)_p(x)

(5)

• The joint probability of x and y is p(x, y)

• The marginal probability of x is p(x) = P

y p(x, y)

• The conditional probability of x given y is: p(x|y)

(13)

Bayesian Learning

• M are the models (or model parameters): unknown

• D is the data: known

p(M|D) = p(D|M)p(M)

p(D) (6)

• p(D|M) is the likelihood.

• p(M) is the prior probability of M

• p(M|D) is the posterior probability of M.

• p(D) = R p(D|M)p(M) is the marginal likelihood or evidence.

• Given D, want to M

- Maximum likelihood (ML): that gives highest likelihood, p(D|M)

- Maximum a posterior (MAP): that gives highest posterior probability, p(M|D)

(14)

Multivariate Gaussian

p(x|µ, Σ) = |2π|⁻^N² |Σ|⁻¹² exp{−1

2(x − µ)^TΣ⁻¹(x − µ)} (7) where µ is the mean and Σ is the covariance matrix.

• Given a data set X = {x₁, . . . , x_N}, the likelihood is p(data|model) = QN

i=1 p(x_i|µ, Σ)

• Goal: find µ and Σ that maximize log likelihood:

L = log

N

Y

i=1

p(x_i|µ, Σ) = −N

2 log |2πΣ| − 1 2

N

X

i=1

(x_i − µ)^TΣ⁻¹(x_i − µ) (8)

• Maximum likelihood estimate:

∂L

∂µ = 0 ⇒ µ =ˆ _N¹ P

i x_i (sample mean)

∂L

∂Σ = 0 ⇒ Σ =ˆ _N¹ P

i(x_i − ˆµ)(x_i − ˆµ)^T (sample covariance) (9)

(15)

Limitations of Gaussian, FA and PCA

• Linear methods: easy to understand and use in practice.

• Efficient way to find structure in high dimensional data, e.g., as a preprocessing step

• All based on Gaussian assumption: only the mean and variance of data are taken into account

• Based on second order statistics