Advanced Topics in Learning and Vision
Ming-Hsuan Yang
mhyang@csie.ntu.edu.tw
Overview
• Unsupervised Learning
• Multivariate Gaussian
• EM Algorithm
• Mixture of Gaussians
• Mixture of Factor Analyzers
• Mixture of Probabilistic Component Analyzers
• Isometric Mapping
• Local Linear Embedding
• Global coordination of local representation
Announcements
• Required and supplementary material available on the course web page
• Send your critiques by Oct 18
• Term project: tinkering your ideas as early as possible
Unsupervised Learning
• Goal:
- dimensionality reduction - finding clusters from data
- finding hidden causes or sources of data (i.e, factors, principal components)
- model data density
• Application:
- data compression
- denoising, outlier detection - classification
- efficient computation
- explain human learning and perception - ...
PCA Application: Web Search
• PageRank: Suppose we have a set of four web pages, A, B, C, and D as depicted above. The PageRank (PR) of A is
P R(A) = P R(B)2 + P R(C)1 + P R(D)3
P R(A) = P R(B)L(B) + P R(C)L(C) + P R(D)L(D) (1)
• Random surfer: Markov process
P R(pi) = q
N + (1 − q) X
pj∈N E(pi)
P R(pj)
L(pj) (2)
• The PR values are the entries of the dominant eigenvector of the modified adjacency matrix. The dominant eigenvector is
P R(p1) P R(p2)
...
P R(pN)
(3)
of
R =
q/N q/N...
q/N
+ (1 − q)
l(p1, p1) l(p1, p2) . . . l(p1, pN) l(p2, p1) . . .
...
l(pN, p1) l(pN, pN)
R (4)
where l(pi, pj) is an adjacency function.
• Related to random walk, Markov process and spectral clustering
• L. Page and S. Brim Pagerank, “An eigenvector based ranking approach for hypertext,” In 21st Annual ACM/SIGIR International Conference on
Research and Development in Information Retrieval, 1998.
PCA Application: Account for Illumination Change [Belhumeur and Kriegman 97]
• What is the set of images of an object under all possible illumination conditions?
• Illumination cone lies near a low dimensional linear PCA subspace of the image space
• Can be used for object recognition
PCA Application: Appearance Compression and Synthesis [Nishino et al. 99]
• Given a 3D model (can be obtained by various vision algorithms or range sensors), how to capture the variation of object appearance under different viewing and illumination conditions?
• Take a sequence of the same image patch under different viewing conditions
• Under different view angles (left: input images, right: synthesized images)
• Under different lighting condition
Review
p(x, y) = p(x)p(y|x)
= p(y)p(x|y) p(y|x) = p(x|y)p(y)p(x)
(5)
• The joint probability of x and y is p(x, y)
• The marginal probability of x is p(x) = P
y p(x, y)
• The conditional probability of x given y is: p(x|y)
Bayesian Learning
• M are the models (or model parameters): unknown
• D is the data: known
p(M|D) = p(D|M)p(M)
p(D) (6)
• p(D|M) is the likelihood.
• p(M) is the prior probability of M
• p(M|D) is the posterior probability of M.
• p(D) = R p(D|M)p(M) is the marginal likelihood or evidence.
• Given D, want to M
- Maximum likelihood (ML): that gives highest likelihood, p(D|M)
- Maximum a posterior (MAP): that gives highest posterior probability, p(M|D)
Multivariate Gaussian
p(x|µ, Σ) = |2π|−N2 |Σ|−12 exp{−1
2(x − µ)TΣ−1(x − µ)} (7) where µ is the mean and Σ is the covariance matrix.
• Given a data set X = {x1, . . . , xN}, the likelihood is p(data|model) = QN
i=1 p(xi|µ, Σ)
• Goal: find µ and Σ that maximize log likelihood:
L = log
N
Y
i=1
p(xi|µ, Σ) = −N
2 log |2πΣ| − 1 2
N
X
i=1
(xi − µ)TΣ−1(xi − µ) (8)
• Maximum likelihood estimate:
∂L
∂µ = 0 ⇒ µ =ˆ N1 P
i xi (sample mean)
∂L
∂Σ = 0 ⇒ Σ =ˆ N1 P
i(xi − ˆµ)(xi − ˆµ)T (sample covariance) (9)
Limitations of Gaussian, FA and PCA
• Linear methods: easy to understand and use in practice.
• Efficient way to find structure in high dimensional data, e.g., as a preprocessing step
• All based on Gaussian assumption: only the mean and variance of data are taken into account
• Based on second order statistics