• 沒有找到結果。

Advanced Topics in Learning and Vision

N/A
N/A
Protected

Academic year: 2022

Share "Advanced Topics in Learning and Vision"

Copied!
15
0
0

加載中.... (立即查看全文)

全文

(1)

Advanced Topics in Learning and Vision

Ming-Hsuan Yang

mhyang@csie.ntu.edu.tw

(2)

Overview

• Unsupervised Learning

• Multivariate Gaussian

• EM Algorithm

• Mixture of Gaussians

• Mixture of Factor Analyzers

• Mixture of Probabilistic Component Analyzers

• Isometric Mapping

• Local Linear Embedding

• Global coordination of local representation

(3)

Announcements

• Required and supplementary material available on the course web page

• Send your critiques by Oct 18

• Term project: tinkering your ideas as early as possible

(4)

Unsupervised Learning

• Goal:

- dimensionality reduction - finding clusters from data

- finding hidden causes or sources of data (i.e, factors, principal components)

- model data density

• Application:

- data compression

- denoising, outlier detection - classification

- efficient computation

- explain human learning and perception - ...

(5)

PCA Application: Web Search

• PageRank: Suppose we have a set of four web pages, A, B, C, and D as depicted above. The PageRank (PR) of A is

P R(A) = P R(B)2 + P R(C)1 + P R(D)3

P R(A) = P R(B)L(B) + P R(C)L(C) + P R(D)L(D) (1)

• Random surfer: Markov process

P R(pi) = q

N + (1 − q) X

pj∈N E(pi)

P R(pj)

L(pj) (2)

(6)

• The PR values are the entries of the dominant eigenvector of the modified adjacency matrix. The dominant eigenvector is

P R(p1) P R(p2)

...

P R(pN)

(3)

of

R =

q/N q/N...

q/N

+ (1 − q)

l(p1, p1) l(p1, p2) . . . l(p1, pN) l(p2, p1) . . .

...

l(pN, p1) l(pN, pN)

R (4)

where l(pi, pj) is an adjacency function.

• Related to random walk, Markov process and spectral clustering

(7)

• L. Page and S. Brim Pagerank, “An eigenvector based ranking approach for hypertext,” In 21st Annual ACM/SIGIR International Conference on

Research and Development in Information Retrieval, 1998.

(8)

PCA Application: Account for Illumination Change [Belhumeur and Kriegman 97]

• What is the set of images of an object under all possible illumination conditions?

• Illumination cone lies near a low dimensional linear PCA subspace of the image space

• Can be used for object recognition

(9)
(10)

PCA Application: Appearance Compression and Synthesis [Nishino et al. 99]

• Given a 3D model (can be obtained by various vision algorithms or range sensors), how to capture the variation of object appearance under different viewing and illumination conditions?

• Take a sequence of the same image patch under different viewing conditions

• Under different view angles (left: input images, right: synthesized images)

(11)

• Under different lighting condition

(12)

Review

p(x, y) = p(x)p(y|x)

= p(y)p(x|y) p(y|x) = p(x|y)p(y)p(x)

(5)

• The joint probability of x and y is p(x, y)

• The marginal probability of x is p(x) = P

y p(x, y)

• The conditional probability of x given y is: p(x|y)

(13)

Bayesian Learning

• M are the models (or model parameters): unknown

• D is the data: known

p(M|D) = p(D|M)p(M)

p(D) (6)

• p(D|M) is the likelihood.

• p(M) is the prior probability of M

• p(M|D) is the posterior probability of M.

• p(D) = R p(D|M)p(M) is the marginal likelihood or evidence.

• Given D, want to M

- Maximum likelihood (ML): that gives highest likelihood, p(D|M)

- Maximum a posterior (MAP): that gives highest posterior probability, p(M|D)

(14)

Multivariate Gaussian

p(x|µ, Σ) = |2π|N2 |Σ|12 exp{−1

2(x − µ)TΣ−1(x − µ)} (7) where µ is the mean and Σ is the covariance matrix.

• Given a data set X = {x1, . . . , xN}, the likelihood is p(data|model) = QN

i=1 p(xi|µ, Σ)

• Goal: find µ and Σ that maximize log likelihood:

L = log

N

Y

i=1

p(xi|µ, Σ) = −N

2 log |2πΣ| − 1 2

N

X

i=1

(xi − µ)TΣ−1(xi − µ) (8)

• Maximum likelihood estimate:

∂L

∂µ = 0 ⇒ µ =ˆ N1 P

i xi (sample mean)

∂L

∂Σ = 0 ⇒ Σ =ˆ N1 P

i(xi − ˆµ)(xi − ˆµ)T (sample covariance) (9)

(15)

Limitations of Gaussian, FA and PCA

• Linear methods: easy to understand and use in practice.

Efficient way to find structure in high dimensional data, e.g., as a preprocessing step

• All based on Gaussian assumption: only the mean and variance of data are taken into account

• Based on second order statistics

參考文獻

相關文件

The hashCode method for a given class can be used to test for object equality and object inequality for that class. The hashCode method is used by the java.util.SortedSet

Topics to be published in the Journal may range from the transmission of theory and practice from other countries and the impact of modernization on teaching practices within

QF Level 3 Students will be learning more advanced listening and speaking skills to handle more specific scenarios in a wider range of business contexts with the use of

• CEPC design has to maintain the possibility for SPPC, but there is no need now to firmly prove the feasibility of SPPC,.. scientifically

Programming languages can be used to create programs that control the behavior of a. machine and/or to express algorithms precisely.” -

Advantage of integrated learning across topics: Students can link up the knowledge, skills and attitudes across different topics. Students’ learning effectiveness can

• To achieve small expected risk, that is good generalization performance ⇒ both the empirical risk and the ratio between VC dimension and the number of data points have to be small..

• Learn the mapping between input data and the corresponding points the low dimensional manifold using mixture of factor analyzers. • Learn a dynamical model based on the points on