Part 2: Unsupervised Learning
Machine Learning Techniques for Computer Vision
Microsoft Research Cambridge
Christopher M. Bishop
x
3x
3Overview of Part 2
• Mixture models
• EM
• Variational Inference
• Bayesian model complexity
• Continuous latent variables
The Gaussian Distribution
• Multivariate Gaussian
• Maximum likelihood
mean covariance
Gaussian Mixtures
• Linear super-position of Gaussians
• Normalization and positivity require
Example: Mixture of 3 Gaussians
0 0.5 1
0 0.5 1
(a)
0 0.5 1
0 0.5 1
(b)
Maximum Likelihood for the GMM
• Log likelihood function
• Sum over components appears inside the log
– no closed form ML solution
EM Algorithm – Informal Derivation
EM Algorithm – Informal Derivation
• M step equations
EM Algorithm – Informal Derivation
• E step equation
EM Algorithm – Informal Derivation
• Can interpret the mixing coefficients as prior probabilities
• Corresponding posterior probabilities (responsibilities)
Old Faithful Data Set
Time
between
eruptions
(minutes)
Latent Variable View of EM
• To sample from a Gaussian mixture:
– first pick one of the components with probability – then draw a sample from that component
– repeat these two steps for each new data point
0 0.5 1
0 0.5 1
(a)
Latent Variable View of EM
• Goal: given a data set, find
• Suppose we knew the colours
– maximum likelihood would involve fitting each component to the corresponding cluster
• Problem: the colours are latent (hidden) variables
Incomplete and Complete Data
complete
0 0.5 1
0 0.5 1
(a)
0 0.5 1
0 0.5 1
(b)
incomplete
Latent Variable Viewpoint
Latent Variable Viewpoint
• Binary latent variables describing which component generated each data point
• Conditional distribution of observed variable
• Prior distribution of latent variables
• Marginalizing over the latent variables we obtain
X
Z
Graphical Representation of GMM
p z n z n
x n
x n
S m
N
Latent Variable View of EM
• Suppose we knew the values for the latent variables – maximize the complete-data log likelihood
– trivial closed-form solution: fit each component to the corresponding set of data points
• We don’t know the values of the latent variables
– however, for given parameter values we can compute
the expected values of the latent variables
Posterior Probabilities (colour coded)
0.5 1