TOO MANY CYCLES!

BRENDAN FREY

Intractable local computations

Even if the graph is a tree, the local functions (conditional probabilities, potentials) may not

yield tractable sum-product computations

• Eg, non-Gaussian pdfs

Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(u_t|u_t-1) u_t = control points of spline

(contour)

Observation: P(o_t|u_t) o_t = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

NONLINEAR

LINEAR (GAUSSIAN)

BRENDAN FREY

Approximate inference

• Monte Carlo

• Markov chain Monte Carlo

• Variational techniques

• Local probability propagation

• Alternating maximizations

Monte Carlo inference

• u = unobserved vars; o = observed vars

• Obtain random sample u⁽¹⁾ , u⁽²⁾, …, u^(R) and use it to

– Represent P(u|o)

– Estimate an expectation,

E[f] =

Σ

uf(u)P(u|o)

Eg, P(u_i=1|o) =

Σ

_u^I(u_i^=1)P(u|o)

I(expr) = 1 if expr is true I(expr) = 0 otherwise

BRENDAN FREY

Expectations from a sample

• From the sample u⁽¹⁾, u⁽²⁾, …, u^(R), we can estimate

E[f] ≅ (1/R)

Σ

_{r f(u}^(r)⁾

• If u⁽¹⁾, u⁽²⁾, …, u^(R) are independent draws from P(u|o), this estimate

– is unbiased

– has variance

∝

^1/R

Rejection sampling

• Otherwise reject and draw again

P*(u)

B(u)

• Goal: Hold o constant, draw u from P(u|o) Given

• P*(u)

∝

^P(u,o)

can eval P*(u)

• Randomly accept u with prob P*(u)/B(u)

• Draw u from normalized form of B(u)

• B(u) ≥ P*(u) can eval B(u)

can “sample” B(u)

Efficiency measured by rejection rate

BRENDAN FREY

y^t+1

Rejection sampling in the shifter net

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose B(d,m,z) = 1 ≥ P*(d,m,z)

y^t

z m

• Draw d,m,z from uniform distribution

Reject !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,y^t,y^t+1)

Rejection sampling in the shifter net

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose B(d,m,z) = 1 ≥ P*(d,m,z)

• Draw d,m,z from uniform distribution

y^t+1 d

y^t

z m

Accept !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,y^t,y^t+1)

BRENDAN FREY

Importance sampling

• Goal: Holding o fixed, represent P(u|o) by a weighted sample

• Find P*(u) ∝ P(u,o) and Q*(u), such that can evaluate P*(u)/Q*(u) and can “sample” Q*(u)

• Sample u⁽¹⁾, u⁽²⁾, …, u^(R), from Q*(u)

• Compute weights w^(r) = P*(u^(r))/Q*(u^(r))

• Represent P(u|o) by

{

^u^(r)^{, w}^(r)^/(^Σ_j^w^(j)⁾

}

• Eg E[f] ≅

Σ

r ^f(u^(r)^)w^(r)^/(Σ_j^w^(j)⁾

Accuracy given by “effective sample size”

Importance sampling in the shifter net

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose Q*(d,m,z) = 1

• Draw (d,m,z)^(r) from uniform distribution

y^t

z m

• Weight (d,m,z)^(r) by P(d^(r),m^(r),z^(r),y^t,y^t+1)

Low weight

BRENDAN FREY

Importance sampling in the shifter net

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose Q*(d,m,z) = 1

y^t+1 d

y^t

z m

• Draw (d,m,z)^(r) from uniform distribution

• Weight (d,m,z)^(r) by P(d^(r),m^(r),z^(r),y^t,y^t+1)

High weight

A better Q-distribution

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose Q*(d,m,z) = P(d,m,z)

• Draw (d,m,z)^(r) from P(d,m,z)

• Weight (d,m,z)^(r) by P(y^t,y^t+1|d^(r),m^(r),z^(r)) Called

“Likelihood weighting”

BRENDAN FREY

Particle filtering (condensation)

(Isard, Blake, et al, et al, et al,…)

• Goal: Use sample S = ut⁽¹⁾^…ut^(R) ^from

P(ut^|o¹^{,…, o}t-1) to sample from P(ut^|o¹^{,…, o}t⁾

• Weight each “particle” ut^(r) in S by P(ot^|ut^(r)⁾

• Redraw a sample S’ from the weighted sample

• For each particle z_t in S’ draw z_t+1 from P(z_t+1|z_t) Exact, for infinite-size samples

For finite-size samples, it may lose modes

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

P(ut|o¹,…,o

t

⁾

∝P(ot|u

t

^)P(ut|o¹^,…,o

t-1

⁾

Q*(ut) P*(ut)

Condensation for active contours

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(u_t|u_t-1) u_t = control points of spline

(contour)

Observation: P(o_t|u_t) o_t = for all measurement

lines: # edges, distance of edges from contour

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

NONLINEAR

LINEAR (GAUSSIAN)

• Sampling from P(u t ^|o

^{,…, o} t ⁾

performs tracking

BRENDAN FREY

3D Body Tracking Model

(Sidenbladh, Black, Fleet, ECCV 2000)

•3D articulated model

•Perspective projection

•Monocular sequence

•Unknown environment

•Motion-based likelihood Goal:

State:

–joint angles and body pose

–joint/pose velocities

–appearance model Vt

φt

Observations:

–image at timet

Dynamic Bayes net

A_t₋1

φt

V_t₋1

I_t₋1

−1

φt

… …

Shows estimated mean of posterior distribution (with significant changes in view and depth)

3D body tracking using particle filtering

(Sidenbladh, Black, Fleet, ECCV 2000)

BRENDAN FREY

Markov chain Monte Carlo (MCMC)

• MCMC simulates a separate Markov chain on u so that the stationary

distribution of the chain is P(u|o)

• MCMC is NOT just Monte Carlo in a Markov chain (eg, condensation is not MCMC, but is an importance-type

sampler for a Markov model)

Achieving a stationary distribution

• Goal: Sample from P(u|o), holding o fixed

• Start with any configuration u

• Repeatedly draw u’ from a transition distribution T(u’|u) – this converges to the stationary

distribution Q(u) defined by

Q(u’) =

Σ

_u Q(u)T(u’|u)

• Q(u) is stationary if detailed balance holds:

Q(u)T(u’|u) = Q(u’)T(u|u’)

• So, pick T(u’|u) so that

P(u|o) T(u’|u) = T(u|u’) P(u’|o)P(o)

P(o)P(u,o) P(u’,o)

BRENDAN FREY

Choosing the sample

• Approach 1:

– Simulate 1 chain for a long time and store configurations periodically

• Approach 2:

– Simulate several chains for moderate

amounts of time and collect configurations

Gibbs sampling

• Goal: Sample from P(u|o), holding o fixed

• Require it to be easy to sample from P(u_i|all other vars)

– easy in discrete graphical models

– for real-valued models, may need local MC or MCMC

• Start with any configuration u

• Visit variables in u in order or at random

• Draw u_i from P(u_i|all other vars)

Stationary distribution = P(u|o) (det balance)

BRENDAN FREY

z₂ z₃ z₄ z₅ z₆ z₁

Time t Time t+1

z₀

Gibbs sampling in the shifter net

Time t Time t+1

Gibbs sampling in the shifter net

Randomly initialize unobserved variables

BRENDAN FREY

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others) ∝

P(var|parents) ^x

Π_children P(child|parents) var is in this set

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others) ∝

P(var|parents) ^x

Π_children P(child|parents) var is in this set

BRENDAN FREY

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others) ∝

P(var|parents) ^x

Π_children P(child|parents) var is in this set

01 1 1

1 1

1 1 1 10

The Metropolis algorithm

• MCMC version of importance sampling

• As before, find P*(u) ∝ P(u,o)

• Find a proposal distribution, Q*(u’|u), such that can evaluate [P*(u’)Q*(u|u’)]/[P*(u)Q*(u’|u)] and can “sample” from Q*(u’|u)

MCMC step:

• Based on old u, sample new u’ from Q*(u’|u)

• Compute a = [P*(u’)Q*(u|u’)]

/

[P*(u)Q*(u’|u)]

• Accept u’ with probability min(a,1)

BRENDAN FREY

Problem: Highly flexible contours are difficult to track, because prob of picking the right u_t from P(u_t|u_t-1) is very small Fix: For each new

observation, apply

Metropolis to the control points to “tweak” them back on track

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

Metropolis for active contours

(Kristjansson and Frey, submitted to NIPS 2000)

Variational methods

• Suppose P(u|o) is intractable

• Idea: Approximate P(u|o) with a distribution Q that is tractable

• Use Q to compute expectations, etc

• Parameterize Q: Q(u|Φ)

• Choose “distance” D(Q,P)

• Minimize D(Q,P) wrt Φ

BRENDAN FREY

Choosing the “distance”

D =

Σ

_u ^Q(u|^Φ^)log[Q(u|^Φ^)/P(u|o)]

P For unimodal Q, minimizing D favors this Q

D =

Σ

_u P(u|o)log[P(u|o)/Q(u|Φ)]

Q P For unimodal Q, minimizing D favors this Q

Making the distance tractable

Usually, the distance can’t be computed directly

D =

Σ

_u ^Q(u|^Φ^)log[Q(u|^Φ^)/P(u|o)]

• This distance uses P(u|o)

• Instead, use F = D – log P(o):

F =

Σ

_u ^Q(u|^Φ^)log[Q(u|^Φ^)/P(u,o)]

• P(u,o) factorizes for a Bayes net!

BRENDAN FREY

What’s needed to minimize F

F =

Σ

_u ^Q(u|Φ) log[Q(u|Φ)/P(u,o)]

Σ

_u ^Q(u|^Φ⁾ ^log[Q(u|^Φ^)]

Σ

_i ^Q(u|^Φ⁾ ^log[P(x_i|parents of x_i)]

• Entropy of Q

• Solution of local log-P equations

• We want

Σ

_u to break into smaller sums

Types of Q-distribution

• Factorized: Q(u|Φ) =

Π

_i^Q(u_i ^|^Φ_i⁾

aka mean field

• Q can be a graphical model that is a substructure of P

• Q can be a graphical model with a different structure from P

BRENDAN FREY

Variational inference in mixed-state dynamic net

(Pavlovic, Frey and Huang, CVPR 1999)

x₀ x₁ x₂ s₀ s₁ s₂

y₀ y₁ y₂

∏

−

= −

−

= −

−

= −

0 1

1 0

1 1

1 0

)

| (

) ,

| ( )

| (

)

| ( )

(

)

| ( )

| (

)

| (

t t T

t t t T

t t T

t t

LDS HMM

x y P

u x x P s

x P

s q P s

s P s

Q Q

U Y, X Q

Q U, Y, S X,

Variational parameters

P(s_t|Y) Input trace + LDS state

vectors colored by most probable HMM state

Colors for HMM states x₀ x₁ x₂

s₀ s₁ s₂

y₀ y₁ y₂

Variational inference in mixed-state dynamic net

(Pavlovic, Frey and Huang, CVPR 1999)

t, time step

BRENDAN FREY

More examples

• An introduction to variational methods

(Jordan et al 1999)

• Variational methods based on convexity analysis (Jaakkola 1997)

• Derivation of a simple variational technique for nonlinear Gaussian Bayes nets

(Frey and Hinton 1999)

Sum-product algorithm in graphs with cycles

• To heck with the cycles - just apply the algorithm! (Even though it’s not exact…)

• Due to cycles, algorithm iterates (passes messages around loops) until we stop it

• Impressive: Error-correcting decoding

– 0.15 dB from Shannon limit – now in 2 telecom standards – Lucent is producing a chip

BRENDAN FREY

Iterative sum-product WORKS in

network for layered appearance model

(Frey, CVPR 2000)

Index of object in layer l

Far away

Intensity of ray n at camera Intensity of ray n at layer l

Camera

Sum-product in layered appearance model

Layer 4 Layer 3 Layer 2 Layer 1 Iter 1

Iter 2

…

Obj 1 Obj 2 Obj 3 Obj 4

Input Image

Input Image Layer 1 Layer L

BRENDAN FREY

Sum-product in layered appearance model

Layer 4 Layer 3 Layer 2 Layer 1 Iter 1

Iter 2

…

Obj 1 Obj 2 Obj 3 Obj 4

Input Image

Input Image Layer 1 Layer L

Markov network for image and scene patch modeling

(from Freeman and Pasztor, ICCV 1999)

image patches

scene patches

BRENDAN FREY

Original 70x70

Markov network result, 280x280

Cubic Spline 280x280

True 280x280

Super-resolution by iterative probability propagation

(from Freeman and Pasztor, ICCV 1999)

Alternating maximizations

• Pick an assignment for all variables

• Repeatedly select a tractable

substructure (eg, a tree) and find the most probable configuration in the substructure

• This is guaranteed to find a sequence of global configurations that are

increasingly probable

BRENDAN FREY

Tracking self-occluding objects in disparity maps

(Jojic, Turk and Huang, ICCV 1999)

PART IV

LEARNING THE

在文檔中 Vision by Inference and Learning in Graphical Models (頁 40-62)

TOO MANY CYCLES!

TOO MANY CYCLES!

Intractable local computations

Even if the graph is a tree, the local functions (conditional probabilities, potentials) may not

yield tractable sum-product computations

• Eg, non-Gaussian pdfs

Active contour model

Approximate inference

Monte Carlo inference

Σ

Σ

Expectations from a sample

Σ

∝

Rejection sampling

∝

Rejection sampling in the shifter net

Reject !

Rejection sampling in the shifter net

Accept !

Importance sampling

{

}

Σ

Importance sampling in the shifter net

Low weight

Importance sampling in the shifter net

High weight

A better Q-distribution

Particle filtering (condensation)

t

t

t-1

Condensation for active contours

• Sampling from P(u t |o

,…, o t )

performs tracking

… …

Markov chain Monte Carlo (MCMC)

Achieving a stationary distribution

Σ

Choosing the sample

Gibbs sampling

Gibbs sampling in the shifter net

Gibbs sampling in the shifter net

Gibbs sampling in the shifter net

Gibbs sampling in the shifter net

Gibbs sampling in the shifter net

The Metropolis algorithm

/

Metropolis for active contours

Variational methods

Choosing the “distance”

Σ

Σ

Making the distance tractable

Σ

Σ

What’s needed to minimize F

Σ

Σ

Σ

Σ

Σ

Types of Q-distribution

Π

∏

∏

∏

∏

More examples

Sum-product algorithm in graphs with cycles

Iterative sum-product WORKS in

network for layered appearance model

Markov network for image and scene patch modeling

Alternating maximizations

PART IV

LEARNING THE

• Sampling from P(u t ^|o

^{,…, o} t ⁾