• 沒有找到結果。

TOO MANY CYCLES!

BRENDAN FREY

Intractable local computations

Even if the graph is a tree, the local functions (conditional probabilities, potentials) may not

yield tractable sum-product computations

• Eg, non-Gaussian pdfs

Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

BRENDAN FREY

Approximate inference

• Monte Carlo

• Markov chain Monte Carlo

• Variational techniques

• Local probability propagation

• Alternating maximizations

Monte Carlo inference

• u = unobserved vars; o = observed vars

• Obtain random sample u(1) , u(2), …, u(R) and use it to

– Represent P(u|o)

– Estimate an expectation,

E[f] =

Σ

uf(u)P(u|o)

Eg, P(ui=1|o) =

Σ

uI(ui=1)P(u|o)

I(expr) = 1 if expr is true I(expr) = 0 otherwise

BRENDAN FREY

Expectations from a sample

• From the sample u(1), u(2), …, u(R), we can estimate

E[f] ≅ (1/R)

Σ

r f(u(r))

• If u(1), u(2), …, u(R) are independent draws from P(u|o), this estimate

– is unbiased

– has variance

1/R

Rejection sampling

• Otherwise reject and draw again

u

P*(u)

B(u)

• Goal: Hold o constant, draw u from P(u|o) Given

• P*(u)

P(u,o)

can eval P*(u)

• Randomly accept u with prob P*(u)/B(u)

• Draw u from normalized form of B(u)

• B(u) ≥ P*(u) can eval B(u)

can “sample” B(u)

Efficiency measured by rejection rate

BRENDAN FREY

yt+1

Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

d

yt

z m

1

0

• Draw d,m,z from uniform distribution

Reject !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

• Draw d,m,z from uniform distribution

yt+1 d

yt

z m

0

1

Accept !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

BRENDAN FREY

Importance sampling

• Goal: Holding o fixed, represent P(u|o) by a weighted sample

• Find P*(u) P(u,o) and Q*(u), such that can evaluate P*(u)/Q*(u) and can “sample” Q*(u)

• Sample u(1), u(2), …, u(R), from Q*(u)

• Compute weights w(r) = P*(u(r))/Q*(u(r))

• Represent P(u|o) by

{

u(r), w(r)/(Σjw(j))

}

• Eg E[f]

Σ

r f(u(r))w(r)/(Σjw(j))

Accuracy given by “effective sample size”

Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

• Draw (d,m,z)(r) from uniform distribution

d

yt

z m

1

0

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

Low weight

BRENDAN FREY

Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

yt+1 d

yt

z m

0

1

• Draw (d,m,z)(r) from uniform distribution

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

High weight

A better Q-distribution

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = P(d,m,z)

• Draw (d,m,z)(r) from P(d,m,z)

• Weight (d,m,z)(r) by P(yt,yt+1|d(r),m(r),z(r)) Called

“Likelihood weighting”

BRENDAN FREY

Particle filtering (condensation)

(Isard, Blake, et al, et al, et al,…)

• Goal: Use sample S = ut(1)…ut(R) from

P(ut|o1,…, ot-1) to sample from P(ut|o1,…, ot)

• Weight each “particle” ut(r) in S by P(ot|ut(r))

• Redraw a sample S’ from the weighted sample

• For each particle zt in S’ draw zt+1 from P(zt+1|zt) Exact, for infinite-size samples

For finite-size samples, it may lose modes

u1 u2 ut-1 ut

o1 o2 ot-1 ot

P(ut|o1,…,o

t

)

∝P(ot|u

t

)P(ut|o1,…,o

t-1

)

Q*(ut) P*(ut)

Condensation for active contours

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

• Sampling from P(u t |o

1

,…, o t )

performs tracking

BRENDAN FREY

3D Body Tracking Model

(Sidenbladh, Black, Fleet, ECCV 2000)

3D articulated model

Perspective projection

Monocular sequence

Unknown environment

Motion-based likelihood Goal:

State:

joint angles and body pose

joint/pose velocities

appearance model Vt

At

φt

It

Observations:

image at timet

Dynamic Bayes net

At

At1

Vt

It

φt

Vt1

It1

1

φt

… …

Shows estimated mean of posterior distribution (with significant changes in view and depth)

3D body tracking using particle filtering

(Sidenbladh, Black, Fleet, ECCV 2000)

BRENDAN FREY

Markov chain Monte Carlo (MCMC)

• MCMC simulates a separate Markov chain on u so that the stationary

distribution of the chain is P(u|o)

• MCMC is NOT just Monte Carlo in a Markov chain (eg, condensation is not MCMC, but is an importance-type

sampler for a Markov model)

Achieving a stationary distribution

• Goal: Sample from P(u|o), holding o fixed

• Start with any configuration u

• Repeatedly draw u’ from a transition distribution T(u’|u) – this converges to the stationary

distribution Q(u) defined by

Q(u’) =

Σ

u Q(u)T(u’|u)

• Q(u) is stationary if detailed balance holds:

Q(u)T(u’|u) = Q(u’)T(u|u’)

• So, pick T(u’|u) so that

P(u|o) T(u’|u) = T(u|u’) P(u’|o)P(o)

P(o)P(u,o) P(u’,o)

BRENDAN FREY

Choosing the sample

• Approach 1:

– Simulate 1 chain for a long time and store configurations periodically

• Approach 2:

– Simulate several chains for moderate

amounts of time and collect configurations

Gibbs sampling

• Goal: Sample from P(u|o), holding o fixed

• Require it to be easy to sample from P(ui|all other vars)

– easy in discrete graphical models

– for real-valued models, may need local MC or MCMC

• Start with any configuration u

• Visit variables in u in order or at random

• Draw ui from P(ui|all other vars)

Stationary distribution = P(u|o) (det balance)

BRENDAN FREY

d

z2 z3 z4 z5 z6 z1

m

Time t Time t+1

z0

Gibbs sampling in the shifter net

1

0

Time t Time t+1

Gibbs sampling in the shifter net

Randomly initialize unobserved variables

BRENDAN FREY

1

0

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others)

P(var|parents) x

Πchildren P(child|parents) var is in this set

1

0

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others)

P(var|parents) x

Πchildren P(child|parents) var is in this set

BRENDAN FREY

1

0

Time t Time t+1

Gibbs sampling in the shifter net

Sample vars one at a time P(var|others)

P(var|parents) x

Πchildren P(child|parents) var is in this set

01 1 1

1 1

1 1 1 10

0

The Metropolis algorithm

• MCMC version of importance sampling

• As before, find P*(u) P(u,o)

• Find a proposal distribution, Q*(u’|u), such that can evaluate [P*(u’)Q*(u|u’)]/[P*(u)Q*(u’|u)] and can “sample” from Q*(u’|u)

MCMC step:

• Based on old u, sample new u’ from Q*(u’|u)

• Compute a = [P*(u’)Q*(u|u’)]

/

[P*(u)Q*(u’|u)]

• Accept u’ with probability min(a,1)

BRENDAN FREY

Problem: Highly flexible contours are difficult to track, because prob of picking the right ut from P(ut|ut-1) is very small Fix: For each new

observation, apply

Metropolis to the control points to “tweak” them back on track

u1 u2 ut-1 ut

o1 o2 ot-1 ot

Metropolis for active contours

(Kristjansson and Frey, submitted to NIPS 2000)

Variational methods

• Suppose P(u|o) is intractable

• Idea: Approximate P(u|o) with a distribution Q that is tractable

• Use Q to compute expectations, etc

• Parameterize Q: Q(u|Φ)

• Choose “distance” D(Q,P)

• Minimize D(Q,P) wrt Φ

BRENDAN FREY

Choosing the “distance”

D =

Σ

u Q(u|Φ)log[Q(u|Φ)/P(u|o)]

u

Q

P For unimodal Q, minimizing D favors this Q

D =

Σ

u P(u|o)log[P(u|o)/Q(u|Φ)]

u

Q P For unimodal Q, minimizing D favors this Q

Making the distance tractable

Usually, the distance can’t be computed directly

D =

Σ

u Q(u|Φ)log[Q(u|Φ)/P(u|o)]

• This distance uses P(u|o)

• Instead, use F = D – log P(o):

F =

Σ

u Q(u|Φ)log[Q(u|Φ)/P(u,o)]

• P(u,o) factorizes for a Bayes net!

BRENDAN FREY

What’s needed to minimize F

F =

Σ

u Q(u|Φ) log[Q(u|Φ)/P(u,o)]

=

Σ

u Q(u|Φ) log[Q(u|Φ)]

-

Σ

u

Σ

i Q(u|Φ) log[P(xi|parents of xi)]

• Entropy of Q

• Solution of local log-P equations

• We want

Σ

u to break into smaller sums

Types of Q-distribution

• Factorized: Q(u|Φ) =

Π

iQ(ui |Φi)

aka mean field

• Q can be a graphical model that is a substructure of P

• Q can be a graphical model with a different structure from P

BRENDAN FREY

Variational inference in mixed-state dynamic net

(Pavlovic, Frey and Huang, CVPR 1999)

x0 x1 x2 s0 s1 s2

y0 y1 y2

Y

=

=

=

=

=

=

1

0 1

1

1 0

0

1

0

1 1

1

1 0

)

| (

) ,

| ( )

| (

)

| ( )

| ( )

(

)

| ( )

| (

)

| (

T

t

t t T

t

t t t T

t

t t T

t

t t

LDS HMM

x y P

u x x P s

x P

s q P s

s P s

P

Q Q

Q

U Y, X Q

S

Q U, Y, S X,

Variational parameters

P(st|Y) Input trace + LDS state

vectors colored by most probable HMM state

Colors for HMM states x0 x1 x2

s0 s1 s2

y0 y1 y2

Variational inference in mixed-state dynamic net

(Pavlovic, Frey and Huang, CVPR 1999)

t, time step

BRENDAN FREY

More examples

• An introduction to variational methods

(Jordan et al 1999)

• Variational methods based on convexity analysis (Jaakkola 1997)

• Derivation of a simple variational technique for nonlinear Gaussian Bayes nets

(Frey and Hinton 1999)

Sum-product algorithm in graphs with cycles

• To heck with the cycles - just apply the algorithm! (Even though it’s not exact…)

• Due to cycles, algorithm iterates (passes messages around loops) until we stop it

• Impressive: Error-correcting decoding

– 0.15 dB from Shannon limit – now in 2 telecom standards – Lucent is producing a chip

BRENDAN FREY

Iterative sum-product WORKS in

network for layered appearance model

(Frey, CVPR 2000)

Index of object in layer l

Far away

Intensity of ray n at camera Intensity of ray n at layer l

Camera

Sum-product in layered appearance model

Layer 4 Layer 3 Layer 2 Layer 1 Iter 1

Iter 2

Obj 1 Obj 2 Obj 3 Obj 4

Input Image

Input Image Layer 1 Layer L

BRENDAN FREY

Sum-product in layered appearance model

Layer 4 Layer 3 Layer 2 Layer 1 Iter 1

Iter 2

Obj 1 Obj 2 Obj 3 Obj 4

Input Image

Input Image Layer 1 Layer L

Markov network for image and scene patch modeling

(from Freeman and Pasztor, ICCV 1999)

image patches

scene patches

BRENDAN FREY

Original 70x70

Markov network result, 280x280

Cubic Spline 280x280

True 280x280

Super-resolution by iterative probability propagation

(from Freeman and Pasztor, ICCV 1999)

Alternating maximizations

• Pick an assignment for all variables

• Repeatedly select a tractable

substructure (eg, a tree) and find the most probable configuration in the substructure

• This is guaranteed to find a sequence of global configurations that are

increasingly probable

BRENDAN FREY

Tracking self-occluding objects in disparity maps

(Jojic, Turk and Huang, ICCV 1999)

PART IV

LEARNING THE

相關文件