# Vision by Inference and Learning in Graphical Models

## Full text

(1)

### Inference and Learning in Graphical Models

Brendan Frey

(www.cs.uwaterloo.ca/~frey)

University of Waterloo

University of Illinois at Urbana-Champaign Microsoft Research

• P. Anandan

• D. Fleet

• D. Heckerman

• T. Huang

• N. Jojic

• R. Szeliski

(2)

BRENDAN FREY

### Approaching vision as probabilistic inference

Graphics: Compute P(x|c) Vision using Bayes rule:

P(c|x) = a P(x|c)P(c), a = 1

### / Σ

c P(x|c)P(c)

• Input: x = vector of pixel intensities

• Class: c = class index 1, 2, …, C Vision: Compute P(c|x)

### Example: P(x|c) Gaussian

• P(x|c) = (2πσc2)-1/2exp[-(x-µc)2/2σc2]

x P(x|c=2)P(c=2) P(x|c=1)P(c=1)

1 2 1

Values of c that maximize P(c|x) = a P(x|c)P(c)

(3)

BRENDAN FREY

### Examples: Image input

• P(Fred | Image)

• P(Happy | Image)

• P(Happy | Image, Fred)

• P(Fred | Image, Happy)

• …

### Examples: Video input

• P(User wants mouse click | Video)

• P(Pixel i from layer L at time t | Video)

• P(Shape of objects at time t | Video)

• P(Appearance of objects at time t |Video)

• P(Position of objects at time t | Video)

• P(shape, appearance, positions of objects at time t | Video)

• …

(4)

BRENDAN FREY

### Optimal decision making

• If P(x|c) and P(c) are correct, picking cMAP = argmaxc P(x|c)P(c)

minimizes the number of classification errors

• If U(c,c*) is the utility of picking class c*

when the true class is c, use

cMEU = argmaxc*

### Σ

c U(c,c*)P(x|c)P(c)

• A probabilistic inference

P(xi | Observed x’s) gives optimal decisions for xi

### Generative models

• We suppose that observations are the result of a structured generative process on system variables x1, x2, …, xN

• A generative model is a density model P(x1, x2, …, xN)

(5)

BRENDAN FREY

### Burglar alarm problem (Pearl 86)

Burglar: b=0 no burglar; b=1 burglar Alarm: a=0 no alarm; a=1 alarm rings

P(a,b) = P(b)P(a|b) Eg, P(a=1|b=1) = 0.8

EARTHQUAKES also trigger alarm Earthquake: e=0 no quake; a=1 quake

P(a,b,e) = P(e)P(b)P(a|b,e)

P(a,b,e) = P(e)P(b)P(a|b,e)

• Under P, are b and e independent ?

• Under P, are b and e independent given a?

• Probabilistic inferences: P(b=1|a=1) = ?, P(b=1|a=1,e=0) = ?, P(b=1|a=1,e=0) = ?

(6)

BRENDAN FREY

P(d,m,z,yt,yt+1) = P(d) P(m)

iP(zi | d)

iP(yit | zi)

### Π

iP(yit+1 | zi-1,zi+1,m)

### Shifter problem: Patches in motion

Examples of 6 x 1 patches from a video sequence Time t:

Time t+1:

Still Right Still Right

Sparse Dense

Noise-free patches - Easier to explain!

• d = density (0=sparse, 1=dense)

• zi = noise-free intensity of pixel i at time t

• m = motion (0=still, 1=right)

• yit = noisy, observed intensity of pixel i at time t

P(d,m,z,yt,yt+1) = P(d) P(m)

iP(zi | d)

iP(yit | zi)

### Π

iP(yit+1 | zi-1,zi+1,m)

d

m

z

yt

### Σ

yt+1P(d,m,z,yt,yt+1) = 1 ?

• Under P, does yt depend on m ?

• Probabilistic inferences: P(m | yt,yt+1) = ?, P(d | yt,yt+1) = ?, P(yt+1| yt) = ?

(7)

BRENDAN FREY

## BAYESIAN NETWORKS

### (directed graphical models)

• MAY be constructed using causal relationships between variables

• Quickly conveys the factorization of a distribution

• By construction, implies the distribution is normalized

• Clearly expresses dependencies and independencies between variables

• Can be used to derive fast inference algorithms

(8)

BRENDAN FREY

### Causal construction of burglar net

e b

Assuming earthquakes don’t cause burglaries, e is not connected to b

a

Earthquakes and burglars trigger

the alarm, so e and b are connected to a

### Causal construction of shifter net

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

m

Time t Time t+1

z0

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

(9)

BRENDAN FREY

### Definition of a Bayes net

• Directed graph

No cycles when following arrows, “DAG”

• Unique variable associated with each node

• For each node, a conditional distribution:

P(child variable | parent variables)

• Defines a joint distribution:

P(x1,…, xN) =

### Π

i P(xi | parents of xi)

### Conditional probabilities in burglar net

e b

a

P(e=1) = .01 P(b=1) = .1

P(a,b,e) = P(e)P(b)P(a|e,b)

P(a=1|e=0,b=0) = .001 P(a=1|e=0,b=1) = .8 P(a=1|e=1,b=0) = .9 P(a=1|e=1,b=1) = .98

(10)

BRENDAN FREY

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

### A distribution in the shifter net

P(y2t+I=1|z1=0,z2=0,m) = .01 P(y2t+I=1|z1,z2=1,m=0) = .99 P(y2t+I=1|z1,z2=0,m=0) = .01 P(y2t+I=1|z1=1,z2,m=1) = .99 P(y2t+I=1|z1=0,z2,m=1) = .01

### Direct bonuses of Bayes nets

• The Markov blanket MB[xi] for variable xi can be read off the graph, where

P(xi | MB[xi], other vars) = P(xi | MB[xi])

• Simulating P(x1,…, xN) is easy

• Normalization

x1

xN

### [ Π

i P(xi | parents of xi)

= 1

(11)

### Simulating the shifter net

Sample d from P(d) Sample z0

from P(z0|d)

Sample y1t

from P(y1t|z1)

Sample m from P(m)

Sample y1t+1

from P(y1t+1|z0,z1,m) d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

### Markov blankets

• What is the smallest set of variables that

“isolates” a variable xi from the other variables in the network?

• The Markov blanket, MB[xi]:

P(xi| MB[xi], X \ {xi} \ MB[xi]) = P(xi| MB[xi])

• If set S does not contain MB[xi], P(xi| S, X \ {xi} \ S) ≠ P(xi| S)

(12)

BRENDAN FREY

### A Markov blanket in the shifter net

MB[z6] = {d, y6t, y6t+1, m, z5}

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

### Pruning Bayes nets

• For variables x1,…, xN, b, suppose b does not have children

• If we delete node b and its edges, the resulting network describes

P(x1,…, xN) =

b P(x1,…, xN,b)

So,

x1

xN

### [ Π

i P(xi | parents of xi)

### ]

= 1

(13)

BRENDAN FREY

m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

m

yIt+I y2t+I y3t+I y4t+I y5t+I

### Pruning the shifter net

m

yIt+I y2t+I y3t+I y4t+I

m

yIt+I y2t+I y3t+I

m

yIt+I y2t+I

m

yIt+I

m d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

P(d,z,yt) = P(d)

iP(zi | d)

### Π

iP(yit | zi)

Use this simpler net to make inferences about d and z and yt

### Noncausal constructions

Reasons for noncausal constructions

• System is not causal

• System too complex for causal construction

• For computational efficiency, a noncausal net is preferable

(14)

BRENDAN FREY

Procedure for noncausal construction

• Order the variables (eg, at random)

• Add the variables, one at a time

• Make the current variable a child of all previously added variables

• Delete as many edges as possible, reducing the number of parents for the current variable

The last step requires probing the

### A noncausal construction of the burglar net, order a, e, b

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

P(e|a) ≠ P(e) so leave a→e

e b

P(b|a,e) ≠ P(b) P(b|a,e) ≠ P(b|a) P(b|a,e) ≠ P(b|e) so leave e→b and a→b

(15)

BRENDAN FREY

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

e b

a

e b

P(a,b,e) = P(e)P(b)P(a|e,b)

Causal construction Non-causal construction

BRENDAN FREY

### Conditional independencies

• Is xA independent of xB given xS?

Is P(xA, xB| xS) = P(xA | xS) P(xB | xS) ?

• YES, if every path from xA to xB is BLOCKED

• A path can be blocked in 3 ways:

xS

1 xS

2

xS is not a descendent

3

(16)

BRENDAN FREY

### Independencies in the shifter net

P(yt,m) = P(yt) P(m)

• P(d,m) = P(d) P(m)

• P(d,m|yt) = P(d|yt) P(m|yt)

• P(d,m|yt+1) P(d|yt+1) P(m|yt+1) d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

x2 x3 x4 x5 x6 x1

### “Extreme” Bayes nets

Factorized model P(x) =

### Π

i P(xi)

x2 x3 x4 x5 x6 x1

Unstructured model P(x) =

### Π

i P(xi|x1,…xi-1)

- Always true, from chain rule of prob.

(17)

BRENDAN FREY

### Mixture model (Naive Bayes)

c

x2 x3 x4 x5 x6 x1

P(x,c) = P(c)

i P(xi|c)

P(x) =

### Σ

c P(x,c)

Discrete

c

x

SHORTHAND P(x,c) = P(c)P(x|c)

z cc=1

### Mixture of Gaussians

P(c) = πc

P(z|c) = N(z; µµµµc , ΦΦΦΦc)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4, diag(ΦΦΦΦ2) =

z=

(18)

BRENDAN FREY

x z ss=

### Transformed mixture of Gaussians

(Frey and Jojic, CVPR 1999)

cc=1 P(c) = πc

P(x|z,s) = N(x; shift(z,s), ΨΨΨΨ)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4,

Shift, s P(s)

diag(ΦΦΦΦ2) = z=

x=

Diagonal

BRENDAN FREY

### appearance model

(Frey, CVPR 2000)

Index of object in layer l

Far

Intensity of ray n at camera Intensity of ray n at layer l

Near

(19)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1st view at x,y

### Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1st view of pixel at x,y in 1st view

L(x,y) in I:

Layer of pixel at x,y in 1st view

### Dynamic Bayes nets

• Just a Bayes net for time-series data

(20)

BRENDAN FREY

### Markov model

z1 z2 zt-1 zt zt+1 MB[zt] = {zt-1, zt+1}

P(zt|z1,z2,…, zt-1,zt+1,…) = P(zt|zt-1,zt+1) P(zt|z1,z2,…, zt-1 ) = P(zt|zt-1)

### Hidden Markov model

z1 z2 zt-1 zt zt+1

zt discrete, P(zt| zt-1) = “transition matrix”

xt discrete or continuous Eg, P(xt|zt) = Normal(xtxt,Cxt)

x1 x2 xt-1 xt xt+1

(21)

BRENDAN FREY

### Linear dynamic system (Kalman filter model)

z1 z2 zt-1 zt zt+1

zt continuous, P(zt| zt-1) Gaussian xt continuous, P(xt|zt) Gaussian

x1 x2 xt-1 xt xt+1

BRENDAN FREY

### Transformed hidden Markov model

(Jojic, Petrovic, Frey and Huang, CVPR 2000)

x s

c

z

x s

c

z

(22)

BRENDAN FREY

### Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

3D body tracking model

3D articulated model

Perspective projection

Monocular sequence

Unknown environment

Motion-based likelihood Goal:

State:

joint angles and body pose

joint/pose velocities

appearance model Vt

At

φt

It

Observations:

image at timet

Dynamic Bayes net

At

At1

Vt

It

φt

Vt1

It1

1

φt

### … …

(23)

BRENDAN FREY

Switching mixture of state-space models

(Ghahramani and Hinton, 1997)

x0 x1 x2 s0 s1 s2

y0 y1 y2

System “switch”

State of system A Measurements

x0 x1 x2 State of system B

### Mixed-state dynamic Bayes net

(Pavlovic, Frey and Huang, CVPR 1999)

• Uses discrete-state HMM to drive

continuous dynamics (Kalman filtering)

x0 x1 x2 s0 s1 s2

y0 y1 y2

Decision/

action/mode State of dynamics

Measurements

Y

(24)

BRENDAN FREY

Microsoft, 2002

## GRAPHICAL MODELS

(25)

BRENDAN FREY

### Markov random field (MRF)

• Undirected graph on variables

• Graph gives Markov blankets:

The Markov blanket of a variable is the variable’s neighbors

z

indicates variables in the Markov blanket for z

### The distribution for an MRF

• If P(x1,…, xN) ≠ 0, for all configs of x1,…, xN, then P(x1,…, xN) can be expressed

P(x1,…, xN) = α

### Π

c φc({xi: i Qc})

• c indexes the maximal cliques

• Qc is the set of the variables in clique c

• φc( ) is a strictly positive function

(potential) on the variables in clique c

• α is a normalizing constant

(26)

BRENDAN FREY

### Burglar MRF

e b

a

1 maximal clique:

Q1 = {e,b,a}

Clique potential:

φ1(e,b,a) Distribution:

P(e,b,a) = α φ1(e,b,a) Are e and b independent? CAN’T TELL!

### Line processes

(Geman and Geman)

0 0 0 0 Maximal clique

Patterns with high φ

1 1 0 0 0 0

1 1

1 0 1 0 0 1

0 1 1 1

1 1

1 0 0 0

Patterns with low φ

0 1 0 0

0 0 1 0

0 0 0 1 0 1

1 1

1 0 1 1

1 1 0 1

1 1 1 0

1 0 0 1

0 1 1 0

(27)

BRENDAN FREY

### Markov network for image and scene patch modeling

(from Freeman and Pasztor, ICCV 1999)

image patches

scene patches

### Bayes net – MRF hybrids

• Suppose we have an MRF for x, with distribution PMRF(x)

• Suppose we have a Bayes net for z, with distribution PBN(z)

• Then, we can add directed edges

connecting variables in x to variables in z, creating a modified Bayes net,

PBN(z|x)

• The joint distribution is PBN(z|x)PMRF(x)

(28)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1st view at x,y

BN-HMM Hybrid Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1st view of pixel at x,y in 1st view

L(x,y) in I:

Layer of pixel at x,y in 1st view

### Factor graphs

(Kschischang, Frey, Loeliger, subm to IEEE Trans IT)

• Bipartite graph: variable nodes and function nodes

• A local function is associated with each function node – this function depends on the neighboring variables

• The global function is given by the product of the local functions

(29)

BRENDAN FREY

### Burglar factor graphs

e b

a

e b

a

P(e,a,b)

P(e) P(b)

P(a|e,b)

No independencies (like MRF)

Same independencies as Bayes net

### Converting an MRF to a factor graph

• Create variable nodes

• Create one function node for each maximal clique in the MRF

• Connect each function node to the variables in the corresponding clique

• Set the function associated with each function node to the corresponding clique potential

• Global function = MRF distribution

• Each MRF has a unique factor graph

• Different factor graphs may have the same MRF

(30)

BRENDAN FREY

### Converting a Bayes net to a factor graph

• Create variable nodes

• For each variable, create one function node and connect it to the variable

• Connect each function node to the parents of the corresponding variable

• Set the function associated with each function node to the corresponding conditional pdf in the Bayes net

• Global function = Bayes net distribution

• If the child of each local function is indicated (eg, with an arrow), the resulting factor graph has the same semantics as a Bayes net

## INFERENCE

(31)

BRENDAN FREY

A probabilistic inference

P(xi | Observed x’s) gives optimal decisions for xi

### Probabilistic inference

Recall for a correct generative model P(x1, x2, …, xN)

### Inference: Mixture of Gaussians

P(c) = πc

P(c|z) = P(z|c)P(c)

### / Σ

c P(z|c)P(c)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4, diag(ΦΦΦΦ2) =

c

z

P(c=1|z) = .9 P(c=2|z) = .1

z=

P(c=1|z) = .2 P(c=2|z) = .8

z= P(z|c) = N(z; µµµµc, ΦΦΦΦc)

(32)

BRENDAN FREY

### Inference: Transformed mixture of Gaussians

P(x,z,s,c) =

P(x|z,s)P(s)P(z|c)P(c)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6, π2= 0.4,

x z s

c

diag(ΦΦΦΦ2) =

x=

sMAP=

cMAP

=1 zMAP=

Linear

Gaussian Linear

Gaussian

Discrete Discrete

### General “brute force” inference

• Suppose x1, x2, …, xN are binary

P(x1) =

x2

x3

### Σ

xNP(x1, x2, …, xN)

• This takes about 2N operations

• Generally, computing P(xi|Observed x’s) takes 2(N - #observed x’s) operations

(33)

BRENDAN FREY

### Inference in Bayes nets

d

z2 z3 z4 z5 z6 z1

m

Time t Time t+1

z0

P(m=1|Obs)=0.8 P(zi|Obs)

P(d=1|Obs)=0.2

### Observed parents “abandon” children

• We can remove the edges connecting observed parents to their children

o a1 a2

c1 c2

Observation: o=o’

P(o|a1,a2)

P(c1|o) P(c2|o)

o a1 a2

c1 c2

P(o=o’|a1,a2)

P(c1|o=o’) P(c2|o=o’)

(34)

BRENDAN FREY

### Sum-product algorithm

(probability propagation, forward-backward algorithm) (Gallager 1963; Pearl 1986; Lauritzen & Spiegelhalter 1986; …)

• Suppose we have a graphical model for discrete variables x1, x2, …, xN

• If the graphical model is a tree (or

“close” to being a tree), the sum-product algorithm can compute

P(xi|Observed x’s) for all xi in LINEAR TIME

### Example: Discrete Markov model

P(A,B,C,D,E) = P(E|D)P(D|C)P(C|B)P(B|A)P(A)

P(E) =

D

C

B

### Σ

AP(E|D)P(D|C)P(C|B)P(B|A)P(A)

A B C D E

=

D

C

B

### Σ

AP(E|D)P(D|C)P(C|B)P(B|A)P(A)

=

D

C

BP(E|D)

### Σ

AP(D|C)P(C|B)P(B|A)P(A)

=

D

C

BP(E|D)P(D|C)

### Σ

AP(C|B)P(B|A)P(A)

=

D

C

### Σ

BP(E|D)P(D|C)P(C|B)

AP(B|A)P(A)

=

D

C

### Σ

BP(E|D)P(D|C)P(C|B)[

AP(B|A)P(A)]

=

D

CP(E|D)

BP(D|C)P(C|B)[

AP(B|A)P(A)]

=

D

CP(E|D)P(D|C)

BP(C|B)[

AP(B|A)P(A)]

=

D

CP(E|D)P(D|C)

BP(C|B)[

AP(B|A)P(A)]

=

DP(E|D)

CP(D|C)

BP(C|B)[

AP(B|A)P(A)]

=

DP(E|D)

CP(D|C)

BP(C|B)[

AP(B|A)P(A)]

f(B) f(B)

g(C)

g(C)

h(D)

h(D) P(E)

(35)

BRENDAN FREY

### General sum-product algorithm

• Messages: short vectors of numbers;

interpret as functions of discrete vars

• Messages flow in both directions on each edge

• Initially, all messages are set to 1

• Messages are updated randomly or in a given order

• Messages are fused to compute (or approximate) P(xi|Observed x’s)

### Passing messages in Bayes nets

f(a) = Σc,bh(c)q(c)P(c|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b) Against edge

c

d e

a b

h(b)

q(c) g(a)

f(c)

f(c) = Σa,bq(c)P(c|a,b)g(a)h(b) With edge

P(c|o) αΣa,bh(c)q(c)P(c|a,b)f(a)g(b)

c

d e

a b

g(b)

q(c) f(a)

h(c)

Fusion

Each message is a function of its parent

(36)

BRENDAN FREY

### Propagating observations in Bayes nets

f(a) = Σbh(c’)q(c’)P(c’|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b)

Against edge

Observation: c=c’

f(c’) = Σa,bq(c’)P(c’|a,b)g(a)h(b)

c

d e

a b

h(b)

q(c) g(a)

f(c)

With edge

f(c) = 0, c not equal to c’

### Passing messages in factor graphs

• Much simpler than Bayes nets!!!!

• Bayes nets and MRFs can be converted to factor graphs really easily

f(a) = g(a)h(a)

a

h(a) f(a)

g(a)

Each message is a function of its neighboring variable

f(a) = ΣbΣcφ(a,b,c)g(b)h(c)

c a b

h(c)

f(a) g(b)

Local f’n:

φ(a,b,c)

Out of var Out of f’n Fusion

P(a|o) f(a)g(a)h(a)

a

h(a) f(a)

g(a)

(37)

BRENDAN FREY

### Result of fusion

• Unobserved variables: u1,…,uK

• Observed variables: o

• Fusion at ui estimates

P(ui,o) =

### Σ

u1,…,ui-1,ui+1,…,uK P(u1,…,uK,o)

• Local normalization:

P(ui|o) = P(ui,o)

uiP(ui,o)

### Properties of sum-product

• Exact in trees

• Computationally efficient even for

– linear Gaussian vars, without discrete children – observed real vars with discrete parents

• Some applications:

– Error-correcting decoding (trellis codes) – Speech recognition (HMM is a tree) – Kalman tracking (LDS net is a tree)

– Multiscale smoothing (tree on image)

(38)

BRENDAN FREY

### Max-product (Viterbi) algorithm

• Replace SUM with MAX in the sum- product algorithm

• Max-product computes

Φ(ui) = max P(u1,…,uK,o)

u1,…,ui-1,ui+1,…,uK

• MAP configuration:

uiMAP = argmax Φ(ui)

ui

### What if graph is not a tree?

• Can “cluster” variables

• Can convert the graph to a “join tree”

(Lauritzen and Spiegelhalter 1988)

• Can use “bucket elimination” (Dechter 1999)

BUT, THESE METHODS ONLY WORK WHEN THE NUMBER OF

CYCLES IS TRACTABLE

(39)

### Quite a few cycles

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

m

Time t Time t+1

z0

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

(40)

BRENDAN FREY

(41)

BRENDAN FREY

### Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

(42)

BRENDAN FREY

### Approximate inference

• Monte Carlo

• Markov chain Monte Carlo

• Variational techniques

• Local probability propagation

• Alternating maximizations

### Monte Carlo inference

• u = unobserved vars; o = observed vars

• Obtain random sample u(1) , u(2), …, u(R) and use it to

– Represent P(u|o)

– Estimate an expectation,

E[f] =

uf(u)P(u|o)

Eg, P(ui=1|o) =

### Σ

uI(ui=1)P(u|o)

I(expr) = 1 if expr is true I(expr) = 0 otherwise

(43)

BRENDAN FREY

### Expectations from a sample

• From the sample u(1), u(2), …, u(R), we can estimate

E[f] ≅ (1/R)

### Σ

r f(u(r))

• If u(1), u(2), …, u(R) are independent draws from P(u|o), this estimate

– is unbiased

– has variance

1/R

### Rejection sampling

• Otherwise reject and draw again

u

P*(u)

B(u)

• Goal: Hold o constant, draw u from P(u|o) Given

• P*(u)

### ∝

P(u,o)

can eval P*(u)

• Randomly accept u with prob P*(u)/B(u)

• Draw u from normalized form of B(u)

• B(u) ≥ P*(u) can eval B(u)

can “sample” B(u)

Efficiency measured by rejection rate

(44)

BRENDAN FREY

yt+1

### Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

d

yt

z m

1

0

• Draw d,m,z from uniform distribution

### Reject !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

### Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

• Draw d,m,z from uniform distribution

yt+1 d

yt

z m

0

1

### Accept !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

(45)

BRENDAN FREY

### Importance sampling

• Goal: Holding o fixed, represent P(u|o) by a weighted sample

• Find P*(u) P(u,o) and Q*(u), such that can evaluate P*(u)/Q*(u) and can “sample” Q*(u)

• Sample u(1), u(2), …, u(R), from Q*(u)

• Compute weights w(r) = P*(u(r))/Q*(u(r))

• Represent P(u|o) by

### {

u(r), w(r)/(Σjw(j))

• Eg E[f]

### Σ

r f(u(r))w(r)/(Σjw(j))

Accuracy given by “effective sample size”

### Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

• Draw (d,m,z)(r) from uniform distribution

d

yt

z m

1

0

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

(46)

BRENDAN FREY

### Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

yt+1 d

yt

z m

0

1

• Draw (d,m,z)(r) from uniform distribution

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

### A better Q-distribution

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = P(d,m,z)

• Draw (d,m,z)(r) from P(d,m,z)

• Weight (d,m,z)(r) by P(yt,yt+1|d(r),m(r),z(r)) Called

“Likelihood weighting”

(47)

BRENDAN FREY

### Particle filtering (condensation)

(Isard, Blake, et al, et al, et al,…)

• Goal: Use sample S = ut(1)…ut(R) from

P(ut|o1,…, ot-1) to sample from P(ut|o1,…, ot)

• Weight each “particle” ut(r) in S by P(ot|ut(r))

• Redraw a sample S’ from the weighted sample

• For each particle zt in S’ draw zt+1 from P(zt+1|zt) Exact, for infinite-size samples

For finite-size samples, it may lose modes

u1 u2 ut-1 ut

o1 o2 ot-1 ot

P(ut|o1,…,o

)

∝P(ot|u

)P(ut|o1,…,o

)

Q*(ut) P*(ut)

### Condensation for active contours

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

1

Updating...

## References

Related subjects :