Vision by Inference and Learning in Graphical Models

(1)

Vision by

Inference and Learning in Graphical Models

Brendan Frey

(www.cs.uwaterloo.ca/~frey)

University of Waterloo

University of Illinois at Urbana-Champaign Microsoft Research

Acknowledgements for comments and suggestions

• P. Anandan

• G. Bradski

• D. Fleet

• D. Heckerman

• T. Huang

• N. Jojic

• R. Szeliski

(2)

BRENDAN FREY

Approaching vision as probabilistic inference

Graphics: Compute P(x|c) Vision using Bayes rule:

P(c|x) = a P(x|c)P(c), a = 1

/ Σ

_c ^P(x|c)P(c)

• Input: x = vector of pixel intensities

• Class: c = class index 1, 2, …, C Vision: Compute P(c|x)

Example: P(x|c) Gaussian

• P(x|c) = (2πσ_c²)^-1/2exp[-(x-µ_c)²/2σ_c²]

x P(x|c=2)P(c=2) P(x|c=1)P(c=1)

1 2 1

Values of c that maximize P(c|x) = a P(x|c)P(c)

(3)

BRENDAN FREY

Examples: Image input

• P(Fred | Image)

• P(Happy | Image)

• P(Happy | Image, Fred)

• P(Fred | Image, Happy)

• …

Examples: Video input

• P(User wants mouse click | Video)

• P(Pixel i from layer L at time t | Video)

• P(Shape of objects at time t | Video)

• P(Appearance of objects at time t |Video)

• P(Position of objects at time t | Video)

• P(shape, appearance, positions of objects at time t | Video)

• …

(4)

BRENDAN FREY

Optimal decision making

• If P(x|c) and P(c) are correct, picking c^MAP = argmax_c P(x|c)P(c)

minimizes the number of classification errors

• If U(c,c*) is the utility of picking class c*

when the true class is c, use

c^MEU = argmax_c*

Σ

_c U(c,c*)P(x|c)P(c)

• A probabilistic inference

P(x_i | Observed x’s) gives optimal decisions for x_i

Generative models

• We suppose that observations are the result of a structured generative process on system variables x₁, x₂, …, x_N

• A generative model is a density model P(x₁, x₂, …, x_N)

(5)

BRENDAN FREY

Burglar alarm problem (Pearl 86)

Burglar: b=0 no burglar; b=1 burglar Alarm: a=0 no alarm; a=1 alarm rings

P(a,b) = P(b)P(a|b) Eg, P(a=1|b=1) = 0.8

EARTHQUAKES also trigger alarm Earthquake: e=0 no quake; a=1 quake

P(a,b,e) = P(e)P(b)P(a|b,e)

Useful questions about

P(a,b,e) = P(e)P(b)P(a|b,e)

• Under P, are b and e independent ?

• Under P, are b and e independent given a?

• Probabilistic inferences: P(b=1|a=1) = ?, P(b=1|a=1,e=0) = ?, P(b=1|a=1,e=0) = ?

(6)

BRENDAN FREY

P(d,m,z,y^t,y^t+1) = P(d) P(m)

Π

_i^P(z_i ^{| d)}

Π

_i^P(y_i^t ^{| z}_i⁾

Π

_i^P(y_i^t+1 ^{| z}_i-1^,z_i+1^,m)

Shifter problem: Patches in motion

Examples of 6 x 1 patches from a video sequence Time t:

Time t+1:

Still Right Still Right

Sparse Dense

Noise-free patches - Easier to explain!

• d = density (0=sparse, 1=dense)

• z_i = noise-free intensity of pixel i at time t

• m = motion (0=still, 1=right)

• y_i^t = noisy, observed intensity of pixel i at time t

Useful questions about

P(d,m,z,y^t,y^t+1) = P(d) P(m)

Π

_i^P(z_i ^{| d)}

Π

_i^P(y_i^t ^{| z}_i⁾

Π

_i^P(y_i^t+1 ^{| z}_i-1^,z_i+1^,m)

BAYESIAN NETWORKS

(directed graphical models)

• MAY be constructed using causal relationships between variables

• Quickly conveys the factorization of a distribution

• By construction, implies the distribution is normalized

• Clearly expresses dependencies and independencies between variables

• Can be used to derive fast inference algorithms

Bayesian network

(8)

BRENDAN FREY

Causal construction of burglar net

e b

Assuming earthquakes don’t cause burglaries, e is not connected to b

a

Earthquakes and burglars trigger

the alarm, so e and b are connected to a

Causal construction of shifter net

d

z₂ z₃ z₄ z₅

y₆^t y₅^t y₄^t y₃^t y₂^t y_I^t

z₆ z₁

m

Time t Time t+1

z₀

y_I^t+I y₂^t+I y₃^t+I y₄^t+I y₅^t+I y₆^t+I

(9)

BRENDAN FREY

Definition of a Bayes net

• Directed graph

No cycles when following arrows, “DAG”

• Unique variable associated with each node

• For each node, a conditional distribution:

P(child variable | parent variables)

• Defines a joint distribution:

P(x₁,…, x_N) =

Π

_i ^P(x_i | parents of x_i)

Conditional probabilities in burglar net

e b

a

P(e=1) = .01 P(b=1) = .1

P(a,b,e) = P(e)P(b)P(a|e,b)

P(a=1|e=0,b=0) = .001 P(a=1|e=0,b=1) = .8 P(a=1|e=1,b=0) = .9 P(a=1|e=1,b=1) = .98

(10)

BRENDAN FREY

d

z₂ z₃ z₄ z₅

z₆ z₁

z₀ m

y_I^t+I y₂^t+I y₃^t+I y₄^t+I y₅^t+I y₆^t+I

A distribution in the shifter net

Direct bonuses of Bayes nets

• The Markov blanket MB[x_i] for variable x_i can be read off the graph, where

P(x_i | MB[x_i], other vars) = P(x_i | MB[x_i])

• Simulating P(x₁,…, x_N) is easy

• Normalization

Σ

^x1

... Σ

^xN

[ ^Π

]

^{= 1}

(11)

Simulating the shifter net

Sample d from P(d) Sample z₀

from P(z₀|d)

Sample y₁^t

from P(y₁^t|z₁)

Sample m from P(m)

Sample y₁^t+1

from P(y₁^t+1|z₀,z₁,m) d

z₂ z₃ z₄ z₅

z₆ z₁

z₀ m

Work “down”

the network

Markov blankets

• What is the smallest set of variables that

“isolates” a variable x_i from the other variables in the network?

• The Markov blanket, MB[x_i]:

P(x_i| MB[x_i], X \ {x_i} \ MB[x_i]) = P(x_i| MB[x_i])

• If set S does not contain MB[x_i], P(x_i| S, X \ {x_i} \ S) ≠ P(x_i| S)

(12)

BRENDAN FREY

A Markov blanket in the shifter net

MB[z₆] = {d, y₆^t, y₆^t+1, m, z₅}

d

z₂ z₃ z₄ z₅

z₆ z₁

z₀ m

y_I^t+I y₂^t+I y₃^t+I y₄^t+I y₅^t+I y₆^t+I

Pruning Bayes nets

• For variables x₁,…, x_N, b, suppose b does not have children

• If we delete node b and its edges, the resulting network describes

P(x₁,…, x_N) =

Σ

b P(x₁,…, x_N,b)

So,

Σ

^x1

... Σ

^xN

[ ^Π

]

^{= 1}

(13)

BRENDAN FREY

m

y_I^t+I y₂^t+I y₃^t+I y₄^t+I y₅^t+I

Pruning the shifter net

m

y_I^t+I y₂^t+I y₃^t+I y₄^t+I

m

y_I^t+I y₂^t+I y₃^t+I

m

y_I^t+I y₂^t+I

m

y_I^t+I

m d

z₂ z₃ z₄ z₅

z₆ z₁

P(d,z,y^t) = P(d)

Π

_i^P(z_i ^{| d)}

Π

_i^P(y_i^t ^{| z}_i⁾

Use this simpler net to make inferences about d and z and y^t

Noncausal constructions

Reasons for noncausal constructions

• System is not causal

• System too complex for causal construction

• For computational efficiency, a noncausal net is preferable

(14)

BRENDAN FREY

Procedure for noncausal construction

• Order the variables (eg, at random)

• Add the variables, one at a time

• Make the current variable a child of all previously added variables

• Delete as many edges as possible, reducing the number of parents for the current variable

The last step requires probing the

physical system or answering queries

A noncausal construction of the burglar net, order a, e, b

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

P(e|a) ≠ P(e) so leave a→e

e b

(15)

BRENDAN FREY

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

e b

a

e b

P(a,b,e) = P(e)P(b)P(a|e,b)

Causal construction Non-causal construction

Are e and b independent?

YES CAN’T

TELL!

BRENDAN FREY

Conditional independencies

• Is x^A independent of x^B given x^S?

Is P(x^A, x^B| x^S) = P(x^A | x^S) P(x^B | x^S) ?

• YES, if every path from x^A to x^B is BLOCKED

• A path can be blocked in 3 ways:

x^S

1 x^S

2

x^S is not a descendent

3

…

(16)

BRENDAN FREY

Independencies in the shifter net

• P(y^t,m) = P(y^t) P(m)

• P(d,m) = P(d) P(m)

• P(d,m|y^t) = P(d|y^t) P(m|y^t)

• P(d,m|y^t+1) ≠ P(d|y^t+1) P(m|y^t+1) d

z₂ z₃ z₄ z₅

z₆ z₁

z₀ m

x₂ x₃ x₄ x₅ x₆ x₁

“Extreme” Bayes nets

Factorized model P(x) =

Π

_i ^P(x_i⁾

x₂ x₃ x₄ x₅ x₆ x₁

Unstructured model P(x) =

Π

_i ^P(x_i^|x₁^,…x_i-1⁾

- Always true, from chain rule of prob.

(17)

BRENDAN FREY

Mixture model (Naive Bayes)

c

x₂ x₃ x₄ x₅ x₆ x₁

P(x,c) = P(c)

Π

_i ^P(x_i^|c)

P(x) =

Σ

_c ^P(x,c)

Discrete

c

x

SHORTHAND P(x,c) = P(c)P(x|c)

z cc⁼1

Mixture of Gaussians

P(c) = π_c

P(z|c) = N(z; µµµµ_c , ΦΦΦΦ_c)

µµµµ₁= diag(ΦΦΦΦ₁) = µµµµ₂=

π₁^{= 0.6,}

π₂= 0.4, diag(ΦΦΦΦ₂) =

z=

(18)

BRENDAN FREY

x z ss⁼

Transformed mixture of Gaussians

(Frey and Jojic, CVPR 1999)

cc=1 P(c) = π_c

P(x|z,s) = N(x; shift(z,s), ΨΨΨΨ)

π₁= 0.6,

π₂^{= 0.4,}

Shift, s P(s)

diag(ΦΦΦΦ₂) = z=

x=

Diagonal

BRENDAN FREY

Layered

appearance model

(Frey, CVPR 2000)

Index of object in layer l

Far

Intensity of ray n at camera Intensity of ray n at layer l

Near

(19)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1^st view at x,y

Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1^st view of pixel at x,y in 1^st view

Transformed hidden Markov model

(Jojic, Petrovic, Frey and Huang, CVPR 2000)

x s

c

z

x s

c

z

t

t-1

(22)

BRENDAN FREY

Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(u_t|u_t-1) u_t = control points of spline

(contour)

Observation: P(o_t|u_t) o_t = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

NONLINEAR

LINEAR (GAUSSIAN)

3D body tracking model

(Sidenbladh, Black, Fleet, ECCV 2000)

•3D articulated model

•Perspective projection

•Monocular sequence

•Unknown environment

•Motion-based likelihood Goal:

State:

–joint angles and body pose

–joint/pose velocities

–appearance model Vt

At

φt

It

Observations:

–image at timet

Dynamic Bayes net

At

A_t₋1

Vt

It

φt

V_t₋1

I_t₋1

−1

φt

… …

(23)

BRENDAN FREY

Switching mixture of state-space models

(Ghahramani and Hinton, 1997)

x₀ x₁ x₂ s₀ s₁ s₂

y₀ y₁ y₂

System “switch”

State of system A Measurements

x₀ x₁ x₂ State of system B

Mixed-state dynamic Bayes net

(Pavlovic, Frey and Huang, CVPR 1999)

• Uses discrete-state HMM to drive

continuous dynamics (Kalman filtering)

x₀ x₁ x₂ s₀ s₁ s₂

y₀ y₁ y₂

Decision/

action/mode State of dynamics

Measurements

Y

(24)

BRENDAN FREY

Easy-living net

Microsoft, 2002

PART II

UNDIRECTED

GRAPHICAL MODELS

(25)

BRENDAN FREY

Markov random field (MRF)

• Undirected graph on variables

• Graph gives Markov blankets:

The Markov blanket of a variable is the variable’s neighbors

z

indicates variables in the Markov blanket for z

The distribution for an MRF

• If P(x₁,…, x_N) ≠ 0, for all configs of x₁,…, x_N, then P(x₁,…, x_N) can be expressed

P(x₁,…, x_N) = α

Π

_c ^φ_c^({x_i^{: i} ^∈ ^Q_c^})

• c indexes the maximal cliques

• Q_c is the set of the variables in clique c

• φ_c( ) is a strictly positive function

(potential) on the variables in clique c

• α is a normalizing constant

(26)

BRENDAN FREY

Burglar MRF

e b

a

1 maximal clique:

Q₁ = {e,b,a}

Clique potential:

φ₁(e,b,a) Distribution:

P(e,b,a) = α φ₁(e,b,a) Are e and b independent? CAN’T TELL!

Line processes

(Geman and Geman)

0 0 0 0 Maximal clique

Patterns with high φ

1 1 0 0 0 0

1 1

1 0 1 0 0 1

0 1 1 1

1 1

1 0 0 0

Patterns with low φ

0 1 0 0

0 0 1 0

0 0 0 1 0 1

1 1

1 0 1 1

1 1 0 1

1 1 1 0

Under P(), lines are probable

1 0 0 1

0 1 1 0

(27)

BRENDAN FREY

Markov network for image and scene patch modeling

(from Freeman and Pasztor, ICCV 1999)

image patches

scene patches

Bayes net – MRF hybrids

• Suppose we have an MRF for x, with distribution P^MRF(x)

• Suppose we have a Bayes net for z, with distribution P^BN(z)

• Then, we can add directed edges

connecting variables in x to variables in z, creating a modified Bayes net,

P^BN(z|x)

• The joint distribution is P^BN(z|x)P^MRF(x)

(28)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1^st view at x,y

BN-HMM Hybrid Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1^st view of pixel at x,y in 1^st view

L(x,y) in I^:

Layer of pixel at x,y in 1^st view

Factor graphs

(Kschischang, Frey, Loeliger, subm to IEEE Trans IT)

• Bipartite graph: variable nodes and function nodes

• A local function is associated with each function node – this function depends on the neighboring variables

• The global function is given by the product of the local functions

(29)

BRENDAN FREY

Burglar factor graphs

e b

a

e b

a

P(e,a,b)

P(e) P(b)

P(a|e,b)

No independencies (like MRF)

Same independencies as Bayes net

Converting an MRF to a factor graph

• Create variable nodes

• Create one function node for each maximal clique in the MRF

• Connect each function node to the variables in the corresponding clique

• Set the function associated with each function node to the corresponding clique potential

• Global function = MRF distribution

• Each MRF has a unique factor graph

• Different factor graphs may have the same MRF

(30)

BRENDAN FREY

Converting a Bayes net to a factor graph

• Create variable nodes

• For each variable, create one function node and connect it to the variable

• Connect each function node to the parents of the corresponding variable

• Set the function associated with each function node to the corresponding conditional pdf in the Bayes net

• Global function = Bayes net distribution

• If the child of each local function is indicated (eg, with an arrow), the resulting factor graph has the same semantics as a Bayes net

PART III

INFERENCE

(31)

BRENDAN FREY

A probabilistic inference

P(x_i | Observed x’s) gives optimal decisions for x_i

Probabilistic inference

Recall for a correct generative model P(x₁, x₂, …, x_N)

Inference: Mixture of Gaussians

P(c) = π_c

P(c|z) = P(z|c)P(c)

/ Σ

^c ^P(z|c)P(c)

π₁= 0.6,

π₂= 0.4, diag(ΦΦΦΦ₂) =

c

z

P(c=1|z) = .9 P(c=2|z) = .1

z⁼

P(c=1|z) = .2 P(c=2|z) = .8

z⁼ P(z|c) = N(z; µµµµ_c, ΦΦΦΦ_c)

(32)

BRENDAN FREY

Inference: Transformed mixture of Gaussians

P(x,z,s,c) =

P(x|z,s)P(s)P(z|c)P(c)

π₁= 0.6, π₂= 0.4,

x z s

c

diag(ΦΦΦΦ₂) =

x=

s^MAP⁼

c^MAP

=1 z^MAP=

Linear

Gaussian Linear

Gaussian

Discrete Discrete

General “brute force” inference

• Suppose x₁, x₂, …, x_N are binary

d

z₂ z₃ z₄ z₅ z₆ z₁

m

Time t Time t+1

z₀

P(m=1|Obs)=0.8 P(z_i|Obs)

P(d=1|Obs)=0.2

Observed parents “abandon” children

• We can remove the edges connecting observed parents to their children

o a₁ a₂

c₁ c₂

Observation: o=o’

P(o|a₁,a₂)

P(c₁|o) P(c₂|o)

o a₁ a₂

c₁ c₂

P(o=o’|a₁,a₂)

P(c₁|o=o’) P(c₂|o=o’)

(34)

BRENDAN FREY

Sum-product algorithm

(probability propagation, forward-backward algorithm) (Gallager 1963; Pearl 1986; Lauritzen & Spiegelhalter 1986; …)

• Suppose we have a graphical model for discrete variables x₁, x₂, …, x_N

• If the graphical model is a tree (or

“close” to being a tree), the sum-product algorithm can compute

P(x_i|Observed x’s) for all x_i in LINEAR TIME

Example: Discrete Markov model

P(A,B,C,D,E) = P(E|D)P(D|C)P(C|B)P(B|A)P(A)

P(E) =

Σ

D

Σ

C

Σ

B

Σ

] ]

f(B) f(B)

g(C)

h(D)

h(D) P(E)

(35)

BRENDAN FREY

General sum-product algorithm

• Messages: short vectors of numbers;

interpret as functions of discrete vars

• Messages flow in both directions on each edge

• Initially, all messages are set to 1

• Messages are updated randomly or in a given order

• Messages are fused to compute (or approximate) P(x_i|Observed x’s)

Passing messages in Bayes nets

f(a) = Σ_c,bh(c)q(c)P(c|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b) Against edge

c

d e

a b

h(b)

q(c) g(a)

f(c)

f(c) = Σ_a,bq(c)P(c|a,b)g(a)h(b) With edge

P(c|o) ≅ αΣ_a,bh(c)q(c)P(c|a,b)f(a)g(b)

c

d e

a b

g(b)

q(c) f(a)

h(c)

Fusion

Each message is a function of its parent

(36)

BRENDAN FREY

Propagating observations in Bayes nets

f(a) = Σ_bh(c’)q(c’)P(c’|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b)

Against edge

Observation: c=c’

f(c’) = Σ_a,bq(c’)P(c’|a,b)g(a)h(b)

c

d e

a b

h(b)

q(c) g(a)

f(c)

With edge

f(c) = 0, c not equal to c’

Passing messages in factor graphs

• Much simpler than Bayes nets!!!!

• Bayes nets and MRFs can be converted to factor graphs really easily

f(a) = g(a)h(a)

a

h(a) f(a)

g(a)

Each message is a function of its neighboring variable

f(a) = Σ_bΣ_c^φ(a,b,c)g(b)h(c)

c a b

h(c)

f(a) g(b)

Local f’n:

φ(a,b,c)

Out of var Out of f’n Fusion

P(a|o) ≅ f(a)g(a)h(a)

a

h(a) f(a)

g(a)

(37)

BRENDAN FREY

Result of fusion

• Unobserved variables: u₁,…,u_K

• Observed variables: o

• Fusion at u_i estimates

P(u_i,o) =

Σ

^u₁^,…,u_i-1^,u_i+1^,…,u_K ^P(u₁^,…,u_K^,o)

• Local normalization:

P(u_i|o) = P(u_i,o)

/ Σ

^u_i^P(u_i^,o)

Properties of sum-product

• Exact in trees

• Computationally efficient even for

– linear Gaussian vars, without discrete children – observed real vars with discrete parents

• Some applications:

– Error-correcting decoding (trellis codes) – Speech recognition (HMM is a tree) – Kalman tracking (LDS net is a tree)

– Multiscale smoothing (tree on image)

, …

(38)

BRENDAN FREY

Max-product (Viterbi) algorithm

• Replace SUM with MAX in the sum- product algorithm

• Max-product computes

Φ(u_i) = max P(u₁,…,u_K,o)

u₁,…,u_i-1,u_i+1,…,u_K

• MAP configuration:

u_i^MAP = argmax Φ(u_i)

u_i

What if graph is not a tree?

• Can “cluster” variables

• Can convert the graph to a “join tree”

(Lauritzen and Spiegelhalter 1988)

• Can use “bucket elimination” (Dechter 1999)

BUT, THESE METHODS ONLY WORK WHEN THE NUMBER OF

CYCLES IS TRACTABLE

(39)

Lots of systems

are best described by nets with an intractable

number of cycles

Quite a few cycles

d

z₂ z₃ z₄ z₅

z₆ z₁

m

Time t Time t+1

z₀

(40)

BRENDAN FREY

TOO MANY CYCLES!

(41)

BRENDAN FREY

Intractable local computations

Even if the graph is a tree, the local functions (conditional probabilities, potentials) may not

yield tractable sum-product computations

• Eg, non-Gaussian pdfs

Active contour model

(contour)

Measurement lines are placed At fixed intervals along the contour

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

NONLINEAR

LINEAR (GAUSSIAN)

(42)

BRENDAN FREY

Approximate inference

• Monte Carlo

• Markov chain Monte Carlo

• Variational techniques

• Local probability propagation

• Alternating maximizations

Monte Carlo inference

• u = unobserved vars; o = observed vars

• Obtain random sample u⁽¹⁾ , u⁽²⁾, …, u^(R) and use it to

– Represent P(u|o)

– Estimate an expectation,

E[f] =

Σ

uf(u)P(u|o)

Eg, P(u_i=1|o) =

Σ

_u^I(u_i^=1)P(u|o)

I(expr) = 1 if expr is true I(expr) = 0 otherwise

(43)

BRENDAN FREY

Expectations from a sample

• From the sample u⁽¹⁾, u⁽²⁾, …, u^(R), we can estimate

E[f] ≅ (1/R)

Σ

_{r f(u}^(r)⁾

• If u⁽¹⁾, u⁽²⁾, …, u^(R) are independent draws from P(u|o), this estimate

– is unbiased

– has variance

∝

^1/R

Rejection sampling

• Otherwise reject and draw again

u

P*(u)

B(u)

• Goal: Hold o constant, draw u from P(u|o) Given

• P*(u)

∝

^P(u,o)

can eval P*(u)

• Randomly accept u with prob P*(u)/B(u)

• Draw u from normalized form of B(u)

• B(u) ≥ P*(u) can eval B(u)

can “sample” B(u)

Efficiency measured by rejection rate

(44)

BRENDAN FREY

y^t+1

Rejection sampling in the shifter net

d

y^t

z m

y^t+1

• Choose P*(d,m,z) = P(d,m,z,y^t,y^t+1)

• Choose B(d,m,z) = 1 ≥ P*(d,m,z)

d

y^t

z m

1

0

• Draw d,m,z from uniform distribution

Reject !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,y^t,y^t+1)

Rejection sampling in the shifter net

d

y^t

z m

y^t+1

• Choose B(d,m,z) = 1 ≥ P*(d,m,z)

• Draw d,m,z from uniform distribution

y^t+1 d

y^t

z m

0

1

Accept !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,y^t,y^t+1)

(45)

BRENDAN FREY

Importance sampling

• Goal: Holding o fixed, represent P(u|o) by a weighted sample

• Find P*(u) ∝ P(u,o) and Q*(u), such that can evaluate P*(u)/Q*(u) and can “sample” Q*(u)

• Sample u⁽¹⁾, u⁽²⁾, …, u^(R), from Q*(u)

• Compute weights w^(r) = P*(u^(r))/Q*(u^(r))

• Represent P(u|o) by

{

^u^(r)^{, w}^(r)^/(^Σ_j^w^(j)⁾

}

• Eg E[f] ≅

Σ

r ^f(u^(r)^)w^(r)^/(Σ_j^w^(j)⁾

Accuracy given by “effective sample size”

Importance sampling in the shifter net

d

y^t

z m

y^t+1

• Choose Q*(d,m,z) = 1

• Draw (d,m,z)^(r) from uniform distribution

d

y^t

z m

1

0

• Weight (d,m,z)^(r) by P(d^(r),m^(r),z^(r),y^t,y^t+1)

Low weight

(46)

BRENDAN FREY

Importance sampling in the shifter net

d

y^t

z m

y^t+1

• Choose Q*(d,m,z) = 1

y^t+1 d

y^t

z m

0

1

• Draw (d,m,z)^(r) from uniform distribution

• Weight (d,m,z)^(r) by P(d^(r),m^(r),z^(r),y^t,y^t+1)

High weight

A better Q-distribution

d

y^t

z m

y^t+1

• Choose Q*(d,m,z) = P(d,m,z)

• Draw (d,m,z)^(r) from P(d,m,z)

• Weight (d,m,z)^(r) by P(y^t,y^t+1|d^(r),m^(r),z^(r)) Called

“Likelihood weighting”

(47)

BRENDAN FREY

Particle filtering (condensation)

(Isard, Blake, et al, et al, et al,…)

• Goal: Use sample S = ut⁽¹⁾^…ut^(R) ^from

P(ut^|o¹^{,…, o}t-1) to sample from P(ut^|o¹^{,…, o}t⁾

• Weight each “particle” ut^(r) in S by P(ot^|ut^(r)⁾

• Redraw a sample S’ from the weighted sample

• For each particle z_t in S’ draw z_t+1 from P(z_t+1|z_t) Exact, for infinite-size samples

For finite-size samples, it may lose modes

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

P(ut|o¹,…,o

t

⁾

∝P(ot|u

t

^)P(ut|o¹^,…,o

t-1

⁾

Q*(ut) P*(ut)

Condensation for active contours

(contour)

u₁ u₂ … u_t-1 u_t

o₁ o₂ … o_t-1 o_t

NONLINEAR

LINEAR (GAUSSIAN)

• Sampling from P(u t ^|o

¹

Vision by Inference and Learning in Graphical Models

Vision by

Inference and Learning in Graphical Models

Acknowledgements for comments and suggestions

Approaching vision as probabilistic inference

/ Σ

Example: P(x|c) Gaussian

Examples: Image input

Examples: Video input

Optimal decision making

Σ

Generative models

Burglar alarm problem (Pearl 86)

Useful questions about

Π

Π

Π

Shifter problem: Patches in motion

Useful questions about

Π

Π

Π

Σ

Σ

Σ

Σ

Σ

PART I

BAYESIAN NETWORKS

(directed graphical models)

Bayesian network

Causal construction of burglar net

Causal construction of shifter net

Definition of a Bayes net

Π

Conditional probabilities in burglar net

A distribution in the shifter net

Direct bonuses of Bayes nets

Σ

... Σ

[ Π

]

Simulating the shifter net

Work “down”

the network

Markov blankets

A Markov blanket in the shifter net

Pruning Bayes nets

Σ

Σ

... Σ

[ Π

]

Pruning the shifter net

Π

Π

Noncausal constructions

A noncausal construction of the burglar net, order a, e, b

Are e and b independent?

YES CAN’T

TELL!

Conditional independencies

Independencies in the shifter net

“Extreme” Bayes nets

Π

Π

Mixture model (Naive Bayes)

Π

Σ

Mixture of Gaussians

Transformed mixture of Gaussians

Layered

appearance model

Multiview layered model

Dynamic Bayes nets

Markov model

Hidden Markov model

Linear dynamic system (Kalman filter model)

Transformed hidden Markov model

t

[ ^Π

[ ^Π