• 沒有找到結果。

Vision by Inference and Learning in Graphical Models

N/A
N/A
Protected

Academic year: 2022

Share "Vision by Inference and Learning in Graphical Models"

Copied!
86
0
0

加載中.... (立即查看全文)

全文

(1)

Vision by

Inference and Learning in Graphical Models

Brendan Frey

(www.cs.uwaterloo.ca/~frey)

University of Waterloo

University of Illinois at Urbana-Champaign Microsoft Research

Acknowledgements for comments and suggestions

• P. Anandan

• G. Bradski

• D. Fleet

• D. Heckerman

• T. Huang

• N. Jojic

• R. Szeliski

(2)

BRENDAN FREY

Approaching vision as probabilistic inference

Graphics: Compute P(x|c) Vision using Bayes rule:

P(c|x) = a P(x|c)P(c), a = 1

/ Σ

c P(x|c)P(c)

• Input: x = vector of pixel intensities

• Class: c = class index 1, 2, …, C Vision: Compute P(c|x)

Example: P(x|c) Gaussian

• P(x|c) = (2πσc2)-1/2exp[-(x-µc)2/2σc2]

x P(x|c=2)P(c=2) P(x|c=1)P(c=1)

1 2 1

Values of c that maximize P(c|x) = a P(x|c)P(c)

(3)

BRENDAN FREY

Examples: Image input

• P(Fred | Image)

• P(Happy | Image)

• P(Happy | Image, Fred)

• P(Fred | Image, Happy)

• …

Examples: Video input

• P(User wants mouse click | Video)

• P(Pixel i from layer L at time t | Video)

• P(Shape of objects at time t | Video)

• P(Appearance of objects at time t |Video)

• P(Position of objects at time t | Video)

• P(shape, appearance, positions of objects at time t | Video)

• …

(4)

BRENDAN FREY

Optimal decision making

• If P(x|c) and P(c) are correct, picking cMAP = argmaxc P(x|c)P(c)

minimizes the number of classification errors

• If U(c,c*) is the utility of picking class c*

when the true class is c, use

cMEU = argmaxc*

Σ

c U(c,c*)P(x|c)P(c)

• A probabilistic inference

P(xi | Observed x’s) gives optimal decisions for xi

Generative models

• We suppose that observations are the result of a structured generative process on system variables x1, x2, …, xN

• A generative model is a density model P(x1, x2, …, xN)

(5)

BRENDAN FREY

Burglar alarm problem (Pearl 86)

Burglar: b=0 no burglar; b=1 burglar Alarm: a=0 no alarm; a=1 alarm rings

P(a,b) = P(b)P(a|b) Eg, P(a=1|b=1) = 0.8

EARTHQUAKES also trigger alarm Earthquake: e=0 no quake; a=1 quake

P(a,b,e) = P(e)P(b)P(a|b,e)

Useful questions about

P(a,b,e) = P(e)P(b)P(a|b,e)

• Under P, are b and e independent ?

• Under P, are b and e independent given a?

• Probabilistic inferences: P(b=1|a=1) = ?, P(b=1|a=1,e=0) = ?, P(b=1|a=1,e=0) = ?

(6)

BRENDAN FREY

P(d,m,z,yt,yt+1) = P(d) P(m)

Π

iP(zi | d)

Π

iP(yit | zi)

Π

iP(yit+1 | zi-1,zi+1,m)

Shifter problem: Patches in motion

Examples of 6 x 1 patches from a video sequence Time t:

Time t+1:

Still Right Still Right

Sparse Dense

Noise-free patches - Easier to explain!

• d = density (0=sparse, 1=dense)

• zi = noise-free intensity of pixel i at time t

• m = motion (0=still, 1=right)

• yit = noisy, observed intensity of pixel i at time t

Useful questions about

P(d,m,z,yt,yt+1) = P(d) P(m)

Π

iP(zi | d)

Π

iP(yit | zi)

Π

iP(yit+1 | zi-1,zi+1,m)

Σ

d

Σ

m

Σ

z

Σ

yt

Σ

yt+1P(d,m,z,yt,yt+1) = 1 ?

• Under P, does yt depend on m ?

• Probabilistic inferences: P(m | yt,yt+1) = ?, P(d | yt,yt+1) = ?, P(yt+1| yt) = ?

(7)

BRENDAN FREY

PART I

BAYESIAN NETWORKS

(directed graphical models)

• MAY be constructed using causal relationships between variables

• Quickly conveys the factorization of a distribution

• By construction, implies the distribution is normalized

• Clearly expresses dependencies and independencies between variables

• Can be used to derive fast inference algorithms

Bayesian network

(8)

BRENDAN FREY

Causal construction of burglar net

e b

Assuming earthquakes don’t cause burglaries, e is not connected to b

a

Earthquakes and burglars trigger

the alarm, so e and b are connected to a

Causal construction of shifter net

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

m

Time t Time t+1

z0

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

(9)

BRENDAN FREY

Definition of a Bayes net

• Directed graph

No cycles when following arrows, “DAG”

• Unique variable associated with each node

• For each node, a conditional distribution:

P(child variable | parent variables)

• Defines a joint distribution:

P(x1,…, xN) =

Π

i P(xi | parents of xi)

Conditional probabilities in burglar net

e b

a

P(e=1) = .01 P(b=1) = .1

P(a,b,e) = P(e)P(b)P(a|e,b)

P(a=1|e=0,b=0) = .001 P(a=1|e=0,b=1) = .8 P(a=1|e=1,b=0) = .9 P(a=1|e=1,b=1) = .98

(10)

BRENDAN FREY

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

A distribution in the shifter net

P(y2t+I=1|z1=0,z2=0,m) = .01 P(y2t+I=1|z1,z2=1,m=0) = .99 P(y2t+I=1|z1,z2=0,m=0) = .01 P(y2t+I=1|z1=1,z2,m=1) = .99 P(y2t+I=1|z1=0,z2,m=1) = .01

Direct bonuses of Bayes nets

• The Markov blanket MB[xi] for variable xi can be read off the graph, where

P(xi | MB[xi], other vars) = P(xi | MB[xi])

• Simulating P(x1,…, xN) is easy

• Normalization

Σ

x1

... Σ

xN

[ Π

i P(xi | parents of xi)

]

= 1

(11)

Simulating the shifter net

Sample d from P(d) Sample z0

from P(z0|d)

Sample y1t

from P(y1t|z1)

Sample m from P(m)

Sample y1t+1

from P(y1t+1|z0,z1,m) d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

Work “down”

the network

Markov blankets

• What is the smallest set of variables that

“isolates” a variable xi from the other variables in the network?

• The Markov blanket, MB[xi]:

P(xi| MB[xi], X \ {xi} \ MB[xi]) = P(xi| MB[xi])

• If set S does not contain MB[xi], P(xi| S, X \ {xi} \ S) ≠ P(xi| S)

(12)

BRENDAN FREY

A Markov blanket in the shifter net

MB[z6] = {d, y6t, y6t+1, m, z5}

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

Pruning Bayes nets

• For variables x1,…, xN, b, suppose b does not have children

• If we delete node b and its edges, the resulting network describes

P(x1,…, xN) =

Σ

b P(x1,…, xN,b)

So,

Σ

x1

... Σ

xN

[ Π

i P(xi | parents of xi)

]

= 1

(13)

BRENDAN FREY

m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

m

yIt+I y2t+I y3t+I y4t+I y5t+I

Pruning the shifter net

m

yIt+I y2t+I y3t+I y4t+I

m

yIt+I y2t+I y3t+I

m

yIt+I y2t+I

m

yIt+I

m d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

P(d,z,yt) = P(d)

Π

iP(zi | d)

Π

iP(yit | zi)

Use this simpler net to make inferences about d and z and yt

Noncausal constructions

Reasons for noncausal constructions

• System is not causal

• System too complex for causal construction

• For computational efficiency, a noncausal net is preferable

(14)

BRENDAN FREY

Procedure for noncausal construction

• Order the variables (eg, at random)

• Add the variables, one at a time

• Make the current variable a child of all previously added variables

• Delete as many edges as possible, reducing the number of parents for the current variable

The last step requires probing the

physical system or answering queries

A noncausal construction of the burglar net, order a, e, b

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

P(e|a) ≠ P(e) so leave a→e

e b

P(b|a,e) ≠ P(b) P(b|a,e) ≠ P(b|a) P(b|a,e) ≠ P(b|e) so leave e→b and a→b

(15)

BRENDAN FREY

P(a,b,e) =

P(a)P(e|a)P(b|a,e)

a

e b

a

e b

P(a,b,e) = P(e)P(b)P(a|e,b)

Causal construction Non-causal construction

Are e and b independent?

YES CAN’T

TELL!

BRENDAN FREY

Conditional independencies

• Is xA independent of xB given xS?

Is P(xA, xB| xS) = P(xA | xS) P(xB | xS) ?

• YES, if every path from xA to xB is BLOCKED

• A path can be blocked in 3 ways:

xS

1 xS

2

xS is not a descendent

3

(16)

BRENDAN FREY

Independencies in the shifter net

P(yt,m) = P(yt) P(m)

• P(d,m) = P(d) P(m)

• P(d,m|yt) = P(d|yt) P(m|yt)

• P(d,m|yt+1) P(d|yt+1) P(m|yt+1) d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

z0 m

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

x2 x3 x4 x5 x6 x1

“Extreme” Bayes nets

Factorized model P(x) =

Π

i P(xi)

x2 x3 x4 x5 x6 x1

Unstructured model P(x) =

Π

i P(xi|x1,…xi-1)

- Always true, from chain rule of prob.

(17)

BRENDAN FREY

Mixture model (Naive Bayes)

c

x2 x3 x4 x5 x6 x1

P(x,c) = P(c)

Π

i P(xi|c)

P(x) =

Σ

c P(x,c)

Discrete

c

x

SHORTHAND P(x,c) = P(c)P(x|c)

z cc=1

Mixture of Gaussians

P(c) = πc

P(z|c) = N(z; µµµµc , ΦΦΦΦc)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4, diag(ΦΦΦΦ2) =

z=

(18)

BRENDAN FREY

x z ss=

Transformed mixture of Gaussians

(Frey and Jojic, CVPR 1999)

cc=1 P(c) = πc

P(x|z,s) = N(x; shift(z,s), ΨΨΨΨ)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4,

Shift, s P(s)

diag(ΦΦΦΦ2) = z=

x=

Diagonal

BRENDAN FREY

Layered

appearance model

(Frey, CVPR 2000)

Index of object in layer l

Far

Intensity of ray n at camera Intensity of ray n at layer l

Near

(19)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1st view at x,y

Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1st view of pixel at x,y in 1st view

L(x,y) in I:

Layer of pixel at x,y in 1st view

Dynamic Bayes nets

• Just a Bayes net for time-series data

(20)

BRENDAN FREY

Markov model

z1 z2 zt-1 zt zt+1 MB[zt] = {zt-1, zt+1}

P(zt|z1,z2,…, zt-1,zt+1,…) = P(zt|zt-1,zt+1) P(zt|z1,z2,…, zt-1 ) = P(zt|zt-1)

Hidden Markov model

z1 z2 zt-1 zt zt+1

zt discrete, P(zt| zt-1) = “transition matrix”

xt discrete or continuous Eg, P(xt|zt) = Normal(xtxt,Cxt)

x1 x2 xt-1 xt xt+1

(21)

BRENDAN FREY

Linear dynamic system (Kalman filter model)

z1 z2 zt-1 zt zt+1

zt continuous, P(zt| zt-1) Gaussian xt continuous, P(xt|zt) Gaussian

x1 x2 xt-1 xt xt+1

BRENDAN FREY

Transformed hidden Markov model

(Jojic, Petrovic, Frey and Huang, CVPR 2000)

x s

c

z

x s

c

z

t

t-1

(22)

BRENDAN FREY

Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

3D body tracking model

(Sidenbladh, Black, Fleet, ECCV 2000)

3D articulated model

Perspective projection

Monocular sequence

Unknown environment

Motion-based likelihood Goal:

State:

joint angles and body pose

joint/pose velocities

appearance model Vt

At

φt

It

Observations:

image at timet

Dynamic Bayes net

At

At1

Vt

It

φt

Vt1

It1

1

φt

… …

(23)

BRENDAN FREY

Switching mixture of state-space models

(Ghahramani and Hinton, 1997)

x0 x1 x2 s0 s1 s2

y0 y1 y2

System “switch”

State of system A Measurements

x0 x1 x2 State of system B

Mixed-state dynamic Bayes net

(Pavlovic, Frey and Huang, CVPR 1999)

• Uses discrete-state HMM to drive

continuous dynamics (Kalman filtering)

x0 x1 x2 s0 s1 s2

y0 y1 y2

Decision/

action/mode State of dynamics

Measurements

Y

(24)

BRENDAN FREY

Easy-living net

Microsoft, 2002

PART II

UNDIRECTED

GRAPHICAL MODELS

(25)

BRENDAN FREY

Markov random field (MRF)

• Undirected graph on variables

• Graph gives Markov blankets:

The Markov blanket of a variable is the variable’s neighbors

z

indicates variables in the Markov blanket for z

The distribution for an MRF

• If P(x1,…, xN) ≠ 0, for all configs of x1,…, xN, then P(x1,…, xN) can be expressed

P(x1,…, xN) = α

Π

c φc({xi: i Qc})

• c indexes the maximal cliques

• Qc is the set of the variables in clique c

• φc( ) is a strictly positive function

(potential) on the variables in clique c

• α is a normalizing constant

(26)

BRENDAN FREY

Burglar MRF

e b

a

1 maximal clique:

Q1 = {e,b,a}

Clique potential:

φ1(e,b,a) Distribution:

P(e,b,a) = α φ1(e,b,a) Are e and b independent? CAN’T TELL!

Line processes

(Geman and Geman)

0 0 0 0 Maximal clique

Patterns with high φ

1 1 0 0 0 0

1 1

1 0 1 0 0 1

0 1 1 1

1 1

1 0 0 0

Patterns with low φ

0 1 0 0

0 0 1 0

0 0 0 1 0 1

1 1

1 0 1 1

1 1 0 1

1 1 1 0

Under P(), lines are probable

1 0 0 1

0 1 1 0

(27)

BRENDAN FREY

Markov network for image and scene patch modeling

(from Freeman and Pasztor, ICCV 1999)

image patches

scene patches

Bayes net – MRF hybrids

• Suppose we have an MRF for x, with distribution PMRF(x)

• Suppose we have a Bayes net for z, with distribution PBN(z)

• Then, we can add directed edges

connecting variables in x to variables in z, creating a modified Bayes net,

PBN(z|x)

• The joint distribution is PBN(z|x)PMRF(x)

(28)

BRENDAN FREY

I(x,y): vector of pix int diffs between other views and pix int in 1st view at x,y

BN-HMM Hybrid Multiview layered model

(variant of Torr, Szeliski, Anandan, CVPR 1999)

Θ Θ Θ

Θ: Params of layer planes

z(x,y):

depth in 1st view of pixel at x,y in 1st view

L(x,y) in I:

Layer of pixel at x,y in 1st view

Factor graphs

(Kschischang, Frey, Loeliger, subm to IEEE Trans IT)

• Bipartite graph: variable nodes and function nodes

• A local function is associated with each function node – this function depends on the neighboring variables

• The global function is given by the product of the local functions

(29)

BRENDAN FREY

Burglar factor graphs

e b

a

e b

a

P(e,a,b)

P(e) P(b)

P(a|e,b)

No independencies (like MRF)

Same independencies as Bayes net

Converting an MRF to a factor graph

• Create variable nodes

• Create one function node for each maximal clique in the MRF

• Connect each function node to the variables in the corresponding clique

• Set the function associated with each function node to the corresponding clique potential

• Global function = MRF distribution

• Each MRF has a unique factor graph

• Different factor graphs may have the same MRF

(30)

BRENDAN FREY

Converting a Bayes net to a factor graph

• Create variable nodes

• For each variable, create one function node and connect it to the variable

• Connect each function node to the parents of the corresponding variable

• Set the function associated with each function node to the corresponding conditional pdf in the Bayes net

• Global function = Bayes net distribution

• If the child of each local function is indicated (eg, with an arrow), the resulting factor graph has the same semantics as a Bayes net

PART III

INFERENCE

(31)

BRENDAN FREY

A probabilistic inference

P(xi | Observed x’s) gives optimal decisions for xi

Probabilistic inference

Recall for a correct generative model P(x1, x2, …, xN)

Inference: Mixture of Gaussians

P(c) = πc

P(c|z) = P(z|c)P(c)

/ Σ

c P(z|c)P(c)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6,

π2= 0.4, diag(ΦΦΦΦ2) =

c

z

P(c=1|z) = .9 P(c=2|z) = .1

z=

P(c=1|z) = .2 P(c=2|z) = .8

z= P(z|c) = N(z; µµµµc, ΦΦΦΦc)

(32)

BRENDAN FREY

Inference: Transformed mixture of Gaussians

P(x,z,s,c) =

P(x|z,s)P(s)P(z|c)P(c)

µµµµ1= diag(ΦΦΦΦ1) = µµµµ2=

π1= 0.6, π2= 0.4,

x z s

c

diag(ΦΦΦΦ2) =

x=

sMAP=

cMAP

=1 zMAP=

Linear

Gaussian Linear

Gaussian

Discrete Discrete

General “brute force” inference

• Suppose x1, x2, …, xN are binary

P(x1) =

Σ

x2

Σ

x3

Σ

xNP(x1, x2, …, xN)

• This takes about 2N operations

• Generally, computing P(xi|Observed x’s) takes 2(N - #observed x’s) operations

(33)

BRENDAN FREY

Inference in Bayes nets

d

z2 z3 z4 z5 z6 z1

m

Time t Time t+1

z0

P(m=1|Obs)=0.8 P(zi|Obs)

P(d=1|Obs)=0.2

Observed parents “abandon” children

• We can remove the edges connecting observed parents to their children

o a1 a2

c1 c2

Observation: o=o’

P(o|a1,a2)

P(c1|o) P(c2|o)

o a1 a2

c1 c2

P(o=o’|a1,a2)

P(c1|o=o’) P(c2|o=o’)

(34)

BRENDAN FREY

Sum-product algorithm

(probability propagation, forward-backward algorithm) (Gallager 1963; Pearl 1986; Lauritzen & Spiegelhalter 1986; …)

• Suppose we have a graphical model for discrete variables x1, x2, …, xN

• If the graphical model is a tree (or

“close” to being a tree), the sum-product algorithm can compute

P(xi|Observed x’s) for all xi in LINEAR TIME

Example: Discrete Markov model

P(A,B,C,D,E) = P(E|D)P(D|C)P(C|B)P(B|A)P(A)

P(E) =

Σ

D

Σ

C

Σ

B

Σ

AP(E|D)P(D|C)P(C|B)P(B|A)P(A)

A B C D E

=

Σ

D

Σ

C

Σ

B

Σ

AP(E|D)P(D|C)P(C|B)P(B|A)P(A)

=

Σ

D

Σ

C

Σ

BP(E|D)

Σ

AP(D|C)P(C|B)P(B|A)P(A)

=

Σ

D

Σ

C

Σ

BP(E|D)P(D|C)

Σ

AP(C|B)P(B|A)P(A)

=

Σ

D

Σ

C

Σ

BP(E|D)P(D|C)P(C|B)

Σ

AP(B|A)P(A)

=

Σ

D

Σ

C

Σ

BP(E|D)P(D|C)P(C|B)[

Σ

AP(B|A)P(A)]

=

Σ

D

Σ

CP(E|D)

Σ

BP(D|C)P(C|B)[

Σ

AP(B|A)P(A)]

=

Σ

D

Σ

CP(E|D)P(D|C)

Σ

BP(C|B)[

Σ

AP(B|A)P(A)]

=

Σ

D

Σ

CP(E|D)P(D|C)

[ Σ

BP(C|B)[

Σ

AP(B|A)P(A)]

]

=

Σ

DP(E|D)

Σ

CP(D|C)

[ Σ

BP(C|B)[

Σ

AP(B|A)P(A)]

]

=

Σ

DP(E|D)

[ Σ

CP(D|C)

[ Σ

BP(C|B)[

Σ

AP(B|A)P(A)]

] ]

f(B) f(B)

g(C)

g(C)

h(D)

h(D) P(E)

(35)

BRENDAN FREY

General sum-product algorithm

• Messages: short vectors of numbers;

interpret as functions of discrete vars

• Messages flow in both directions on each edge

• Initially, all messages are set to 1

• Messages are updated randomly or in a given order

• Messages are fused to compute (or approximate) P(xi|Observed x’s)

Passing messages in Bayes nets

f(a) = Σc,bh(c)q(c)P(c|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b) Against edge

c

d e

a b

h(b)

q(c) g(a)

f(c)

f(c) = Σa,bq(c)P(c|a,b)g(a)h(b) With edge

P(c|o) αΣa,bh(c)q(c)P(c|a,b)f(a)g(b)

c

d e

a b

g(b)

q(c) f(a)

h(c)

Fusion

Each message is a function of its parent

(36)

BRENDAN FREY

Propagating observations in Bayes nets

f(a) = Σbh(c’)q(c’)P(c’|a,b)g(b)

c

d e

a b

h(c) q(c)

f(a)

g(b)

Against edge

Observation: c=c’

f(c’) = Σa,bq(c’)P(c’|a,b)g(a)h(b)

c

d e

a b

h(b)

q(c) g(a)

f(c)

With edge

f(c) = 0, c not equal to c’

Passing messages in factor graphs

• Much simpler than Bayes nets!!!!

• Bayes nets and MRFs can be converted to factor graphs really easily

f(a) = g(a)h(a)

a

h(a) f(a)

g(a)

Each message is a function of its neighboring variable

f(a) = ΣbΣcφ(a,b,c)g(b)h(c)

c a b

h(c)

f(a) g(b)

Local f’n:

φ(a,b,c)

Out of var Out of f’n Fusion

P(a|o) f(a)g(a)h(a)

a

h(a) f(a)

g(a)

(37)

BRENDAN FREY

Result of fusion

• Unobserved variables: u1,…,uK

• Observed variables: o

• Fusion at ui estimates

P(ui,o) =

Σ

u1,…,ui-1,ui+1,…,uK P(u1,…,uK,o)

• Local normalization:

P(ui|o) = P(ui,o)

/ Σ

uiP(ui,o)

Properties of sum-product

• Exact in trees

• Computationally efficient even for

– linear Gaussian vars, without discrete children – observed real vars with discrete parents

• Some applications:

– Error-correcting decoding (trellis codes) – Speech recognition (HMM is a tree) – Kalman tracking (LDS net is a tree)

– Multiscale smoothing (tree on image)

, …

(38)

BRENDAN FREY

Max-product (Viterbi) algorithm

• Replace SUM with MAX in the sum- product algorithm

• Max-product computes

Φ(ui) = max P(u1,…,uK,o)

u1,…,ui-1,ui+1,…,uK

• MAP configuration:

uiMAP = argmax Φ(ui)

ui

What if graph is not a tree?

• Can “cluster” variables

• Can convert the graph to a “join tree”

(Lauritzen and Spiegelhalter 1988)

• Can use “bucket elimination” (Dechter 1999)

BUT, THESE METHODS ONLY WORK WHEN THE NUMBER OF

CYCLES IS TRACTABLE

(39)

Lots of systems

are best described by nets with an intractable

number of cycles

Quite a few cycles

d

z2 z3 z4 z5

y6t y5t y4t y3t y2t yIt

z6 z1

m

Time t Time t+1

z0

yIt+I y2t+I y3t+I y4t+I y5t+I y6t+I

(40)

BRENDAN FREY

TOO MANY CYCLES!

TOO MANY CYCLES!

(41)

BRENDAN FREY

Intractable local computations

Even if the graph is a tree, the local functions (conditional probabilities, potentials) may not

yield tractable sum-product computations

• Eg, non-Gaussian pdfs

Active contour model

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

Measurement lines are placed At fixed intervals along the contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

(42)

BRENDAN FREY

Approximate inference

• Monte Carlo

• Markov chain Monte Carlo

• Variational techniques

• Local probability propagation

• Alternating maximizations

Monte Carlo inference

• u = unobserved vars; o = observed vars

• Obtain random sample u(1) , u(2), …, u(R) and use it to

– Represent P(u|o)

– Estimate an expectation,

E[f] =

Σ

uf(u)P(u|o)

Eg, P(ui=1|o) =

Σ

uI(ui=1)P(u|o)

I(expr) = 1 if expr is true I(expr) = 0 otherwise

(43)

BRENDAN FREY

Expectations from a sample

• From the sample u(1), u(2), …, u(R), we can estimate

E[f] ≅ (1/R)

Σ

r f(u(r))

• If u(1), u(2), …, u(R) are independent draws from P(u|o), this estimate

– is unbiased

– has variance

1/R

Rejection sampling

• Otherwise reject and draw again

u

P*(u)

B(u)

• Goal: Hold o constant, draw u from P(u|o) Given

• P*(u)

P(u,o)

can eval P*(u)

• Randomly accept u with prob P*(u)/B(u)

• Draw u from normalized form of B(u)

• B(u) ≥ P*(u) can eval B(u)

can “sample” B(u)

Efficiency measured by rejection rate

(44)

BRENDAN FREY

yt+1

Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

d

yt

z m

1

0

• Draw d,m,z from uniform distribution

Reject !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

Rejection sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose B(d,m,z) = 1P*(d,m,z)

• Draw d,m,z from uniform distribution

yt+1 d

yt

z m

0

1

Accept !

• Randomly accept with probability

P*(d,m,z)/B(d,m,z)

= P(d,m,z,yt,yt+1)

(45)

BRENDAN FREY

Importance sampling

• Goal: Holding o fixed, represent P(u|o) by a weighted sample

• Find P*(u) P(u,o) and Q*(u), such that can evaluate P*(u)/Q*(u) and can “sample” Q*(u)

• Sample u(1), u(2), …, u(R), from Q*(u)

• Compute weights w(r) = P*(u(r))/Q*(u(r))

• Represent P(u|o) by

{

u(r), w(r)/(Σjw(j))

}

• Eg E[f]

Σ

r f(u(r))w(r)/(Σjw(j))

Accuracy given by “effective sample size”

Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

• Draw (d,m,z)(r) from uniform distribution

d

yt

z m

1

0

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

Low weight

(46)

BRENDAN FREY

Importance sampling in the shifter net

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = 1

yt+1 d

yt

z m

0

1

• Draw (d,m,z)(r) from uniform distribution

• Weight (d,m,z)(r) by P(d(r),m(r),z(r),yt,yt+1)

High weight

A better Q-distribution

d

yt

z m

yt+1

• Choose P*(d,m,z) = P(d,m,z,yt,yt+1)

• Choose Q*(d,m,z) = P(d,m,z)

• Draw (d,m,z)(r) from P(d,m,z)

• Weight (d,m,z)(r) by P(yt,yt+1|d(r),m(r),z(r)) Called

“Likelihood weighting”

(47)

BRENDAN FREY

Particle filtering (condensation)

(Isard, Blake, et al, et al, et al,…)

• Goal: Use sample S = ut(1)…ut(R) from

P(ut|o1,…, ot-1) to sample from P(ut|o1,…, ot)

• Weight each “particle” ut(r) in S by P(ot|ut(r))

• Redraw a sample S’ from the weighted sample

• For each particle zt in S’ draw zt+1 from P(zt+1|zt) Exact, for infinite-size samples

For finite-size samples, it may lose modes

u1 u2 ut-1 ut

o1 o2 ot-1 ot

P(ut|o1,…,o

t

)

∝P(ot|u

t

)P(ut|o1,…,o

t-1

)

Q*(ut) P*(ut)

Condensation for active contours

(Blake and Isard, Springer-Verlag 1998)

Unobserved state: P(ut|ut-1) ut = control points of spline

(contour)

Observation: P(ot|ut) ot = for all measurement

lines: # edges, distance of edges from contour

u1 u2 ut-1 ut

o1 o2 ot-1 ot

NONLINEAR

LINEAR (GAUSSIAN)

• Sampling from P(u t |o

1

,…, o t )

performs tracking

參考文獻

相關文件

• Nearpod allows the teacher to create interactive lessons that are displayed on the student device while they are teaching2. • During the lesson students interact with the program

Teacher / HR Data Payroll School email system Exam papers Exam Grades /.

Classifying sensitive data (personal data, mailbox, exam papers etc.) Managing file storage, backup and cloud services, IT Assets (keys) Security in IT Procurement and

• Learn strategies to answer different types of questions.. • Manage the use of time

Ongoing Projects in Image/Video Analytics with Deep Convolutional Neural Networks. § Goal – Devise effective and efficient learning methods for scalable visual analytic

• To achieve small expected risk, that is good generalization performance ⇒ both the empirical risk and the ratio between VC dimension and the number of data points have to be small..

• Learn the mapping between input data and the corresponding points the low dimensional manifold using mixture of factor analyzers. • Learn a dynamical model based on the points on

CAST: Using neural networks to improve trading systems based on technical analysis by means of the RSI financial indicator. Performance of technical analysis in growth and small