• 沒有找到結果。

(Markov Random Field or Markov

N/A
N/A
Protected

Academic year: 2022

Share "(Markov Random Field or Markov"

Copied!
29
0
0

加載中.... (立即查看全文)

全文

(1)

Undirected graphical models

(Markov Random Field or Markov Network)

Reference to:

Books:

Bishop Ch8, PGM Ch4 slides & Nodes :

Cedric Archambean’s slide at Pascal Bootcamp 2010 Tiberio Caetano’s slide at MLSS 2008

CMU Graphical Learning Model lecture notes

Prof. Shou-de Lin

sdlin@csie.ntu.edu.tw CSIE/GINM

1 SAI 2010

(2)

An Recommendation Letter Example

• Chance of getting a job depends on the

recommendation letter. However, only one letter is provided to the company, and it is determined by the

‘choice’.

• How about the dependency (L1  J|C=2)?

– BN makes independence assertions only at the level of variables.

• How about (L1  L2|C, J)?

• There is no perfect map using BN for such distribution

job letter1

choice

letter2

2 SAI 2010

(3)

A Friendship Example

• Each random variable represents whether a person knows a news.

• A goes out with B and D. C goes out with B and D (see the right graph). A and C doesn’t know each other, and so does B and D.

• The left graph implies (AC|{B,D}) , but it also implies BD|A 

• The left graph implies B and D are dependent given C and A 

• The middle graph implies (AC|{B,D})  but it also implies BD 

• There is no perfect map using BN for such distribution

3 SAI 2010

(4)

Undirected graphical models

• Ideally we would like to have more freedom in the graph

• Markov random fields allow for the specification of a different class of Conditional Independence (CI) statements

• The class of CI statements for MRFs can be easily defined by graphical means in undirected graphs.

– The absence of edges imply CI statements

• Local potential functions and the cliques in the graph completely determine the joint distribution.

• Give correlations between variables, but no explicit way to generate samples

4 SAI 2010

(5)

In the dating example

• P(a,b,c,d)= 

1

(A,B) 

2

(B,C) 

3

(C,D) 

4

(D,A)/Z

• Z is a normalization term

5 SAI 2010

(6)

Def: Markov Network (MN) or Markov Random Field (MRF)

• How about a sequence of multiplications?

, where g

ci

(x

ci

) is a function of a subset of x.

• A distribution P

is a Gibbs distribution

parameterized by a set of factors = {

1

(D

1

),…

k

(D

k

)} if it is defined as P

(X

1

,….X

n

)=

1

(D

1

)*

2

(D

2

)…

m

(D

m

)/Z (Z is a normalization factor)

 

n i i

i

i x x

K

i

C C K

i

C C

n g x Z g x

x Z x

p ,...,

1 1

1 1

) ( where

, ) 1 (

) ,..., (

6 SAI 2010

(7)

What are the proper Factors for MRF?

• Ip(H): all dependency satisfies: two nodes that are not

connected is conditionally independent given all other nodes

– P(x1,x2|X\{x1,x2})=P(x1|X\{x1,x2}) P(x2|X\{x1,x2})

• Therefore, nodes that are not connected should NOT be put in the same factor  nodes that ARE connected should be put in the same factor

• How can we identify nodes that ARE connected?

– Maximal clique

7 SAI 2010

(8)

Cliques and Maximal Cliques

• A clique of a graph is a complete subgraph (each pair of nodes have an edge)

• A maximal clique of a graph is a clique which is not a subset of another clique

• {A, C} form a clique

• {A, B, D} and {B, E} are maximal clique

C

A

D

B E

8 SAI 2010

(9)

Factorization in MRF

• For a graph with K maximal cliques, we can decompose the joint as

(xCi are the nodes belonging to the maximal clique Ci)

• The factorization is in terms of local potential functions

• Can also be described as: a Gibbs distribution p factorizes over an MRF H of each Xci are nodes belonging to a clique Ci

• The potential functions do not necessarily have a probabilistic interpretation

 

n i i

i

i x x

K

i

C C

K

i

C C

n g x Z g x

x Z x

p ,...,

1 1

1 1

) (

where ,

) 1 (

) ,..., (

i g

g

Ci

( )}, with

Ci

( ) 0 for all

{   

9 SAI 2010

(10)

Example of potential function

• Let

C

A

D

B E

} 1 , 0 {

}, 1 , 0 {

}, ,

1 {B E BE

C

B E g(B,E)

0 0 0.4

0 1 0.8

1 0 3.0

1 1 2.5

Not necessarily to be a probability

CPD can be seen as a special case of the factors 10

SAI 2010

(11)

Separation, CI and Factorization in MRF (1/2)

• Factorization  CI: If a probability distribution factorizes according to an undirected graph, and if X, Y and Z are disjoint subsets of nodes such that Z separated X from Y (no directed link from X to Y through Z), then the distribution satisfies X  Y| Z

Proof:

• We start by considering the case where XYZ=S. As Z separates X from Y, there are no direct edges between X and Y. Hence, any clique in H is fully contained either in XZ or YZ. Let Ix be the indexes of the set of cliques that are contained in XZ, and let Iy be the indexes of the

remaining cliques. Therefore P(X1,….Xn)=iIxi(Di) iIy i(Di)/K, K is the normalization factor

P(X1,….Xn)=f(X,Z)g(Y,Z)/K  X  Y| Z

• Then we consider the case where XYZS, let U=S-XYZ, it is possible to partition U into two disjoint sets U1 and U2 such that Z separates XU1 from YU2. Similar to the above argument, we can

conclude P(X1,….Xn)=f(X,U1,Z) f(Y,U2,Z)/K  (X,U1  Y,U2|Z)  X  Y| Z

11 SAI 2010

(12)

Separation, CI and Factorization in MRF (2/2)

CI properties and factorization are equivalent in MRF :

• Factorization  CI: If a probability distribution factorizes according to an undirected graph, and if A, B and C are

disjoint subsets of nodes such that C separated A from B, then the distribution satisfies

• CI Factorization: If a positive probability distribution satisfies the CI statements implied by graph separation over the undirected graph, then it also factorizes according to this graph.

– known as the Hammersley-Clifford theorem (proof: section 4.4 in PGM, or section 4.2.3 in our textbook)

C B A  |

12 SAI 2010

(13)

Local Dependency for Markov Random Field

• Weakest dependency : two nodes that are not connected is

conditionally independent given all other nodes

– All dependency satisfies such condition is called Ip(H)

• Less weak dependency: a node is conditional independent of every other node in the network given its directed neighbors

– All dependency satisfies such condition is called Il(H)

X1

X2

X3 X4 X5

X6

X7 X8

13 SAI 2010

(14)

Global Graph Separation

• if every path from A to B includes at least one node from C, then C is said to separate A from B in G.

• Path is blocked by C  A  B | C

• All dependency that satisfies such condition is denoted as I(H)

• For ANY MRF, we can denote I(H) ==> Il(H) == > Ip(H)

• For a positive joint probability distribution P, the following three statements are equivalent (proof see 4.3.2.2 in PGM):

– P |= I(H) – P |= Il(H) – P |= Ip(H)

14 SAI 2010

(15)

A canonical example: image denoising

This figure is from Tibério Caetano’s slide at MLSS 2008

15 SAI 2010

(16)

Image denoising:

Ising model

• yi is the observed noisy pixel and xi is an unknown noise-free pixel

• there is a strong correlation between xi and yi

• neighboring pixels xi and xj in an image are strongly correlated.

• Cliques: {xi, yi} and {xi, xj} where i and j are indices of

neighboring pixels noisy nodes are correlated with denoised nodes

• E(x,y)=hixi - i,jxixj - ixiyi , p(x,y)=e-E(x,y)/Z

bias toward xi=0 neighbor nodes should be the same

• The lower the energy the better (i.e. the higher the probability)

16 SAI 2010

(17)

Optimization in Ising Model

• E(x,y)=h

i

x

i

- 

i,j

x

i

x

j

- 

i

x

i

y

i

, p(x,y)=e

-

E(x,y)

/Z

• Iterated conditional modes (ICM):

– It’s a gradient ascent method.

– First initialize x

i

=y

i

for all i.

– Then we take one node x

j

at a time and evaluate the total energy for two states x

j

=+1 and x

j

=-1, choose the better assignment.

– The algorithm can converge to a local optimal.

SAI 2010 17

(18)

MRF versus Bayesian network

Similarities:

• CI properties of the joint are encoded into the graph structure and define families of structured probability distributions.

• CI properties are related to concepts of separation of (groups of) nodes in the graph.

• Local entities in the graph imply the simplified algebraic structure (factorization) of the joint.

18 SAI 2010

(19)

MRF versus Bayesian network

Differences:

• The set of probability distributions represented by MRFs is different from the set represented by Bayesian networks.

• MRFs have a normalization constant that couples all factors, whereas Bayesian networks have not.

• Factors in Bayesian networks are probability distributions, while factors in MRFs are nonnegative potentials.

19 SAI 2010

(20)

Markov Blanket in BN and MRF

• The Markov Blanket of a node xi in either a BN or an MRF is the smallest set of nodes A such that p(xi |x~i ) = p(xi |xA )

• BN: parents, children and co-parents of children of the node

• MRF: neighbors of the node

BN MRF

20 SAI 2010

(21)

Mapping a Linear Bayesian networks into a MRF

• Bayesian network:

• MRF

• The mapping is here straightforward. Let

x1 x2 x3 …... xn

)

| ( )

| ( ) ( )

,...,

(x1 xnp x1 p x2 x1 p xn xn1

p

) , (

) , ( )

, 1 (

) ,...,

( 1 n g1,2 x1 x2 g2,3 x2 x3 gn 1,n xn 1 xn x Z

x

p

x1 x2 x3 …... xn

1

and

)

| ( )

, (

, ),

| ( )

, (

),

| ( ) ( )

, (

1 1

, 1 2

3 3

2 3 , 2

1 2 1

2 1 2 , 1

Z

x x p x

x g

x x p x

x g

x x p x p x

x g

n n n

n n

n

21 SAI 2010

(22)

Mapping head-to-head Bayesian networks into a MRF

• When there are head-to-head nodes one has to add edges to convert the Bayesian network into the undirected graph.

• This process of ‘marrying the parents’ has become known as moralization, and the resulting undirected graph, after

dropping the arrows, is called the moral graph.

x1 x2

x3

x1 x2

x3

) ,

| ( ) ( ) ( )

, ,

(x1 x2 x3 p x1 p x2 p x3 x1 x2

p 1 ( , , )

) , ,

( 1 2 3 g1,2,3 x1 x2 x3 x Z

x x

p

Involves the three variables

22 SAI 2010

(23)

Mapping a Bayesian networks into a MRF- General Process

• Add additional undirected links between all pairs of parents of each node in the graph.

• Initialize all of the clique potentials of the moral graph to 1.

• Take each conditional distribution factor in the original directed graph and multiply it into one of the clique potentials.

23 SAI 2010

(24)

Dependency Revisit

• P: the set of all distributions P over a given set of variables.

• D: a set of distributions D that can be represented by BN

• U: a set of distributions U that can be represented by MRF

SAI 2010 24

(25)

Factor graph

Two types of nodes

• The circles in a factor graph represent random variables

• The squares represent factors in the joint distribution

Two nodes are neighbors if they share a common factor.

) ( ) , ( ) , ( ) , ( )

, ,

(x1 x2 x3 f x1 x2 f x1 x2 f x2 x3 f x3

pa b c d

25 SAI 2010

(26)

Factor graph

• Factor graphs incorporate explicit details about the

factorization, but factor nodes do not correspond to CIs.

• An edge between a circle and a square states that the

corresponding function has the corresponding variable as an argument.

• The joint distribution is the product of all functions (so, the functions are factors).

• They preserve the tree structure of DAGs and undirected graphs.

26 SAI 2010

(27)

Converting a BN into a factor graph

• Create nodes corresponding to the original DAG.

• Create factor nodes corresponding the conditional distributions.

• Conversion is not unique:

) ,

| ( )

, , ( ), (

) ( ), (

) (

) ,

| ( ) ( ) ( )

, , (

2 1 3

3 2 1 2

2 1

1

2 1 3

2 1

3 2 1

x x x

p x

x x f x

p x

f x

p x

f

x x x

p x

p x p x

x x f

c b

a   

27 SAI 2010

(28)

Converting an MRF into a factor graph

• Create nodes corresponding to the original undirected graph.

• Create factor nodes corresponding the potential functions.

• Conversion is not unique:

) ,

, ( )

, ( )

, ,

(

) ,

, ( )

, ,

(

3 2

1 3 , 2 , 1 3

2 3

2 1

3 2

1 3 , 2 , 1 3

2 1

x x

x x

x f

x x

x f

x x

x x

x x f

b

a

28 SAI 2010

(29)

Why Factor Graphs?

• Now we have two completely different types of graphs:

BN and MRF.

– It’s a problem because their inference strategy can be very different

• We probably want a unified framework

– How about converting BN to MRF?

– Problem: loops can be created.

• It is possible to convert both BN and MRF into trees without loops.

SAI 2010 29

參考文獻

相關文件

○ Propose a method to check the connectivity of the graph based on the Warshall algorithm and introduce an improved approach, then apply it to increase the accuracy of the

 From a source vertex, systematically follow the edges of a graph to visit all reachable vertices of the graph.  Useful to discover the structure of

According to the regulations, the employer needs to provide either a mobile or landline phone number at which the employer (or a contact person) can be reached.. If

To proceed, we construct a t-motive M S for this purpose, so that it has the GP property and its “periods”Ψ S (θ) from rigid analytic trivialization generate also the field K S ,

◦ Lack of fit of the data regarding the posterior predictive distribution can be measured by the tail-area probability, or p-value of the test quantity. ◦ It is commonly computed

Miroslav Fiedler, Praha, Algebraic connectivity of graphs, Czechoslovak Mathematical Journal 23 (98) 1973,

The vertex-cover problem is to find a vertex cover of minimum size in a given undirected graph. • 此問題的decision版本為NP-Complete

Given an undirected graph with nonnegative edge lengths and nonnegative vertex weights, the routing requirement of a pair of vertices is assumed to be the product of their weights.