4 Markov chain characterizations

國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

4.1 Compatibility by the Gibbs sampler

Suppose that X and Y are two random variables taking values in {x₁, . . . , x_I} and {y₁, . . . , y_J}, respectively. Consider two conditional probability ma-trices A = (A_ij) = (P {X = x_i|Y = y_j}) and B = (B_ij) = (P {Y = y_j|X = xi}). Arnold, Castillo and Sarabia (1999) treated the matrix A⁰ (transpose of A) as a transition matrix from Y to X and the matrix B as a transition matrix from X to Y , and then applied the Gibbs sampler to obtain stationary distributions. We describe the method as follows. For ease of discussion, we assume A_ij > 0, B_ij > 0 for all i, j.

We begin with an initial X⁽¹⁾. Conditioning on X⁽¹⁾, draw a Y⁽¹⁾ from B.

Next, conditioning on Y⁽¹⁾, draw a X⁽²⁾ from A⁰. So we have the following transitions:

X⁽¹⁾ −−→ Y^B ⁽¹⁾ −−→ X^A⁰ ⁽²⁾ −−→ Y^B ⁽²⁾ −−→ X^A⁰ ⁽³⁾ −−→ Y^B ⁽³⁾ → . . .

This is a Markov chain, but not homogeneous. We then combine two tran-sitions into a single one, so that we have the following two homogeneous chains:

X⁽¹⁾ → X⁽²⁾ → X⁽³⁾. . . Y⁽¹⁾ → Y⁽²⁾ → Y⁽³⁾. . .

The transition matrix of the first chain is BA⁰, and the transition matrix of the second chain is A⁰B. Each chain determines a stationary distribution, say τ = (τ_i) and η = (η_j) where τ_i = P (X = x_i) and η_j = P (Y = y_j). That is, τ and η are solutions of the following systems:

τ BA⁰ = τ, (4.1)

ηA⁰B = η. (4.2)

Note that both transition matrices BA⁰ and A⁰B are irreducible, so that the respective stationary distributions τ and η are unique.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

τ and B together determine a joint distribution f (x_i, y_j) = τ_iB_ij, and η and A together determine a joint distribution g(x_i, y_j) = η_jA_ij.

Let f (x_i, +) = ^P_y_j f (x_i, y_j) and f (+, y_j) =^P_x_if (x_i, y_j), so that f (x_i, +) and f (+, y_j) are the marginal distributions of f . Arnold, Castillo and Sarabia (1999) obtained the following theorem.

Theorem 4.1.1.

(i) Whether A and B are compatible or not, both joint distributions f and g have the same marginal distributions. That is, f (x_i, +) = g(x_i, +) and f (+, y_j) = g(+, y_j) for all i, j.

(ii) A and B are compatible if and only if the stationary distributions τ and η of the respective transition matrices BA⁰ and A⁰B satisfy τ_iB_ij = η_jA_ij for all i, j, i.e., f (x_i, y_j) = g(x_i, y_j) for all i, j.

Proof:

(i) Note that

f (+, y_j) = ^X

f (x_i, y_j) = ^X

τ_iB_ij = (τ B)_j.

So the row vector τ B corresponds to the Y -marginal distribution of f .

Similarly, ηA⁰ corresponds to the X-marginal distribution of g.

Multiplying B to equation (4.1) yields (τ B)A⁰B = (τ B), which together with (4.2) implies

τ B = η.

So the Y -marginal distribution of f = τ B = η = the Y -marginal distribution of g.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Multiplying A⁰ to equation (4.2) yields (ηA⁰)BA⁰ = (ηA⁰).

From equation (4.1), we have

ηA⁰ = τ.

So the X-marginal distribution of g = ηA⁰ = τ = the X-marginal distribution of f . This proves that both joint distributions have the same marginal distributions.

(ii) Suppose that A and B are compatible, implying that there exists a joint distribution h(xi, yj) such that

h (xi, yj)

h (+, y_j) = A_ij and h (xi, yj)

h (x_i, +) = B_ij. Let

hX = (h(x1, +), . . . , h(xI, +)) and hY = (h(+, y1), . . . , h(+, yJ)), which correspond to the X- and Y -marginal distributions of h.

h_XB = h_Y, (4.3)

h_YA⁰ = h_X. (4.4)

Multiplying A⁰ to equation (4.3) and B to equation (4.4) yields h_XBA⁰ = h_YA⁰ = h_X,

hYA⁰B = hXB = hY. From (4.1) and (4.2), we have

τ = h_X and η = h_Y.

It follows that f (x_i, y_j) = g(x_i, y_j) = h(x_i, y_j) for all i, j.

‧

is the conditional distribution of X given Y under g and B is the conditional distribution of Y given X under f , it follows that f = g has A and B as its two conditional distributions. This proves that A and B are compatible.

Example 4.1.1. Consider two conditional distribution matrices:

A =

τ = (0.29546, 0.70454), η = (0.47727, 0.52273).

τ and B together determine a joint distribution (f (xi, yj)) = (τiBij) = while η and A together determine a joint distribution

(g(x_i, y_j)) = (η_jA_ij) =

The two joint distributions are different, so A and B are incompatible. How-ever, they have the same marginal distributions, τ and η.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Arnold, Castillo, Sarabia (1999) only considered Markov chain char-acterization involving two random variables. We now consider the three-dimensional case where X, Y and Z are discrete random variables with I, J and K possible values, respectively. Three conditional distributions are given by

A_ijk = P (X = x_i|Y = y_j, Z = z_k), B_ijk = P (Y = y_j|X = x_i, Z = z_k), C_ijk = P (Z = z_k|X = x_i, Y = y_j).

Again for ease of discussion, we assume A_ijk, B_ijk and C_ijk are all positive.

We generate a Markov chain X⁽¹⁾, Y⁽¹⁾, Z⁽¹⁾, X⁽²⁾, Y⁽²⁾, Z⁽²⁾, . . . as follows.

We start with (X⁽¹⁾, Y⁽¹⁾). Then generate Z⁽¹⁾ using C together with (X⁽¹⁾, Y⁽¹⁾). Thus we move from (X⁽¹⁾, Y⁽¹⁾) to (Y ⁽¹⁾, Z⁽¹⁾). Next, we gen-erate X⁽²⁾ using A together with (Y⁽¹⁾, Z⁽¹⁾), resulting in a movement from (Y⁽¹⁾, Z⁽¹⁾) to (Z⁽¹⁾, X⁽²⁾). Note that in each transition, one of the two com-ponents remains the same. So we have the following transitions

(X⁽¹⁾, Y⁽¹⁾) → (Y⁽¹⁾, Z⁽¹⁾) → (Z⁽¹⁾, X⁽²⁾) → (X⁽²⁾, Y⁽²⁾) → (Y⁽²⁾, Z⁽²⁾) → . . . This is a Markov chain, but not homogeneous. We then combine three transitions into a single one, so that we have three homogeneous chains,

(X⁽¹⁾, Y⁽¹⁾) → (X⁽²⁾, Y⁽²⁾) → (X⁽³⁾, Y⁽³⁾) → . . . (Y⁽¹⁾, Z⁽¹⁾) → (Y⁽²⁾, Z⁽²⁾) → (Y⁽³⁾, Z⁽³⁾) → . . . (Z⁽¹⁾, X⁽²⁾) → (Z⁽²⁾, X⁽³⁾) → (Z⁽³⁾, X⁽⁴⁾) → . . . Let ¯A be the transition matrix from (Y, Z) to (Z, X):

A((j, k), (h, i)) = P (Z = z¯ _h, X = x_i|Y = y_j, Z = z_k) =







A_ijk if h = k 0 if h 6= k, B the transition matrix from (Z, X) to (X, Y ):¯

B((k, i), (h, j)) = P (X = x¯ _h, Y = y_j|Z = z_k, X = x_i) =







B_ijk if h = i 0 if h 6= i,

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

and ¯C the transition matrix from (X, Y ) to (Y, Z):

C((i, j), (h, k)) = P (Y = y¯ _h, Z = z_k|X = x_i, Y = y_j) =







Cijk if h = j 0 if h 6= j.

The transition matrix of the first chain is ¯C ¯A ¯B, the transition matrix of the second chain is ¯A ¯B ¯C and the transition matrix of the third chain is ¯B ¯C ¯A.

Each chain has a unique stationary distribution, say τ = (τ (i, j)) of dimension IJ, η = (η(j, k)) of dimension JK and θ = (θ(k, i)) of dimension KI. That is, τ, η and θ satisfy

τ ¯C ¯A ¯B = τ, (4.5)

η ¯A ¯B ¯C = η, (4.6)

θ ¯B ¯C ¯A = θ. (4.7)

τ and C together determine a joint distribution, f (x_i, y_j, z_k) = τ (i, j)C_ijk, η and A together determine a joint distribution, g(x_i, y_j, z_k) = η(j, k)A_ijk, and θ and B together determine a joint distribution, h(x_i, y_j, z_k) = θ(k, i)B_ijk.

We have the following result.

Theorem 4.1.2

(i) The (Y, Z)-distribution under f is the same as that under g, the (X, Z)-distribution under g is the same as that under h, and the (X, Y )-distribution under h is the same as that under f . That is,

f (+, y_j, z_k) = g(+, y_j, z_k) for all j, k, g(x_i, +, z_k) = h(x_i, +, z_k) for all i, k, h(x_i, y_j, +) = f (x_i, y_j, +) for all i, j.

Consequently, f, g and h have the same X-, Y - and Z-marginal dis-tributions.

(ii) A, B and C are compatible if and only if the stationary distribu-tions τ , η and θ of the respective transition matrices ¯C ¯A ¯B, ¯A ¯B ¯C and

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

B ¯¯C ¯A satisfy τ (i, j)C_ijk = η(j, k)A_ijk = θ(k, i)B_ijk for all i, j, k, i.e., f (x_i, y_j, z_k) = g(x_i, y_j, z_k) = h(x_i, y_j, z_k) for all i, j, k.

Proof:

(i) The distribution of (Y Z) under f is f (+, y_j, z_k) = ^X

τ (i, j)C_ijk

= ^X

i,h

τ (i, h) ¯C( (i, h) , (j, k))

= the (j, k) component of τ ¯C.

So τ ¯C corresponds to the (Y, Z)-distribution under f . Similarly, η ¯A and θ ¯B correspond respectively to the (Z, X) and (X, Y ) distribution under g and h. Multiplying ¯C to equation (4.5) yields

(τ ¯C) ¯A ¯B ¯C = (τ ¯C), which together with (4.6) implies

τ ¯C = η.

So the (Y, Z)-distribution under f = τ ¯C = η = the (Y, Z)-distribution under g. Multiplying ¯A to equation (4.6) yields

(η ¯A) ¯B ¯C ¯A = (η ¯A).

From equation (4.7), we have

η ¯A = θ.

So the (Z, X)-distribution under g = η ¯A = θ = the (Z, X)-distribution under h. Multiplying ¯B to equation (4.7) yields

(θ ¯B) ¯C ¯A ¯B = (θ ¯B).

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

From equation (4.5), we have

θ ¯B = τ.

So the (X, Y )-distribution under h = θ ¯B = τ = the (X, Y )-distribution under f .

We have shown that the (Y, Z)-distribution under f is the same as that under g, the (X, Z)-distribution under g is the same as that under h, and the (X, Y )-distribution under h is the same as that under f . That is,

f (+, y_j, z_k) = g(+, y_j, z_k) for all j, k, g(x_i, +, z_k) = h(x_i, +, z_k) for all i, k, h(xi, yj, +) = f (xi, yj, +) for all i, j.

Consequently, f, g and h have the same X-, Y - and Z-marginal dis-tributions.

(ii) Suppose that A, B and C are compatible, implying that there exists a joint distribution d(x_i, y_j, z_k) such that

Aijk = d(xi, yj, zk)/d(+, yj, zk), B_ijk = d(x_i, y_j, z_k)/d(x_i, +, z_k), C_ijk = d(x_i, y_j, z_k)/d(x_i, y_j, +).

Let

d_X,Y = (d(x₁, y₁, +), . . . , d(x_I, y_J, +)), d_Y,Z = (d(+, y₁, z₁), . . . , d(+, y_J, z_K)), d_Z,X = (d(x₁, +, z₁), . . . , d(x_I, +, z_K)).

dX,YC = d¯ Y,Z, (4.8)

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

d_Y,ZA = d¯ _Z,X, (4.9)

d_Z,XB = d¯ _X,Y. (4.10) Multiplying ¯A ¯B to equation (4.8), ¯B ¯C to equation (4.9) and ¯C ¯A to equation (4.10) yields

d_X,YC ¯¯A ¯B = d_X,Y, d_Y,ZA ¯¯B ¯C = d_Y,Z, d_Z,XB ¯¯C ¯A = d_Z,X. From (4.5), (4.6) and (4.7), we have

τ = d_X,Y, η = d_Y,Z, θ = dZ,X.

It follows that f (x_i, y_j, z_k) = g(x_i, y_j, z_k) = h(x_i, y_j, z_k) = d(x_i, y_j, z_k) for all i, j, k.

Conversely, suppose f (x_i, y_j, z_k) = g(x_i, y_j, z_k) = h(x_i, y_j, z_k) for all i, j, k.

Since A is the conditional distribution of X given (Y, Z) under g, B is the conditional distribution of Y given (Z, X) under h and C is the conditional distribution of Z given (X, Y ) under f , it follows that f = g = h has A, B and C as its three conditional distributions. This proves that A, B and C are compatible.

Example 4.1.2. (Example 3.2.2 continued)

Consider three random variables X, Y and Z with possible values (x₁, x₂), (y₁, y₂) and (z₁, z₂), and three matrices A, B and C.

A =

x₁, x₂ y₁, z₁ 0.1 0.9 y₁, z₂ 0.9 0.1 y₂, z₁ 0.2 0.8 y2, z2 0.8 0.2

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

B =

y₁, y₂ z₁, x₁ 0.3 0.7 z₁, x₂ 0.7 0.3 z₂, x₁ 0.4 0.6 z₂, x₂ 0.6 0.4

C =

z₁, z₂ x₁, y₁ 0.4 0.6 x₁, y₂ 0.6 0.4 x₂, y₁ 0.5 0.5 x2, y2 0.5 0.5

Suppose that our generation sequence is X⁽¹⁾, Y⁽¹⁾, Z⁽¹⁾, X⁽²⁾, Y⁽²⁾, Z⁽²⁾. . . Then ¯C is the following transition matrix from (X, Y ) to (Y, Z):

y₁, z₁ y₁, z₂ y₂, z₁ y₂, z₂

x₁, y₁ 0.4 0.6 0 0

x₁, y₂ 0 0 0.6 0.4

x₂, y₁ 0.5 0.5 0 0

x₂, y₂ 0 0 0.5 0.5

A is the following transition matrix from (Y, Z) to (Z, X):¯ z₁, x₁ z₁, x₂ z₂, x₁ z₂, x₂

y₁, z₁ 0.1 0.9 0 0

y1, z2 0 0 0.9 0.1

y₂, z₁ 0.2 0.8 0 0

y₂, z₂ 0 0 0.8 0.2

B is the following transition matrix from (Z, X) to (X, Y ):¯ x₁, y₁ x₁, y₂ x₂, y₁ x₂, y₂

z₁, x₁ 0.3 0.7 0 0

z₁, x₂ 0 0 0.7 0.3

z₂, x₁ 0.4 0.6 0 0

z2, x2 0 0 0.6 0.4

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Then ¯C ¯A ¯B is the following transition matrix from (X, Y ) to (X, Y ) x1, y1 x1, y2 x2, y1 x2, y2

x1, y1 0.228 0.352 0.288 0.132 x₁, y₂ 0.164 0.276 0.384 0.176 x₂, y₁ 0.195 0.305 0.345 0.155 x2, y2 0.190 0.310 0.340 0.160

A ¯¯B ¯C is the following transition matrix from (Y, Z) to (Y, Z) y₁, z₁ y₁, z₂ y₂, z₁ y₂, z₂

y₁, z₁ 0.327 0.333 0.177 0.163 y₁, z₂ 0.174 0.246 0.344 0.236 y2, z1 0.304 0.316 0.204 0.176 y₂, z₂ 0.188 0.252 0.328 0.232

B ¯¯C ¯A is the following transition matrix from (Z, X) to (Z, X) z₁, x₁, z₁, x₂ z₂, x₁ z₂, x₂

z₁, x₁ 0.096 0.444 0.386 0.074 z1, x2 0.065 0.435 0.435 0.065 z₂, x₁ 0.088 0.432 0.408 0.072 z₂, x₂ 0.070 0.430 0.430 0.070 Suppose that τ, η and θ satisfy the following systems:

τ ¯C ¯A ¯B = τ, η ¯A ¯B ¯C = η, θ ¯B ¯C ¯A = θ.

We find

τ = (0.1910322, 0.3058966, 0.3452520, 0.1578192), η = (0.2490389, 0.2872453, 0.2624476, 0.2012682), θ = (0.0773934, 0.4340930, 0.4195354, 0.0689782).

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

From τ and C, we can determine a joint distribution (f (x_i, y_j, z_k)) = (τ (i, j)C_ijk)

= (f (x1, y1, z1) , f (x1, y1, z2) , f (x1, y2, z1) , f (x1, y2, z2) , f (x₂, y₁, z₁) , f (x₂, y₁, z₂) , f (x₂, y₂, z₁) , f (x₂, y₂, z₂))

= (0.07641288, 0.1146193, 0.183538, 0.1223586, 0.172626, 0.172626, 0.0789096, 0.0789096).

Then

the X-marginal distribution of

f = (f (x1, +, +), f (x2, +, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

f = (f (+, y₁, +), f (+, y₂, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

f = (f (+, +, z₁), f (+, +, z₂)) = (0.5114865, 0.4885135).

From η and A, we can determine a joint distribution (g(x_i, y_j, z_k)) = (η(j, k)A_ijk)

= (g (x1, y1, z1) , g (x1, y1, z2) , g (x1, y2, z1) , g (x1, y2, z2) , g (x₂, y₁, z₁) , g (x₂, y₁, z₂) , g (x₂, y₂, z₁) , g (x₂, y₂, z₂))

= (0.02490389, 0.2585208, 0.05248952, 0.1610146, 0.224135, 0.02872453, 0.2099581, 0.04025364).

Then

the X-marginal distribution of

g = (g(x1, +, +), g(x2, +, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

g = (g(+, y₁, +), g(+, y₂, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

g = (g(+, +, z₁), g(+, +, z₂)) = (0.5114865, 0.4885135).

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

From θ and B, we can determine a joint distribution (h(x_i, y_j, z_k)) = (θ(k, i)B_ijk)

= (h (x1, y1, z1) , h (x1, y1, z2) , h (x1, y2, z1) , h (x1, y2, z2) , h (x2, y1, z1) , h (x2, y1, z2) , h (x2, y2, z1) , h (x2, y2, z2))

= (0.02321802, 0.1678142, 0.05417538, 0.2517212, 0.3038651, 0.04138691, 0.1302279, 0.02759127).

Then

the X-marginal distribution of

h = (h(x₁, +, +), h(x₂+, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

h = (h(+, y1, +), h(+, y2, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

h = (h(+, +, z₁), h(+, +, z₂)) = (0.5114864, 0.4885136).

So that

f (x_i, +, +) = g(x_i, +, +) = h(x_i, +, +) for all i, f (+, y_j, +) = g(+, y_j, +) = h(+, y_j, +) for all j, f (+, +, z_k) = g(+, +, z_k) = h(+, +, z_k) for all k.

All these three joint distributions are different, so A, B and C are incom-patible. However, they have the same marginal distributions.

In fact, we can consider an alternative Markov chain X⁽¹⁾, Z⁽¹⁾, Y⁽¹⁾, X⁽²⁾, Z⁽²⁾, Y⁽²⁾. . . Specifically, start with (X⁽¹⁾, Z⁽¹⁾), then we generate Y⁽¹⁾ using B. Thus we move from (X⁽¹⁾, Z⁽¹⁾) to (Z⁽¹⁾, Y⁽¹⁾). Next, we generate X⁽²⁾ using A together with (Z⁽¹⁾, Y⁽¹⁾), resulting in a movement from (Z⁽¹⁾, Y⁽¹⁾), to (Y⁽¹⁾, X⁽²⁾). Note that in each transition, one of the two components remains the same. So we have the following transitions

(X⁽¹⁾, Z⁽¹⁾) → (Z⁽¹⁾, Y⁽¹⁾) → (Y⁽¹⁾, X⁽²⁾) → (X⁽²⁾, Z⁽²⁾) → (Z⁽²⁾, Y⁽²⁾) → . . .

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

This is a Markov chain, but not homogeneous. We then combine three transitions into a single one, so that we have three homogeneous chains,

(X⁽¹⁾, Z⁽¹⁾) → (X⁽²⁾, Z⁽²⁾) → (X⁽³⁾, Z⁽³⁾) → . . . (Z⁽¹⁾, Y⁽¹⁾) → (Z⁽²⁾, Y⁽²⁾) → (Z⁽³⁾, Y⁽³⁾) → . . . (Y⁽¹⁾, X⁽²⁾) → (Y⁽²⁾, X⁽³⁾) → (Y⁽³⁾, X⁽⁴⁾) → . . . Let ˜A be the following transition matrix from (Z, Y ) to (Y, X):

A((k, j), (h, i)) = P (Y = y˜ h, X = xi|Z = z_k, Y = yj) =







A_ijk if h = j 0 if h 6= j.

B be the following transition matrix from (X, Z) to (Z, Y ):˜ B((i, k), (h, j)) = P (Z = z˜ _h, Y = y_j|X = x_i, Z = z_k) =







Bijk if h = k 0 if h 6= k.

C be the following transition matrix from (Y, X) to (X, Z):˜ C((j, i), (h, k)) = P (X = x˜ _h, Z = z_k|Y = y_j, X = x_i) =







C_ijk if h = i 0 if h 6= i.

The transition matrix of the first chain is ˜B ˜A ˜C, the transition matrix of the second chain is ˜A ˜C ˜B and the transition matrix of the third chain is C ˜˜B ˜A. Each chain has a unique stationary distribution, say ˜τ = (˜τ (i, k)) of dimension IK, ˜η = (˜η(k, j)) of dimension KJ and ˜θ = (˜θ(j, i)) of dimension JI. That is, ˜τ , ˜η and ˜θ are solutions of the following systems:

τ ˜B ˜A ˜C = ˜τ , (4.11)

η ˜A ˜C ˜B = ˜η, (4.12)

θ ˜˜C ˜B ˜A = ˜θ, (4.13)

τ and B together determine a joint distribution, ˜f (x_i, y_j, z_k) = ˜τ (i, k)B_ijk,

η and A together determine a joint distribution, ˜g(xi, yj, zk) = ˜η(k, j)Aijk, and ˜θ and C together determine a joint distribution, ˜h(x_i, y_j, z_k) = ˜θ(j, i)C_ijk.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Following the proof of Theorem 4.1.2, we obtain the following theorem.

Theorem 4.1.3

(i) The (Y, Z)-distribution under ˜f is the same as that under ˜g, the (X, Y )-distribution under ˜g is the same as that under ˜h, and the (X, Z)-distribution under ˜h is the same as that under ˜f . That is,

f (+, y˜ j, zk) = ˜g(+, yj, zk) for all j, k,

g(xi, yj, +) = ˜h(xi, yj, +) for all i, j,

˜h(x_i, +, z_k) = ˜f (x_i, +, z_k) for all i, k.

Consequently, ˜f , ˜g and ˜h have the same X-, Y - and Z-marginal dis-tributions.

(ii) A, B and C are compatible if and only if the stationary distribu-tions ˜τ , ˜η and ˜θ of respective transition matrices ˜B ˜A ˜C, ˜A ˜C ˜B and C ˜˜B ˜A satisfy ˜τ (i, k)Bijk= ˜η(k, j)Aijk= ˜θ(j, i)Cijk for all i, j, k. That is ˜f (x_i, y_j, z_k) = ˜g(x_i, y_j, z_k) = ˜h(x_i, y_j, z_k) for all i, j, k.

Example 4.1.3. (Example 4.1.2 continued)

A =

x1, x2

z1, y1 0.1 0.9 z₁, y₂ 0.2 0.8 z₂, y₁ 0.9 0.1 z2, y2 0.8 0.2

B =

y1, y2

x₁, z₁ 0.3 0.7 x₁, z₂ 0.4 0.6 x₂, z₁ 0.7 0.3 x2, z2 0.6 0.4

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

C =

z₁, z₂ y₁, x₁ 0.4 0.6 y₁, x₂ 0.5 0.5 y₂, x₁ 0.6 0.4 y₂, x₂ 0.5 0.5

Suppose that our generation sequence is X⁽¹⁾, Z⁽¹⁾, Y⁽¹⁾, X⁽²⁾, Z⁽²⁾, Y⁽²⁾. . . Then ˜C is the following transition matrix from (Y, X) to (X, Z):

x₁, z₁ x₁, z₂ x₂, z₁ x₂, z₂ y₁, x₁ 0.4 0.6 0 0

y₁, x₂ 0 0 0.5 0.5

y2, x1 0.6 0.4 0 0

y₂, x₂ 0 0 0.5 0.5

A is the following transition matrix from (Z, Y ) to (Y, X):˜ y₁, x₁ y₁, x₂ y₂, x₁ y₂, x₂

z₁, y₁ 0.1 0.9 0 0

z₁, y₂ 0 0 0.2 0.8

z2, y1 0.9 0.1 0 0

z₂, y₂ 0 0 0.8 0.2

B is the following transition matrix from (X, Z) to (Z, Y ):˜ z₁, y₁ z₁, y₂ z₂, y₁ z₂, y₂

x₁, z₁ 0.3 0.7 0 0

x₁, z₂ 0 0 0.4 0.6

x2, z1 0.7 0.3 0 0

x₂, z₂ 0 0 0.6 0.4

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Then ˜B ˜A ˜C is the following transition matrix from (X, Z) to (X, Z) x₁, z₁ x₁, z₂ x₂, z₁ x₂, z₂

x₁, z₁ 0.096 0.074 0.415 0.415 x₁, z₂ 0.432 0.408 0.080 0.080 x2, z1 0.064 0.066 0.435 0.435 x₂, z₂ 0.408 0.452 0.070 0.070

A ˜˜C ˜B is the following transition matrix from (Z, Y ) to (Z, Y ) z1, y1 z1, y2 z2, y1 z2, y2

z1, y1 0.327 0.163 0.294 0.216 z₁, y₂ 0.316 0.204 0.272 0.208 z₂, y₁ 0.143 0.267 0.246 0.344 z₂, y₂ 0.214 0.366 0.188 0.232

C ˜˜B ˜A is the following transition matrix from (Y, X) to (Y, X) y₁, x₁, y₁, x₂ y₂, x₁ y₂, x₂

y₁, x₁ 0.228 0.132 0.344 0.296 y₁, x₂ 0.305 0.345 0.190 0.160 y₂, x₁ 0.162 0.178 0.276 0.384 y2, x2 0.305 0.345 0.190 0.160 Suppose that ˜τ , ˜η and ˜θ satisfy the following systems:

τ ˜B ˜A ˜C = ˜τ ,

η ˜A ˜C ˜B = ˜η, θ ˜˜C ˜B ˜A = ˜θ.

We find

τ = (0.25, 0.25, 0.25, 0.25),

η = (0.25, 0.25, 0.25, 0.25), θ = (0.25, 0.25, 0.25, 0.25).˜

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

From ˜τ and B, we can determine a joint distribution ( ˜f (x_i, y_j, z_k)) = (˜τ (i, k)B_ijk)

= ( ˜f (x1, y1, z1) , ˜f (x1, y1, z2) , ˜f (x1, y2, z1) , ˜f (x1, y2, z2) f (x˜ ₂, y₁, z₁) , ˜f (x₂, y₁, z₂) , ˜f (x₂, y₂, z₁) , ˜f (x₂, y₂, z₂))

= (0.075, 0.100, 0.175, 0.150, 0.175, 0.150, 0.075, 0.100).

Then

the X-marginal distribution of ˜f = ( ˜f (x1, +, +), ˜f (x2, +, +)) = (0.5, 0.5), the Y -marginal distribution of ˜f = ( ˜f (+, y1, +), ˜f (+, y2, +)) = (0.5, 0.5), the Z-marginal distribution of ˜f = ( ˜f (+, +, z₁), ˜f (+, +, z₂)) = (0.5, 0.5).

From ˜η and A, we can determine a joint distribution (˜g(x_i, y_j, z_k)) = (˜η(k, j)Aijk)

= (˜g (x₁, y₁, z₁) , ˜g (x₁, y₁, z₂) , ˜g (x₁, y₂, z₁) , ˜g (x₁, y₂, z₂)

g (x2, y1, z1) , ˜g (x2, y1, z2) , ˜g (x2, y2, z1) , ˜g(x2, y2, z2))

= (0.025, 0.225, 0.050, 0.200, 0.225, 0.025, 0.200, 0.050).

Then

the X-marginal distribution of ˜g = (˜g(x₁, +, +), ˜g(x₂, +, +)) = (0.5, 0.5), the Y -marginal distribution of ˜g = (˜g(+, y₁, +), ˜g(+, y₂, +)) = (0.5, 0.5), the Z-marginal distribution of ˜g = (˜g(+, +, z1), ˜g(+, +, z2)) = (0.5, 0.5).

From ˜θ and C, we can determine a joint distribution (˜h(xi, yj, zk)) = (˜θ(j, i)B_ijk)

= (˜h (x₁, y₁, z₁) , ˜h (x₁, y₁, z₂) , ˜h (x₁, y₂, z₁) , ˜h (x₁, y₂, z₂) h (x˜ ₂, y₁, z₁) , ˜h (x₂, y₁, z₂) , ˜h (x₂, y₂, z₁) , ˜h(x₂, y₂, z₂))

= (0.100, 0.150, 0.150, 0.100, 0.125, 0.125, 0.125, 0.125).

Then

the X-marginal distribution of ˜h = (˜h(x₁, +, +), ˜h(x₂+, +)) = (0.5, 0.5), the Y -marginal distribution of ˜h = (˜h(+, y₁, +), ˜h(+, y₂, +)) = (0.5, 0.5),

‧

Although all these three joint distributions are different, they have the same marginal distributions.

4.2 Simulations

We applied the Gibbs sampler to generate simulations for Example 4.1.1.

The results are given in Table 4.2.1 where the second column is the empirical distribution of (X⁽ⁱ⁾, Y⁽ⁱ⁾), i = 1, . . . , n while the third column is the empirical distribution of (Y⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, . . . , n.

Table 4.2.1: Empirical distributions for the Gibbs sampler in Example 4.1.1 Sample size Sampling sequence Sampling sequence

n X⁽¹⁾ → Y⁽¹⁾ → X⁽²⁾ → Y⁽²⁾. . . Y⁽¹⁾ → X⁽¹⁾ → Y⁽²⁾ → X⁽²⁾. . .

From Table 4.2.1, we find that the Gibbs sampler has two different em-pirical joint distributions, one based on (X⁽ⁱ⁾, Yⁱ⁾), i = 1, 2, . . . , n, and the

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

other based on (Y⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large.

We also applied the Gibbs sampler to generate simulations for Example 4.1.2. The results are given in Tables 4.2.2−4.2.4. Table 4.2.2 is the empirical distribution of (X⁽ⁱ⁾, Y⁽ⁱ⁾, Z⁽ⁱ⁾), i = 1, . . . , n, Table 4.2.3 is the empirical distribution of (Y⁽ⁱ⁾, Z⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, . . . , n, and Table 4.2.4 is the empirical distribution of (Z⁽ⁱ⁾, X⁽ⁱ⁾, Y⁽ⁱ⁾), i = 1, . . . , n.

Table 4.2.2: Empirical distribution of (X⁽ⁱ⁾, Y⁽ⁱ⁾, Z⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n X⁽¹⁾ → Y⁽¹⁾ → Z⁽¹⁾ → X⁽²⁾ → Y⁽²⁾ → Z⁽²⁾. . .

1000 0.07200 0.09400 0.18200 0.09200

0.20800 0.18200 0.09600 0.07400

10,000 0.07740 0.11740 0.18880 0.12640

0.16720 0.17140 0.08060 0.07080

100,000 0.07602 0.11520 0.18406 0.12286 0.17150 0.17018 0.07986 0.08032

1,000,000 0.07636 0.11502 0.18346 0.12226 0.17279 0.17265 0.07890 0.07856

Stationary distribution 0.07641 0.11462 0.18354 0.12236 0.17263 0.17263 0.07891 0.07891

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Table 4.2.3: Empirical distribution of (Y⁽ⁱ⁾, Z⁽ⁱ⁾, X⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n Y⁽¹⁾ → Z⁽¹⁾ → X⁽¹⁾ → Y⁽²⁾ → Z⁽²⁾ → X⁽²⁾. . .

1000 0.02800 0.25200 0.05600 0.12400

0.23600 0.03600 0.22400 0.04400

10,000 0.02540 0.25440 0.05820 0.16200

0.22300 0.02500 0.21240 0.03960

100,000 0.02570 0.25268 0.05308 0.16290 0.22450 0.02902 0.21228 0.03984

1,000,000 0.02489 0.25862 0.05267 0.16052 0.22425 0.02902 0.21015 0.03988

Stationary distribution 0.02490 0.25852 0.05249 0.16101 0.22414 0.02872 0.20996 0.04025

Table 4.2.4: Empirical distribution of (Z⁽ⁱ⁾, X⁽ⁱ⁾, Y⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n Z⁽¹⁾ → X⁽¹⁾ → Y⁽¹⁾ → Z⁽²⁾ → X⁽²⁾ → Y⁽²⁾. . .

1000 0.01800 0.14400 0.07000 0.23400

0.33400 0.03000 0.14400 0.02600

10,000 0.02280 0.17720 0.05940 0.25300

0.28620 0.04260 0.13120 0.02760

100,000 0.02232 0.16818 0.05386 0.25374 0.30480 0.04170 0.12954 0.02586

1,000,000 0.02340 0.16777 0.05399 0.25179 0.30357 0.04152 0.13074 0.02722

Stationary distribution 0.02322 0.16781 0.05418 0.25172 0.30387 0.04139 0.13023 0.02759

From Tables 4.2.2−4.2.4, we find that the Gibbs sampler has three differ-ent empirical joint distributions, one based on (X⁽ⁱ⁾, Y⁽ⁱ⁾, Z⁽ⁱ⁾), i = 1, 2, . . . , n,

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

another based on (Y⁽ⁱ⁾, Z⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, . . . , n, and the third based on (Z⁽ⁱ⁾, X⁽ⁱ⁾, Y⁽ⁱ⁾), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large. The result is consistent with Theorem 4.1.2.

Finally, we apply the Gibbs sampler to generate simulations for Example 4.1.3. The results are given in Tables 4.2.5−4.2.7. Table 4.2.5 is the empirical distribution of (X⁽ⁱ⁾, Z⁽ⁱ⁾, Y⁽ⁱ⁾), i = 1, . . . , n, Table 4.2.6 is the empirical distribution of (Z⁽ⁱ⁾, Y⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, . . . , n, and Table 4.2.7 is the empirical distribution of (Y⁽ⁱ⁾, X⁽ⁱ⁾, Z⁽ⁱ⁾), i = 1, . . . , n.

Table 4.2.5: Empirical distribution of (X⁽ⁱ⁾, Z⁽ⁱ⁾, Y⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n X⁽¹⁾ → Z⁽¹⁾ → Y⁽¹⁾ → X⁽²⁾ → Z⁽²⁾ → Y⁽²⁾. . .

1000 0.08000 0.11000 0.17000 0.12600

0.20200 0.14600 0.08400 0.08200

10,000 0.07280 0.10340 0.17660 0.15100

0.17300 0.14640 0.07160 0.10520

100,000 0.07400 0.10106 0.17426 0.14968 0.17366 0.14996 0.07692 0.10005

1,000,000 0.07515 0.09987 0.17536 0.15041 0.17523 0.14979 0.07442 0.09976

Stationary distribution 0.07500 0.10000 0.17500 0.15000 0.17500 0.15000 0.07500 0.10000

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

Table 4.2.6: Empirical distribution of (Z⁽ⁱ⁾, Y⁽ⁱ⁾, X⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n Z⁽¹⁾ → Y⁽¹⁾ → X⁽¹⁾ → Z⁽²⁾ → Y⁽²⁾ → X⁽²⁾. . .

1000 0.03000 0.21600 0.05200 0.19200

0.23600 0.02000 0.19000 0.06400

10,000 0.02100 0.23200 0.05200 0.19340

0.22480 0.02560 0.20080 0.05040

100,000 0.02494 0.22412 0.05098 0.20044 0.22496 0.02638 0.19998 0.04820

1,000,000 0.02537 0.22567 0.04998 0.19914 0.22503 0.02507 0.19967 0.05004

Stationary distribution 0.02500 0.22500 0.05000 0.20000 0.22500 0.02500 0.20000 0.05000

Table 4.2.7: Empirical distribution of (Y⁽ⁱ⁾, X⁽ⁱ⁾, Z⁽ⁱ⁾) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n Y⁽¹⁾ → X⁽¹⁾ → Z⁽¹⁾ → Y⁽²⁾ → X⁽²⁾ → Z⁽²⁾. . .

1000 0.01100 0.13400 0.12000 0.10000

0.14400 0.13200 0.14600 0.11200

10,000 0.09900 0.15080 0.15140 0.10560

0.12160 0.12260 0.12640 0.12260

100,000 0.10006 0.15218 0.15052 0.10040 0.12500 0.12386 0.12300 0.12498

1,000,000 0.09994 0.15049 0.14981 0.10042 0.12487 0.12390 0.12494 0.12563

Stationary distribution 0.10000 0.15000 0.15000 0.10000 0.12500 0.12500 0.12500 0.12500

From Tables 4.2.5−4.2.7, we find that the Gibbs sampler has three differ-ent empirical joint distributions, one based on (X⁽ⁱ⁾, Z⁽ⁱ⁾, Y⁽ⁱ⁾), i = 1, 2, . . . , n,

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

another based on (Z⁽ⁱ⁾, Y⁽ⁱ⁾, X⁽ⁱ⁾), i = 1, . . . , n, and the third based on (Y⁽ⁱ⁾, X⁽ⁱ⁾, Z⁽ⁱ⁾), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large. The result is consistent with Theorem 4.1.3.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

5 Conclusions

Although the ratio matrix approach can deal with the compatibility of discrete conditional distributions, it can be only applied to two-dimensional case. Our graphical representation approach, using basic ideas in graph the-ory, can be extended to higher-dimensional case. This approach can not only check the compatibility but also find the set of all compatible joint distri-butions when the given conditional distridistri-butions are compatible. It works for general n-variate cases and allows for zero elements. Moreover, when the graph is connected, we can use a spanning tree to check the compatibility and find the unique probability distribution if the given conditional distributions are compatible.

In the present paper, we restrict attention to the case where each random variable takes values in a finite set and the given conditional distributions are full. If a random variable takes values in an infinite set, e.g., Poisson variate, the compatibility problem should be extended to the infinite setting.

However, in the literature little has been done for the infinite setting. So our graphical representation approach can not be readily extended to this setting.

Consider random variables X₁, . . . , X_n, a conditional distribution p_S|T, where S 6= ∅, S ∩ T = ∅ and S ∪ T = {1, . . . , n} , is called a full condi-tional distribution because all variables are involved. For instance, if n = 3, then p_12|3 is a full conditional distribution but p_1|2 is not. Since specifying a full conditional distribution p_S|T amounts to specifying the probability ratio p (x) : p (x⁰) for all x = (x_S, x_T) and x⁰ = (x⁰_S, x⁰_T) with x_T = x⁰_T where x_S denotes x restricted to the subset S of {1, . . . , n}, the given conditional distributions can be equivalently described in terms of probability ratios be-tween vertices. However, this is not so for general conditional disibutions as considered by Gelmen and Speed (1993). For example, if n = 3 and the given conditional distributions are p_1|2, p_2|3 and p_3|1, then we can not find any probability ratios between vertices. This is a major limitation of the approach.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

In practical applications, since specified conditional distributions are typ-ically subject to errors, it is unlikely for them to be exactly compatible. An issue of practical relevance is to find a probability distribution that is “most nearly compatible” with the given conditional distributions, which has been addressed by Arnold, Castillo and Sarabia (2002) and Chen, Ip and Wang (2011). It will be of great interest to formulate and solve this problem in terms of a graphical representation.

We also present the relation of compatibility with Gibbs sampler in higher-dimensional case. We transfer a nonhomogeneous Markov chain into all kinds of homogeneous Markov chains by combining successive transitions into a single one. We prove that a given set of conditional distributions is compatible if and only if all homogeneous Markov chains give rise to the same joint distribution. This result can be extended to general n-variate cases.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

References

[1] Arnold, B. C., Castillo, E., and Sarabia, J. M. (1999). Conditional spec-ification of statistical models. Springer, New York.

[2] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2001). Conditionally specified distributions: an introduction (with discussions). Statistical Science, 16, 249-274.

[3] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2002). Exact and near compatibility of discrete distributions. Computational Statistics and Data Analysis, 40, 231-252.

[4] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2004). Compatibility of partial or complete conditional probability specifications. Journal of Statistical Planning and Inference, 123, 133-159.

[5] Arnold, B. C. and Gokhale, D. V. (1998). Distributions most nearly compatible with given families of conditional distributions. Test 7, 377-390.

[6] Arnold, B. C. and Press, S. J. (1989). Campatible conditional distribu-tions. Journal of the American Statistical Association, 84, 152-156.

[7] Besag, J., (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B 36, 192-236.

[8] Gelman, A. and Speed, T. P. (1993). Characterizing a joint probabil-ity distribution by conditionals. Journal of the Royal Statistical Society.

Series B 55, 185-188.

[9] Gourieroux, C. and Monfort, A. (1979). On the characterization of a joint probability distribution by conditional distributions. Journal of Econo-metrics, 10, 115-118.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

[10] Hobert, J. P. and Casella, G. (1998) Functional compatibility, markov chains, and Gibbs sampling with improper posteriors. Journal of Com-putational and Graphical Statistics, 7, 42-60.

[11] Ip, E. H., Wang, Y. J., (2009) Canonical representation of condition-ally specified multivariate discrete distributions. Journal of Multivariate Analysis, 100, 1282-1290.

[12] Kuo, K-L, Wang, Y. J. (2011) A simple algorithm for checking compati-bility among discrete conditional distributions. Computational Statistics and Data Analysis, 5, 2457-2462.

[13] Liu, J. S. (1996) Discussion of “Statistical inference and Monte Carlo algorithms” by Casella, G. Test 5, 305-310.

[14] Slavkovic, A. B., Sullivant, S., (2006) The space of compatible full condi-tionals is a unimodular toric variety. Journal of Symbolic Computation, 41, 196-209.

[15] Song, C. C., Li, L. A., Chen, C. H., Jiang, T. J. and Kuo, K. L. (2010).

Compatibilty of finie discrete conditional distributions. Statistica Sinica, 20, 423-440.

[16] Tian, G. L., Tan, M., Ng, K. W. and Tang, M. L. (2009). A unified method for checking compatibility and uniqueness for finite discrete con-ditional distributions. Communications in Statistics-Theory and Models, 38, 115-129.

[17] Toffoli, E., Cecchin, E., Corona, G., Russo, A., Buonadonna, A., D’Andrea, M., Pasetto, L., Pessa, S., Errante, D., De Pangher, V., Giusto, M., Medici, M., Gaion, F., Sandri, P., Galligioni, E., Bonura, S., Boccalon, M., Biason, P., Frustaci, S. (2006). The role of UGT1A1*28 polymorphism in the pharmacodynamics and pharmacoki-netics of irinotecan in patients with metastatic colorectal cancer. Jounal of Clinical Oncology 24, 3061-3068.

‧ 國

立政治大學

‧

N a

tio na

l C h engchi U ni ve rs it y

[18] Wang, Y. J., Kuo, K-L. (2010) Compatibility of discrete conditional distributions with structural zeros. Journal of Multivariate Analysis, 101, 191-199.

[19] Yao, Y. C., Chen, S. C., Wang, S. H. (2014). On compatibility of dis-crete full conditional distributions: A graphical representation approach.

Journal of Multivariate Analysis, 124, 1-9.

在文檔中離散條件機率分配之相容性研究 - 政大學術集成 (頁 35-63)

4 Markov chain characterizations

國

立 政 治 大 學

‧

N a

tio na

l C h engchi U ni ve rs it y