• 沒有找到結果。

4 Markov chain characterizations

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

4 Markov chain characterizations

4.1 Compatibility by the Gibbs sampler

Suppose that X and Y are two random variables taking values in {x1, . . . , xI} and {y1, . . . , yJ}, respectively. Consider two conditional probability ma-trices A = (Aij) = (P {X = xi|Y = yj}) and B = (Bij) = (P {Y = yj|X = xi}). Arnold, Castillo and Sarabia (1999) treated the matrix A0 (transpose of A) as a transition matrix from Y to X and the matrix B as a transition matrix from X to Y , and then applied the Gibbs sampler to obtain stationary distributions. We describe the method as follows. For ease of discussion, we assume Aij > 0, Bij > 0 for all i, j.

We begin with an initial X(1). Conditioning on X(1), draw a Y(1) from B.

Next, conditioning on Y(1), draw a X(2) from A0. So we have the following transitions:

X(1) −−→ YB (1) −−→ XA0 (2) −−→ YB (2) −−→ XA0 (3) −−→ YB (3) → . . .

This is a Markov chain, but not homogeneous. We then combine two tran-sitions into a single one, so that we have the following two homogeneous chains:

X(1) → X(2) → X(3). . . Y(1) → Y(2) → Y(3). . .

The transition matrix of the first chain is BA0, and the transition matrix of the second chain is A0B. Each chain determines a stationary distribution, say τ = (τi) and η = (ηj) where τi = P (X = xi) and ηj = P (Y = yj). That is, τ and η are solutions of the following systems:

τ BA0 = τ, (4.1)

ηA0B = η. (4.2)

Note that both transition matrices BA0 and A0B are irreducible, so that the respective stationary distributions τ and η are unique.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

τ and B together determine a joint distribution f (xi, yj) = τiBij, and η and A together determine a joint distribution g(xi, yj) = ηjAij.

Let f (xi, +) = Pyj f (xi, yj) and f (+, yj) =Pxif (xi, yj), so that f (xi, +) and f (+, yj) are the marginal distributions of f . Arnold, Castillo and Sarabia (1999) obtained the following theorem.

Theorem 4.1.1.

(i) Whether A and B are compatible or not, both joint distributions f and g have the same marginal distributions. That is, f (xi, +) = g(xi, +) and f (+, yj) = g(+, yj) for all i, j.

(ii) A and B are compatible if and only if the stationary distributions τ and η of the respective transition matrices BA0 and A0B satisfy τiBij = ηjAij for all i, j, i.e., f (xi, yj) = g(xi, yj) for all i, j.

Proof:

(i) Note that

f (+, yj) = X

xi

f (xi, yj) = X

i

τiBij = (τ B)j.

So the row vector τ B corresponds to the Y -marginal distribution of f .

Similarly, ηA0 corresponds to the X-marginal distribution of g.

Multiplying B to equation (4.1) yields (τ B)A0B = (τ B), which together with (4.2) implies

τ B = η.

So the Y -marginal distribution of f = τ B = η = the Y -marginal distribution of g.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Multiplying A0 to equation (4.2) yields (ηA0)BA0 = (ηA0).

From equation (4.1), we have

ηA0 = τ.

So the X-marginal distribution of g = ηA0 = τ = the X-marginal distribution of f . This proves that both joint distributions have the same marginal distributions.

(ii) Suppose that A and B are compatible, implying that there exists a joint distribution h(xi, yj) such that

h (xi, yj)

h (+, yj) = Aij and h (xi, yj)

h (xi, +) = Bij. Let

hX = (h(x1, +), . . . , h(xI, +)) and hY = (h(+, y1), . . . , h(+, yJ)), which correspond to the X- and Y -marginal distributions of h.

So

hXB = hY, (4.3)

hYA0 = hX. (4.4)

Multiplying A0 to equation (4.3) and B to equation (4.4) yields hXBA0 = hYA0 = hX,

hYA0B = hXB = hY. From (4.1) and (4.2), we have

τ = hX and η = hY.

It follows that f (xi, yj) = g(xi, yj) = h(xi, yj) for all i, j.

is the conditional distribution of X given Y under g and B is the conditional distribution of Y given X under f , it follows that f = g has A and B as its two conditional distributions. This proves that A and B are compatible.

Example 4.1.1. Consider two conditional distribution matrices:

A =

τ = (0.29546, 0.70454), η = (0.47727, 0.52273).

τ and B together determine a joint distribution (f (xi, yj)) = (τiBij) = while η and A together determine a joint distribution

(g(xi, yj)) = (ηjAij) =

The two joint distributions are different, so A and B are incompatible. How-ever, they have the same marginal distributions, τ and η.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Arnold, Castillo, Sarabia (1999) only considered Markov chain char-acterization involving two random variables. We now consider the three-dimensional case where X, Y and Z are discrete random variables with I, J and K possible values, respectively. Three conditional distributions are given by

Aijk = P (X = xi|Y = yj, Z = zk), Bijk = P (Y = yj|X = xi, Z = zk), Cijk = P (Z = zk|X = xi, Y = yj).

Again for ease of discussion, we assume Aijk, Bijk and Cijk are all positive.

We generate a Markov chain X(1), Y(1), Z(1), X(2), Y(2), Z(2), . . . as follows.

We start with (X(1), Y(1)). Then generate Z(1) using C together with (X(1), Y(1)). Thus we move from (X(1), Y(1)) to (Y (1), Z(1)). Next, we gen-erate X(2) using A together with (Y(1), Z(1)), resulting in a movement from (Y(1), Z(1)) to (Z(1), X(2)). Note that in each transition, one of the two com-ponents remains the same. So we have the following transitions

(X(1), Y(1)) → (Y(1), Z(1)) → (Z(1), X(2)) → (X(2), Y(2)) → (Y(2), Z(2)) → . . . This is a Markov chain, but not homogeneous. We then combine three transitions into a single one, so that we have three homogeneous chains,

(X(1), Y(1)) → (X(2), Y(2)) → (X(3), Y(3)) → . . . (Y(1), Z(1)) → (Y(2), Z(2)) → (Y(3), Z(3)) → . . . (Z(1), X(2)) → (Z(2), X(3)) → (Z(3), X(4)) → . . . Let ¯A be the transition matrix from (Y, Z) to (Z, X):

A((j, k), (h, i)) = P (Z = z¯ h, X = xi|Y = yj, Z = zk) =

Aijk if h = k 0 if h 6= k, B the transition matrix from (Z, X) to (X, Y ):¯

B((k, i), (h, j)) = P (X = x¯ h, Y = yj|Z = zk, X = xi) =

Bijk if h = i 0 if h 6= i,

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

and ¯C the transition matrix from (X, Y ) to (Y, Z):

C((i, j), (h, k)) = P (Y = y¯ h, Z = zk|X = xi, Y = yj) =

Cijk if h = j 0 if h 6= j.

The transition matrix of the first chain is ¯C ¯A ¯B, the transition matrix of the second chain is ¯A ¯B ¯C and the transition matrix of the third chain is ¯B ¯C ¯A.

Each chain has a unique stationary distribution, say τ = (τ (i, j)) of dimension IJ, η = (η(j, k)) of dimension JK and θ = (θ(k, i)) of dimension KI. That is, τ, η and θ satisfy

τ ¯C ¯A ¯B = τ, (4.5)

η ¯A ¯B ¯C = η, (4.6)

θ ¯B ¯C ¯A = θ. (4.7)

τ and C together determine a joint distribution, f (xi, yj, zk) = τ (i, j)Cijk, η and A together determine a joint distribution, g(xi, yj, zk) = η(j, k)Aijk, and θ and B together determine a joint distribution, h(xi, yj, zk) = θ(k, i)Bijk.

We have the following result.

Theorem 4.1.2

(i) The (Y, Z)-distribution under f is the same as that under g, the (X, Z)-distribution under g is the same as that under h, and the (X, Y )-distribution under h is the same as that under f . That is,

f (+, yj, zk) = g(+, yj, zk) for all j, k, g(xi, +, zk) = h(xi, +, zk) for all i, k, h(xi, yj, +) = f (xi, yj, +) for all i, j.

Consequently, f, g and h have the same X-, Y - and Z-marginal dis-tributions.

(ii) A, B and C are compatible if and only if the stationary distribu-tions τ , η and θ of the respective transition matrices ¯C ¯A ¯B, ¯A ¯B ¯C and

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

B ¯¯C ¯A satisfy τ (i, j)Cijk = η(j, k)Aijk = θ(k, i)Bijk for all i, j, k, i.e., f (xi, yj, zk) = g(xi, yj, zk) = h(xi, yj, zk) for all i, j, k.

Proof:

(i) The distribution of (Y Z) under f is f (+, yj, zk) = X

i

τ (i, j)Cijk

= X

i,h

τ (i, h) ¯C( (i, h) , (j, k))

= the (j, k) component of τ ¯C.

So τ ¯C corresponds to the (Y, Z)-distribution under f . Similarly, η ¯A and θ ¯B correspond respectively to the (Z, X) and (X, Y ) distribution under g and h. Multiplying ¯C to equation (4.5) yields

(τ ¯C) ¯A ¯B ¯C = (τ ¯C), which together with (4.6) implies

τ ¯C = η.

So the (Y, Z)-distribution under f = τ ¯C = η = the (Y, Z)-distribution under g. Multiplying ¯A to equation (4.6) yields

(η ¯A) ¯B ¯C ¯A = (η ¯A).

From equation (4.7), we have

η ¯A = θ.

So the (Z, X)-distribution under g = η ¯A = θ = the (Z, X)-distribution under h. Multiplying ¯B to equation (4.7) yields

(θ ¯B) ¯C ¯A ¯B = (θ ¯B).

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

From equation (4.5), we have

θ ¯B = τ.

So the (X, Y )-distribution under h = θ ¯B = τ = the (X, Y )-distribution under f .

We have shown that the (Y, Z)-distribution under f is the same as that under g, the (X, Z)-distribution under g is the same as that under h, and the (X, Y )-distribution under h is the same as that under f . That is,

f (+, yj, zk) = g(+, yj, zk) for all j, k, g(xi, +, zk) = h(xi, +, zk) for all i, k, h(xi, yj, +) = f (xi, yj, +) for all i, j.

Consequently, f, g and h have the same X-, Y - and Z-marginal dis-tributions.

(ii) Suppose that A, B and C are compatible, implying that there exists a joint distribution d(xi, yj, zk) such that

Aijk = d(xi, yj, zk)/d(+, yj, zk), Bijk = d(xi, yj, zk)/d(xi, +, zk), Cijk = d(xi, yj, zk)/d(xi, yj, +).

Let

dX,Y = (d(x1, y1, +), . . . , d(xI, yJ, +)), dY,Z = (d(+, y1, z1), . . . , d(+, yJ, zK)), dZ,X = (d(x1, +, z1), . . . , d(xI, +, zK)).

So

dX,YC = d¯ Y,Z, (4.8)

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

dY,ZA = d¯ Z,X, (4.9)

dZ,XB = d¯ X,Y. (4.10) Multiplying ¯A ¯B to equation (4.8), ¯B ¯C to equation (4.9) and ¯C ¯A to equation (4.10) yields

dX,YC ¯¯A ¯B = dX,Y, dY,ZA ¯¯B ¯C = dY,Z, dZ,XB ¯¯C ¯A = dZ,X. From (4.5), (4.6) and (4.7), we have

τ = dX,Y, η = dY,Z, θ = dZ,X.

It follows that f (xi, yj, zk) = g(xi, yj, zk) = h(xi, yj, zk) = d(xi, yj, zk) for all i, j, k.

Conversely, suppose f (xi, yj, zk) = g(xi, yj, zk) = h(xi, yj, zk) for all i, j, k.

Since A is the conditional distribution of X given (Y, Z) under g, B is the conditional distribution of Y given (Z, X) under h and C is the conditional distribution of Z given (X, Y ) under f , it follows that f = g = h has A, B and C as its three conditional distributions. This proves that A, B and C are compatible.

Example 4.1.2. (Example 3.2.2 continued)

Consider three random variables X, Y and Z with possible values (x1, x2), (y1, y2) and (z1, z2), and three matrices A, B and C.

A =

x1, x2 y1, z1 0.1 0.9 y1, z2 0.9 0.1 y2, z1 0.2 0.8 y2, z2 0.8 0.2

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

B =

y1, y2 z1, x1 0.3 0.7 z1, x2 0.7 0.3 z2, x1 0.4 0.6 z2, x2 0.6 0.4

C =

z1, z2 x1, y1 0.4 0.6 x1, y2 0.6 0.4 x2, y1 0.5 0.5 x2, y2 0.5 0.5

Suppose that our generation sequence is X(1), Y(1), Z(1), X(2), Y(2), Z(2). . . Then ¯C is the following transition matrix from (X, Y ) to (Y, Z):

y1, z1 y1, z2 y2, z1 y2, z2

x1, y1 0.4 0.6 0 0

x1, y2 0 0 0.6 0.4

x2, y1 0.5 0.5 0 0

x2, y2 0 0 0.5 0.5

A is the following transition matrix from (Y, Z) to (Z, X):¯ z1, x1 z1, x2 z2, x1 z2, x2

y1, z1 0.1 0.9 0 0

y1, z2 0 0 0.9 0.1

y2, z1 0.2 0.8 0 0

y2, z2 0 0 0.8 0.2

B is the following transition matrix from (Z, X) to (X, Y ):¯ x1, y1 x1, y2 x2, y1 x2, y2

z1, x1 0.3 0.7 0 0

z1, x2 0 0 0.7 0.3

z2, x1 0.4 0.6 0 0

z2, x2 0 0 0.6 0.4

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Then ¯C ¯A ¯B is the following transition matrix from (X, Y ) to (X, Y ) x1, y1 x1, y2 x2, y1 x2, y2

x1, y1 0.228 0.352 0.288 0.132 x1, y2 0.164 0.276 0.384 0.176 x2, y1 0.195 0.305 0.345 0.155 x2, y2 0.190 0.310 0.340 0.160

A ¯¯B ¯C is the following transition matrix from (Y, Z) to (Y, Z) y1, z1 y1, z2 y2, z1 y2, z2

y1, z1 0.327 0.333 0.177 0.163 y1, z2 0.174 0.246 0.344 0.236 y2, z1 0.304 0.316 0.204 0.176 y2, z2 0.188 0.252 0.328 0.232

B ¯¯C ¯A is the following transition matrix from (Z, X) to (Z, X) z1, x1, z1, x2 z2, x1 z2, x2

z1, x1 0.096 0.444 0.386 0.074 z1, x2 0.065 0.435 0.435 0.065 z2, x1 0.088 0.432 0.408 0.072 z2, x2 0.070 0.430 0.430 0.070 Suppose that τ, η and θ satisfy the following systems:

τ ¯C ¯A ¯B = τ, η ¯A ¯B ¯C = η, θ ¯B ¯C ¯A = θ.

We find

τ = (0.1910322, 0.3058966, 0.3452520, 0.1578192), η = (0.2490389, 0.2872453, 0.2624476, 0.2012682), θ = (0.0773934, 0.4340930, 0.4195354, 0.0689782).

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

From τ and C, we can determine a joint distribution (f (xi, yj, zk)) = (τ (i, j)Cijk)

= (f (x1, y1, z1) , f (x1, y1, z2) , f (x1, y2, z1) , f (x1, y2, z2) , f (x2, y1, z1) , f (x2, y1, z2) , f (x2, y2, z1) , f (x2, y2, z2))

= (0.07641288, 0.1146193, 0.183538, 0.1223586, 0.172626, 0.172626, 0.0789096, 0.0789096).

Then

the X-marginal distribution of

f = (f (x1, +, +), f (x2, +, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

f = (f (+, y1, +), f (+, y2, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

f = (f (+, +, z1), f (+, +, z2)) = (0.5114865, 0.4885135).

From η and A, we can determine a joint distribution (g(xi, yj, zk)) = (η(j, k)Aijk)

= (g (x1, y1, z1) , g (x1, y1, z2) , g (x1, y2, z1) , g (x1, y2, z2) , g (x2, y1, z1) , g (x2, y1, z2) , g (x2, y2, z1) , g (x2, y2, z2))

= (0.02490389, 0.2585208, 0.05248952, 0.1610146, 0.224135, 0.02872453, 0.2099581, 0.04025364).

Then

the X-marginal distribution of

g = (g(x1, +, +), g(x2, +, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

g = (g(+, y1, +), g(+, y2, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

g = (g(+, +, z1), g(+, +, z2)) = (0.5114865, 0.4885135).

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

From θ and B, we can determine a joint distribution (h(xi, yj, zk)) = (θ(k, i)Bijk)

= (h (x1, y1, z1) , h (x1, y1, z2) , h (x1, y2, z1) , h (x1, y2, z2) , h (x2, y1, z1) , h (x2, y1, z2) , h (x2, y2, z1) , h (x2, y2, z2))

= (0.02321802, 0.1678142, 0.05417538, 0.2517212, 0.3038651, 0.04138691, 0.1302279, 0.02759127).

Then

the X-marginal distribution of

h = (h(x1, +, +), h(x2+, +)) = (0.4969288, 0.5030712), the Y -marginal distribution of

h = (h(+, y1, +), h(+, y2, +)) = (0.5362842, 0.4637158), the Z-marginal distribution of

h = (h(+, +, z1), h(+, +, z2)) = (0.5114864, 0.4885136).

So that

f (xi, +, +) = g(xi, +, +) = h(xi, +, +) for all i, f (+, yj, +) = g(+, yj, +) = h(+, yj, +) for all j, f (+, +, zk) = g(+, +, zk) = h(+, +, zk) for all k.

All these three joint distributions are different, so A, B and C are incom-patible. However, they have the same marginal distributions.

In fact, we can consider an alternative Markov chain X(1), Z(1), Y(1), X(2), Z(2), Y(2). . . Specifically, start with (X(1), Z(1)), then we generate Y(1) using B. Thus we move from (X(1), Z(1)) to (Z(1), Y(1)). Next, we generate X(2) using A together with (Z(1), Y(1)), resulting in a movement from (Z(1), Y(1)), to (Y(1), X(2)). Note that in each transition, one of the two components remains the same. So we have the following transitions

(X(1), Z(1)) → (Z(1), Y(1)) → (Y(1), X(2)) → (X(2), Z(2)) → (Z(2), Y(2)) → . . .

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

This is a Markov chain, but not homogeneous. We then combine three transitions into a single one, so that we have three homogeneous chains,

(X(1), Z(1)) → (X(2), Z(2)) → (X(3), Z(3)) → . . . (Z(1), Y(1)) → (Z(2), Y(2)) → (Z(3), Y(3)) → . . . (Y(1), X(2)) → (Y(2), X(3)) → (Y(3), X(4)) → . . . Let ˜A be the following transition matrix from (Z, Y ) to (Y, X):

A((k, j), (h, i)) = P (Y = y˜ h, X = xi|Z = zk, Y = yj) =

Aijk if h = j 0 if h 6= j.

B be the following transition matrix from (X, Z) to (Z, Y ):˜ B((i, k), (h, j)) = P (Z = z˜ h, Y = yj|X = xi, Z = zk) =

Bijk if h = k 0 if h 6= k.

C be the following transition matrix from (Y, X) to (X, Z):˜ C((j, i), (h, k)) = P (X = x˜ h, Z = zk|Y = yj, X = xi) =

Cijk if h = i 0 if h 6= i.

The transition matrix of the first chain is ˜B ˜A ˜C, the transition matrix of the second chain is ˜A ˜C ˜B and the transition matrix of the third chain is C ˜˜B ˜A. Each chain has a unique stationary distribution, say ˜τ = (˜τ (i, k)) of dimension IK, ˜η = (˜η(k, j)) of dimension KJ and ˜θ = (˜θ(j, i)) of dimension JI. That is, ˜τ , ˜η and ˜θ are solutions of the following systems:

˜

τ ˜B ˜A ˜C = ˜τ , (4.11)

˜

η ˜A ˜C ˜B = ˜η, (4.12)

θ ˜˜C ˜B ˜A = ˜θ, (4.13)

˜

τ and B together determine a joint distribution, ˜f (xi, yj, zk) = ˜τ (i, k)Bijk,

˜

η and A together determine a joint distribution, ˜g(xi, yj, zk) = ˜η(k, j)Aijk, and ˜θ and C together determine a joint distribution, ˜h(xi, yj, zk) = ˜θ(j, i)Cijk.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Following the proof of Theorem 4.1.2, we obtain the following theorem.

Theorem 4.1.3

(i) The (Y, Z)-distribution under ˜f is the same as that under ˜g, the (X, Y )-distribution under ˜g is the same as that under ˜h, and the (X, Z)-distribution under ˜h is the same as that under ˜f . That is,

f (+, y˜ j, zk) = ˜g(+, yj, zk) for all j, k,

˜

g(xi, yj, +) = ˜h(xi, yj, +) for all i, j,

˜h(xi, +, zk) = ˜f (xi, +, zk) for all i, k.

Consequently, ˜f , ˜g and ˜h have the same X-, Y - and Z-marginal dis-tributions.

(ii) A, B and C are compatible if and only if the stationary distribu-tions ˜τ , ˜η and ˜θ of respective transition matrices ˜B ˜A ˜C, ˜A ˜C ˜B and C ˜˜B ˜A satisfy ˜τ (i, k)Bijk= ˜η(k, j)Aijk= ˜θ(j, i)Cijk for all i, j, k. That is ˜f (xi, yj, zk) = ˜g(xi, yj, zk) = ˜h(xi, yj, zk) for all i, j, k.

Example 4.1.3. (Example 4.1.2 continued)

A =

x1, x2

z1, y1 0.1 0.9 z1, y2 0.2 0.8 z2, y1 0.9 0.1 z2, y2 0.8 0.2

B =

y1, y2

x1, z1 0.3 0.7 x1, z2 0.4 0.6 x2, z1 0.7 0.3 x2, z2 0.6 0.4

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

C =

z1, z2 y1, x1 0.4 0.6 y1, x2 0.5 0.5 y2, x1 0.6 0.4 y2, x2 0.5 0.5

Suppose that our generation sequence is X(1), Z(1), Y(1), X(2), Z(2), Y(2). . . Then ˜C is the following transition matrix from (Y, X) to (X, Z):

x1, z1 x1, z2 x2, z1 x2, z2 y1, x1 0.4 0.6 0 0

y1, x2 0 0 0.5 0.5

y2, x1 0.6 0.4 0 0

y2, x2 0 0 0.5 0.5

A is the following transition matrix from (Z, Y ) to (Y, X):˜ y1, x1 y1, x2 y2, x1 y2, x2

z1, y1 0.1 0.9 0 0

z1, y2 0 0 0.2 0.8

z2, y1 0.9 0.1 0 0

z2, y2 0 0 0.8 0.2

B is the following transition matrix from (X, Z) to (Z, Y ):˜ z1, y1 z1, y2 z2, y1 z2, y2

x1, z1 0.3 0.7 0 0

x1, z2 0 0 0.4 0.6

x2, z1 0.7 0.3 0 0

x2, z2 0 0 0.6 0.4

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Then ˜B ˜A ˜C is the following transition matrix from (X, Z) to (X, Z) x1, z1 x1, z2 x2, z1 x2, z2

x1, z1 0.096 0.074 0.415 0.415 x1, z2 0.432 0.408 0.080 0.080 x2, z1 0.064 0.066 0.435 0.435 x2, z2 0.408 0.452 0.070 0.070

A ˜˜C ˜B is the following transition matrix from (Z, Y ) to (Z, Y ) z1, y1 z1, y2 z2, y1 z2, y2

z1, y1 0.327 0.163 0.294 0.216 z1, y2 0.316 0.204 0.272 0.208 z2, y1 0.143 0.267 0.246 0.344 z2, y2 0.214 0.366 0.188 0.232

C ˜˜B ˜A is the following transition matrix from (Y, X) to (Y, X) y1, x1, y1, x2 y2, x1 y2, x2

y1, x1 0.228 0.132 0.344 0.296 y1, x2 0.305 0.345 0.190 0.160 y2, x1 0.162 0.178 0.276 0.384 y2, x2 0.305 0.345 0.190 0.160 Suppose that ˜τ , ˜η and ˜θ satisfy the following systems:

˜

τ ˜B ˜A ˜C = ˜τ ,

˜

η ˜A ˜C ˜B = ˜η, θ ˜˜C ˜B ˜A = ˜θ.

We find

˜

τ = (0.25, 0.25, 0.25, 0.25),

˜

η = (0.25, 0.25, 0.25, 0.25), θ = (0.25, 0.25, 0.25, 0.25).˜

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

From ˜τ and B, we can determine a joint distribution ( ˜f (xi, yj, zk)) = (˜τ (i, k)Bijk)

= ( ˜f (x1, y1, z1) , ˜f (x1, y1, z2) , ˜f (x1, y2, z1) , ˜f (x1, y2, z2) f (x˜ 2, y1, z1) , ˜f (x2, y1, z2) , ˜f (x2, y2, z1) , ˜f (x2, y2, z2))

= (0.075, 0.100, 0.175, 0.150, 0.175, 0.150, 0.075, 0.100).

Then

the X-marginal distribution of ˜f = ( ˜f (x1, +, +), ˜f (x2, +, +)) = (0.5, 0.5), the Y -marginal distribution of ˜f = ( ˜f (+, y1, +), ˜f (+, y2, +)) = (0.5, 0.5), the Z-marginal distribution of ˜f = ( ˜f (+, +, z1), ˜f (+, +, z2)) = (0.5, 0.5).

From ˜η and A, we can determine a joint distribution (˜g(xi, yj, zk)) = (˜η(k, j)Aijk)

= (˜g (x1, y1, z1) , ˜g (x1, y1, z2) , ˜g (x1, y2, z1) , ˜g (x1, y2, z2)

˜

g (x2, y1, z1) , ˜g (x2, y1, z2) , ˜g (x2, y2, z1) , ˜g(x2, y2, z2))

= (0.025, 0.225, 0.050, 0.200, 0.225, 0.025, 0.200, 0.050).

Then

the X-marginal distribution of ˜g = (˜g(x1, +, +), ˜g(x2, +, +)) = (0.5, 0.5), the Y -marginal distribution of ˜g = (˜g(+, y1, +), ˜g(+, y2, +)) = (0.5, 0.5), the Z-marginal distribution of ˜g = (˜g(+, +, z1), ˜g(+, +, z2)) = (0.5, 0.5).

From ˜θ and C, we can determine a joint distribution (˜h(xi, yj, zk)) = (˜θ(j, i)Bijk)

= (˜h (x1, y1, z1) , ˜h (x1, y1, z2) , ˜h (x1, y2, z1) , ˜h (x1, y2, z2) h (x˜ 2, y1, z1) , ˜h (x2, y1, z2) , ˜h (x2, y2, z1) , ˜h(x2, y2, z2))

= (0.100, 0.150, 0.150, 0.100, 0.125, 0.125, 0.125, 0.125).

Then

the X-marginal distribution of ˜h = (˜h(x1, +, +), ˜h(x2+, +)) = (0.5, 0.5), the Y -marginal distribution of ˜h = (˜h(+, y1, +), ˜h(+, y2, +)) = (0.5, 0.5),

Although all these three joint distributions are different, they have the same marginal distributions.

4.2 Simulations

We applied the Gibbs sampler to generate simulations for Example 4.1.1.

The results are given in Table 4.2.1 where the second column is the empirical distribution of (X(i), Y(i)), i = 1, . . . , n while the third column is the empirical distribution of (Y(i), X(i)), i = 1, . . . , n.

Table 4.2.1: Empirical distributions for the Gibbs sampler in Example 4.1.1 Sample size Sampling sequence Sampling sequence

n X(1) → Y(1) → X(2) → Y(2). . . Y(1) → X(1) → Y(2) → X(2). . .

From Table 4.2.1, we find that the Gibbs sampler has two different em-pirical joint distributions, one based on (X(i), Yi)), i = 1, 2, . . . , n, and the

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

other based on (Y(i), X(i)), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large.

We also applied the Gibbs sampler to generate simulations for Example 4.1.2. The results are given in Tables 4.2.2−4.2.4. Table 4.2.2 is the empirical distribution of (X(i), Y(i), Z(i)), i = 1, . . . , n, Table 4.2.3 is the empirical distribution of (Y(i), Z(i), X(i)), i = 1, . . . , n, and Table 4.2.4 is the empirical distribution of (Z(i), X(i), Y(i)), i = 1, . . . , n.

Table 4.2.2: Empirical distribution of (X(i), Y(i), Z(i)) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n X(1) → Y(1) → Z(1) → X(2) → Y(2) → Z(2). . .

1000 0.07200 0.09400 0.18200 0.09200

0.20800 0.18200 0.09600 0.07400

!

10,000 0.07740 0.11740 0.18880 0.12640

0.16720 0.17140 0.08060 0.07080

!

100,000 0.07602 0.11520 0.18406 0.12286 0.17150 0.17018 0.07986 0.08032

!

1,000,000 0.07636 0.11502 0.18346 0.12226 0.17279 0.17265 0.07890 0.07856

!

Stationary distribution 0.07641 0.11462 0.18354 0.12236 0.17263 0.17263 0.07891 0.07891

!

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Table 4.2.3: Empirical distribution of (Y(i), Z(i), X(i)) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n Y(1) → Z(1) → X(1) → Y(2) → Z(2) → X(2). . .

1000 0.02800 0.25200 0.05600 0.12400

0.23600 0.03600 0.22400 0.04400

!

10,000 0.02540 0.25440 0.05820 0.16200

0.22300 0.02500 0.21240 0.03960

!

100,000 0.02570 0.25268 0.05308 0.16290 0.22450 0.02902 0.21228 0.03984

!

1,000,000 0.02489 0.25862 0.05267 0.16052 0.22425 0.02902 0.21015 0.03988

!

Stationary distribution 0.02490 0.25852 0.05249 0.16101 0.22414 0.02872 0.20996 0.04025

!

Table 4.2.4: Empirical distribution of (Z(i), X(i), Y(i)) for the Gibbs sampler in Example 4.1.2

Sample size Sampling sequence

n Z(1) → X(1) → Y(1) → Z(2) → X(2) → Y(2). . .

1000 0.01800 0.14400 0.07000 0.23400

0.33400 0.03000 0.14400 0.02600

!

10,000 0.02280 0.17720 0.05940 0.25300

0.28620 0.04260 0.13120 0.02760

!

100,000 0.02232 0.16818 0.05386 0.25374 0.30480 0.04170 0.12954 0.02586

!

1,000,000 0.02340 0.16777 0.05399 0.25179 0.30357 0.04152 0.13074 0.02722

!

Stationary distribution 0.02322 0.16781 0.05418 0.25172 0.30387 0.04139 0.13023 0.02759

!

From Tables 4.2.2−4.2.4, we find that the Gibbs sampler has three differ-ent empirical joint distributions, one based on (X(i), Y(i), Z(i)), i = 1, 2, . . . , n,

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

another based on (Y(i), Z(i), X(i)), i = 1, . . . , n, and the third based on (Z(i), X(i), Y(i)), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large. The result is consistent with Theorem 4.1.2.

Finally, we apply the Gibbs sampler to generate simulations for Example 4.1.3. The results are given in Tables 4.2.5−4.2.7. Table 4.2.5 is the empirical distribution of (X(i), Z(i), Y(i)), i = 1, . . . , n, Table 4.2.6 is the empirical distribution of (Z(i), Y(i), X(i)), i = 1, . . . , n, and Table 4.2.7 is the empirical distribution of (Y(i), X(i), Z(i)), i = 1, . . . , n.

Table 4.2.5: Empirical distribution of (X(i), Z(i), Y(i)) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n X(1) → Z(1) → Y(1) → X(2) → Z(2) → Y(2). . .

1000 0.08000 0.11000 0.17000 0.12600

0.20200 0.14600 0.08400 0.08200

!

10,000 0.07280 0.10340 0.17660 0.15100

0.17300 0.14640 0.07160 0.10520

!

100,000 0.07400 0.10106 0.17426 0.14968 0.17366 0.14996 0.07692 0.10005

!

1,000,000 0.07515 0.09987 0.17536 0.15041 0.17523 0.14979 0.07442 0.09976

!

Stationary distribution 0.07500 0.10000 0.17500 0.15000 0.17500 0.15000 0.07500 0.10000

!

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

Table 4.2.6: Empirical distribution of (Z(i), Y(i), X(i)) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n Z(1) → Y(1) → X(1) → Z(2) → Y(2) → X(2). . .

1000 0.03000 0.21600 0.05200 0.19200

0.23600 0.02000 0.19000 0.06400

!

10,000 0.02100 0.23200 0.05200 0.19340

0.22480 0.02560 0.20080 0.05040

!

100,000 0.02494 0.22412 0.05098 0.20044 0.22496 0.02638 0.19998 0.04820

!

1,000,000 0.02537 0.22567 0.04998 0.19914 0.22503 0.02507 0.19967 0.05004

!

Stationary distribution 0.02500 0.22500 0.05000 0.20000 0.22500 0.02500 0.20000 0.05000

!

Table 4.2.7: Empirical distribution of (Y(i), X(i), Z(i)) for the Gibbs sampler in Example 4.1.3

Sample size Sampling sequence

n Y(1) → X(1) → Z(1) → Y(2) → X(2) → Z(2). . .

1000 0.01100 0.13400 0.12000 0.10000

0.14400 0.13200 0.14600 0.11200

!

10,000 0.09900 0.15080 0.15140 0.10560

0.12160 0.12260 0.12640 0.12260

!

100,000 0.10006 0.15218 0.15052 0.10040 0.12500 0.12386 0.12300 0.12498

!

1,000,000 0.09994 0.15049 0.14981 0.10042 0.12487 0.12390 0.12494 0.12563

!

Stationary distribution 0.10000 0.15000 0.15000 0.10000 0.12500 0.12500 0.12500 0.12500

!

From Tables 4.2.5−4.2.7, we find that the Gibbs sampler has three differ-ent empirical joint distributions, one based on (X(i), Z(i), Y(i)), i = 1, 2, . . . , n,

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

another based on (Z(i), Y(i), X(i)), i = 1, . . . , n, and the third based on (Y(i), X(i), Z(i)), i = 1, 2, . . . , n. Each empirical joint distribution is very close to its stationary joint distribution when the sample size n is large. The result is consistent with Theorem 4.1.3.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

5 Conclusions

Although the ratio matrix approach can deal with the compatibility of discrete conditional distributions, it can be only applied to two-dimensional case. Our graphical representation approach, using basic ideas in graph the-ory, can be extended to higher-dimensional case. This approach can not only check the compatibility but also find the set of all compatible joint distri-butions when the given conditional distridistri-butions are compatible. It works for general n-variate cases and allows for zero elements. Moreover, when the graph is connected, we can use a spanning tree to check the compatibility and find the unique probability distribution if the given conditional distributions are compatible.

In the present paper, we restrict attention to the case where each random variable takes values in a finite set and the given conditional distributions are full. If a random variable takes values in an infinite set, e.g., Poisson variate, the compatibility problem should be extended to the infinite setting.

However, in the literature little has been done for the infinite setting. So our graphical representation approach can not be readily extended to this setting.

Consider random variables X1, . . . , Xn, a conditional distribution pS|T, where S 6= ∅, S ∩ T = ∅ and S ∪ T = {1, . . . , n} , is called a full condi-tional distribution because all variables are involved. For instance, if n = 3, then p12|3 is a full conditional distribution but p1|2 is not. Since specifying a full conditional distribution pS|T amounts to specifying the probability ratio p (x) : p (x0) for all x = (xS, xT) and x0 = (x0S, x0T) with xT = x0T where xS denotes x restricted to the subset S of {1, . . . , n}, the given conditional distributions can be equivalently described in terms of probability ratios be-tween vertices. However, this is not so for general conditional disibutions as considered by Gelmen and Speed (1993). For example, if n = 3 and the given conditional distributions are p1|2, p2|3 and p3|1, then we can not find any probability ratios between vertices. This is a major limitation of the approach.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

In practical applications, since specified conditional distributions are typ-ically subject to errors, it is unlikely for them to be exactly compatible. An issue of practical relevance is to find a probability distribution that is “most nearly compatible” with the given conditional distributions, which has been addressed by Arnold, Castillo and Sarabia (2002) and Chen, Ip and Wang (2011). It will be of great interest to formulate and solve this problem in terms of a graphical representation.

We also present the relation of compatibility with Gibbs sampler in higher-dimensional case. We transfer a nonhomogeneous Markov chain into all kinds of homogeneous Markov chains by combining successive transitions into a single one. We prove that a given set of conditional distributions is compatible if and only if all homogeneous Markov chains give rise to the same joint distribution. This result can be extended to general n-variate cases.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

References

[1] Arnold, B. C., Castillo, E., and Sarabia, J. M. (1999). Conditional spec-ification of statistical models. Springer, New York.

[2] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2001). Conditionally specified distributions: an introduction (with discussions). Statistical Science, 16, 249-274.

[3] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2002). Exact and near compatibility of discrete distributions. Computational Statistics and Data Analysis, 40, 231-252.

[4] Arnold, B. C., Castillo, E., and Sarabia, J. M. (2004). Compatibility of partial or complete conditional probability specifications. Journal of Statistical Planning and Inference, 123, 133-159.

[5] Arnold, B. C. and Gokhale, D. V. (1998). Distributions most nearly compatible with given families of conditional distributions. Test 7, 377-390.

[6] Arnold, B. C. and Press, S. J. (1989). Campatible conditional distribu-tions. Journal of the American Statistical Association, 84, 152-156.

[7] Besag, J., (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B 36, 192-236.

[8] Gelman, A. and Speed, T. P. (1993). Characterizing a joint probabil-ity distribution by conditionals. Journal of the Royal Statistical Society.

Series B 55, 185-188.

[9] Gourieroux, C. and Monfort, A. (1979). On the characterization of a joint probability distribution by conditional distributions. Journal of Econo-metrics, 10, 115-118.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

[10] Hobert, J. P. and Casella, G. (1998) Functional compatibility, markov chains, and Gibbs sampling with improper posteriors. Journal of Com-putational and Graphical Statistics, 7, 42-60.

[11] Ip, E. H., Wang, Y. J., (2009) Canonical representation of condition-ally specified multivariate discrete distributions. Journal of Multivariate Analysis, 100, 1282-1290.

[12] Kuo, K-L, Wang, Y. J. (2011) A simple algorithm for checking compati-bility among discrete conditional distributions. Computational Statistics and Data Analysis, 5, 2457-2462.

[13] Liu, J. S. (1996) Discussion of “Statistical inference and Monte Carlo algorithms” by Casella, G. Test 5, 305-310.

[14] Slavkovic, A. B., Sullivant, S., (2006) The space of compatible full condi-tionals is a unimodular toric variety. Journal of Symbolic Computation, 41, 196-209.

[15] Song, C. C., Li, L. A., Chen, C. H., Jiang, T. J. and Kuo, K. L. (2010).

Compatibilty of finie discrete conditional distributions. Statistica Sinica, 20, 423-440.

[16] Tian, G. L., Tan, M., Ng, K. W. and Tang, M. L. (2009). A unified method for checking compatibility and uniqueness for finite discrete con-ditional distributions. Communications in Statistics-Theory and Models, 38, 115-129.

[17] Toffoli, E., Cecchin, E., Corona, G., Russo, A., Buonadonna, A., D’Andrea, M., Pasetto, L., Pessa, S., Errante, D., De Pangher, V., Giusto, M., Medici, M., Gaion, F., Sandri, P., Galligioni, E., Bonura, S., Boccalon, M., Biason, P., Frustaci, S. (2006). The role of UGT1A1*28 polymorphism in the pharmacodynamics and pharmacoki-netics of irinotecan in patients with metastatic colorectal cancer. Jounal of Clinical Oncology 24, 3061-3068.

‧ 國

立 政 治 大 學

N a

tio na

l C h engchi U ni ve rs it y

[18] Wang, Y. J., Kuo, K-L. (2010) Compatibility of discrete conditional distributions with structural zeros. Journal of Multivariate Analysis, 101, 191-199.

[19] Yao, Y. C., Chen, S. C., Wang, S. H. (2014). On compatibility of dis-crete full conditional distributions: A graphical representation approach.

Journal of Multivariate Analysis, 124, 1-9.

相關文件