Joint Posterior Distributions

We summarize the steps for obtaining σ_ii in (30). Let −∇²`ˆ_t be minus the observed information matrix. For i = 1, 2, . . . , p,

In this section we try to extend the marginal posterior density (23) to the joint posterior density. Some notations are needed. Let

h⁽⁰⁾ = h, h⁽¹⁾ = U h ={h⁽¹⁾_i , i = 1,· · · , p},

and we can obtain a more general form:

E_ξ^t{h(Zt)} = Φph +∑

‧

provided all the expectations exist. Take h(z) = 1_{Z_i_≤z∗

i, i=1,··· ,p}, where z_i^∗ ∈ R. Then the left hand side of (32) becomes the joint cumulative distribution of Zt, and the right hand side gives an expansion for the joint distribution. To obtain an expansion similar to (23), we need to extend the results in (19) and (20).

Q1: How does E_ξ^t

Since the generalization requires heavy calculations, in this thesis we only look at 2-dimensional cases. For Q1, from (16) and (17) we know that

E_ξ^t

involve the ﬁrst and the second posterior moments of Zt; and (16) and (17) are derived by taking h(z) in (14) and (15) as z_i and z_iz_j. To generalize these results, note that if

‧

By (16) and (17) it follows that Φ₂h⁽²⁾₁₂ + Φ₂h⁽²⁾₂₁ =

‧

together with (16) and (17) it follows that Φ₂h⁽³⁾₁₁₂+ Φ₂h⁽³⁾₁₂₁+ Φ₂h⁽³⁾₂₁₁ =

‧

write them out separately. Similarly, we can obtain

Φ2h⁽³⁾₁₂₂+ Φ2h⁽³⁾₂₁₂+ Φ2h⁽³⁾₂₂₁ =

In the following we shall write down the joint posterior density of θ.

Let A, B, and h(z) be as in (33). So, θ∈ A if and only if z ∈ B; and the joint probability and joint density of θ are

P_ξ^t(θi≤ ai, i = 1, 2) = P_ξ^t(θ∈ A) = Pξ^t(Zt∈ B),

‧

With the results in previous paragraphs, we have ξ^t(a1, a2) = ∂²P_ξ^t(θ∈ A)

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

= |Σt|φ2(z^∗)[1 + q1(z₁^∗)E_ξ^tZt1+ q1(z₂^∗)E_ξ^tZt2 +1

2q2(z₁^∗)(E_ξ^tZ_t1² − 1) +q₁(z₁^∗)q₁(z₂^∗)(E_ξ^tZ_t1Z_t2) + 1

2q₂(z₂^∗)(E_ξ^tZ_t2² − 1) +1

6q₃(z^∗₁)(E_ξ^tZ_t1³ − 3Eξ^tZ_t1) +1

2q₂(z₁^∗)q₁(z₂^∗)(E_ξ^tZ_t1²Z_t2− Eξ^tZ_t2) (35) +1

2q₁(z^∗₁)q₂(z₂^∗)(E_ξ^tZ_t1Z_t2² − Eξ^tZ_t1) +1

6q₃(z^∗₂)(E_ξ^tZ_t2³ − 3Eξ^tZ_t2)]

+ ∑

i1,...,i4

E_ξ^t {

h⁽⁴⁾_i

1...i4(Z_t)f_t,i⁽⁴⁾

1...i4(Zt) ft(Zt)

} .

4 Experimental Results

In Section 4.1 we compare the second order approximations for posterior densities by John-son (1970), Tierney and Kadane (1986), and Weng (2010a). For Section 4.2, we evaluate the performance of (37) when data comes in two stages. In Section 4.3-4.5, we use some GLM examples to study the use of implied density (see (29) in Section 3.1) to diagnose convergence of simulation series. In Section 4.6, we use the expansion (23) to diagnose the posterior density with multi modes, but the result is not well. All computations here are done in R (2010).

4.1 Comparison of Second Order Approximations 4.1.1 Comparison with Johnson (1970)

Johnson (1970) obtained expansions for marginal posterior distributions through Taylor expansions. Weng (2010a) showed that the marginal posterior distribution of Ztp can be expanded as

P_ξ^t(Ztp ≤ zp^∗) = Φ(z_p^∗)−

∑m i=1

Rit(z^∗_p)φ(z^∗_p) + O(t⁻^m+1² ), (36) where

R_it(z_p^∗) = ∑

v∈Ji

v!q_v₋₁(z_p^∗)E_ξ^t(q_v(Z_tp)) = O(t⁻ⁱ²).

Here q_v are Hermite polynomials. Weng (2010b) applied a version of Stein’s identity to approximate the posterior moments in (23) and derived the marginal posterior density for

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

θ_p:

ξ^t_p(ap) = [Σt]pp{φ(z^∗p) +

∑m i=1

Qˆit(z_p^∗)φ(z^∗_p) + O(t⁻^m+1² )}, (37)

where m = 1, 2, which gives approximations accurate to O(t⁻¹) and O(t^−3/2), respectively.

Here

Qit(z_p^∗) =∑

j∈Jⁱ

j!qj(z_p^∗)E_ξ^t(qj(Ztp)) = O(t⁻²ⁱ),

and ˆQit(z_p^∗) is obtained by replacing E_ξ^t(qj(Ztp)) with analytic approximations. These ana-lytic approximations involve the loglikelihood and the prior and their derivatives. See Weng (2010b) and Weng and Hsu (2011). Note that ˆQ1t is of order O(t^−1/2) and ˆQ2t is O(t⁻¹).

Weng and Hsu (2011) found that the O(t⁻¹) terms in Weng (2010b) and Johnson (1970) do not agree and further compared these two expansions by simulation study. The simulations conﬁrmed this ﬁnding and revealed that our O(t⁻¹) term gives better performance than Johnson’s. The materials below are taken from Weng (2010b) and Weng and Hsu (2011).

Johnson (1970) considered the posterior distribution of a centered and scaled variable (see his Eq. (2.1), p. 853) in 1-dimensional case:

ψ = (θ− ˆθt)b(ˆθ_t), (38)

where t is the sample size and b(ˆθ_t) =

[−1 t

∑t i=1

∂²

∂θ²logf (x_i, θ)|_θ=ˆ_θ_t]1/2

Denote the posterior cdf of t^1/2ψ by F_t. He showed that the posterior distribution of F_t possesses an asymptotic expansion in powers of t^−1/2 (see his Theorem 2.1):

|Ft(w)− Φ(w) −

∑K j=1

γj(w, x)t^−j/2| ≤ D1t⁻¹²^(K+1), (39)

and his Proposition 2.1 shows that each γj(w, x) is a polynomial in w having coeﬃcients bounded in x multiplied by the standard normal density. Here we use two examples for simulation comparison. The ﬁrst example is a Binomial-Beta model. Suppose that X ∼ Bin(t, θ), where the prior of θ is assumed to be Beta(a, b). Take a = 0.5, b = 4, t = 5, x = 2 and a = 0.5, b = 4, t = 30, x = 12. Thus, the posterior distributions of θ are Beta(2.5,7) and Beta(12.5,22) respectively, which are shown in Figure 1.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

We compare the approximate posterior density of θ by Johnson’s formulas and (37) to orders O(t⁻¹) and O(t^−3/2). Here Johnson’s approximation to O(t⁻¹) is obtained by taking K = 1 in (39):

pt(w)≡ dFt(w)

dw = φ(w) +dγj(w, x)

dw t^−1/2+ O(t⁻¹);

and the approximation to O(t^−3/2) is by taking K = 2 in (39). Figures 1(a1) and (a2) give the true density and (37) to O(t⁻¹) and O(t^−3/2); Figures 1(b1) and (b2) give the true density and Johnson’s approximations to O(t⁻¹) and O(t^−3/2); and Figures 1(c1) and (c2) contain the two O(t⁻¹) approximations. We have some observations. First, Figures 1(c1) and (c2) show that the two O(t⁻¹) approximations are quite close. Secondly, Figure 1(a1) shows that our approximation to O(t^−3/2) is closer to the true density than approximation to O(t⁻¹), but Figure 1(b1) reveals that Johnson’s formula to O(t^−3/2) does not improve upon O(t⁻¹). Thirdly, Figures 1(a2) and (b2) show that the negative value of the two approximations has improved for θ (ranges between 0.5 and 1).

Next we consider a Poisson-Gamma example. Let y₁, ..., y_t be an i.i.d. sample from Poisson(θ), where the prior of θ is assumed to be Gamma(a, b). Suppose that (y1, y2, y3, y4, y5) = (3, 5, 7, 10, 3) and that (a, b) = (30, 5). Thus, the MLE of θ is 5.6, the prior mean of θ is 6 and the posterior distribution of θ follows Gamma(a +∑_t

i=1yi, b + t)=Gamma(58,10).

We have some observations from Figure 2(a1) to Figure 2(c1). First, Figure 2(c1) indicates that the two O(t⁻¹) approximations are fairly close; secondly, Figure 2(a1) shows that (37) to O(t^−3/2) improves upon O(t⁻¹), but Figure 2(b1) shows that Johnson’s does not.

Now we try diﬀerent prior distributions to see its eﬀect on the approximations. Suppose that (a, b) = (15, 5). So, the prior mean of θ is 3 and θ|y ∼ Gamma(43,10). The results are in Figures 3(a1)∼ 3(c1). As before, Figure 3(c1) indicates that the two O(t⁻¹) approximations are close. However, due to the fact that the prior mean of θ may be farther from the MLE, from Figures 3(a1) and 3(b1) we found that both O(t^−3/2) approximations are worse than O(t⁻¹). A closer look at these two O(t^−3/2) curves show that Johnson’s approximation (ranges between -2 and 1.5) ﬂuctuates more widely than ours (ranges between -1 and 1).

Next we try diﬀerent sample sizes for Poisson-Gamma example. We do (y₁, y₂, y₃, y₄, y₅) = (3, 5, 7, 10, 3) for three times to obtain a sample size 15 with (a, b) = (30, 5) and (15, 5).

When (a, b) = (30, 5), the MLE of θ is 5.6, the prior mean of θ is 6 and θ|y follows

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Gamma(114,20). Figure 2(c2) indicates that the two O(t⁻¹) approximations are close.

Figure 2(a2) shows that the negative value of our O(t^−3/2) approximation has improved for θ ranging between 2 and 4 while Figure 2(b2) shows that the negative value of John-son’s O(t^−3/2) approximation has also improved for θ ranging between 7 and 9. When (a, b) = (15, 5), the prior mean of θ is 3 and θ|y follows Gamma(99,20). Figure 3(c2) in-dicates that the two O(t⁻¹) approximations are close. Figure 3(a2) shows that (37) to O(t^−3/2) shortens the range of negative value according to Figure 3(a1) while Figure 3(b2) shows that Johnson’s also shortens the range of negative value according to Figure 3(b1).

In Figure 3, the prior mean of θ is farther from the MLE; the larger the sample size is, the better the result will be.

sample size: t = 5

(a1) (b1) (c1)

0.0 0.2 0.4 0.6 0.8 1.0

−0.50.51.52.5

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

−10123

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

−0.50.51.52.5

marginal posterior density

sample size: t = 30

(a2) (b2) (c2)

0.0 0.2 0.4 0.6 0.8 1.0

012345

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

012345

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

012345

marginal posterior density

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 1: Marginal posterior pdf of θ. Beta-Binomial model.

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t⁻¹); Dotted: Eq (37) to O(t^−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t⁻¹); Dotted: Johnson’s O(t^−3/2).

(c1) and (c2) Solid: Eq (37) to O(t⁻¹); Dashed: Johnson’s O(t⁻¹).

sample size: t = 5

(a1) (b1) (c1)

0 2 4 6 8 10

0.00.20.4

marginal posterior density

0 2 4 6 8 10

0.00.20.40.6

marginal posterior density

0 2 4 6 8 10

0.00.10.20.3

marginal posterior density

sample size: t = 15

(a2) (b2) (c2)

3 4 5 6 7 8

0.00.20.40.6

marginal posterior density

3 4 5 6 7 8

0.00.20.40.60.8

marginal posterior density

3 4 5 6 7 8

0.00.20.40.6

marginal posterior density

Figure 2: Marginal posterior pdf of θ. Poisson model with prior Gamma(30,5).

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t⁻¹); Dotted: Eq (37) to O(t^−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t⁻¹); Dotted: Johnson’s O(t^−3/2).

(c1) and (c2) Solid: Eq (37) to O(t⁻¹); Dashed: Johnson’s O(t⁻¹).

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

sample size: t = 5

(a1) (b1) (c1)

0 2 4 6 8 10

−1.0−0.50.00.51.0

marginal posterior density

0 2 4 6 8 10

−2.0−1.00.01.0

marginal posterior density

0 2 4 6 8 10

−0.40.00.40.8

marginal posterior density

sample size: t = 15

(a2) (b2) (c2)

3 4 5 6 7 8

0.00.51.0

marginal posterior density

3 4 5 6 7 8

−0.50.00.51.0

marginal posterior density

3 4 5 6 7 8

−0.20.20.61.0

marginal posterior density

Figure 3: Marginal posterior pdf of θ. Poisson model with prior Gamma(15,5).

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t⁻¹); Dotted: Eq (37) to O(t^−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t⁻¹); Dotted: Johnson’s O(t^−3/2).

(c1) and (c2) Solid: Eq (37) to O(t⁻¹); Dashed: Johnson’s O(t⁻¹).

4.1.2 Comparison with Tierney and Kadane (1986)

In this section we compare (37) with (24) by Tierney and Kadane. We consider a data taken from Mendenhall et al. (1989); see also Tanner (1993). The explanatory variable is the number of days of radiotherapy received by each of 24 patients, and the response variable is the absence (1) and presence (0) of disease at a site three years after treatment.

A problem of interest is to use the covariate (days) to predict outcome.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

We ﬁt the data using the logistic regression model log

( pi

1− pi

)

= θ1+ θ2xi,

where xi is the covariate for patient i and pi is the probability of success (no disease). So, p_i= exp(θ₁+ θ₂x_i)/(1 + exp(θ₁+ θ₂x_i)). The intercept θ₁ represents the log-odds of success for zero days, while the slope θ2 represents the change in the log-odds of success for every unit increase in the covariate. The loglikelihood and the second partial derivatives are

`_t(θ) =

∑t i=1

[y_ilogp_i+ (1− yi)log(1− pi)] =

∑t i=1

[y_i(θ₁+ θ₂x_i)− log(1 + exp(θ1+ θ₂x_i))],

`⁽²⁾₁₁ =−∑

p_i(1− pi), `⁽²⁾₁₂ =−∑

x_ip_i(1− pi), `⁽²⁾₂₂ =−∑

x²_ip_i(1− pi). (40) Now we take ﬂat priors on both θ1 and θ2. The comparison of the density in (25) and (37) with m = 2 and moments replaced by approximations are in Figure 4(a); the two methods and the true density are very close. Next we take the standard normal density priors on both θ₁ and θ₂. The results are in Figure 4(b); the result of (25) is close to the exact density, but (37) performs poorly. Here the posterior means of qi(Zt2), i = 1, ..., 4 are (1.485, 1.460,−0.045, −3.333) and (3.134, 7.207, −0.336, −3.874), respectively. The reason why analytic approximation (37) fails may be that the prior mean is farther from the MLE.

If we change priors to N (0, 2), a less informative prior than N (0, 1), then the result of (37) improves a bit; see Figure 4(c). Since the posterior standard error of θ2 is around 1/23.25 = 0.043, the prior mean 0 is about two standard errors away from ˆθ_t2. Figure 4 showed that the less informative the prior is (i.e. larger variance), the more accurate the approximate density is.

Next we consider a data ﬁrst analyzed by Finney (1947); see also Albert and Chib (1993) for illustrating a sampling method for marginal posterior densities, and Myers (1990)[p.330-332]. A probit model was ﬁt to the data. The model is

pi= Φ(θ1+ θ2c1i+ θ3c2i), i = 1, ..., 39,

where Φ is the cdf of the standard normal density, c_1i is the volume of air inspired, c_2i is the rate of air inspired, and the binary outcome is the occurrence or non-occurrence on a transient vasorestriction on the skin of the digits. Now we take ﬂat priors on both θ₁ and

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

θ₂ and obtain the results are in Figure 5, where (37) (with m = 2) is very close to the exact density, but (25) shifts to left slightly.

(a) (b) (c)

−0.3 −0.2 −0.1 0.0 0.1

02468

θ2

marginal posterior density

−0.3 −0.2 −0.1 0.0 0.1

−150−50050100

θ2

marginal posterior density

−0.3 −0.2 −0.1 0.0 0.1

−30−100102030

θ2

marginal posterior density

Figure 4: Marginal posterior pdf of θ2. Logit2p model.

Solid: Equation (25); Dashed: Equation (37); Dotted: Exact distribution.

(a) ﬂat-prior. (b) N(0,1) prior. (c) N(0,2) prior.

0 1 2 3

0.00.20.40.60.8

θ3

marginal posterior density

Figure 5: Marginal posterior pdf of θ₃. Probit model with ﬂat prior. Solid: Equation (25);

Dashed: Equation (37); Dotted: Exact distribution by numerical integration.

在文檔中馬可夫鏈蒙地卡羅收斂的研究與貝氏漸進的表現 - 政大學術集成 (頁 20-34)

‧

‧

‧

‧

‧

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

4 Experimental Results

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學