• 沒有找到結果。

We summarize the steps for obtaining σii in (30). Let −∇2`ˆt be minus the observed information matrix. For i = 1, 2, . . . , p,

In this section we try to extend the marginal posterior density (23) to the joint posterior density. Some notations are needed. Let

h(0) = h, h(1) = U h ={h(1)i , i = 1,· · · , p},

and we can obtain a more general form:

Eξt{h(Zt)} = Φph +

provided all the expectations exist. Take h(z) = 1{Zi≤z

i, i=1,··· ,p}, where zi ∈ R. Then the left hand side of (32) becomes the joint cumulative distribution of Zt, and the right hand side gives an expansion for the joint distribution. To obtain an expansion similar to (23), we need to extend the results in (19) and (20).

Q1: How does Eξt

Since the generalization requires heavy calculations, in this thesis we only look at 2-dimensional cases. For Q1, from (16) and (17) we know that

Eξt

involve the first and the second posterior moments of Zt; and (16) and (17) are derived by taking h(z) in (14) and (15) as zi and zizj. To generalize these results, note that if

By (16) and (17) it follows that Φ2h(2)12 + Φ2h(2)21 =

together with (16) and (17) it follows that Φ2h(3)112+ Φ2h(3)121+ Φ2h(3)211 =

write them out separately. Similarly, we can obtain

Φ2h(3)122+ Φ2h(3)212+ Φ2h(3)221 =

In the following we shall write down the joint posterior density of θ.

Let A, B, and h(z) be as in (33). So, θ∈ A if and only if z ∈ B; and the joint probability and joint density of θ are

Pξti≤ ai, i = 1, 2) = Pξt(θ∈ A) = Pξt(Zt∈ B),

With the results in previous paragraphs, we have ξt(a1, a2) = 2Pξt(θ∈ A)

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

= t2(z)[1 + q1(z1)EξtZt1+ q1(z2)EξtZt2 +1

2q2(z1)(EξtZt12 − 1) +q1(z1)q1(z2)(EξtZt1Zt2) + 1

2q2(z2)(EξtZt22 − 1) +1

6q3(z1)(EξtZt13 − 3EξtZt1) +1

2q2(z1)q1(z2)(EξtZt12Zt2− EξtZt2) (35) +1

2q1(z1)q2(z2)(EξtZt1Zt22 − EξtZt1) +1

6q3(z2)(EξtZt23 − 3EξtZt2)]

+ ∑

i1,...,i4

Eξt {

h(4)i

1...i4(Zt)ft,i(4)

1...i4(Zt) ft(Zt)

} .

4 Experimental Results

In Section 4.1 we compare the second order approximations for posterior densities by John-son (1970), Tierney and Kadane (1986), and Weng (2010a). For Section 4.2, we evaluate the performance of (37) when data comes in two stages. In Section 4.3-4.5, we use some GLM examples to study the use of implied density (see (29) in Section 3.1) to diagnose convergence of simulation series. In Section 4.6, we use the expansion (23) to diagnose the posterior density with multi modes, but the result is not well. All computations here are done in R (2010).

4.1 Comparison of Second Order Approximations 4.1.1 Comparison with Johnson (1970)

Johnson (1970) obtained expansions for marginal posterior distributions through Taylor expansions. Weng (2010a) showed that the marginal posterior distribution of Ztp can be expanded as

Pξt(Ztp ≤ zp) = Φ(zp)

m i=1

Rit(zp)φ(zp) + O(tm+12 ), (36) where

Rit(zp) = ∑

v∈Ji

1

v!qv−1(zp)Eξt(qv(Ztp)) = O(ti2).

Here qv are Hermite polynomials. Weng (2010b) applied a version of Stein’s identity to approximate the posterior moments in (23) and derived the marginal posterior density for

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

θp:

ξtp(ap) = [Σt]pp{φ(zp) +

m i=1

Qˆit(zp)φ(zp) + O(tm+12 )}, (37)

where m = 1, 2, which gives approximations accurate to O(t−1) and O(t−3/2), respectively.

Here

Qit(zp) =∑

j∈Ji

1

j!qj(zp)Eξt(qj(Ztp)) = O(t2i),

and ˆQit(zp) is obtained by replacing Eξt(qj(Ztp)) with analytic approximations. These ana-lytic approximations involve the loglikelihood and the prior and their derivatives. See Weng (2010b) and Weng and Hsu (2011). Note that ˆQ1t is of order O(t−1/2) and ˆQ2t is O(t−1).

Weng and Hsu (2011) found that the O(t−1) terms in Weng (2010b) and Johnson (1970) do not agree and further compared these two expansions by simulation study. The simulations confirmed this finding and revealed that our O(t−1) term gives better performance than Johnson’s. The materials below are taken from Weng (2010b) and Weng and Hsu (2011).

Johnson (1970) considered the posterior distribution of a centered and scaled variable (see his Eq. (2.1), p. 853) in 1-dimensional case:

ψ = (θ− ˆθt)b(ˆθt), (38)

where t is the sample size and b(ˆθt) =

[1 t

t i=1

2

∂θ2logf (xi, θ)|θ=ˆθt]1/2

.

Denote the posterior cdf of t1/2ψ by Ft. He showed that the posterior distribution of Ft possesses an asymptotic expansion in powers of t−1/2 (see his Theorem 2.1):

|Ft(w)− Φ(w) −

K j=1

γj(w, x)t−j/2| ≤ D1t12(K+1), (39)

and his Proposition 2.1 shows that each γj(w, x) is a polynomial in w having coefficients bounded in x multiplied by the standard normal density. Here we use two examples for simulation comparison. The first example is a Binomial-Beta model. Suppose that X Bin(t, θ), where the prior of θ is assumed to be Beta(a, b). Take a = 0.5, b = 4, t = 5, x = 2 and a = 0.5, b = 4, t = 30, x = 12. Thus, the posterior distributions of θ are Beta(2.5,7) and Beta(12.5,22) respectively, which are shown in Figure 1.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

We compare the approximate posterior density of θ by Johnson’s formulas and (37) to orders O(t−1) and O(t−3/2). Here Johnson’s approximation to O(t−1) is obtained by taking K = 1 in (39):

pt(w)≡ dFt(w)

dw = φ(w) +dγj(w, x)

dw t−1/2+ O(t−1);

and the approximation to O(t−3/2) is by taking K = 2 in (39). Figures 1(a1) and (a2) give the true density and (37) to O(t−1) and O(t−3/2); Figures 1(b1) and (b2) give the true density and Johnson’s approximations to O(t−1) and O(t−3/2); and Figures 1(c1) and (c2) contain the two O(t−1) approximations. We have some observations. First, Figures 1(c1) and (c2) show that the two O(t−1) approximations are quite close. Secondly, Figure 1(a1) shows that our approximation to O(t−3/2) is closer to the true density than approximation to O(t−1), but Figure 1(b1) reveals that Johnson’s formula to O(t−3/2) does not improve upon O(t−1). Thirdly, Figures 1(a2) and (b2) show that the negative value of the two approximations has improved for θ (ranges between 0.5 and 1).

Next we consider a Poisson-Gamma example. Let y1, ..., yt be an i.i.d. sample from Poisson(θ), where the prior of θ is assumed to be Gamma(a, b). Suppose that (y1, y2, y3, y4, y5) = (3, 5, 7, 10, 3) and that (a, b) = (30, 5). Thus, the MLE of θ is 5.6, the prior mean of θ is 6 and the posterior distribution of θ follows Gamma(a +t

i=1yi, b + t)=Gamma(58,10).

We have some observations from Figure 2(a1) to Figure 2(c1). First, Figure 2(c1) indicates that the two O(t−1) approximations are fairly close; secondly, Figure 2(a1) shows that (37) to O(t−3/2) improves upon O(t−1), but Figure 2(b1) shows that Johnson’s does not.

Now we try different prior distributions to see its effect on the approximations. Suppose that (a, b) = (15, 5). So, the prior mean of θ is 3 and θ|y ∼ Gamma(43,10). The results are in Figures 3(a1)∼ 3(c1). As before, Figure 3(c1) indicates that the two O(t−1) approximations are close. However, due to the fact that the prior mean of θ may be farther from the MLE, from Figures 3(a1) and 3(b1) we found that both O(t−3/2) approximations are worse than O(t−1). A closer look at these two O(t−3/2) curves show that Johnson’s approximation (ranges between -2 and 1.5) fluctuates more widely than ours (ranges between -1 and 1).

Next we try different sample sizes for Poisson-Gamma example. We do (y1, y2, y3, y4, y5) = (3, 5, 7, 10, 3) for three times to obtain a sample size 15 with (a, b) = (30, 5) and (15, 5).

When (a, b) = (30, 5), the MLE of θ is 5.6, the prior mean of θ is 6 and θ|y follows

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Gamma(114,20). Figure 2(c2) indicates that the two O(t−1) approximations are close.

Figure 2(a2) shows that the negative value of our O(t−3/2) approximation has improved for θ ranging between 2 and 4 while Figure 2(b2) shows that the negative value of John-son’s O(t−3/2) approximation has also improved for θ ranging between 7 and 9. When (a, b) = (15, 5), the prior mean of θ is 3 and θ|y follows Gamma(99,20). Figure 3(c2) in-dicates that the two O(t−1) approximations are close. Figure 3(a2) shows that (37) to O(t−3/2) shortens the range of negative value according to Figure 3(a1) while Figure 3(b2) shows that Johnson’s also shortens the range of negative value according to Figure 3(b1).

In Figure 3, the prior mean of θ is farther from the MLE; the larger the sample size is, the better the result will be.

sample size: t = 5

(a1) (b1) (c1)

0.0 0.2 0.4 0.6 0.8 1.0

−0.50.51.52.5

θ

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

−10123

θ

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

−0.50.51.52.5

θ

marginal posterior density

sample size: t = 30

(a2) (b2) (c2)

0.0 0.2 0.4 0.6 0.8 1.0

012345

θ

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

012345

θ

marginal posterior density

0.0 0.2 0.4 0.6 0.8 1.0

012345

θ

marginal posterior density

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Figure 1: Marginal posterior pdf of θ. Beta-Binomial model.

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t−1); Dotted: Eq (37) to O(t−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t−1); Dotted: Johnson’s O(t−3/2).

(c1) and (c2) Solid: Eq (37) to O(t−1); Dashed: Johnson’s O(t−1).

sample size: t = 5

(a1) (b1) (c1)

0 2 4 6 8 10

0.00.20.4

θ

marginal posterior density

0 2 4 6 8 10

0.00.20.40.6

θ

marginal posterior density

0 2 4 6 8 10

0.00.10.20.3

θ

marginal posterior density

sample size: t = 15

(a2) (b2) (c2)

3 4 5 6 7 8

0.00.20.40.6

θ

marginal posterior density

3 4 5 6 7 8

0.00.20.40.60.8

θ

marginal posterior density

3 4 5 6 7 8

0.00.20.40.6

θ

marginal posterior density

Figure 2: Marginal posterior pdf of θ. Poisson model with prior Gamma(30,5).

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t−1); Dotted: Eq (37) to O(t−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t−1); Dotted: Johnson’s O(t−3/2).

(c1) and (c2) Solid: Eq (37) to O(t−1); Dashed: Johnson’s O(t−1).

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

sample size: t = 5

(a1) (b1) (c1)

0 2 4 6 8 10

−1.0−0.50.00.51.0

θ

marginal posterior density

0 2 4 6 8 10

−2.0−1.00.01.0

θ

marginal posterior density

0 2 4 6 8 10

−0.40.00.40.8

θ

marginal posterior density

sample size: t = 15

(a2) (b2) (c2)

3 4 5 6 7 8

0.00.51.0

θ

marginal posterior density

3 4 5 6 7 8

−0.50.00.51.0

θ

marginal posterior density

3 4 5 6 7 8

−0.20.20.61.0

θ

marginal posterior density

Figure 3: Marginal posterior pdf of θ. Poisson model with prior Gamma(15,5).

(a1) and (a2) Solid: Exact; Dashed: Eq (37) to O(t−1); Dotted: Eq (37) to O(t−3/2).

(b1) and (b2) Solid: Exact; Dashed: Johnson’s O(t−1); Dotted: Johnson’s O(t−3/2).

(c1) and (c2) Solid: Eq (37) to O(t−1); Dashed: Johnson’s O(t−1).

4.1.2 Comparison with Tierney and Kadane (1986)

In this section we compare (37) with (24) by Tierney and Kadane. We consider a data taken from Mendenhall et al. (1989); see also Tanner (1993). The explanatory variable is the number of days of radiotherapy received by each of 24 patients, and the response variable is the absence (1) and presence (0) of disease at a site three years after treatment.

A problem of interest is to use the covariate (days) to predict outcome.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

We fit the data using the logistic regression model log

( pi

1− pi

)

= θ1+ θ2xi,

where xi is the covariate for patient i and pi is the probability of success (no disease). So, pi= exp(θ1+ θ2xi)/(1 + exp(θ1+ θ2xi)). The intercept θ1 represents the log-odds of success for zero days, while the slope θ2 represents the change in the log-odds of success for every unit increase in the covariate. The loglikelihood and the second partial derivatives are

`t(θ) =

t i=1

[yilogpi+ (1− yi)log(1− pi)] =

t i=1

[yi1+ θ2xi)− log(1 + exp(θ1+ θ2xi))],

`(2)11 =

i

pi(1− pi), `(2)12 =

i

xipi(1− pi), `(2)22 =

i

x2ipi(1− pi). (40) Now we take flat priors on both θ1 and θ2. The comparison of the density in (25) and (37) with m = 2 and moments replaced by approximations are in Figure 4(a); the two methods and the true density are very close. Next we take the standard normal density priors on both θ1 and θ2. The results are in Figure 4(b); the result of (25) is close to the exact density, but (37) performs poorly. Here the posterior means of qi(Zt2), i = 1, ..., 4 are (1.485, 1.460,−0.045, −3.333) and (3.134, 7.207, −0.336, −3.874), respectively. The reason why analytic approximation (37) fails may be that the prior mean is farther from the MLE.

If we change priors to N (0, 2), a less informative prior than N (0, 1), then the result of (37) improves a bit; see Figure 4(c). Since the posterior standard error of θ2 is around 1/23.25 = 0.043, the prior mean 0 is about two standard errors away from ˆθt2. Figure 4 showed that the less informative the prior is (i.e. larger variance), the more accurate the approximate density is.

Next we consider a data first analyzed by Finney (1947); see also Albert and Chib (1993) for illustrating a sampling method for marginal posterior densities, and Myers (1990)[p.330-332]. A probit model was fit to the data. The model is

pi= Φ(θ1+ θ2c1i+ θ3c2i), i = 1, ..., 39,

where Φ is the cdf of the standard normal density, c1i is the volume of air inspired, c2i is the rate of air inspired, and the binary outcome is the occurrence or non-occurrence on a transient vasorestriction on the skin of the digits. Now we take flat priors on both θ1 and

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

θ2 and obtain the results are in Figure 5, where (37) (with m = 2) is very close to the exact density, but (25) shifts to left slightly.

(a) (b) (c)

−0.3 −0.2 −0.1 0.0 0.1

02468

θ2

marginal posterior density

−0.3 −0.2 −0.1 0.0 0.1

−150−50050100

θ2

marginal posterior density

−0.3 −0.2 −0.1 0.0 0.1

−30−100102030

θ2

marginal posterior density

Figure 4: Marginal posterior pdf of θ2. Logit2p model.

Solid: Equation (25); Dashed: Equation (37); Dotted: Exact distribution.

(a) flat-prior. (b) N(0,1) prior. (c) N(0,2) prior.

0 1 2 3

0.00.20.40.60.8

θ3

marginal posterior density

Figure 5: Marginal posterior pdf of θ3. Probit model with flat prior. Solid: Equation (25);

Dashed: Equation (37); Dotted: Exact distribution by numerical integration.

相關文件