Stein's 方法在貝氏分析之應用

(1)

行政院國家科學委員會專題研究計畫成果報告

Stein's 方法在貝氏分析之應用

研究成果報告(精簡版)

計畫類別：個別型計畫編號： NSC 96-2118-M-004-002- 執行期間： 96 年 08 月 01 日至 97 年 08 月 31 日執行單位：國立政治大學統計學系計畫主持人：翁久幸報告附件：出席國際會議研究心得報告及發表論文處理方式：本計畫可公開查詢

中華民國 98 年 05 月 13 日

(2)

Applications of Stein’s method in Bayesian analysis

NSC96-2118-M-004-002

96.8.1-97.7.31

Ruby C. Weng

Department of Statistics, National Chengchi University

May 11, 2009

Abstract

This project describes applications of a version of Stein’s Identity in Bayesian asymptotics. We show that the use of Stein’s Identity provides an alternative to traditional Laplace method for obtaining approximations of the marginal posterior densities.

Key words: Laplace method; posterior distributions; Stein’s identity.

1 Introduction

Let g(θ) be a smooth function on the parameter space Θ. We are interested in the estimation of the posterior mean of g(θ), given a sample of observations x(t)_{; that is,}

E_ξt[g(θ)] = Eξ[g(θ)|xt] = R Θg(θ)exp(`t(θ))ξ(θ)dθ R Θexp(`t(θ))ξ(θ)dθ , (1)

where `tis the log-likelihood function and ξ the prior. Nowadays, modern computing

techniques like Markov chain Monte Carlo and importance sampling have made many computations possible. Still, such methods are computational intensive and the sam-pling schemes vary from distribution to distribution. It is therefore of importance to have good analytic approximations which are simpler to compute. A traditional analytic approach to this problem (1) starts from a Taylor series expansion at the max-imum likelihood estimator (or at the modes of the integrands), proceeds from there to develop expansions on both the numerator and denominator, and then obtains ap-proximations by formal division of the two series. For example, Johnson [1, 2] derived

(3)

expansions associated with posterior distribution of some pivotal quantity; Lindley [3, 4] and Mosteller and Wallace [5] obtained second order approximations for the integral by applying standard Laplace method to both numerator and denominator and taking the ratio. Tierney and Kadane [6] renewed interest in Laplace method by applying it in a special form in which g is assumed to be positive.

In related work, Woodroofe [10, 11] developed a version of Stein’s Identity, which can be used to write posterior expectations in a particular form. Though this identity has a close Bayesian connection, the main focus of Woodroofe [10, 11] and some follow up work is on developing frequentist confidence regions. The first study of this tool in Bayesian context is Weng [7], which showed asymptotic posterior normality of nonhomogeneous Poisson model. Recently, Weng [8] further applied this identity for estimating predictive densities, and approximating marginal posterior distributions and posterior quantiles for individual parameters. Some formulas obtained are new, and some are shown to equivalent to the existing ones.

2 Stein’s Identity and the Model

Stein’s Identity Let Φp denote the standard p-variate normal distribution and write

Φph =

Z hdΦp

for functions h for which the integral is finite. For s > 0, denote Hs as the collection

of all measurable functions h : <p _{→ < for which |h(z)|/b ≤ 1 + ||z||}s _{for some b > 0.}

Given h ∈ Hs, let h0 = Φph, hp = h, hk(y1, ..., yk) = Z <p−k h(y1, ..., yk, w)Φp−k(dw), (2) gk(y1, ..., yp) = e 1 2y 2 k Z ∞ yk [hk(y1, ..., yk−1, w) − hk−1(y1, ..., yk−1)]e− 1 2w 2 dw, (3)

for −∞ < y1, ..., yp < ∞ and k = 1, ..., p. Then let U h = (g1, ..., gp)T and V h =

(U2_{h + U}2_hT_{)/2, where U}2_{h is the p × p matrix whose k-th column is U g}

k and gk

is as in (3). For example, for z ∈ <p_{, if h(z) = z}

1, then U h(z) = (1, 0, ..., 0)T and

(4)

below as zi and zizj yield Φp(U h) = Z <p zh(z)Φp(dz), (4) Φp(U2h) = Z <p 1 2(zz T _{− 1)h(z)Φ} p(dz). (5)

Lemma 2.1 (Stein0s Identity) Let r be a nonnegative integer. Suppose that f is a differentiable function on <p_{, and}

Z <p |f |dΦp+ Z <p (1 + ||z||r)||∇f (z)||Φp(dz) < ∞, then Φp(f h) = Φpf · Φph + Z <p (U h(z))T∇f (z)Φp(dz),

for all h ∈ Hr. If ∂f /∂zj, j = 1, ..., p, are differentiable, and

Z <p (1 + ||z||r)||∇2f (z)||Φp(dz) < ∞, then Φp(f h) = Φpf · Φph + (ΦpU h)T Z <p ∇f (z)Φp(dz) + Z <p tr[(V h(z))∇2f (z)]Φp(dz), for all h ∈ Hr.

The model Let Xt be a random vector distributed according to a family of

prob-ability densities pt(xt|θ), where t is a discrete or continuous parameter and θ ∈ Θ, an

open subset in <p_{. Consider a Bayesian model in which θ has a prior density ξ which}

is twice differentiable in <pand vanishes off of Θ. Assume that the log-likelihood func-tion `t(θ) is twice differentiable with respect to θ. Let Bt denote the set of sample

points for which the maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) = 0,

where ∇ indicates differentiation with respect to θ; therefore, −∇2`t(ˆθt) is positive

definite in Bt. The expressions for posterior expansions in (11) and (12) below are

valid on Bt.

The model Let Xt be a random vector distributed according to a family of

prob-ability densities pt(xt|θ), where t is a discrete or continuous parameter and θ ∈ Θ, an

open subset in <p_{. Consider a Bayesian model in which θ has a prior density ξ which}

is twice differentiable in <p_{and vanishes off of Θ. Assume that the log-likelihood}

(5)

points for which the maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) = 0,

where ∇ indicates differentiation with respect to θ; therefore, −∇2`t(ˆθt) is positive

definite in Bt. The expressions for posterior expansions in (11) and (12) below are

valid on Bt.

Define Σt and Zt as

ΣT_tΣt = −∇2`t(ˆθt), (6)

Zt = Σt(θ − ˆθt). (7)

Then the posterior density of θ given data xt is ξt(θ) ∝ exp(`t(θ))ξ(θ), and the

posterior density of Zt is

ζt(z) ∝ ξt(θ(z)) ∝ exp[`t(θ) − `t(ˆθt)]ξ(θ), (8)

where the relation of θ and z is given in (7). Now define ut(θ) = `t(θ) − `t(ˆθt) +

1 2||zt||

2_. ₍₉₎

So, (8) can be rewritten as

ζt(z) ∝ φp(z)ft(z), (10)

where ft(z) = ξ(θ(z))exp[ut(θ)] and φp(z) denotes the standard p-variate normal

density.

Observe that the posterior distribution of Zt in (10) is of a form suitable for

Stein’s Identity. Since ξ is twice differentiable in <p and vanishes off of Θ, ft(z)(=

ξ(θ(z))exp[ut(θ)]) also has the properties. So, by Lemma 2.1,

E_ξt{h(Zt)} = Φph + Eξt{[U h(Zt)]T ∇ft(Zt) ft(Zt) }, (11) E_ξt{h(Zt)} = Φph + (ΦpU h)TEξt[ ∇ft(Zt) ft(Zt) ] + E_ξt{tr[V h(Zt) ∇2_f t(Zt) ft(Zt) ]}. (12) Throughout ∇ξ and ∇2ξ denote the gradient and Hessian of ξ with respect to θ, ∇f and ∇2_{f the gradient and Hessian of f with respect to Z, and E}t

ξ and Vξt the

posterior expectation and variance given data xt. Some calculations are useful for

later reference. ∇ft(Zt) ft(Zt) = (ΣT_t)−1[∇ξ(θ) ξ(θ) + ∇ut(θ)], (13) ∇2_f t(Zt) ft(Zt) = (ΣT_t)−1[∇ 2_ξ ξ + ∇ξ ξ ∇u T t + ∇ut ∇ξT ξ + ∇ 2_u t+ ∇ut∇uTt]Σ −1 t , (14)

(6)

where by (9) we can derive

∇ut(θ) = ∇`t(θ) − ∇2`t(ˆθt)(θ − ˆθt), (15)

∇2_u

t(θ) = ∇2`t(θ) − ∇2`t(ˆθt). (16)

3 Marginal Posterior Distributions

All asymptotic posterior expansions in this section are valid for sample points which lie on Bt(the set in which maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) =

0; see Section 2) and satisfy the following lemma.

Lemma 3.2 Let Mt(r; r1, ..., rp) denote rth joint posterior moments of Ztwith r > 0;

that is, Mt(r; r1, ..., rp) = Eξth(Zt), where h(z) =

Qp

i=1z ri

i with P ri = r. Then

(i) E_ξth(Zt) = O(t−1/2) for odd r;

(ii) E_ξth(Zt) = Φh + O(t−1) for even r.

The above lemma is well known and we state it here for later use. The proof is in, for instance, Johnson [2]. We can also establish it using Stein’s Identity.

Recall that Xt is a random vector from pt(xt|θ), where θ is chosen according to

the prior density ξ. Let θ0 denote the true underlying parameter. All asymptotic

posterior expansions below are valid for sample points which lie on Bt(see Section 2)

and satisfy the following conditions: (C0) lim t→∞t −1_∇2_`_ˆ t is positive definite, (C1) t−1`ˆ(k)_t = O(1) for k > 0, (C2) t2_Et ξ[a(θ) − a(ˆθt) − P3 s=1(s!) −1_a(s)_(ˆ_θ t; θ − ˆθt)]2 = O(1), (C3) Et ξ||Zt||n = O(1) for n > 0, where a(θ) is `(1)_i or `(2)_ij , a(s)_(ˆ_θ t; θ − ˆθt) = P_i₁···isa (s)

i1···is(ˆθt)δi1· · · δis, and O(1) means

convergence of a sequence of real numbers. So, the integrand in (C2) is square of remainder terms in a Taylor expansion. Condition (C1) is easy to check. Conditions (C1) and (C2) can be guaranteed by assuming some tail properties of `t and the local

behavior that `(k)_{(θ) is bounded in a small neighborhood of θ} 0.

In the following we prove Lemma 3.2 using Stein’s Identity. It should always be remembered that the derivatives of ft are in (13) and (14), and ∇ut and ∇2ut are in

(7)

(15) and (16). First note that if h is a polynomial of order r, U h and V h are of orders r − 1 and r − 2 (see Weng and Woodroofe [9, Lemma 8]); and that by (4), ΦpU h = 0

for even r. Then, by Taylor expansions, [∇ut(θ)]i = 1 2δ T t Diδt+ (Rem1) = 1 2Z T t ViZt+ (Rem1), (17) [∇2ut(θ)]ij = [Di]j.Σ−1t Zt+ 1 2 X k,s ˆ `(4)_ijks[Z_tT(Σ_tT)−1ekeTsΣ −1 t Zt] + (Rem2), (18) where (Rem1) = (1/6) P jks`ˆ (4) ijksδtjδtkδts+ (1/24) P jksq` (5)

ijksq(˜θt)δtjδtkδtsδtq, ˜θtlies

be-tween θ and ˆθt, and (Rem2) has a similar form. So, Eξt{[U h(Zt)]iRem1} is bounded

by (C1)-(C3) and Cauchy-Schwartz inequality.

Next, let qk denote Hermite polynomials, given by qk(z)φ(z) = (−d/dz)kφ(z). For

instance, for k = 1, ..., 4 the Hermite polynomials are q1(z) = z, q2(z) = z2 − 1,

q3(z) = z3 − 3z, and q4(z) = z4− 6z2+ 3.

Theorem 3.1 Take h∗(ztp) in (??) as the indicator function 1(ztp ≤ w), where w ∈

<. Then, the marginal posterior distribution for the individual parameter θp is

P_ξt(θp ≤ a) = Pξt(Ztp ≤ w) = Φ(w) − 6 X i=1,i6=5 1 i!qi−1(w)φ(w)E t ξ(qi(Ztp)) + O(t−3/2), (19) where w = [Σt]pp(a − ˆθtp).

By taking derivative of (19) with respect to a, we obtain the marginal posterior density ξ_pt(a) = [Σt]pp{φ(w) + 6 X i=1,i6=5 1 i!qi(w)φ(w)E t ξ(qi(Ztp)) + O(t−3/2)}. (20)

Observe that no renormalization is needed for this approximation asR_<qi(w)φ(w)dw =

0.

References

[1] R. Johnson. An asymptotic expansion for posterior distributions. Ann. Math. Statist., 38:1899–1906, 1967.

(8)

[2] R. Johnson. Asymptotic expansions associated with posterior distributions. Ann. Math. Statist., 41:851–864, 1970.

[3] D. V. Lindley. The use of prior probability distributions in statistical inference and decisions. Proc. 4th. Berkeley Symp., 1:453–468, 1961.

[4] D. V. Lindley. Approximate bayesian methods. In J. M. Bernardo, M. H. DeG-root, D. V. Lindley, and A. F. M. S. (Eds.), editors, Bayesian Statistics. Univer-sity Press, 1980.

[5] F. Mosteller and D. L. Wallace. Inference and Disputed Authorship: The Feder-alist Papers. Addison-Wesley, Reading, Mass., 1964.

[6] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81:82– 86, 1986.

[7] R. C. Weng. On Stein’s identity for posterior normality. Statistica Sinica, 13:495– 506, 2003.

[8] R. C. Weng. Stein’s identity for bayesian inference. manuscript, 2006.

[9] R. C. Weng and M. Woodroofe. Integrable expansions for posterior distributions for multiparameter exponential families with applications to sequential confi-dence levels. Statistica Sinica, 10:693–713, 2000.

[10] M. Woodroofe. Very weak expansions for sequentially designed experiments: linear models. The Annals of Statistics, 17:1087–1102, 1989.

[11] M. Woodroofe. Integrable expansions for posterior distributions for one-parameter exponential families. Statistica Sinica, 2:91–111, 1992.

(9)

Report on attending the 7th World Congress in Probability and Statis-tics Singapore

The 7th World Congress in Probability and Statistics was jointly sponsored by the Bernoulli Society and the Institute of Mathematical Statistics, two of the major international statistical societies. This year the conference was held in Singapore from July 14 to 19, 2008. This meeting is a major international event in probability and statistics held every four years. It covers a wide range of topics and features the latest scientific developments in the fields of probability and statistics and their applications.

I arrived on July 13 and stayed for 6 days. I presented my recent work on Stein’s Identity and its applications in Bayesian analysis. My talk was scheduled with some other Bayesian studies so that I got a good chance to see other Bayesian related work. I attended several other presentations and was impressed by some interesting talks such as “A picture is worth a thousand numbers: communicating uncertainties following statistical analysis” by David Spiegelhalter, “Probability and statistics in internet information retrieval” by Zhi-Ming Ma, and Luke Tierney’s talk on statistical computing.

I also browsed books displayed in the book stand and purchased one book relevant to my research.

Stein's 方法在貝氏分析之應用

行政院國家科學委員會專題研究計畫 成果報告