行政院國家科學委員會專題研究計畫 成果報告
Stein's 方法在貝氏分析之應用
研究成果報告(精簡版)
計 畫 類 別 : 個別型 計 畫 編 號 : NSC 96-2118-M-004-002- 執 行 期 間 : 96 年 08 月 01 日至 97 年 08 月 31 日 執 行 單 位 : 國立政治大學統計學系 計 畫 主 持 人 : 翁久幸 報 告 附 件 : 出席國際會議研究心得報告及發表論文 處 理 方 式 : 本計畫可公開查詢中 華 民 國 98 年 05 月 13 日
Applications of Stein’s method in Bayesian analysis
NSC96-2118-M-004-002
96.8.1-97.7.31
Ruby C. Weng
Department of Statistics, National Chengchi University
May 11, 2009
Abstract
This project describes applications of a version of Stein’s Identity in Bayesian asymptotics. We show that the use of Stein’s Identity provides an alternative to traditional Laplace method for obtaining approximations of the marginal posterior densities.
Key words: Laplace method; posterior distributions; Stein’s identity.
1
Introduction
Let g(θ) be a smooth function on the parameter space Θ. We are interested in the estimation of the posterior mean of g(θ), given a sample of observations x(t); that is,
Eξt[g(θ)] = Eξ[g(θ)|xt] = R Θg(θ)exp(`t(θ))ξ(θ)dθ R Θexp(`t(θ))ξ(θ)dθ , (1)
where `tis the log-likelihood function and ξ the prior. Nowadays, modern computing
techniques like Markov chain Monte Carlo and importance sampling have made many computations possible. Still, such methods are computational intensive and the sam-pling schemes vary from distribution to distribution. It is therefore of importance to have good analytic approximations which are simpler to compute. A traditional analytic approach to this problem (1) starts from a Taylor series expansion at the max-imum likelihood estimator (or at the modes of the integrands), proceeds from there to develop expansions on both the numerator and denominator, and then obtains ap-proximations by formal division of the two series. For example, Johnson [1, 2] derived
expansions associated with posterior distribution of some pivotal quantity; Lindley [3, 4] and Mosteller and Wallace [5] obtained second order approximations for the integral by applying standard Laplace method to both numerator and denominator and taking the ratio. Tierney and Kadane [6] renewed interest in Laplace method by applying it in a special form in which g is assumed to be positive.
In related work, Woodroofe [10, 11] developed a version of Stein’s Identity, which can be used to write posterior expectations in a particular form. Though this identity has a close Bayesian connection, the main focus of Woodroofe [10, 11] and some follow up work is on developing frequentist confidence regions. The first study of this tool in Bayesian context is Weng [7], which showed asymptotic posterior normality of nonhomogeneous Poisson model. Recently, Weng [8] further applied this identity for estimating predictive densities, and approximating marginal posterior distributions and posterior quantiles for individual parameters. Some formulas obtained are new, and some are shown to equivalent to the existing ones.
2
Stein’s Identity and the Model
Stein’s Identity Let Φp denote the standard p-variate normal distribution and write
Φph =
Z hdΦp
for functions h for which the integral is finite. For s > 0, denote Hs as the collection
of all measurable functions h : <p → < for which |h(z)|/b ≤ 1 + ||z||s for some b > 0.
Given h ∈ Hs, let h0 = Φph, hp = h, hk(y1, ..., yk) = Z <p−k h(y1, ..., yk, w)Φp−k(dw), (2) gk(y1, ..., yp) = e 1 2y 2 k Z ∞ yk [hk(y1, ..., yk−1, w) − hk−1(y1, ..., yk−1)]e− 1 2w 2 dw, (3)
for −∞ < y1, ..., yp < ∞ and k = 1, ..., p. Then let U h = (g1, ..., gp)T and V h =
(U2h + U2hT)/2, where U2h is the p × p matrix whose k-th column is U g
k and gk
is as in (3). For example, for z ∈ <p, if h(z) = z
1, then U h(z) = (1, 0, ..., 0)T and
below as zi and zizj yield Φp(U h) = Z <p zh(z)Φp(dz), (4) Φp(U2h) = Z <p 1 2(zz T − 1)h(z)Φ p(dz). (5)
Lemma 2.1 (Stein0s Identity) Let r be a nonnegative integer. Suppose that f is a differentiable function on <p, and
Z <p |f |dΦp+ Z <p (1 + ||z||r)||∇f (z)||Φp(dz) < ∞, then Φp(f h) = Φpf · Φph + Z <p (U h(z))T∇f (z)Φp(dz),
for all h ∈ Hr. If ∂f /∂zj, j = 1, ..., p, are differentiable, and
Z <p (1 + ||z||r)||∇2f (z)||Φp(dz) < ∞, then Φp(f h) = Φpf · Φph + (ΦpU h)T Z <p ∇f (z)Φp(dz) + Z <p tr[(V h(z))∇2f (z)]Φp(dz), for all h ∈ Hr.
The model Let Xt be a random vector distributed according to a family of
prob-ability densities pt(xt|θ), where t is a discrete or continuous parameter and θ ∈ Θ, an
open subset in <p. Consider a Bayesian model in which θ has a prior density ξ which
is twice differentiable in <pand vanishes off of Θ. Assume that the log-likelihood func-tion `t(θ) is twice differentiable with respect to θ. Let Bt denote the set of sample
points for which the maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) = 0,
where ∇ indicates differentiation with respect to θ; therefore, −∇2`t(ˆθt) is positive
definite in Bt. The expressions for posterior expansions in (11) and (12) below are
valid on Bt.
The model Let Xt be a random vector distributed according to a family of
prob-ability densities pt(xt|θ), where t is a discrete or continuous parameter and θ ∈ Θ, an
open subset in <p. Consider a Bayesian model in which θ has a prior density ξ which
is twice differentiable in <pand vanishes off of Θ. Assume that the log-likelihood
points for which the maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) = 0,
where ∇ indicates differentiation with respect to θ; therefore, −∇2`t(ˆθt) is positive
definite in Bt. The expressions for posterior expansions in (11) and (12) below are
valid on Bt.
Define Σt and Zt as
ΣTtΣt = −∇2`t(ˆθt), (6)
Zt = Σt(θ − ˆθt). (7)
Then the posterior density of θ given data xt is ξt(θ) ∝ exp(`t(θ))ξ(θ), and the
posterior density of Zt is
ζt(z) ∝ ξt(θ(z)) ∝ exp[`t(θ) − `t(ˆθt)]ξ(θ), (8)
where the relation of θ and z is given in (7). Now define ut(θ) = `t(θ) − `t(ˆθt) +
1 2||zt||
2. (9)
So, (8) can be rewritten as
ζt(z) ∝ φp(z)ft(z), (10)
where ft(z) = ξ(θ(z))exp[ut(θ)] and φp(z) denotes the standard p-variate normal
density.
Observe that the posterior distribution of Zt in (10) is of a form suitable for
Stein’s Identity. Since ξ is twice differentiable in <p and vanishes off of Θ, ft(z)(=
ξ(θ(z))exp[ut(θ)]) also has the properties. So, by Lemma 2.1,
Eξt{h(Zt)} = Φph + Eξt{[U h(Zt)]T ∇ft(Zt) ft(Zt) }, (11) Eξt{h(Zt)} = Φph + (ΦpU h)TEξt[ ∇ft(Zt) ft(Zt) ] + Eξt{tr[V h(Zt) ∇2f t(Zt) ft(Zt) ]}. (12) Throughout ∇ξ and ∇2ξ denote the gradient and Hessian of ξ with respect to θ, ∇f and ∇2f the gradient and Hessian of f with respect to Z, and Et
ξ and Vξt the
posterior expectation and variance given data xt. Some calculations are useful for
later reference. ∇ft(Zt) ft(Zt) = (ΣTt)−1[∇ξ(θ) ξ(θ) + ∇ut(θ)], (13) ∇2f t(Zt) ft(Zt) = (ΣTt)−1[∇ 2ξ ξ + ∇ξ ξ ∇u T t + ∇ut ∇ξT ξ + ∇ 2u t+ ∇ut∇uTt]Σ −1 t , (14)
where by (9) we can derive
∇ut(θ) = ∇`t(θ) − ∇2`t(ˆθt)(θ − ˆθt), (15)
∇2u
t(θ) = ∇2`t(θ) − ∇2`t(ˆθt). (16)
3
Marginal Posterior Distributions
All asymptotic posterior expansions in this section are valid for sample points which lie on Bt(the set in which maximum likelihood estimator ˆθtexists and satisfies ∇`t(ˆθt) =
0; see Section 2) and satisfy the following lemma.
Lemma 3.2 Let Mt(r; r1, ..., rp) denote rth joint posterior moments of Ztwith r > 0;
that is, Mt(r; r1, ..., rp) = Eξth(Zt), where h(z) =
Qp
i=1z ri
i with P ri = r. Then
(i) Eξth(Zt) = O(t−1/2) for odd r;
(ii) Eξth(Zt) = Φh + O(t−1) for even r.
The above lemma is well known and we state it here for later use. The proof is in, for instance, Johnson [2]. We can also establish it using Stein’s Identity.
Recall that Xt is a random vector from pt(xt|θ), where θ is chosen according to
the prior density ξ. Let θ0 denote the true underlying parameter. All asymptotic
posterior expansions below are valid for sample points which lie on Bt(see Section 2)
and satisfy the following conditions: (C0) lim t→∞t −1∇2`ˆ t is positive definite, (C1) t−1`ˆ(k)t = O(1) for k > 0, (C2) t2Et ξ[a(θ) − a(ˆθt) − P3 s=1(s!) −1a(s)(ˆθ t; θ − ˆθt)]2 = O(1), (C3) Et ξ||Zt||n = O(1) for n > 0, where a(θ) is `(1)i or `(2)ij , a(s)(ˆθ t; θ − ˆθt) = Pi1···isa (s)
i1···is(ˆθt)δi1· · · δis, and O(1) means
convergence of a sequence of real numbers. So, the integrand in (C2) is square of remainder terms in a Taylor expansion. Condition (C1) is easy to check. Conditions (C1) and (C2) can be guaranteed by assuming some tail properties of `t and the local
behavior that `(k)(θ) is bounded in a small neighborhood of θ 0.
In the following we prove Lemma 3.2 using Stein’s Identity. It should always be remembered that the derivatives of ft are in (13) and (14), and ∇ut and ∇2ut are in
(15) and (16). First note that if h is a polynomial of order r, U h and V h are of orders r − 1 and r − 2 (see Weng and Woodroofe [9, Lemma 8]); and that by (4), ΦpU h = 0
for even r. Then, by Taylor expansions, [∇ut(θ)]i = 1 2δ T t Diδt+ (Rem1) = 1 2Z T t ViZt+ (Rem1), (17) [∇2ut(θ)]ij = [Di]j.Σ−1t Zt+ 1 2 X k,s ˆ `(4)ijks[ZtT(ΣtT)−1ekeTsΣ −1 t Zt] + (Rem2), (18) where (Rem1) = (1/6) P jks`ˆ (4) ijksδtjδtkδts+ (1/24) P jksq` (5)
ijksq(˜θt)δtjδtkδtsδtq, ˜θtlies
be-tween θ and ˆθt, and (Rem2) has a similar form. So, Eξt{[U h(Zt)]iRem1} is bounded
by (C1)-(C3) and Cauchy-Schwartz inequality.
Next, let qk denote Hermite polynomials, given by qk(z)φ(z) = (−d/dz)kφ(z). For
instance, for k = 1, ..., 4 the Hermite polynomials are q1(z) = z, q2(z) = z2 − 1,
q3(z) = z3 − 3z, and q4(z) = z4− 6z2+ 3.
Theorem 3.1 Take h∗(ztp) in (??) as the indicator function 1(ztp ≤ w), where w ∈
<. Then, the marginal posterior distribution for the individual parameter θp is
Pξt(θp ≤ a) = Pξt(Ztp ≤ w) = Φ(w) − 6 X i=1,i6=5 1 i!qi−1(w)φ(w)E t ξ(qi(Ztp)) + O(t−3/2), (19) where w = [Σt]pp(a − ˆθtp).
By taking derivative of (19) with respect to a, we obtain the marginal posterior density ξpt(a) = [Σt]pp{φ(w) + 6 X i=1,i6=5 1 i!qi(w)φ(w)E t ξ(qi(Ztp)) + O(t−3/2)}. (20)
Observe that no renormalization is needed for this approximation asR<qi(w)φ(w)dw =
0.
References
[1] R. Johnson. An asymptotic expansion for posterior distributions. Ann. Math. Statist., 38:1899–1906, 1967.
[2] R. Johnson. Asymptotic expansions associated with posterior distributions. Ann. Math. Statist., 41:851–864, 1970.
[3] D. V. Lindley. The use of prior probability distributions in statistical inference and decisions. Proc. 4th. Berkeley Symp., 1:453–468, 1961.
[4] D. V. Lindley. Approximate bayesian methods. In J. M. Bernardo, M. H. DeG-root, D. V. Lindley, and A. F. M. S. (Eds.), editors, Bayesian Statistics. Univer-sity Press, 1980.
[5] F. Mosteller and D. L. Wallace. Inference and Disputed Authorship: The Feder-alist Papers. Addison-Wesley, Reading, Mass., 1964.
[6] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association, 81:82– 86, 1986.
[7] R. C. Weng. On Stein’s identity for posterior normality. Statistica Sinica, 13:495– 506, 2003.
[8] R. C. Weng. Stein’s identity for bayesian inference. manuscript, 2006.
[9] R. C. Weng and M. Woodroofe. Integrable expansions for posterior distributions for multiparameter exponential families with applications to sequential confi-dence levels. Statistica Sinica, 10:693–713, 2000.
[10] M. Woodroofe. Very weak expansions for sequentially designed experiments: linear models. The Annals of Statistics, 17:1087–1102, 1989.
[11] M. Woodroofe. Integrable expansions for posterior distributions for one-parameter exponential families. Statistica Sinica, 2:91–111, 1992.
Report on attending the 7th World Congress in Probability and Statis-tics Singapore
The 7th World Congress in Probability and Statistics was jointly sponsored by the Bernoulli Society and the Institute of Mathematical Statistics, two of the major international statistical societies. This year the conference was held in Singapore from July 14 to 19, 2008. This meeting is a major international event in probability and statistics held every four years. It covers a wide range of topics and features the latest scientific developments in the fields of probability and statistics and their applications.
I arrived on July 13 and stayed for 6 days. I presented my recent work on Stein’s Identity and its applications in Bayesian analysis. My talk was scheduled with some other Bayesian studies so that I got a good chance to see other Bayesian related work. I attended several other presentations and was impressed by some interesting talks such as “A picture is worth a thousand numbers: communicating uncertainties following statistical analysis” by David Spiegelhalter, “Probability and statistics in internet information retrieval” by Zhi-Ming Ma, and Luke Tierney’s talk on statistical computing.
I also browsed books displayed in the book stand and purchased one book relevant to my research.