後驗分配的近似計算(2/2)

(1)

行政院國家科學委員會專題研究計畫成果報告

後驗分配的近似計算(2/2)

計畫類別：個別型計畫計畫編號： NSC93-2118-M-004-001- 執行期間： 93 年 08 月 01 日至 94 年 07 月 31 日執行單位：國立政治大學統計學系計畫主持人：翁久幸報告類型：完整報告報告附件：出席國際會議研究心得報告及發表論文處理方式：本計畫可公開查詢

中華民國 94 年 10 月 18 日

(2)

Approximate Computations for Posterior

Distributions

NSC93-2118-M-004-001

92.8.1-94.7.31

Ruby C. Weng

Department of Statistics, National Chengchi University

October 18, 2005

Abstract

This project describes a method for approximating posterior expectations of functions of the parameter. First the posterior density of a data dependent transformation Zt of the parameter is expressed as a form close to a normal

density. Next, a version of Stein's Identity is applied to the posterior distri-bution to obtain posterior moments of Zt. Then the results are converted to

derive second-order approximations to posterior expectations of functions (not necessarily positive) of the parameter.

Key words: maximum likelihood estimator; posterior distributions; Stein's

identity.

1 Introduction

Let g() be a smooth function on the parameter space . The estimation of the posterior mean of g(), given a sample of observations x(t)_{, requires integraion over}

of the form Et [g()] = E[g()jjx(t)] = R _Rg()e`t()()d e`t()()d ; (1)

where `t is the log-likelihood function. If the likelihood function has a dominant

mode, Laplace method will be suitable for approximating the integrals. Many authors have applied Laplace method to nd approximations to the ratios of integrals in (1). For example, Lindley [2] derived the second order approximation for the integral.

(3)

Tierney and Kadane [5] applied the Laplace method in a special form in which g is assumed to be positive, the integrand of the numerator in (1) is expressed as exp[`t() + logg() + log()] (called fully exponential Laplace approximations) and

is expanded at the mode of the integrand itself, rather than at the posterior mode. For a general function g (possibly non-positive), Tierney, Kass, and Kadane [6] obtained a second order expansion of the posterior expectation by applying the fully exponential

method to approximate the moment generating function Et

[exp(sg())] and then

dierentiating (called the MGF method).

In this project we present a method, based on a version of Stein's Identity, for the problem of estimating the posterior mean of a smooth function of the parameter. First the posterior density of a data dependent transformation Zt(2) of the parameter

is converted into a form close to a normal density. Next, a version of Stein's Identity is applied to the posterior distribution to obtain posterior moments of Zt. Then the

results are converted to derive second-order approximations to posterior expectations of functions (not necessarily positive) of the parameter.

2 The Model and Stein's Identity

Let Xt be a random vector distributed according to a family of probability densities

pt(xtj), where t is a discrete or continuous parameter and 2 , an open subset in

<p_{. Assume that the log-likelihood function, denoted by `}

t(), is twice continuously

dierentiable with respect to . Throughout let ^tbe a root of the likelihood equation

satisfying r`t(^t) = 0, where r indicates dierentiation with respect to . Whenever

such a root exists and r2_`

t(^t) is positively denite, we dene t and the data

dependent transformation Zt as

0

tt = r2`t(^t)

Zt = t( ^t); (2)

otherwise, dene t and Zt arbitrarily (in a measurable way).

Consider a Bayesian model in which has a prior density . Then the posterior density of given data xt is t() / e`t()(), and the posterior density of Zt is

(4)

where the relation of and z is given in (2). Now a Taylor's expansion gives `t() = `t(^t) + 1₂( ^t)0r2`t(t)( ^t);

where

t lies between and ^t. Let

ut() = 1₂( ^t)0[r2`t(^t) r2`t(t)]( ^t);

it follows that

`t() = `t(^t) ₂1jjztjj2+ ut() (4)

and (3) can be rewritten as

t(z) / p(z)ft(z); (5)

where ft(z) = ((z))exp[ut()] and p(z) denotes the standard p-variate normal

density.

Throughout r and r2_{denote the gradient and Hessian of with respect to ,}

rf and r2_{f the gradient and Hessian of f with respect to Z, and E}t

and Vt the

conditional expectation and variance given data xt.

Stein's Identity Let p denote the standard p-variate normal distribution and write

ph =

Z hdp

for functions h for which the integral is nite. Next let denote a nite signed

measure of the form d = fdp; where f is a real-valued function dened on <p

satisfying pjfj = R jfjdp < 1: For k > 0, denote Hk as the collection of all

measurable functions h : <p _{! < for which jh(z)j=b 1 + jjzjj}k _{for some b > 0 and}

H = [k0Hk. Given h 2 Hk, let h0 = ph, hp = h, hj(y1; :::; yj) = Z <p jh(y1; :::; yj; w)p j(dw); (6) and gj(y1; :::; yp) = e12yj2 Z ₁ yj [hj(y1; :::; yj 1; w) hj 1(y1; :::; yj 1)]e 12w2dw; (7)

for 1 < y1; :::; yp < 1 and j = 1; :::; p. Then let Uh = (g1; :::; gp)T. Note that U

may be iterated. Let V h = (U2_{h + U}2_h0_{)=2, where U}2_{h is the p p matrix whose j-th}

column is Ugj and gj is as in (7). Then V h is a symmetric matrix. For example, for

z 2 <p_{, if h(z) = z}

(5)

Lemma 2.1 (Stein0_{s Identity) Let r be a nonnegative integer. Suppose that d =}

fdp as above, where f is a dierentiable function on <p, for which

Z <pjfjdp+ Z <p(1 + jjzjj r_)jjrf(z)jj p(dz) < 1; then h = 1 ph + Z <p(Uh(z)) T_rf(z) p(dz);

for all h 2 Hr. If @f=@zj; j = 1; :::; p, are dierentiable, and

Z <p(1 + jjzjj r_)jjr2_f(z)jj p(dz) < 1; then h = 1 ph + p(Uh)T Z <prf(z)p(dz) + Z <ptr[(V h(z))r 2_f(z)] p(dz); for all h 2 Hr:

Observe from (5) that the posterior distribution of Zt is of a form suitable for

Stein's Identity. Let Btdenote the event fr`t(^t) = 0; r2`t(^t) is positively deniteg:

Suppose that has a compact support 1 2 and r is continuous. Then, jjrjj is

bounded on 1 and we can verify that

Z <pjfjdp+ Z <p(1 + jjzjj r_)jjrf t(z)jjp(dz) < 1: Hence, by Lemma 2.1 Et fh(Zt)g = ph + Etf[Uh(Zt)]Trf_f t(Zt) t(Zt) g; (8)

a:e: on Bt, for all h 2 H. If also r2 is continuous, then similar arguments lead to

Et fh(Zt)g = ph + (pUh)TEt[rf_f t(Zt) t(Zt) ] + E t ftr[V h(Zt)r 2_f t(Zt) ft(Zt) ]g (9)

a:e: on Bt, for all h 2 H.

3 Main Results

In this section we present approximations of posterior moments of Zt and use it to

(6)

Lemma 3.2 If h(z) = zizj, 1 i j p, then

(i) gj(z) = zi and gk(z) = 0 for k 6= j

(ii) Uh = (0; ; 0)T (iii) tr[V h(z)r2ft(z) ft(z) ] = [ r2_f_t_(z) ft(z) ]ij: Et (r ) = r^_^ + O(t 1) and Et(r 2 ) = r2^ ^ + O(t 1): (10) Theorem 3.1 (i)Et Zt = (Tt) 1[(r^_^ ) + 1₂U] + O(t 3=2); (ii)Vt Zt = Ip+ (Tt) 1[(r 2^ ^ ) + (r^^ )(r^ T ^ ) + W ]t1+ O(t 2);

where U is a vector and W a matrix involving higher order derivatives of `.

4 Applications

4.1 Linkage example

Here we consider an example presented in Rao [3] and reexamined by Tanner and Wong [4] and references therein. From a genetic linkage model, it is believed that 197 animals are distributed multinomially into four categories, y = (y1; y2; y3; y4) =

(125; 18; 20; 34), with cell probabilities specied by (1

2 + 4;1 4 ;1 4 ;4).

Tanner and Wong [4] also consider a second version of the sata in which the sample size is reduced by a factor of 10, y = (125; 18; 20; 34). As suggested in their paper, here we choose the uniform prior for 2 (0; 1) and assess the performance of our method using both the large sample and small sample data. Table 1 reports the exact posterior means and variances of (carried out by matlab), and the approximations using our approach.

Table: Linkage example

Large sample Small sample

Method posterior mean posterior variance posterior mean posterior variance

Exact 0.6228 0.0026 0.5704 0.0225

(7)

5 Conclusions

In conclusion, we use Stein's Identity to approximate posterior moments of a suitably normalized quantity. These moments are useful in the evaluation of posterior means and variances of g(). Unlike Laplace method (for positive function), ours requires

third derivatives of `t, but we need only posterior mode for all g. Some formulas

presented here are new, while some agree with results in earlier approaches such as Johnson [1] and Tierney, Kass, and Kadane [6].

References

[1] R. Johnson. Asymptotic expansions associated with posterior distributions. Ann. Math. Statist., 41:851{864, 1970.

[2] D. V. Lindley. The use of prior probability distributions in statistical inference and decisions. Proc. 4th. Berkeley Symp., 1:453{468, 1961.

[3] C. R. Rao. Linear Statistical Inference and its Applications. John Wiley, New York, Second edition, 2001.

[4] M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation. Journal of American Statistical Association, 82:528{540, 1987. [5] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments

and marginal densities. Journal of American Statistical Association, 81:82{86, 1986.

[6] L. Tierney, R. E. Kass, and J. B. Kadane. Fully exponential laplace approxima-tions to expectaapproxima-tions and variances of nonpositive funcapproxima-tions. Journal of American Statistical Association, 84:710{716, 1989.

後驗分配的近似計算(2/2)

行政院國家科學委員會專題研究計畫 成果報告