• 沒有找到結果。

後驗分配的近似計算(2/2)

N/A
N/A
Protected

Academic year: 2021

Share "後驗分配的近似計算(2/2)"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

後驗分配的近似計算(2/2)

計畫類別: 個別型計畫 計畫編號: NSC93-2118-M-004-001- 執行期間: 93 年 08 月 01 日至 94 年 07 月 31 日 執行單位: 國立政治大學統計學系 計畫主持人: 翁久幸 報告類型: 完整報告 報告附件: 出席國際會議研究心得報告及發表論文 處理方式: 本計畫可公開查詢

中 華 民 國 94 年 10 月 18 日

(2)

Approximate Computations for Posterior

Distributions

NSC93-2118-M-004-001

92.8.1-94.7.31

Ruby C. Weng

Department of Statistics, National Chengchi University

October 18, 2005

Abstract

This project describes a method for approximating posterior expectations of functions of the parameter. First the posterior density of a data dependent transformation Zt of the parameter is expressed as a form close to a normal

density. Next, a version of Stein's Identity is applied to the posterior distri-bution to obtain posterior moments of Zt. Then the results are converted to

derive second-order approximations to posterior expectations of functions (not necessarily positive) of the parameter.

Key words: maximum likelihood estimator; posterior distributions; Stein's

identity.

1 Introduction

Let g() be a smooth function on the parameter space . The estimation of the posterior mean of g(), given a sample of observations x(t), requires integraion over

 of the form Et [g()] = E[g()jjx(t)] = R Rg()e`t()()d e`t()()d ; (1)

where `t is the log-likelihood function. If the likelihood function has a dominant

mode, Laplace method will be suitable for approximating the integrals. Many authors have applied Laplace method to nd approximations to the ratios of integrals in (1). For example, Lindley [2] derived the second order approximation for the integral.

(3)

Tierney and Kadane [5] applied the Laplace method in a special form in which g is assumed to be positive, the integrand of the numerator in (1) is expressed as exp[`t() + logg() + log()] (called fully exponential Laplace approximations) and

is expanded at the mode of the integrand itself, rather than at the posterior mode. For a general function g (possibly non-positive), Tierney, Kass, and Kadane [6] obtained a second order expansion of the posterior expectation by applying the fully exponential

method to approximate the moment generating function Et

[exp(sg())] and then

di erentiating (called the MGF method).

In this project we present a method, based on a version of Stein's Identity, for the problem of estimating the posterior mean of a smooth function of the parameter. First the posterior density of a data dependent transformation Zt(2) of the parameter

is converted into a form close to a normal density. Next, a version of Stein's Identity is applied to the posterior distribution to obtain posterior moments of Zt. Then the

results are converted to derive second-order approximations to posterior expectations of functions (not necessarily positive) of the parameter.

2 The Model and Stein's Identity

Let Xt be a random vector distributed according to a family of probability densities

pt(xtj), where t is a discrete or continuous parameter and  2 , an open subset in

<p. Assume that the log-likelihood function, denoted by `

t(), is twice continuously

di erentiable with respect to . Throughout let ^tbe a root of the likelihood equation

satisfying r`t(^t) = 0, where r indicates di erentiation with respect to . Whenever

such a root exists and r2`

t(^t) is positively de nite, we de ne t and the data

dependent transformation Zt as

0

tt = r2`t(^t)

Zt = t( ^t); (2)

otherwise, de ne t and Zt arbitrarily (in a measurable way).

Consider a Bayesian model in which  has a prior density . Then the posterior density of  given data xt is t() / e`t()(), and the posterior density of Zt is

(4)

where the relation of  and z is given in (2). Now a Taylor's expansion gives `t() = `t(^t) + 12( ^t)0r2`t(t)( ^t);

where 

t lies between  and ^t. Let

ut() = 12( ^t)0[r2`t(^t) r2`t(t)]( ^t);

it follows that

`t() = `t(^t) 21jjztjj2+ ut() (4)

and (3) can be rewritten as

t(z) / p(z)ft(z); (5)

where ft(z) = ((z))exp[ut()] and p(z) denotes the standard p-variate normal

density.

Throughout r and r2 denote the gradient and Hessian of  with respect to ,

rf and r2f the gradient and Hessian of f with respect to Z, and Et

 and Vt the

conditional expectation and variance given data xt.

Stein's Identity Let p denote the standard p-variate normal distribution and write

ph =

Z hdp

for functions h for which the integral is nite. Next let denote a nite signed

measure of the form d = fdp; where f is a real-valued function de ned on <p

satisfying pjfj = R jfjdp < 1: For k > 0, denote Hk as the collection of all

measurable functions h : <p ! < for which jh(z)j=b  1 + jjzjjk for some b > 0 and

H = [k0Hk. Given h 2 Hk, let h0 = ph, hp = h, hj(y1; :::; yj) = Z <p jh(y1; :::; yj; w)p j(dw); (6) and gj(y1; :::; yp) = e12yj2 Z 1 yj [hj(y1; :::; yj 1; w) hj 1(y1; :::; yj 1)]e 12w2dw; (7)

for 1 < y1; :::; yp < 1 and j = 1; :::; p. Then let Uh = (g1; :::; gp)T. Note that U

may be iterated. Let V h = (U2h + U2h0)=2, where U2h is the p  p matrix whose j-th

column is Ugj and gj is as in (7). Then V h is a symmetric matrix. For example, for

z 2 <p, if h(z) = z

(5)

Lemma 2.1 (Stein0s Identity) Let r be a nonnegative integer. Suppose that d =

fdp as above, where f is a di erentiable function on <p, for which

Z <pjfjdp+ Z <p(1 + jjzjj r)jjrf(z)jj p(dz) < 1; then h = 1  ph + Z <p(Uh(z)) Trf(z) p(dz);

for all h 2 Hr. If @f=@zj; j = 1; :::; p, are di erentiable, and

Z <p(1 + jjzjj r)jjr2f(z)jj p(dz) < 1; then h = 1  ph + p(Uh)T Z <prf(z)p(dz) + Z <ptr[(V h(z))r 2f(z)] p(dz); for all h 2 Hr:

Observe from (5) that the posterior distribution of Zt is of a form suitable for

Stein's Identity. Let Btdenote the event fr`t(^t) = 0; r2`t(^t) is positively de niteg:

Suppose that  has a compact support 1 2  and r is continuous. Then, jjrjj is

bounded on 1 and we can verify that

Z <pjfjdp+ Z <p(1 + jjzjj r)jjrf t(z)jjp(dz) < 1: Hence, by Lemma 2.1 Et fh(Zt)g = ph + Etf[Uh(Zt)]Trff t(Zt) t(Zt) g; (8)

a:e: on Bt, for all h 2 H. If also r2 is continuous, then similar arguments lead to

Et fh(Zt)g = ph + (pUh)TEt[rff t(Zt) t(Zt) ] + E t ftr[V h(Zt)r 2f t(Zt) ft(Zt) ]g (9)

a:e: on Bt, for all h 2 H.

3 Main Results

In this section we present approximations of posterior moments of Zt and use it to

(6)

Lemma 3.2 If h(z) = zizj, 1  i  j  p, then

(i) gj(z) = zi and gk(z) = 0 for k 6= j

(ii) Uh = (0;    ; 0)T (iii) tr[V h(z)r2ft(z) ft(z) ] = [ r2ft(z) ft(z) ]ij: Et (r ) = r^^ + O(t 1) and Et(r 2  ) = r2^ ^ + O(t 1): (10) Theorem 3.1 (i)Et Zt = (Tt) 1[(r^^ ) + 12U] + O(t 3=2); (ii)Vt Zt = Ip+ (Tt) 1[(r 2^ ^ ) + (r^^ )(r^ T ^ ) + W ]t1+ O(t 2);

where U is a vector and W a matrix involving higher order derivatives of `.

4 Applications

4.1 Linkage example

Here we consider an example presented in Rao [3] and reexamined by Tanner and Wong [4] and references therein. From a genetic linkage model, it is believed that 197 animals are distributed multinomially into four categories, y = (y1; y2; y3; y4) =

(125; 18; 20; 34), with cell probabilities speci ed by (1

2 + 4;1 4 ;1 4 ;4).

Tanner and Wong [4] also consider a second version of the sata in which the sample size is reduced by a factor of 10, y = (125; 18; 20; 34). As suggested in their paper, here we choose the uniform prior for  2 (0; 1) and assess the performance of our method using both the large sample and small sample data. Table 1 reports the exact posterior means and variances of  (carried out by matlab), and the approximations using our approach.

Table: Linkage example

Large sample Small sample

Method posterior mean posterior variance posterior mean posterior variance

Exact 0.6228 0.0026 0.5704 0.0225

(7)

5 Conclusions

In conclusion, we use Stein's Identity to approximate posterior moments of a suitably normalized quantity. These moments are useful in the evaluation of posterior means and variances of g(). Unlike Laplace method (for positive function), ours requires

third derivatives of `t, but we need only posterior mode for all g. Some formulas

presented here are new, while some agree with results in earlier approaches such as Johnson [1] and Tierney, Kass, and Kadane [6].

References

[1] R. Johnson. Asymptotic expansions associated with posterior distributions. Ann. Math. Statist., 41:851{864, 1970.

[2] D. V. Lindley. The use of prior probability distributions in statistical inference and decisions. Proc. 4th. Berkeley Symp., 1:453{468, 1961.

[3] C. R. Rao. Linear Statistical Inference and its Applications. John Wiley, New York, Second edition, 2001.

[4] M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation. Journal of American Statistical Association, 82:528{540, 1987. [5] L. Tierney and J. B. Kadane. Accurate approximations for posterior moments

and marginal densities. Journal of American Statistical Association, 81:82{86, 1986.

[6] L. Tierney, R. E. Kass, and J. B. Kadane. Fully exponential laplace approxima-tions to expectaapproxima-tions and variances of nonpositive funcapproxima-tions. Journal of American Statistical Association, 84:710{716, 1989.

參考文獻

相關文件

The disadvantage of the inversion methods of that type, the encountered dependence of discretization and truncation error on the free parameters, is removed by

Teachers may consider the school’s aims and conditions or even the language environment to select the most appropriate approach according to students’ need and ability; or develop

利用 determinant 我 們可以判斷一個 square matrix 是否為 invertible, 也可幫助我們找到一個 invertible matrix 的 inverse, 甚至將聯立方成組的解寫下.

This paper presents (i) a review of item selection algorithms from Robbins–Monro to Fred Lord; (ii) the establishment of a large sample foundation for Fred Lord’s maximum

Then, we tested the influence of θ for the rate of convergence of Algorithm 4.1, by using this algorithm with α = 15 and four different θ to solve a test ex- ample generated as

Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions,

Chen, The semismooth-related properties of a merit function and a descent method for the nonlinear complementarity problem, Journal of Global Optimization, vol.. Soares, A new

Particularly, combining the numerical results of the two papers, we may obtain such a conclusion that the merit function method based on ϕ p has a better a global convergence and