• 沒有找到結果。

過濾變量DIRICHLET分配的研究(2/2)

N/A
N/A
Protected

Academic year: 2021

Share "過濾變量DIRICHLET分配的研究(2/2)"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

 



Dirichlet



A study on the filtered-variate Dirichlet

distributions



   

 

NSC 89-2118-M-004-011

NSC 90-2118-M-004-014



89



8



1



91



7



31







!"#

$%&

'%(

)*+,-./01234567

89:;<=>?@A,-BC

8DEFG;<=>?@A,-BC

;H9IJKLM@A,-NOP5Q(RBC

9IS>T 9:>T,-UBC

VW9XYZDJ2[\J]

 

(2)

 



Dirichlet



A study on the filtered-variate Dirichlet distributions

 

NSC 89-2118-M-004-011

NSC 90-2118-M-004-014

^

89



8



1



91



7



31





9XYZDJ2[\J]

jiangt@math.nccu.edu.tw

 !"# $%& '%(

9XYZDJ2[\J]

  Dirichlet   !" #$%&'()$*+,- Dickey ./ 0121998324567 $89: ;<=>?@2A2B CDEFGHIJ67 $KL89 (MNOP2J'()$QRS $TUVWX,Y3Dirichlet Z[2BC\D]^6TUV$ 2 G_`KL$(Mabc$defN [ g h i jk2TUV2 ,Y3Dirichlet2 89 Abstract

The filtered-variate Dirichlet distribu-tions prior have been extensively used for histogram smoothing problems, which had embarrassed Bayesians for decades (see

Dickey and Jiang, 1998). Therefore, the

assessments of these distributions are very important in Bayesian applications. In this paper, we develop and give easy assessment

methods for these distributions. In

ad-dition, the roughness of a random vector corresponding to histogram probabilities is usually related to the filtered-variate (gen-eralized) Dirichlet distributions. We study the distribution of the roughness and give easy methods to compute its expectation. Keywords: Bayesian inference, roughness,

filtered-variate (generalized) Dirichlet, prior assessment

§1 Introduction

It is well known among Bayesian statisti-cians that the class of Dirichlet distribu-tions is a natural conjugate prior family

for the multinomial sampling. However,

in practice, it is likely to encounter sam-pling probabilities with adjacent categories close in value. Hence, it is highly desir-able to have “smooth” prior distributions in such cases. Unfortunately, the Dirichlet distribution have the property where corre-sponding adjacent random quantities have negative correlations, which are in viola-tion to the smoothness assumpviola-tion. Dickey and Jiang (1998) successfully resolve this decades’ problem by replacing Dirichlet dis-tribution with the filtered-variate Dirichlet distribution.

Although there are many methods of the assessment of the filtered-variate Dirichlet distribution available and they seem to be promising, some non-statistician users (experts) may have difficulty in pro-viding required information for prior assess-ments. For example, providing some typical smooth probability vectors so that the em-pirical moments of these vectors having pri-or means and variances may not be an easy task for some users (experts). In Section 2, we propose alternative assessment methods that are easy to elicit prior smoothing in-formation.

(3)

Achieving an accurate estimation is usually more important than portraying the true smoothness. This is why most of re-searchers focus on the study of the poste-rior mean (or the mode) when the posteri-or distribution must be summarized in the form of a vector estimate. However, there may be occasions when it is reasonable to quote an estimate that exhibits a roughness equal to the posterior expected roughness. In Section 3, we shall also investigate this interesting problem of the roughness. Fi-nally, we give conclusions in Section 4.

§2 Prior assessment

Let v1, v2, . . . , vI be the unknown cell

prop-erties for multinomial sampling. It is well known that the corresponding natural con-jugate family is the Dirichlet distribution. The random quantities vi’s are nearly prior independent, with a slight negative associa-tion, because of the constraint on their sum. However, it is frequently the case in practice that probabilities corresponding to adjacent cells are subjectively positively correlated, that is, the cell properties are expected to be smooth.

Here, we shall perform a linear trans-formation of a Dirichlet vector so that the

new probability vector is smooth. Let’s

consider the filtered-variate Dirichlet vec-tor v ∼ FAD(b), where b = (b1, b2, . . . , bI)0 and I × I circular matrix A with the first

k(k < I) entries having the value 1/k and

the remaining entries having the value 0 for the first row, with the second to the k plus first entries having the value 1/k and the remaining entries having the value 0 for the second row, and so on.

The major issue now is then the prior assessment. An assessment procedure for the prior distribution, setting b+ = PIi=1bi and e = kAb, can be as follows:

Step 1. Elicit prior mean for the i-th cat-egorical probability, say mi. We would have mi = 1 k · ei b+ , i = 1, . . . , I.

Step 2. Elicit prior variance for the first categorical probability, say s2

1. We then have b+= m1(1 − m1k) k · s2 1 − 1.

Step 3. Compute ei,

ei = k · b+· mi, i = 1, 2, . . . , I. Step 4. Compute b,

b = 1 k · A

−1· e.

Note that an inverse matrix of a circular matrix A can be computed easily (see Mar-cus and Minc (1964)). Hence, the prior dis-tribution can be easily assessed.

§3 Roughness

Define the r-roughness of a vector v by the average of its squared r-distant difference,

Rr(v) = I−r X i=1 (vi− vi+r)2 I − r ,

with special interest in adjacent differences,

r = 1. That is, R1(v) = I−1 X i=1 (vi− vi+1)2 I − 1 . If we further define d = (d1, d2, . . . , dI)0,

where di = vi − vi+1, for i = 1, 2, . . . ,

I − 1, and dI = vI − v1, then R1(v) =

PI−1

i=1d2i/(I − 1). It can be shown that

d = Bv, where B is an I × I matrix and

B =            1 −1 0 0 · · · 0 0 0 1 −1 0 · · · 0 0 0 0 1 −1 · · · 0 0 .. . ... ... ... ... ... ... 0 0 0 0 · · · 1 −1 −1 0 0 0 · · · 0 1            .

Consider the filtered-variate generalized Dirichlet distribution v ∼ FAD(b, H, c), where v = Au, A is an I ×I matrix, u is an

I × 1 random vector, and u ∼ D(b, H, c).

Note that such generalized Dirichlet distri-bution u was first defined by Dickey (1983). In addition, D(b, H, c) ∼ D(b) if H = 0 or

c = 0. Since d = Bv, hence d = BAu.

Note that d is not a probability vector and that d is not a regular filtered-variate gener-alized Dirichlet distribution. However, the moments of d can be expressed similarly as

(4)

those of the regular filtered-variate general-ized Dirichlet distribution.

Before studying the distribution of R1,

we shall reexpress R1 first. Let D be a

sub-matrix of B by removing the last row of B.

Hence, D is an I − 1 by I matrix. R1 is

then expressed as [1/(I − 1)](u0A0D0DAu). Since A0D0 is the transpose matrix of DA, hence A0D0DA, denoted by C(I × I), is a

symmetric matrix. Therefore,

R1 =

1

I − 1(u

0Cu),

where u is an I × 1 random vector, C is a

I × I symmetric constant matrix. Now, by

Theorem 1.7 of Seber (1977, p. 13), we have

E(R1) =

1

I − 1[tr(CΣ) + E(u)

0CE(u)] , (1) where Σ is the variance-covariance matrix of random vector u. Equation (1) gives us the expected roughness.

To apply equation (1) in detail, we shall consider the special case when I = 5 and u ∼ D(b). If A is a 5 × 5 matrix and

k = 4, then, d = BAu, where u ∼ D(b)

and B is a 5 × 5 matrix. Since D is a sub-matrix of B by removing the last row of

matrix B, D =      1 −1 0 0 0 0 1 −1 0 0 0 0 1 −1 0 0 0 0 1 −1     .

It can be shown that

C = A0D0DA = 1 16         2 −1 0 0 −1 −1 2 −1 0 0 0 −1 2 −1 0 0 0 −1 1 0 −1 0 0 0 1         .

Hence, R1 = (1/4)(u0Cu). If we define G =

16C, then the expected 1-roughness E(R1)

can be computed by equation (1), that is

E(R1) =

1

43[tr(GΣ) + E(u)

0GE(u)] ,

where Σ is the variance-covariance matrix of u. It can be shown that

tr(GΣ) = 1 b2 +(b++ 1) (6b1b2+ 4b1b3 + 3b1b4+ 5b1b5+ 6b2b3+ 3b2b4+ 3b2b5 + 5b3b4+ 3b3b5+ 2b4b5). In addition, E(u)0GE(u) = 1 b2 + (2b21+ 2b22+ 2b23 + b24+ b25− 2b1b2− 2b2b3− 2b3b4). Therefore, we have E(R1) = 1 43b +(b++ 1){(2b 1+ 2b2+ 2b3 + b4+ b5) + (2b21+ 2b 2 2+ 2b 2 3+ b 2 4+ b 2 5 − 2b1b2 − 2b1b5− 2b2b3− 2b3b4)} .

If u is a generalized Dirichlet distribu-tion, (i.e., u ∼ D(b, H, c)), then the prob-ability density function of u is

f (u) = 1 B(b) I Y i=1 ubi−1 i !   J Y j=1 I X i=1 uihij !cj  R(b, H, −c) , (2) where the normalized constant R is the Carlson’s (see Carlson (1977)) multiple hy-pergeometric function. It can be seen that the moments of u are the ratios of R’s. These ratios can be computed by the meth-ods given by Jiang, Kadane, and Dickey (1992). We can also approximate moments by the quasi-Bayes method. Note the nu-merator of the right-hand side of equa-tion (2) can be regarded as the product of a Dirichlet prior probability density func-tion and a likelihood of the censored data

hQJ j=1

PI

i=1uihij

cji

, where cj is the num-ber of observations reported as the j-th re-port set. The total number of observations is then n ≡ c+=

PJ

j=1cj. For convenience, we shall assume that these data are received sequentially. Therefore, the posterior prob-ability density function, after receipt of the first report R1 = r1, has the form,

f (u) = I X m=1 B(1) m hmr1 B+(1) ( I Y i=1 ubi+δmi −1 i !, Bm(1) ) , (3) where B(1) m =B(b + δ m), δm=(δm 1 , . . . , δIm), and B(1)+ = PIm=1B(1) m hmr1. If we were

(5)

first subject, then the posterior density (3) would become 1 B(b + c(1)) I Y i=1 ubi+c (1) i −1 i ! , (4)

where c(1) = (c(1)1 , c(1)2 , . . . , c(1)I ) and each

c(1)i is 1 if the true category of the first sub-ject is i, and is 0 otherwise. However, we are not further informed the true category of the first subject. Instead, we base our de-cision upon the quasi-datum d(1)i = E[Ci(1) |

R1 = r1] = Pr(C (1)

i = 1 | r1), the

expect-ed value of Ci(1) posterior to the datum r1.

Now, we have d(1)i = Pr(R1 = r1 | C (1) i = 1)Pr(C (1) i = 1) I X k=1 [Pr(R1 = r1 | C (1) k = 1)Pr(C (1) k = 1)] . Let ˆ d(1)i = hir1uˆ (0) i PI k=1hkr1uˆ (0) k ,

where ˆu(0)i is the prior mean of ui. We use ˆ

d(1)i to estimate d(1)i . This provides the key to the procedure. We approximate (3) by the Dirichlet density (4),

ˆ f1(u) = 1 B(b + ˆd(1)) I Y i=1 ubi+ ˆd(1)i −1 i = 1 B(b(1)) I Y i=1 ub (1) i −1 i (5)

where b(1) is the updated parameter vector at the first step. That is, b(1)i = bi + ˆd

(1)

i for all i. This approximate posterior distri-bution is within the Dirichlet distridistri-bution

family. Subsequent updating proceeds in

the identical manner. For the n-th step, af-ter receiving the n-th report Rn = rn, our approximate posterior distribution of u is

ˆ fn(u | R1 = r1, . . . , Rn = rn) = 1 B(b(n−1)+ ˆd(n)) I Y i=1 ub(n−1)i + ˆd (n) i −1 i ,

where b(n−1) is the updated parameter vec-tor at the (n − 1)st step. The approximate general posterior moment of ui’s is

ˆ E I Y i=1 ub0i i ! = B(b(n)+ b0).B(b(n)).

Hence, for example, the posterior mean of

θi, based on r1, r2, . . . , rn, can be approxi-mated by ˆ u(n)i = b (n) i b(n)+ = b (n−1) i +Pr(Cc (n) i = 1 | Rn= rn) b(0)+ + n . §4 Conclusions

The filtered-variate Dirichlet distribution family is important for the problems of the Bayesian local smoothness. We now pro-vide a new prior assessment method for the filtered-variate Dirichlet distribution. It is easy enough for even non-statisticians to apply. Although we are usually more in-terested in an accurate estimation than the true smoothness, there are occasions that the roughness is a concern. We also provide an easy method to compute the expected value of roughness for the filtered-variate (generalized) Dirichlet distribution.

References

[1] B. C. Carlson (1977), Special Functions

of Applied Mathematics, Academic Press, New York.

[2] J. M. Dickey (1983), “Multiple hyper-geometric functions: Probabilistic inter-pretations and statistical uses,” Journal of the American Statistical Association, 78, 628–637.

[3] Dickey, J. M. and Jiang, T. J. (1998),

“Filtered-Variate Prior Distributions

for Histogram Smoothing,” Journal of the American Statistical Association, 93, 651–662.

[4] T. J. Jiang, J. B. Kadane, and

J. M. Dickey (1992), “Computation of Carlson’s multiple hypergeometric func-tion R for Bayesian applicafunc-tions,” Jour-nal of ComputatioJour-nal and Graphical Statistics, 1, 231–251.

[5] Marcus, Marvin and Minc, Henryk (1964), A Survey of Matrix Theory and

Matrix Inequalities, Allyn and Bacon,

Boston.

[6] Seber, G. A. F. (1977),Linear

參考文獻

相關文件

• Three uniform random numbers are used, the first one determines which BxDF to be sampled and then sample that BxDF using the other two random numbers.. Sampling

Too good security is trumping deployment Practical security isn’ t glamorous... USENIX Security

Central lab was done for toxicity check yet the sampling time is not within the allowable range as stated in protocol. Central lab data was reviewed prior to

It has been well-known that, if △ABC is a plane triangle, then there exists a unique point P (known as the Fermat point of the triangle △ABC) in the same plane such that it

• One technique for determining empirical formulas in the laboratory is combustion analysis, commonly used for compounds containing principally carbon and

Population: the form of the distribution is assumed known, but the parameter(s) which determines the distribution is unknown.. Sample: Draw a set of random sample from the

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

(a) A special school for children with hearing impairment may appoint 1 additional non-graduate resource teacher in its primary section to provide remedial teaching support to