• 沒有找到結果。

Objective Bayes Factors for Inequality Constrained Hypotheses

N/A
N/A
Protected

Academic year: 2022

Share "Objective Bayes Factors for Inequality Constrained Hypotheses"

Copied!
23
0
0

加載中.... (立即查看全文)

全文

(1)

Objective Bayes Factors for Inequality Constrained Hypotheses

Herbert Hoijtink

1,2

1Department of Methods and Statistics, Utrecht University, P. O. Box 80140, 3508 TC Utrecht, The Netherlands

2CITO Institute for Educational Measurement, P. O. Box 1034, 6801 MG Arnhem, The Netherlands E-mail: H.Hoijtink@uu.nl or herbert.hoijtink@cito.nl

Summary

This paper will present a Bayes factor for the comparison of an inequality constrained hypothesis with its complement or an unconstrained hypothesis. Equivalent sets of hypotheses form the basis for the quantification of the complexity of an inequality constrained hypothesis. It will be shown that the prior distribution can be chosen such that one of the terms in the Bayes factor is the quantification of the complexity of the hypothesis of interest. The other term in the Bayes factor represents a measure of the fit of the hypothesis. Using a vague prior distribution this fit value is essentially determined by the data. The result is an objective Bayes factor.

Key words: Bayes factor; equivalent hypotheses; inequality constraints; model complexity.

1 Inequality Constrained Hypotheses

The majority of researchers using statistics for the analysis of their data is familiar with the concept of the null hypothesis. Let x denote an N× J matrix containing the responses of i = 1, . . . , N persons to j = 1, . . . , J variables, θpfor p= 1, . . . , P the structural parameters, andφ the nuisance parameters of a statistical model with density f (x | θ, φ). Two well-known appearances of the null hypothesis are

H0:θ1= · · · = θP, (1)

and

H0 :θ1 = 0, . . . , θP= 0. (2)

The first occurs, for example, in a one-way ANOVA (Gelman et al., 2004, pp. 406–411) where the null states that P populations means are equal, the second in a linear regression (Gelman et al., 2004, Chapter 14) where the null states that none of P predictors have an effect on the dependent variable. In both models the nuisance parameterφ is the residual variance. Another familiar concept is that of the alternative hypothesis

H1 : not H0. (3)

In exploratory statistical analyses H0and H1have a central role. The main research questions are “which elements ofθ are not equal” (related to (1)) or “which elements of θ are not equal

 

(2)

to zero” (related to (2)). These questions are usually not answered after an hypothesis like (1) or (2) is evaluated. If, for example, (1) is rejected using a null-hypothesis significance test, it is clear that “something is going on”, but is it still unclear “what is going on”. To clarify the latter, usually follow-up tests focussing on specific elements ofθ are executed in combination with evaluation of the estimate of θ to support the interpretation of significant p-values.

There is by now a wealth of literature criticizing the use of p-values for the evaluation of hypotheses. A comprehensive overview is given by Wagenmakers (2007). Some of the arguments against the use of p-values are: that they cannot be used to quantify support in favour of the null-hypothesis; that they overstate the amount of evidence against the null-hypothesis;

and, that usually multiple p-values are needed to evaluate omnibus hypotheses like (1) and (2).

The latter leads to multiple testing problems that, if properly addressed, seriously impede the statistical power of the tests executed. Furthermore, there is criticism with respect to the null- hypothesis itself (Cohen, 1994; Van de Schoot et al., 2011). The main argument is that research hypotheses are almost never formulated in terms of the null hypothesis “nothing is going on” or the alternative hypothesis “something is going on but I don’t know what”. This does not address the needs of researchers who usually have rather clear expectations about “what is going on” in the population in which they are interested.

In this paper both problems will be addressed. Instead of the p-value, the Bayes factor (Kass & Raftery, 1995) will be used to evaluate hypotheses. Furthermore, H0 and H1 as are used in exploratory statistical analysis, will be replaced by Hi, Hc, and Hu. As will be illustrated in the four examples that follow, Hi is an hypothesis that reflects the researcher’s expectations about what is going on the population of interest by means of inequality constraints among the parametersθ; Hcis the complement of Hi, that is, it is “not the expectation of the researcher”;

and, Huis an hypotheses without constraints onθ which is also of importance for the computation of the Bayes factor.

Example 1: In item response theory (De Boeck & Wilson, 2004) person fit analysis (see Meijer & Sijtsma (2001) for a comprehensive overview) is used to investigate whether a person shows test taking behaviour as intended or not. Examples of not intended test taking behaviour are cheating (copying answers from your neighbour) and fatigue (giving random responses to the items presented towards the end of a test). Let ξ denote the ability of a person and and δp the difficulty of item p= 1, . . . , P. The smaller p, the easier the item, that is, the items can be ordered on a latent trait from easy to difficult. Let xp ∈ {0, 1} denote the response to item p, where 0 denotes an incorrect and 1 a correct response. Letθp denote the probability of responding correctly to an item. This probability is a function ofξ, δp, and possibly other item parameters. In, for example, the Rasch model,θp= exp(ξ − δp)/(1 + exp(ξ − δp)), that is, the largerδp, the smallerθp. Hereθp will be treated as a latent variable without specifying the underlying item response model. Intended behaviour results ifθp decreases if p increases, that is,θ1> θ2> · · · > θP. For a test consisting of six items (e.g., an arithmetic ability test with items like: (1) 5+ 12 =; (2) 19 + 23 =; (3) 81 + 112 =; (4) 213 + 712 =; (5) 7332 + 832 =;

and (6) 8312+ 1222 =), the inequality constrained hypothesis is

Hi :θ1> θ2> θ3 > θ4 > θ5 > θ6. (4) This hypothesis will be evaluated for two hypothetical persons. One responding x= [111000]

to these items and one responding x= [000111]. The first person shows test taking behaviour as intended because easier items are responded to correctly up to a certain level. Consequently the second person who only responds correctly to the most difficult items does not show test taking behaviour as intended.

(3)

Table 1

Mean confidence ratings for the data of hasel and kassin (2009).

Condition Phase a Phase b N

1 5.95 8.34 41

2 5.63 4.47 43

3 4.48 3.63 43

4 5.61 2.65 46

Example 2: Hasel & Kassin (2009) investigated whether “confessions corrupt eyewitness identifications”. Participants in the experiment witnessed a theft. Each participant had to identify the thief from a six-person target-absent photographic lineup. The participants who made an identification gave a confidence rating (Phase a) for their identification. After 2 days these participants were randomly assigned to four conditions. In Condition 1 they were told that the person they identified confessed, in Condition 2 that all suspects denied, in Condition 3 that the person they identified denied, and in Condition 4 that another person confessed.

Hereafter these participants gave another confidence rating (Phase b) for their identification (mean confidence ratings are presented in Table 1).

Hassel & Kassin (2009) write that “. . . we tested the provocative hypothesis that a confession will lead eyewitnesses to change. . . their confidence in those [identification] decisions”. The inequality constrained hypothesis (Hi) implied in their paper is:

Hi :θ1b > θ1a, θ2b< θ2a, θ3b< θ3a, θ4b < θ4a, θ1b > θ2b> θ3b> θ4b, (5) whereθ denotes the mean in the condition-phase combination indicated. This hypothesis states that if the person identified confesses the confidence increases, in the other three conditions the confidence decreases. Furthermore, the confidence rating in Phase b decreases from Condition 1 to Condition 4. This is logical because the amount of secondary evidence provided for the identification made by a person decreases from Condition 1 to Condition 4.

Example 3: Norton & Dunn (1985) investigated the relation between snoring and various conditions such as having a heart disease or not. Their data are displayed in Table 2. Their main expectation is that there is a positive association between the degree of snoring and having a heart disease or not. Degree of snoring was scored as N , O, N E, and E denoting Never, Occasionally, Nearly Every Night, and Every Night, respectively; let N and Y denote No Heart Disease and Heart Disease, respectively; and letπO,N denote the probability that a person is a member of group O, N. Then their main expectation can be written as

Hi :

πN,NπO,Y

πN,YπO,NO,NπN E,Y

πO,YπNE,N , πNE,NπE,Y

πN E,YπE,N



> 1, (6)

that is, each of three adjacent odds ratios (Agrestie, 2002, p. 44) are larger than 1. An additional expectation could be that there is a decreasing effect of increasing levels of snoring on having a heart disease or not. LetρO,N E = logππO,NO,YππN E,NN E,Y. In terms of log odds ratios (this reason for this

Table 2

Frequencies presented by norton and dunn (1985).

Heart Disease

Snoring No Yes

Never 1355 24

Occasionally 603 35

Nearly Every Night 192 21

Every Night 224 30

(4)

Table 3

Item responses presented by bock and lieberman (1970).

Item Responses Frequency Item Responses Frequency

00000 12 10000 7

00001 19 10001 39

00010 1 10010 11

00011 7 10011 34

00100 3 10100 14

00101 19 10101 51

00110 3 10110 15

00111 17 10111 90

01000 10 11000 6

01001 5 11001 25

01010 3 11010 7

01011 7 11011 35

01100 7 11100 18

01101 23 11101 136

01110 8 11110 32

01111 28 11111 308

will be elaborated below (10)) this can be formulated as:

Hi :ρN,O > ρO,N E > ρN E,E > 0. (7) Example 4: Bock & Lieberman (1970) present the responses (see Table 3) of 1000 students applying for admission to law school to five items with respect to debate from Section 7 of the Law School Admission Test. The main question for many tests is whether they can be used to order respondents from less to more able with respect to the trait measured by the test. Let p= 1, . . . , 3 denote three latent classes (McLachlan & Peel, 2000, p. 166) with class weights φp denoting the proportion of students in each of these classes. Letθp j denote the probability that a student in class p provides the correct response to item j = 1, . . . , J. Then three latent classes ordered with respect to ability in the domain debate are obtained if:

Hi :θ1 j < θ2 j < θ3 jfor j = 1, . . . , 5, (8) see Croon (1990) and Hoijtink (1998) for an elaboration of the latent class item response models that result from the restriction (8).

As can be seen, (4), (5), (7), and (8) are inequality constrained hypotheses. In this paper the Bayes factor (Kass & Raftery, 1995) will be used to compare the inequality constrained hypothesis of interest with its complement Hc an unconstrained hypothesis Hu. Let R be a K × P matrix of full rank containing real numbers (a further specification of R will follow in the sequel), then an inequality constrained hypothesis can be formulated as

Hi : Rθ > 0, (9)

where 0 denotes a vector of length K , its complement as

Hc : not Hi, (10)

and an unconstrained hypothesis as

Hu :θ1, . . . , θP, (11)

that is, there are no constraints on the parameters. Note that hypothesis (7) can be formulated using (9) only because it is formulated in terms of log odds ratios, that is,θ = [θN,N, . . . , θE,N]= [logπN,N, . . . , log πE,N].

(5)

Table 4

Contrasting approaches.

Authors Models Complexity Hypotheses Prior Distribution

KLugkist Ancova Rθ >= 0 Standard/Data Based

Laudy CT Rθ >= r Standard/Uninformative

Kato Multi-Level Rθ >= 0 Standard/Data Based

Mulder MNLM Definition Rθ >= r Motivated/Data Based

This Paper General Motivation Rθ > 0 Motivated/Uninformative

The remainder of this paper will be used to introduce objective Bayes factors for the evaluation of inequality constrained hypotheses. In Section 2 the state of the art with respect to the evaluation of inequality constrained hypotheses will be presented. In Section 3 it will be shown that the Bayes factor is a function of the complexity and the fit of an inequality constrained hypothesis.

In Sections 4 and 5, complexity and fit, respectively, will be elaborated. In Section 6 the computation of the Bayes factor will be discussed. In Section 7 the examples introduced in this section will be analysed and the paper is concluded with a short resume in Section 8.

2 Evaluating Inequality Constrained Hypotheses

Approaches based on hypothesis testing can be used to compare H0 and Hi (Silvapulle &

Sen, 2005). Furthermore, as is elaborated by Silvapulle & Sen (2005, p. 61) there are two equivalent formulations for the alternative hypothesis if Hiis the null hypothesis: Hiversus Hu; and Hi versus Hc. However, since the likelihood ratio test is based on the ratio of the likelihood maximized under Hi and the likelihood maximized under Hu (Sivapulle & Sen, 2005, e.g., pp. 38–40, 90–91), the first interpretation seems to be more appropriate. As elaborated in the introduction, this paper will not focus on the situation where H0is a hypothesis of interest. This paper will focus on a comparison of Hiwith either Hcor Hu. The author has two arguments to prefer a comparison of Hiwith Hcover a comparison of Hiwith Hu. First of all, the sample sizes needed to distinguish Hifrom Hcwill be smaller than the sample sizes needed to distinguish Hi

from Hu (because Hiis contained in Hu). Secondly, as elaborated in the four examples given in the previous section, researchers want to know whether their expectation is correct or not, that is, they want to compare Hi to Hc. However, the Bayes factors for both comparisons are rather similar (this will be elaborated in the next section), which suggest that none of the comparisons is scientifically more relevant than the other. This paper will not develop a philosophical argument pro or contra either one approach. When the four examples are revisited towards the end of this paper, both comparisons will be presented and compared.

There is increasing attention for the evaluation of inequality constrained hypotheses using the Bayes factor. See, for example, Casella & Berger (1987), Berger & Mortera (1999), Klugkist et al. (2005), Kato & Hoijtink (2006), Klugkist & Hoijtink (2007), Laudy & Hoijtink (2007), various chapters in Hoijtink et al.(2008), Mulder et al. (2009, 2010), Klugkist et al. (2010), and Hoijtink (2011). Table 4 summarizes the main features of the approaches presented in these books and papers and highlights the added value of this paper.

The first column of Table 4 displays the names of the main authors of four approaches that can be found in the literature. The second column lists the statistical models discussed by these authors: anova, models for contingency tables (CTs), multi-level models, and the multivariate normal linear model (MNLM). This paper addresses all statistical models with density f (x| θ, φ) and provides a treatment of unidentified models. As such it can be seen as a unifying framework for all previous work in this area and a generalization to statistical models for which the evaluation of inequality constrained hypotheses using the Bayes factor has not yet been discussed.

(6)

As will be shown in the next section, the Bayes factor can be written as a function of two parameters that are usually called the complexity (to be denoted by ci) and the fit (to be denoted by fi) of an inequality constrained hypothesis. Complexity can be computed using the prior distribution and fit using the posterior distribution of the parameters of the statistical model at hand. The third and fifth columns highlight that Klugkist, Laudy, and Kato choose a standard prior distribution for their models and do not use complexity to motivate their choice of prior distributions. Mulder gives a definition of complexity and specifies prior distributions such that they render complexity values in agreement with his definition. In this paper, for statistical models in general, it will be motivated what the complexity of inequality constrained hypotheses belonging to an equivalent set of hypotheses should be. Prior distributions will be chosen such that they render the required complexity. In Section 4.1, the proof of Theorem 1, examples of prior distributions that render appropriate complexity values are given. Among these distributions are the prior distributions used by Klugkist, Laudy, Kato, and Mulder. As such, this paper provides a unifying framework for the specification of prior distributions for statistical models in general, and it provides a motivation for the prior distributions that have been used in previous work.

The fourth column gives the functional form of the hypotheses discussed by the respective authors. Note that>= denotes that a hypotheses may be specified using inequality and equality constraints among the model parameters and> denotes that only inequality constraints may be used. Note furthermore, that r is a vector of length P containing constants. As can be seen the hypotheses considered in this paper are less flexible than the hypotheses considered by previous authors. However, this has three advantages:

• Theory and practice for the evaluation of these hypotheses is available for all statistical models with density f (x| θ, φ).

• As is shown in the fifth column of Table 4, to evaluate these hypotheses uninformative prior distributions can be used. As will later on be elaborated, this renders objective Bayes factors in the sense that neither the data at hand (as is the case in the approaches of Klugkist, Kato, and Mulder) nor subjective input from the researcher are needed to specify the prior distributions.

• As is exemplified using four examples in the previous section, researchers often have a specific expectation in the form of an inequality constrained hypotheses and want to evaluate whether this hypothesis is supported by the data or not. Using the approach proposed in this paper this can be done objectively.

Summarizing, this paper presents an approach for the evaluation of inequality constrained hypotheses for statistical models in general using a Bayes factor that is objective. Together with a simulation based estimate of the Bayes factor and an estimate of its variance due to Monte Carlo error, this renders theory and practice for the evaluation of inequality constrained hypotheses.

In Section 3, a simple form of the Bayes factor comparing Hi with Hc and Hu will be derived. In Section 4 the prior distribution of θ and ψ will be chosen such that the Bayes factor adequately accounts for the complexity of Hi. In Section 5 the fit of an hypothesis will be elaborated. Section 6 elaborated how the Bayes factor and its variance due to Monte Carlo error can be estimated. In Section 7 the approach proposed will be further illustrated using the examples. Section 8 concludes with a short discussion.

3 A Simple Form for the Bayes Factor

The Bayes factor (see Jeffreys (1961) for an early discussion and Kass & Raftery (1995), for a contemporary overview of the state of the art) is a measure of relative support for two hypotheses that accounts for fit and complexity. The first step in the derivation of the Bayes factor of Hi with respect to Hc is the derivation of the Bayes factor of Hi with respect to Hu

(7)

using the reformulation of Chib (1995):

B Fi u =



θ,φ f (x | θ, φ)h(θ, φ | Hi)dθ, φ



θ,φ f (x | θ, φ)h(θ, φ | Hu)dθ, φ =

f (x | θ, φ)h(θ, φ | Hi)

g(θ, φ | x, Hi) / f (x| θ, φ)h(θ, φ | Hu)

g(θ, φ | x, Hu) , (12)

where h(·) denotes the prior distribution and g(·) the posterior distribution of θ and φ for the hypothesis indicated. According to Leucari & Consonni (2003) and Roverate & Consonni (2004),

If nothing was elicited to indicate that the two priors should be different, then it is sensible to specify [the prior of the inequality constrained hypothesis] to be,. . . , as close as possible to [the prior of the alternative hypothesis]. In this way the resulting Bayes factor should be least influenced by dissimilarities between the two priors due to differences in the construction processes, and could thus more faithfully represent the strength of the support that the data lend to each [hypothesis].

This quote motivates why we derive the prior distribution for the inequality constrained hypothesis from the prior distribution for the alternative hypothesis:

h(θ, φ | Hi)= h(θ, φ | Hu)Iθ∈Hi



θ,φh(θ, φ | Hu)Iθ∈Hidθ, φ, (13) where the indicator function I (·) equals 1 if θ is in accordance with Hi and 0 other- wise. If, for example, in Example 4, θ = [0.75, 0.65, 0.55, 0.45, 0.35, 0.25], I(·) = 1 be- cause the value θ is in agreement with Hi :θ1 > θ2 > θ3> θ4> θ5> θ6. If, however, θ = [0.75, 0.85, 0.55, 0.45, 0.70, 0.25], I(·) = 0 because the value θ is not in agreement with Hi. Note that the denominator of (13) is the proportion the prior distribution of Hu in agreement with Hc. In the sequel this proportion will be denoted by ci.

It also holds that

g(θ, φ | x, Hi)= g(θ, φ | x, Hu)Iθ∈Hi



θ,φg(θ, φ | x, Hu)Iθ∈Hidθ, φ, (14) where the denominator denotes the proportion of the posterior distribution of Hu in agreement with Hi. In the sequel this proportion will be denoted by fi.

Substitution of (13) and (14) in (12) for a value ofθ in agreement with Hi renders

B Fi u= fi/ci. (15)

Since Hc is the complement of Hi, the proportion of the prior distribution of Hu in agreement with Hcis 1− ci. Similarly, the proportion of the posterior distribution of Hu in agreement with Hcis 1− fi. Using this result it follows that

B Fi c = B Fi u/B Fcu= ( fi/ci)/((1 − fi)/(1 − ci)). (16) The question how to choose the prior distribution for Huwill be addressed in the next section in which complexity is further discussed. In the subsequent section fit will further be elaborated.

(8)

4 Complexity

4.1 Sets of Equivalent Hypotheses

As elaborated in the examples,θ is a vector of inequality constrained parameters. Examples given were a vector of means, a vector of cell probabilities and a vector of class-specific probabilities. In the sequel it will be assumed that, like in the examples, the elements ofθ are of the same nature. Two other examples are regression coefficients and factor loadings.

Consider the hypothesisθ1 > θ2 > θ3. There are in total 3!= 6 hypotheses with an equivalent structure. One other of these isθ3 > θ1> θ2. Each of these hypotheses is of the same complexity, that is, neither is more or less parsimonious or simple than another. In the Bayesian approach θ and φ are considered to be random variables with a prior and a posterior distribution. For Hi :θ1 = θ2 = θ3, both



θ,φ

h(θ, φ | Hu)Iθ∈Hidθ, φ = P(θ1= θ2= θ3 | h(θ, φ | Hu))= 0, (17)

and 

θ,φ

g(θ, φ | x, Hu)Iθ∈Hidθ, φ = P(θ1 = θ2 = θ3 | g(θ, φ | x, Hu))= 0. (18) In general, the probability, with respect to the prior and posterior distribution, of any hypothesis constructed using equality constraints is zero. In this sense it can be said that the union of these six hypotheses encompasses 100% of the parameter space. Stated otherwise, the complexity of each hypothesis is 1/6-th of the parameter space.

Another example is the hypothesis θ1> {θ2, θ3, θ4}. Three equivalent hypotheses can be obtained ifθ1is exchanged with eachθ to the right of the inequality constraint. Since the union of these four hypotheses encompasses 100% of the parameter space, the complexity of each should be 1/4-th of the parameter space.

Definition 1: An Equivalent Set consists of equivalent hypotheses Hi 1, . . . , Hi q, . . . , Hi Qfor which Hi 1

. . . Hi q

. . .

Hi Qencompasses 100% of the parameters space.

Let Rk= {Rk1, . . . , Rk P} denote the k-th row from R. The following restrictions on R render a hypothesis that is a member of an equivalent set:

(1) Each Rkp ∈ {−1, 0, 1}.

(2) For k= 1, . . . , K ,

p Rkp = 0.

(3) The elements of R1can be divided into D=P/M subsets of the same size (note that M denotes the number of 1’s in the first row of R), such that for k= 2, . . . , K , Rk is a permutation of these subsets.

The illustrate these requirements, consider the hypothesisθ1> θ2 > θ3. For this hypothesis R=

1 −1 0

0 1 −1

. (19)

It is immediately clear that the first two requirements hold. If the first row of R, that is, R1 is divided in D= 3 subsets containing one element each, it is clear that R2 is a permutation of these subsets.

The number of hypotheses in an equivalent set can be determined as follows:

(9)

(1) Obtain all D! permutations of the D subsets in θ, that is, divide θ1, . . . , θP in the same subsets as R11, . . . , R1P.

(2) Denote the number of permutations for which Rθ > 0 is in agreement with Hi by B.

(3) Then the number of hypotheses in the equivalent set is Q= D!/B, and the complexity of each hypothesis is 1/Q.

To illustrate this, consider the hypothesisθ1 > {θ2, θ3, θ4}. For this hypothesis R=

⎣1 −1 0 0

1 0 −1 0

1 0 0 −1

⎦ . (20)

If R1is divided in to D= 4 subsets each containing one element, and θ is analogously divided, there are in total 24 permutations of the four elements ofθ. For the six permutations with θ1

in the first place it holds that Rθ > 0 is in agreement with Hi. For the 18 permutations where θ1 is not in the first place, this does not hold. Consequently, the number of hypotheses in the equivalent set is 24/6 = 4.

As can be seen in (15) and (16) one of the components in B Fi c is ci, the proportion of the prior distribution for Hu in agreement with Hi. The prior distribution for Hu is a measure over the parameter space according to which the proportion of the parameter space in agreement with Hi can be computed. The prior distribution for Hu can be chosen such that ci = 1/Q for each hypothesis in an equivalent set. This leads to the following definition of complexity:

Definition 2: The Complexity of an Inequality Constrained Hypothesis is the proportion of the prior distribution in agreement with Hi.

THEOREM1. Let

h(θ, φ | Hu)= h(θ | Hu)h(φ | Hu). (21) If h(θ1, . . . , θP | Hu)= h(θ(1), . . . , θ(P)| Hu) for all permutations (1), . . . , (P) of 1, . . . , P, then ci = 1/Q for each hypothesis in an equivalent set.

Proof of Theorem 1: If R1 is divided in D subsets of the same size, and analogouslyθ = [θ1, . . . , θD], there are D! permutations ofθ: θ1, . . . , θD!.  If the number of equivalent hypotheses is Q, D!/Q = B of these permutations are in agreement with each equivalent hypothesis, that is, [θ1, . . . , θD!]= [. . . , θq1, . . . , θq B, . . .], leading to Hi q : Rθq1> 0

. . .

Rθq B> 0.

Because eachθ ∈ Hi q can be permuted to aθ∈ Hi q for which it holds that h(θ | Hu)= h(θ| Hu) it holds that

ci q =



θ∈Hi q

h(θ | Hu)dθ =



θ∈Hi q

h(θ| Hu)dθ= ci q. (22) Since

qci q= 1 this implies that ci q= 1/Q for q = 1, . . . , Q.

The requirement formulated in Theorem 1 holds, for example, if:

(1) h(θ | Hu)=

ph(θp | Hu) with the restriction that h(θ1 | Hu)= · · · = h(θP | Hu). This prior distribution was used by Klugkist and Kato in Table 4.

(2) h(θ | Hu)= D(α0, ..., α0), where D denotes a Dirichlet distribution. This prior distribution was used by Laudy in Table 4.

(10)

(3) h(θ | Hu)= NP(μ0, 0) with the restriction that μ0= [μ0, . . . , μ0], that all diagonal elements of 0 have the value σ02, and that all off-diagonal elements have the value τ0. This prior distribution was used by Mulder in Table 4.

(4) h(θ | Hu)= W−1(ν0, 0), where W−1 denotes an inverse Wishart distribution, with the same restrictions on0as in item 3.

In this paper the prior distribution will not be used to formalize the prior knowledge with respect toθ and φ. Instead it will be used to quantify the complexity of a hypothesis such that ci = 1/Q for each hypothesis in an equivalent set. For the computation of ci a specification of h(θ | Hu) beyond the requirement formulated in Theorem 1 is not necessary. In Section 6 this will be elaborated.

The main component needed for the computation of fi is g(θ, φ | x, Hu)∝ f (x | θ, φ)h(θ | Hu)h(φ | Hu). If the prior distributions ofθ and φ are vague (e.g., a multivariate normal prior forθ with very large prior variances), fi is essentially determined by the data. Section 6 will elaborate on the computation of fi.

Consequently, Bayes factors computed for hypotheses from an equivalent set using ci and fi

can be called objective in the sense that the prior distribution (and thus ci) results from the logic underlying the principle of equivalents sets of hypotheses, and, the posterior distribution (and thus fi) is essentially determined by the data.

4.2 Hypotheses that Do Not Belong to an Equivalent Set:

pRkp= 0

Not all hypotheses that can be constructed using (9) belong to an equivalent set. The hypothesis θ1− θ2> θ3− θ4 > θ5− θ6

θ1 > θ2

θ3 > θ4

θ5 > θ6

(23)

is not a member of an equivalent set. The three elements on the first line can be permuted in six ways. The elements within each of the last three lines can be permuted in two ways. Although the combination of all 6× 2 × 2 × 2 = 48 permutations covers 100% of the parameter space, it cannot be claimed that each permutation is of equal complexity. Consider, for example,

θ1− θ2> θ3− θ4 > θ5− θ6

θ2> θ1

θ3> θ4

θ5> θ6

. (24)

For this hypothesis ci = 0 for any h(θ, φ | Hu) that is continuous on the domain ofθ and φ. For the hypothesis in (23) ci is always strictly larger than zero.

The R-matrix corresponding to (23) is

R=

⎢⎢

⎢⎣

1 −1 −1 1 0 0

0 0 1 −1 −1 1

1 −1 0 0 0 0

0 0 1 −1 0 0

0 0 0 0 1 −1

⎥⎥

⎥⎦. (25)

As can be seen the first row of this matrix cannot be divided into subsets such that a permutation of these subsets renders the other rows in the matrix (see the third requirement for hypotheses

(11)

belonging to equivalent sets in the previous section). Therefore this matrix does not specify a hypotheses that is a member of an equivalent set.

Note that hypothesis (23) is a combination of hypotheses that do belong to an equivalent set:

both the expression on the first line and the expressions on the last three lines belong to an equivalent set.

Definition 3: A combination of hypotheses belonging to equivalent sets is obtained if Hi : Hi 1

. . .

Hi S, for which Hi s for s= 1, . . . , S are members of an equivalent set.

Consider also the hypothesisθ1− θ2 > θ2− θ3. This hypothesis might be relevant in a repeated measures context where the expectation is that the difference in means between the measurements before and after an intervention is larger than the difference between the means obtained for after and follow up measurements. For this hypothesis R= [1 −2 1]. This hypothesis is not a member of an equivalent set, because some elements of R have other values that−1, 0, and 1.

This section will focus on hypotheses that do not belong to an equivalent set for which



pRkp= 0 for k = 1, . . . , K , and R is of full rank. The latter ensures that none of the rows in R are linearly dependent. If, for example, the first row is [1 −1] and the second row [−1 1], the hypothesis formulated would beθ1 = θ2, a case that will not be discussed in this paper. The next section will focus on hypotheses for which

p Rkp= 0 for at least one row k of R and R is of full rank.

THEOREM 2: If h(θ | Hu)=

ph(θp | Hu), and h(θp | Hu)= N (μ0, σ02) for p= 1, . . . , P and

pRkp = 0 for k = 1, . . . , K , ciis independent of μ0 andσ02. Proof of Theorem 2:

ci =



θ∈Hi

h(θ | Hu)dθ

= P(R1θ > 0, . . . , RKθ > 0 | θp ∼ N (μ0, σ02) for p= 1, . . . , P). (26) Note that

E( Rkθ) = Rk1Eh(.)θ1+ · · · + Rk PEh(.)θP = μ0



p

Rkp. (27)

For the hypotheses discussed in this section

p Rkp= 0, hence E(Rkθ) = 0. Note furthermore, that

Var( Rkθ) = Rk12 Varh(.)θ1+ · · · + Rk P2 Varh(.)θP = σ02



p

Rkp2 , (28)

and

Cov( Rkθ, Rkθ) =

p

Cov



Rkpθp,

p

Rkpθp



. (29)

Since, according to Theorem 2 the elements ofθ are mutually independent, (29) reduces to Cov(Rkθ, Rkθ) =

p

Cov(Rkpθp, Rkpθp)= σ02



p

RkpRkp. (30)

(12)

1

2 1 =

θ2

θ

θ θ

Figure 1. Simple illustration of Theorem 2.

Using (27), (28), and (30), (26) can be rewritten as ci = P



R1θ > 0, . . . , RKθ > 0 |

 R1θ . . . RKθ



∼ N

M, σ02C

, (31)

where

M =

 0 . . .

0



, (32)

and

C =



p R1 p2 . . .



p R1 pRK p . . . . . . . . .



pRK pR1 p . . .



p R2K p

⎦ . (33)

Due to the fact that P( A> 0 | A ∼ N (0, σ02B))= P(A > 0 | A ∼ N (0, B)), where A and 0 are vectors of length P, and B is a P× P matrix, (31) reduces to

ci = P

⎝R1θ > 0, . . . , RKθ > 0 |

R1θ . . . RKθ

⎦ ∼ N(M, C)

⎠ , (34)

that is, independent ofμ0andσ02.

A simple illustration of Theorem 2 is given in Figure 1. Using the hypothesis Hi :θ1 > θ2

it is shown that ci = 0.50 (the proportion of the prior distribution below the diagonal line) if h(θp | Hu)= N (μ0, σ02) for p= 1, . . . , 2 independent of the value of μ0andσ02.

A normal prior distribution can be used ifθp ∈ R. Using a very large prior variance, g(θ, φ | x, Hu)∝ f (x | θ, φ)h(φ | Hu), that is, with a noninformative prior for the nuisance parameters

(13)

the posterior distribution is essentially determined by the data. Ifθp ∈ R+a normal prior can be used for logθp. This implies that g(θ, φ | x, Hu)∝ f (x | θ, φ)

p1ph(φ | Hu) because for σ02 → ∞ the h(θp | Hu)→ 1/θp if h(logθp | Hu)∼ N (μ0, σ02). One example where the latter is applicable is in the case of restrictions on variances. Letπ1, . . . , πP denote the parameters of a multinomial distribution, that is,

pπp = 1. Then, if πp = γp/

pγp, another example is a normal prior distribution for logγp. Note that for σ02 → ∞ a normal prior for log γp

corresponds to a D(α, ..., α) prior for π with α → 0. This parametrization will be used to evaluate the hypothesis presented in Example 2.

As was shown above, ci is independent of μ0 and σ02. Furthermore, with σ02→ ∞ the posterior distribution and, consequently, fi, are essentially determined by the data. On these grounds B Fi c can be called objective. The prior distribution used for hypotheses that do not belong to an equivalent set is in agreement with the requirement specified in Theorem 1, that is, h(θ1, . . . , θP | Hu)= h(θ(1), . . . , θ(P)| Hu) for all permutations (1), . . . , (P) of 1, . . . , P. Here too it is a measure over the parameter space with which the proportion of the parameter space in agreement with Hi can be computed. This supports the interpretation of ci as a measure of model complexity for nonequivalent hypotheses for which

pRkp = 0 for k = 1, . . . , K . 4.3 Hypotheses That Do Not Belong to an Equivalent Set:

pRkp = 0

Let, like in the previous section, h(θp | Hu)= N (μ0, σ02) for p= 1, . . . , P. Then, for the hypothesis

θ1> 0, (35)

ci increases ifμ0increases for any fixed value ofσ02. Consider also the hypothesis

θ1 > θ2+ θ3. (36)

This hypothesis might be relevant in a situation whereθ denotes the effect of a treatment, group 1 gets a treatment containing component A and B, and groups 2 and 3 get components A and B, respectively. This hypothesis states that the joint effect of A and B is larger than the sum of the individual effects. Here too ci is not independent ofμ0 andσ02. Both (35) and (36) are characterized by the fact that for one or more rows in the R matrix

p Rkp = 0, that is, these hypotheses are not member of an equivalent set.

THEOREM3: If h(θp | Hu)= N (μ0, σ02) for p= 1, . . . , P and

p Rkp = 0 for at least one row of R, ciis independent of μ0forσ02 → ∞.

Proof of Theorem 3: Using (27), (28), and (30), (26) can be rewritten as

ci = P

⎝R1θ > 0, . . . , RKθ > 0 |

R1θ . . . RKθ

⎦ ∼ N(μ0M, σ02C)

⎠ , (37)

where

M =



p R1 p

. . .



pRK p

⎦ , (38)

(14)

θ →1 0 2

Figure 2. Simple illustration of Theorem 3.

and C as in (33). Since limσ2

0→∞P( A> 0 | A ∼ N (M, σ02B))= P(A > 0 | A ∼ N (0, σ02B))= P(A > 0 | A ∼ N(0, B)), where A, 0, and M are vectors of length P, and B is a P× P covariance matrix, the limσ02→∞ci as defined in (37) reduces to (34).

A simple illustration of Theorem 3 is given in Figure 2. Using the hypothesisθ1 > 0 it is shown that ci → 0.50 if h(θ1| Hu)= N (μ0, σ02) forμ0 = 25 and σ02 → ∞.

Like before, fi is independent of the prior distribution for large values ofσ02. A vague prior distribution is often called uninformative or objective. In this sense B Fi c for hypotheses with



pRkp = 0 for at least one row from R is objective. Here too, the prior distribution used is in agreement with the requirement formulated in Theorem 1, and it is a measure over the parameter space with which the proportion of the parameter space in agreement with Hi can be computed.

However, the support for the interpretation of ci as a measure of complexity is weaker than before because ci is neither independent of h(θp | Hu) as was the case for hypotheses from equivalent sets, nor independent ofμ0andσ02as was the case for hypotheses with

pRkp = 0 for each row of R.

4.4 The Prior Distribution

In this section it was shown that the principle of equivalent sets of inequality constrained hypotheses can be used to quantify the complexity of those hypotheses. The requirement formulated in Theorem 1 renders a prior distribution such that one of the terms in the Bayes factor for an inequality constrained hypothesis versus its complement is this complexity. With the added requirement that the prior distribution is normal, the same logic was applicable to hypotheses not belonging to equivalent sets for which the sum of each row in R is equal to zero. In all these situations the prior distribution was not used to reflect prior knowledge with respect to the parameters of the model of interest, but, to quantify the complexity of the inequality constrained hypothesis under investigation. Since the prior distribution follows from the logic underlying equivalent sets of hypotheses, and, may be vague, the resulting Bayes factor can be called objective. As was shown, the logic breaks down for hypotheses not belonging to equivalent sets for which the sum of one or more rows in R is not equal to zero. For these hypotheses complexity depends on the mean and variance of the prior distribution, that is, a subjective choice has to be made. As was shown, one option is to use a vague prior distribution.

(15)

1

θ

2

θ

θ θ =

=

=

=

=

2 1

20

i . f

99

i . f 80

i . f 50

i . f

Figure 3. Illustration of the fit of Hi:θ1> θ2.

5 Fit

From (14) in Section 3 it follows that the fit of an hypothesis is equal to fi =



θ,φg(θ, φ | x, Hu)Iθ∈Hidθ, φ, (39) that is, the proportion of the posterior distribution of the statistical model at hand in agreement with Hi. As was elaborated in the previous section, for the evaluation of inequality constrained hypotheses using the Bayes factor, the prior distribution can be chosen such that it is uninformative. This implies that the posterior distribution is proportional to the density of the data:

g(θ, φ | x, Hu)∝ f (x | θ, φ). (40)

Stated otherwise, the fit of an hypothesis is objective in the sense that it does not depend on the prior distribution, but is through the density of the data completely determined by the data.

Consequently, the Bayes factor (15) or (16) is objective in the sense that it is determined by a complexity resulting from a vague prior distribution and a fit determined by the density of the data.

In Figure 3 fit is illustrated using the simple hypothesis Hi :θ1> θ2. Presented in the figure are the 95% iso-density contours of four hypothetical posterior distributions. Going from the left-hand side to the right-hand side of the figure it can be seen that the proportion of these posteriors in the region (below the lineθ1 = θ2) in agreement with Hi :θ1> θ2 is increasing from about 0.20 to about 0.99. Stated otherwise, the larger the proportion of the posterior distribution in agreement with an inequality constrained hypothesis, the larger the fit of that hypothesis.

6 Computation of the Bayes Factor

In this section and Appendices A through D, Example 1 will be used to illustrate and elaborate how OpenBugs(http://www.openbugs.info/w/) can be used to compute the

(16)

required Bayes factors. The main steps will be presented in this section. The details and correspondingOpenBugscode can be found in Appendices A through D. Note thatWinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/) was the predecessor ofOpenBugsand that an excellent book elaborating the use ofWinBUGS has been written by Ntzoufras (2009). In the next section the result of the evaluation of the hypotheses presented in Examples 2–4 will be presented.

The hypothesis of interest in Example 1 was:

Hi :θ1> θ2> θ3 > θ4 > θ5 > θ6. (41) The two response vectors for which this hypothesis has to be evaluated were [111000] and [000111]. First of all the complexity of this hypothesis has to be computed. Independent standard normal prior distributions for each element ofθ are in agreement with the requirement formulated in Theorem 1 and can thus be used to estimate ci for hypotheses belonging to equivalent sets, and hypotheses not belonging to equivalent sets for which the sum of the elements in each row of R is equal to zero. As was shown in the proof of Theorem 3, standard normal priors can also be used to estimate ci for hypotheses not belonging to equivalent sets for which the sum of the elements in at least one row of R is not equal to zero because for these hypotheses limσ2

0→∞ci

is identical to (34). Note that OpenBugsis not really needed to compute the complexity in this example because sixθ’s can be permuted in 6! ways, which implies that each permutation (including the one presented in (41)) has a complexity of 1/6! However, in Appendix A it is illustrated and elaborated how it can be done usingOpenBugs. Using a sample of T = 100 000 parameter vectors from the prior distribution, the resulting estimate of ci (the proportion of parameter vectors in agreement with (41)) was 0.0013 which is rather close to the true value of 1/6!= 0.0014.

Secondly, the fit of hypothesis (41) has to be computed. The density of the data for the item response data is

f (x | θ) =

P p=1

θxpp(1− θp)1−xp, (42)

whereθp denotes the probability of correctly responding to item p and xp is 1/0 if an item is responded to correctly/incorrectly. Note that (41) is an hypothesis belonging to an equivalent set. According to Theorem 1 it is therefore appropriate to use aB(1, 1) beta prior distribution for each element ofθ. Appendix B presents an algorithm and annotatedOpenBugscode that can be used to estimate fi for (41). Using a sample of T = 100 000 parameter vectors from the posterior distribution the resulting estimate of fi (the proportion of parameter vectors in agreement with (41)) was 0.0109 for [111000] and 0.00002 for [000111]. Appendix C illustrates the format of the data file used byOpenBugsfor this example.

In the third and last step estimates of Bayes factors and credible intervals representing the uncertainty in these estimates due to sampling are obtained. The estimate of B Fi u

B Fˆ i u= ˆfi/ˆci, (43)

and the estimate of B Fi cis

B Fˆ i c= ( ˆfi/ˆci)/((1 − ˆfi)/(1 − ˆci)). (44) Using the estimates of ciand fiobtained usingOpenBugsB Fˆ i uwas 8.38 and 0.02 for [111000]

and [000111], respectively. B Fˆ i c was 8.46 and 0.02 for [111000] and [000111], respectively.

Appendix D presents the annotatedOpenBugscode that was used to compute the variance and

參考文獻

相關文件

• Consider an algorithm that runs C for time kT (n) and rejects the input if C does not stop within the time bound.. • By Markov’s inequality, this new algorithm runs in time kT (n)

• The  ArrayList class is an example of a  collection class. • Starting with version 5.0, Java has added a  new kind of for loop called a for each

In order to provide some materials for this research the present paper offers a morecomprehensive collection and systematic arrangement of the Lotus Sūtra and its commentaries

† Institute of Applied Mathematical Sciences, NCTS, National Taiwan University, Taipei 106, Taiwan.. It is also important to note that we obtain an inequality with exactly the

In this talk, we introduce a general iterative scheme for finding a common element of the set of solutions of variational inequality problem for an inverse-strongly monotone mapping

(b) An Assistant Master/Mistress (Student Guidance Teacher) under school-based entitlement with a local first degree or equivalent qualification will be eligible for

Keywords Second-order cone · Variational inequality · Fischer-Burmeister function · Neural network · Lyapunov stable · Projection function.. Sun is also affiliated with Department

which can be used (i) to test specific assumptions about the distribution of speed and accuracy in a population of test takers and (ii) to iteratively build a structural