• 沒有找到結果。

貝氏方法在多選題排序上的應用

N/A
N/A
Protected

Academic year: 2021

Share "貝氏方法在多選題排序上的應用"

Copied!
33
0
0

加載中.... (立即查看全文)

全文

(1)

統計學研究所

貝氏方法在多選題排序上的應用

Bayesian Ranking Responses in Multiple-Choice Questions

研 究 生:張少源

指導教授:王秀瑛 教授

(2)

貝氏方法在多選題排序上的應用

Bayesian Ranking Responses in Multiple-Choice Questions

研 究 生:張少源 Student:Shao-Yuan Chang

指導教授:王秀瑛 Advisor:Hsiuying Wang

國 立 交 通 大 學

統 計 學 研 究 所

碩 士 論 文

A Thesis

Submitted to Institute of Statistics College of Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master in Statistics June 2010 Hsinchu, Taiwan

中華民國九十九年六月

(3)

貝氏方法在多選題排序上的應用

學生:張少源

指導教授

王秀瑛教授

國立交通大學統計學研究所碩士班

在許多調查研究中,問卷調查是一個很重要的工具。許多文獻上對於可複選

的問題分析不如研究單選問題那麼的深入。Wang (2008a)在 frequentist 的架構

下,提出針對複選題作排序的方法。但是在實際的情況下,對於各個選項也許存

在著事前分配,所以建立新的方法結合過去資料與新的資料作排序在問卷調查中

是必要的課題。在本篇研究中,我們根據貝式多重檢定的方法,藉由控制後驗的

錯誤發生率來得到在貝式架構下的排序。除此之外,我們也將用模擬的方法去比

較這些方法的差異及恰當的拒絕區域。

(4)

Bayesian Ranking Responses in Multiple-Choice Questions

student:Shao-Yuan Chang

Advisors:Prof. Hsiuying Wang

Institute of Statistics

National Chiao Tung University

ABSTRACT

In many studies, the questionnaire is an important tool for surveying. In the

literature, the analyses of multiple-choice questions are not established as in depth as

those for single-choice question. Wang (2008a) proposed several methods for ranking

the Responses in Multiple-Choice Questions under the usual frequentist setup.

However in many situations, there may exist prior information for the ranks of the

responses, therefore, establishing a methodology combining the update survey data

and the past information for ranking the responses is an essential issue for the

questionnaire data analysis. In this paper, we based on several Bayesian multiple

testing procedures to develop the Bayesian ranking methods by controlling the

posterior expected false discovery rate. In addition, a simulation study is conducted to

make a comparison of these approaches and to derive the appropriate rejection region

for the testing.

(5)

謝謝家人在碩士班這兩年背後的支持與鼓勵,也感謝指導老師對我的關心和

論文上的指導與協助,最後謝謝幾個好朋友,我會記得這些日子的。

(6)

中文提要

………

i

英文提要

………

ii

誌謝

………

iii

目錄

………

iv

表目錄

………

v

圖目錄

………

vi

1、

Introduction………

1

2、

Model ………

4

2.1

Model Selection ………

5

3、

Testing Approach ………

7

3.1

Multiple Testing ………

7

3.2

Testing Procedures ………

10

4、

Ranking Approach and Ranking Consistency………

11

4.1

Penalty Score ………

12

5、

Simulation Result………

13

5.1

Rejection Rate ………

13

6、

A Real Data Example ………

16

7、

Conclusion ………

19

8、

Appendix ………

19

(7)

表目錄

Table 1

Outcomes of multiple test. ………

8

Table 2

The rejection rates for the three methods corresponding to each

hypothesis in (4) for 1000 replicates. ………

14

Table 3

The rejection rates for the three methods corresponding to each

hypothesis in (4) for 1000 replicates. ………

(8)

圖目錄

Figure 1

The expected penalty scores of the three methods under the

condition of Example 1. ………

24

Figure 2

The expected penalty scores of the three methods under the

condition of Example 2. ………

(9)

1

Introduction

The questionnaire method is a widely-used tool for researchers in many fields to collect information. It is used especially in marketing or management studies. There are two kinds of questions: single response questions and multiple responses questions. The analyses of multiple responses questions are not as established in depth as those for single response questions. Approaches of analyzing multiple re-sponses questions have been lacking until recently. Umesh (1995) first discussed the problem of analyzing multiple responses questions. Loughin and Scherer (1998), Decady and Thomas (1999) and Bilder, Loughin and Nettleton (2000) propose sev-eral methods for testing marginal independence between a single response question and a multiple responses question. Agresti and Liu (1999,2001) discuss the mod-eling of multiple responses questions. These studies mainly focus on the analysis of the dependence between a single response question and a multiple responses question. However, most researchers are also interested in ranking the responses in a question according to the probabilities of responses being chosen. In fact, the ranking responses problem may be the primary issue in the study of a survey.

Wang (2008a) proposed several approaches to solve this problem. However, these methodologies are derived under the frequestist setup, which cannot be adopted in the Bayesian framework. In real applications, empirical information may exist for the probabilities of responses being chosen. Related applications can refer to Pammer, Fong and Arnold (2000), etc. An appropriate methodology which combines the current data with the past information can provide a more ob-jective ranking strategy than an approach based only on current data. Thus, this study proposes several methods for ranking the responses in a multiple responses question under the Bayesian framework. The methodologies are an extension of the methods proposed in Muller, Parmigiani, Robert and Rousseau (2004). More details about Bayesian multiple testing and applications are discussed by Gopalan and Berry (1998), Do et al. (2005), Gonen, Westfall, Johnson (2003), Scott and

(10)

Berger (2006), Muller, Parmigiani and Rice (2007) and Scott (2009).

A related study about Bayesian ranking was carried out by Berger and Deely (2008). Their approach is to rank the items based on the posterior probability of the null hypothesis or Bayes factor. Although the methodology provides a rule for ranking, it does not set up the error tolerance. In the methods used in this study, the statistic used for ranking is similar to the one proposed by Berger and Deely (2008). Furthermore, we also propose the FDR criterion to measure the testing error. In the Bayesian framework, the conventional approach does not associate a criterion to set up the error tolerance. Based on Muller’s approach, we can control the testing error within a tolerance level. From this viewpoint, using the Bayesian FDR approach to rank responses is more informative and useful than the conventional approach.

In addition, the Berger and Deelys’ approach cannot directly be applied to analyze multiple responses questions. In this study, we clearly illustrate the use of the Bayesian model for analyzing multiple responses questions and derive the exact and approximate Bayes estimator forms. The proposed method can provide a convenient way for researchers to directly adopt the formulas for ranking the responses for multiple responses questions.

First, we use the example described in Wang (2008a) to illustrate the problem. A company is designing a marketing survey to help develop an insect killer. The researchers list several factors, including high quality, price, packaging and smell which could affect the sales market. Thus, the researchers want to know the rank of significance of these factors such that they can design a product with lower cost and higher profit. To obtain the data, a group of individuals are surveyed about purchasing an insect killer. They are asked to fill out questionnaires which list all the questions that addressed to each respondent. The following is the multiple responses question in the questionnaire:

Question 1: Which factors are important to you when considering the purchase of an indoor insect killer ? (1) price (2) high quality (3) packaging (4) smell

(11)

(5) others.

In this multiple responses question, there are a total of 25 − 1 = 31 kinds

of possible answers because we exclude the case which respondents do not select any response. The 31 random variables constitute a multinomial distribution with multinomial proportions p ∈ P = {pi1i2i3i4i5, ij = 0 or 1 and 0 <

5

P

j=1

ij ≤ 5 },

where ij cannot be simultaneously equal to 0. Note that the requirement of a

multiple responses question is that at least one response is selected. This is not equivalent to a true-false question with five items. If we allow respondents not to select any item or to select all items, it would be equivalent to the five true-false items question. The method developed in this study can extend to this situation. If we consider the parameter space under the frequentist framework instead of the Bayesian framework. Wang (2008a) provides examples showing that the conventional testing approaches do not posses the property of ranking consistency. This property is a reasonable criterion to reflect the validity of the testing approach. Under the frequentist framework, it is still unknown if a satisfactory approach exists to ranking responses with the property of ranking consistency. In this study, in addition to proposing a ranking approach under the Bayesian framework, a Bayesian ranking consistency property is introduced and the proposed method is shown to be Bayesian ranking consistent.

In the Bayesian framework, assume that we have prior information on the pa-rameter space P and we rank the responses based on a survey study under this prior information. This problem is related to the usual Bayesian multiple testing problem if we consider a single response question. However, the application is more complicated when analyzing multiple responses questions. Muller et al. (2004) pro-posed several criteria for the Bayesian multiple testing. Miranda-Moreno, Labbe and Fu (2007) applied the methods to hotspot identification in an engineering study. Wang (2008b) carried out a related study estimating the proportions in a multinomial distribution. In this paper, we investigate these Bayesian multiple testing procedures and extend the approaches to rank the responses for multiple

(12)

responses questions.

The paper is organized as follows. In Section 2, we describe a Bayesian model for multiple responses responses. Section 3 proposes several Bayesian multiple testing procedures for testing an order of the responses are proposed in Section 3. In Section 4, a ranking criterion is proposed to rank the responses. In addition, the Bayesian multiple testing procedures discussed in Section 3 are shown to be con-sistent. In Section 5, we present simulation studies to compare the rejection rates of the methodologies and appropriate false discovery rate tolerances for different testing procedures. Finally, Section 6 provides a data example which is ranking inconsistent under the frequentist framework, but is ranking consistent under the Bayesian framework. Finally, a conclusion is given in Section 7.

2

Model

For the general case, assume that a multiple responses question has k responses, v1, · · · , vk, and we interview n respondents. Each respondent is asked to choose

at least one and at most s answers for this question, where 0 < s ≤ k. If s = 1, it is a single response question. There are a total of c = Ck

1 + · · · + Csk possible

kinds of answers that respondents can choose. Let ni1···ik denote the number of respondents selecting the responses vh and not selecting vh′ if ih = 1 and ih′ = 0, and pi1···ik denotes the corresponding probability. For example, when k = 7, n0100100 denote the number of respondents selecting the second and the fifth responses and

not selecting the other responses. Thus, the pmf function of n∗

= {ni1···ik, ij = 0 or 1 and 0 < k P j=1 ij ≤ s} is fs(n∗) = I(0 < k X j=1 ij ≤ s) n! Q ij=0 or 1 ni1···ik! Y ij=0 or 1 pni1···ik i1···ik , (1)

where I(·) denotes the indicator function. Let mj denote the sum of the number

ni1···ik such that the jth response is selected, and πj denote the corresponding probability, that is mj = P ij=1 ni1···ik and πj = P ij=1

(13)

probability of response j. Also let mjl denote the sum of the number ni1,···ik such

that the jth and lth responses are selected, and πjl denote the corresponding

probability. Then mjl = P ij=il=1 ni1···ik and πjl= P ij=il=1 pi1···ik.

Assume that we have a prior on the parameter space. Here we consider the conjugate prior π(p) = I(0 < k X j=1 ij ≤ s) Γ( P ij=0 or 1 αi1···ik) Q ij=0 or 1 Γ(αi1···ik) Y ij=0 or 1 pαi1···ik i1···ik , (2) which is a Dirichlet distribution with Ck

s(2s− 1) parameters.

Under this setup, we have the posterior distribution π(p|n∗) = f (n∗ |p)π(p) = Γ( P ij=0 or 1 (αi1···ik + ni1···ik)) Q ij=0 or 1 Γ(αi1···ik+ ni1···ik) Y ij=0 or 1 pαi1···ik+ni1···ik i1···ik . (3)

Through the form of the posterior distribution, we can derive the Bayes estima-tor for each pi1···ik under the squared error loss function. The Bayes estimator ˆπj of πj is equal to the summation of the Bayes estimator of pi1···ik, where ij = 1. We can base this on the Bayes estimators of πj to rank the significance of πj. However, if

the ranking is based only on the Bayes estimators, it may lack of enough confidence to convince people to accept the ranking result. Therefore, establishing a multiple testing approach under a specific tolerance error to certify that the resulting rank is accurate is an essential issue.

2.1

Prior selection

To decide the Dirichlet prior (2), we have to select appropriate values for the parameters. If the empirical experience has provided us the model of the prior, then we can directly use this prior. If we do not have a specific prior but have the past data of the survey for this multiple responses question, we can choose an appropriate prior based on the data. When the past data is complete, meaning that it has the records for the number of each Ck

(14)

the value of the parameter αi1···ik in the Dirichlet distribution to be n(mi1···ik)/m,

where m denotes the number of respondents and mi1···ik denotes the number of

respondents selecting the responses vh and not selecting vh′ if ih = 1 and ih′ = 0 for the past data. In this way, the sum of αi1···ik is equal to n, which leads to the equal contribution of the past data and the current survey data. This equal weight contribution can balance the past information and the current survey data in the statistical inference.

On the other hand, the past data may be incomplete. It could only have the records of the number of each responses selected, but not the number of each Ck

s(2s− 1) possible answer selected. In this case, it is hard to estimate all

parame-ters for the prior. Instead of estimating each parameter directly, for a response, we can set the equal weight to the parameters αi1...ik in the prior with corresponding pi1...ik selecting this response. Then take the sum of weights assigned to the answer to be the value of parameter.

The selection of a prior may be a key issue for analyzing the data. The approach for prior estimation suggested above may not approximate the true prior very well. Thus, to achieve more accurate estimation, a more careful investigation to determine the prior selection is necessary. Since this study does not focus on prior selection, we do not provide a comprehensive discussion of this issue herein.

(15)

3

Testing approach

3.1

Multiple testing

In this section, we propose several multiple testing methods to make comparison of πj. Assume that there are k responses and we are interested in testing

H01: π2 ≤ π1vs H11: π2 > π1

H02: π3 ≤ π2vs H12: π3 > π2

· · ·

H0k−1: πk≤ πk−1vs H1k−1: πk > πk−1.

(4)

Note that it may be reasonable to test the equality of π1 = ... = πk first,

and then proceed to test the one-sided test when the equality of πi, i = 1, .., k is

rejected. The approach for testing a point null hypothesis is discussed by Berger (1985). It is necessary to assign a probability ξ0 to the case H0 : π1 = ... = πk and

spreading out the probability of 1 − ξ0 on the alternative hypothesis H0c. Since

the probability that the πi, i = 1, .., k are equal may be low, we do not investigate

testing the point null hypothesis in depth in this study. In addition, according to the ranking criterion (9) used in this study, for ranking two responses πi and πj,

both one-sided hypotheses H0 : πi > πj and H0 : πj > πi are considered, which

may reflect the information obtained from the point null hypothesis.

For testing (4), the decision rules considered here is to control the posterior expected false discovery rate. The concept of false discovery rate (FDR) was pro-posed by Benjamini and Hochberg (1995) to determine optimal thresholds under this criterion in a multiple testing setting. When we test multiple hypotheses, the possible outcomes (over the l tests) may be summarized in Table 1.

(16)

Table 1. Outcomes of multiple tests. The notations l is the total number of hypothe-ses, l0 is the unknown number of the true null hypotheses, l1 is the unknown number

of the false null hypotheses, V is the number of false positives, T is the number of false negatives, S is the number of rejected null hypotheses that are false, U is the number of rejected null hypotheses that are true and D is the number of rejected null hypotheses.

Test result

number of H0i accepted number of H0i rejected

real state

number of true H0i U V l0

number of false H0i T S l1

l − D D l

We define the false discovery rate, posterior false discovery rate, false negative rate and posterior false negative rate for the frequentist and Bayesian setting based on the literature as follows.

First, some notations and definitions are given. Let zi denote an indicator that

the ith hypothesis H0i is false and vi = P (zi = 1|n∗) denote the marginal posterior

probability of πi+1 > πi. The rejection of the H0i is denoted by di = 1, otherwise

di = 0. Let z = (z1, . . . , zk−1) and d = (d1, . . . , dk−1). Under the frequestist setup,

the false discovery rate and false negative rate are denoted by the expectations

E[ V

D+ǫ] and E[ T

n−D+ǫ] respectively, where D = P di and ǫ is a small constant to

avoid a zero denominator. Let F DR(d, z) = P di(1 − zi) D + ǫ and F NR(d, z) = P(1 − di)zi n − D + ǫ

Under a Bayesian setting, these error rates are defined as the posterior expected false discovery rate denoted by F DR(d, n∗) and the posterior expected false

(17)

neg-ative rate denoted as F NR(d, n∗), where F DR(d, n∗) = Z F DR(d, z)dp(z|n∗) = P di(1 − vi) D + ǫ and F NR(d, n∗) = Z F NR(d, z)dp(z|n∗) = P(1 − di)vi n − D + ǫ .

The posterior expected false discovery count F D(d, n∗) and the posterior

ex-pected false negative count F N (d, n∗) are defined as

F D(d, n∗) =Xdi(1 − vi)

and

F N(d, n∗) =X

(1 − di)vi.

By definition, vi in the model for the multiple responses questionnaire can be

expressed as ∝ Z · · · Z (I(0 < k X j=1 ij ≤ s)I(πl+1 > πl) Γ( P ij=0 or 1 αi1···ik+ ni1···ik) Γ(αi1···ik+ ni1···ik) Y ij=0 or 1 pαi1···ik+ni1···ik i1···ik ) Y ij=0 or 1 dpi1···ik, (5) which may be difficult to derive directly because it is a multiple integration. Instead

of deriving its exact value, we can approximate it by simulation or using the normal approximation.

Theorem 1. By the normal approximation, the multiple integration (5) can

be approximated by

Φ(√B C),

where Φ(x) denotes the cumulative distribution function of the standard normal distribution, A = X ij=0 or 1 (αi1i2...ik+ ni1i2...ik), B = P il+1=1,il=0 (αi1i2...ik+ ni1i2...ik) − P il=1,il+1=0 (αi1i2...ik + ni1i2...ik) A

(18)

and

C = 1

A2(A + 1)(

X

il+1=1,il=0

(αi1i2...ik+ ni1i2...ik)(A − αi1i2...ik− ni1i2...ik)

+ X

il=1,il+1=0

(αi1i2...ik+ ni1i2...ik)(A − αi1i2...ik − ni1i2...ik)

+ 2 X i′ l+1=1,i′l=0 X i′′ l=1,i′′l+1=0 (αi′

1i′2...i′k+ ni′1i′2...i′k)(αi′′1i′′2...i′′k + ni′′1i′′2...i′′k)). (6) The proof is given in the Appendix.

3.2

Testing procedures

We will introduce several multiple testing procedures in Berger (1985) and Muller et al (2004) for testing (4).

Method 1. The decision of accepting or rejecting the null hypothesis is based

on the specific loss function proposed by Berger (1985), which is defined as 

0 if the decision taken is right c if we reject H0i when it is true

1 if we accept H0i when it is false

(7)

where c(≥ 0) and 1 represent the losses for making a wrong decision due to a false positive and a false negative error, respectively. In this criterion, the loss function can be written as

LN(d, n∗) = cF D + F N.

Method 2. The second method is to consider the loss function

LR(d, n∗) = cF DR + F NR.

Method 3. We also consider bivariate loss functions that explicitly

acknowl-edge the two competing goals, leading to the following posterior expected losses: L2R(d, n∗) = (F DR, F NR).

(19)

We can define the optimal decisions under L2Ras the minimization of F NR subject

to F DR ≤ α2R.

By Muller et al (2004), under the three loss functions, the optimal decision that minimizes the loss functions takes the form

di = I(vi ≥ t∗), (8)

where t∗ are t

N = c/(c + 1), t∗R(n∗) = v(n−D∗)and t∗2R(n∗) = min{s : F DR(s, n∗) ≤ α2R} under the loss functions LN, LR and L2R, respectively. In the expressions

for t∗

R and t∗2R, v(i) is the ith order statistic of {v1, ..., vn}, and D∗ is the optimal

number of discoveries that is found by minimizing the function (A.1) in Muller et al. (2004).

A simulation study for comparing the three methods is given in Section 5.

4

Ranking approach and ranking consistency

If not all of the hypotheses in (4) are rejected, there does not have enough evidence to rank all responses. An objective way to rank the responses is to test the hy-pothesis πi > πj for each i and j. There are totally C2k hypotheses for k responses.

The rank of the ith response can be defined as follows.

Ri = k −

k

X

j=1,j6=i

I(πi > πj). (9)

Using the criterion (9), we define a response the most significant if it has smallest Ri value and we rank it first. The response with second smallest Ri value is defined

to be the second significant response and so on.

By Wang (2008a), a reasonable ranking approach may need to satisfy the rank-ing consistency property. The property is modified here to fit the Bayesian set up as follows.

Bayesian Ranking consistency property:

(20)

should also be rejected by the test with the same level if the Bayes estimator of Iπj−πi>0 is less than the Bayes estimator of Iπj−πg>0.

From the examples given in Wang (2008a), under the frequentist framework, the tests derived by the conventional approaches do not posses the property of frequentist ranking consistency. It is still unknown if there exist ranking consistent tests under the frequentist framework. When considering the problem under the Bayesian framework, it is easier to find the ranking consistent tests.

Theorem 2. The three testing procedures (8) considered in Section 3 for

different t∗values under the loss functions L

N, LRand L2R, respectively are ranking

consistent.

Proof. For the three tests in Section 3, the decision rules of the tests are

based on (8). From the form, for a fixed cutoff t∗, the decision rule only depends

on the Bayes estimator vi of H0i. If a hypothesis H0i with a smaller vi is rejected,

then a hypothesis H0j with a larger vj is accordingly rejected by the rule. Thus,

the proof is complete.

4.1

Penalty Score

In this section, we will set up a penalty score to evaluate the three methods from the viewpoint of ranking error. To rank the ith responses, for a method, we need to calculate their Ri values using this method, then use the value Ri to rank the

responses. A penalty score is defined as the summation of the absolute values of the true rank and the rank derived by the method. For example, in the case of k = 5, if the true rank of the first response is 1, and the true rank of the second response is 2, etc. We use the notation (1, 2, 3, 4, 5) to denote the true rank. If the rank derived by a method for an observation is (2, 1, 3, 5, 4), the penalty score for the method given by the observation is |1−2|+|2−1|+|3−3|+|4−5|+|5−4| = 4. We conduct a simulation for 1000 replicates to compare the expected penalty scores for the three methods. The simulation procedure is as follows.

(21)

Step 1. First we set up a prior for α.

Step 2. We generate a set of p from the prior distribution with respect to the

α value in Step 1. From this p, we can attain a true rank for π based on this p. Step 3. Using the pmf function (1) with the p in Step 2, we generate a set n∗.

Step 4. We set up the k2 null hypotheses for any two different πi. Then

based on n∗ in Step 3, we calculate the Bayes estimator of the indicator function

of each hypothesis. Then apply the three methods in Section 3 to test each null hypothesis. Use (9) to rank the k responses and calculate the penalty score from the derived rank and the true rank in Step 2 for each method.

Step 5. Repeat Steps 2-4 1000 times. Take the average of the penalty score

in Step 4 for each method and the approximated expected penalty score for each method is derived.

Remark 1. When we consider the testing for the 2Ck

2 hypotheses to calculate

the Ri, i = 1, ..., k, although the hypotheses are not exactly in the form of (4), we

can still apply the methods for the general testing of the 2Ck

2 hypotheses.

5

Simulation result

5.1

Rejection rate

A simulation study is conducted to evaluate the performance of the three methods in this section. We first set up a known prior of the form (2) on the parameter

space. Let wj = P

ij=1

αi1···ik, j = 1, ..., k. A condition for the prior setting is given in Section 2.1. The simulation procedure is to generate a set of p. Then use the p in (1) to generate a set of n∗. Next, calculate v

i conditioning on the n∗ and

use the vi for the three different loss criteria. For testing the k − 1 hypotheses

of (4), we can count the rejection number for the k − 1 hypotheses for the three methods. Although the truth of the k−1 hypotheses depends on p, by the property of the Dirichlet distribution, we have E(πi) < E(πj) if wi < wj. If we repeat

(22)

H0i: πi+1 < πi of a good testing should be close to P (πi+1 > πi). Thus, we can use

the criterion to evaluate the testing methods. We repeat the simulation process

1000 times and the results are shown in Examples 1 and 2. The t∗ values for the

first two tests in these examples are selected such that their corresponding c values

in LN and LR can minimize penalty score presented in Section 5.2. The t∗ values

for the third test in these examples are selected such that their corresponding α2R

values in L2R can minimize the penalty score presented in Section 5.2.

Example 1. Consider the case of k = 5 and a Dirichlet prior distribution on

the parameter space with α00000 = 0, α00001 = 98, α00010= 63, α00100 = 42, α01000 =

28, α10000 = 28 and the others are equal 7. In this case, w1 = 133 = w2 = 133 <

w3 = 147 < w4 = 168 < w5 = 203. Under this setup, we have P (π2 > π1) =

0.500, P (π3 > π2) = 0.859, P (π4 > π3) = 0.930 and P (π5 > π4) = 0.986. To

testing (4), we compare the three methods introduced in Section 3. The rejection rates for each method are listed in Table 2, where c values are 1 and 0.33 for LN

and LR and α2R value is 0.15 for L2R.

Table 2: The rejection rates of the three methods corresponding to each hypothesis in (4) for 1000 replicates.

H01 : π2 ≤ π1 H02 : π3 ≤ π2 H03: π4 ≤ π3 H04: π5 ≤ π4

LN 0.473 0.939 0.981 1

LR 0.188 0.798 0.951 0.996

L2R 0.415 0.899 0.965 0.998

Example 2. Consider the case of k = 5 and a Dirichlet prior distribution on

the parameter space with α00000 = 0, α00001 = 56, α00010= 49, α00100 = 42, α01000 =

35, α10000 = 70 and the others are equal 7. In this case, w2 = 140 < w3 = 147 <

w4 = 154 < w5 = 161 < w1 = 170. We have P (π2 > π1) = 0.006, P (π3 > π2) =

0.703, P (π4 > π3) = 0.696 and P (π5 > π4) = 0.688. To test (4), we compare the

(23)

listed in Table 3, , where c values are 1 and 0.54 for LN and LR and α2R value is

0.2 for L2R.

Table 3: The rejection rates of the three methods corresponding to each hypothesis in (4) for 1000 replicates.

H01 : π2 ≤ π1 H02 : π3 ≤ π2 H03: π4 ≤ π3 H04: π5 ≤ π4

LN 0 0.754 0.759 0.747

LR 0.001 0.903 0.911 0.907

L2R 0 0.582 0.570 0.565

From Tables 2 and 3, the performance of the method under the loss function

LR seems worse than the other two methods because its rejection rate is not close

to the probability of the indicator function of the alternative hypothesis in most cases. In Example 1, Method 3 seems better than Method 1. However, in Exam-ple 2, Method 1 is better than Method 3. Different situations result in different performances by these two methods. Overall, Method 1 and Method 3 may be superior to the Method 2 in many cases as shown in the simulation study.

Following the above procedures, the approximate expected score for the three methods can be derived. Note that the scores for Methods 1 and 2 depend on the

value of c, and the score for Method 3 depends on the value of α2R. In the real

application, the selection of c in Methods 1 and 2 may depend on the true cost of

the wrong decision making and the selection of α2Rin Method 3 may depend on the

allowed tolerance error. However, from a theoretical viewpoint, we can investigate

the situation of c and α2R such that the three methods have the smallest penalty

score.

Based on the simulation procedures, the performances of the expected penalty score for different c and α2R corresponding to αi1...ik in Examples 1 and 2 are presented in Figures 1 and 2.

(24)

1.173 in Figure 1, which occurs at c = 1, c = 0.33 and α2R = 0.15, respectively.

The minimum expected penalty scores for Methods 1-3 are 2.274, 3.012 and 2.534 in Figure 1, which occurs at c = 1, c = 0.54 and α2R = 0.2, respectively. Basically,

Figures 1 and 2 show that Method 1 has the smallest minimum expected penalty scores, followed by Method 3. Method 2 has the largest minimum expected penalty scores, which leads to the worst performance among these three methods. From the viewpoint of ranking, this consequence coincides with the results in Section 5.1.

6

A Real Data Example

In this section, we use a real data example to illustrate the methods and present a case which it is ranking inconsistent under the frequenttist framework, but is ranking consistent under the Bayesian framework. This example is a survey of 49609 first-year college students in Taiwan about their preferences in their college study. The data set can be accessed at http://srda.sinica.edu.tw and it is available upon request from the first author. We list one of the multiple responses questions in the questionnaire as follows.

Question: What kind of experience do you expect to receive during the period of college study? (Select at least one response)

1. Read over the Chinese and foreign classic literature 2. Travel around Taiwan

3. Present academic papers in conferences 4. Lead large-scale activities

5. Be on a school team

6. Be a cadre of student associations 7. Participate internship programs 8. Fall in love

9. Have sexual experience 10. Travel around the world

(25)

11. Make many friends 12. Others

We are interested in ranking the responses of this multiple responses question according to students’ preference. To make a clear illustration, we do not consider the problem of ranking all responses, but the problem of ranking the five responses: read Chinese and foreign classics, present academic papers in conferences, lead large-scale activities, be on a school team and be a student association cadre.

The population of the survey is the whole data set including 49609 interview data. Since we have all data, we can obtain the true ranks of the five responses. To illustrate the methods, suppose that we do not have the whole data set, but only have the interview data of 100 randomly selected respondents. Note that from the whole data set, the numbers of respondents selecting the five responses are 8858, 5358, 10578, 6823 and 12145. The first, second and third ranks show that the students prefer to ”be a student association cadre”, ”lead large-scale activities” and ”read Chinese and foreign classics”.

In this example, the notations i1 = 1, i2 = 1, i3 = 1, i4 = 1 and i5 = 1 in

ni1i2i3i4i5 correspond to selection of the response ”read Chinese and foreign classics”, ”present academic papers in conferences”, ”lead large-scale activities”, ”be on a school team” and ” be a cadre of student associations ”, respectively.

According to a 100 randomly selected data, we have n10000 = 19, n01000 =

5, n00100 = 7, n00010 = 6, n00001 = 10, n11000 = 3, n10100 = 0, n10010 = 0, n10001 =

5, n01100 = 1, n01010 = 0, n01001 = 1, n00110 = 0, n00101 = 8, n00011 = 2, n11100 =

0, n11010 = 1, n11001 = 0, n01110 = 0, n01101 = 3, n01011 = 0, n00111 = 8, n10110 =

0, n10101 = 7, n10011 = 0, n11110 = 0, n11101 = 3, n11011 = 1, n10111 = 3, n01111 = 3

and n11111 = 4 for the 100 data. This leads to m1 = 46, m2 = 24, m3 = 47, m4 =

29, m5 = 58. From the data, the most selected responses is ”be a cadre of student

associations ”. Next is ”lead large-scale activities”, followed by ”read Chinese and foreign classics”. Consequently, we have m(5) = 58, m(4) = 47, m(3) = 46, m(2) =

(26)

29, m(1) = 24. Now we are interested in testing

H01: π(5) ≤ π(4)vs H11: π(5) > π(4)

H02: π(5) ≤ π(3)vs H12: π(5) > π(3).

(10) In this case, the likelihood ratio test does not lead to the rejection of the hypothe-ses. Thus, we use the Wald and generalized score tests to illustrate the ranking inconsistency property. When testing H01, the values of the two test statistics with

respect to the Wald test and generalized score test under the frequentist framework are 2.17 and 2.12. The upper 0.025 cutoff point of the standard normal distribu-tion is 1.96, resulting in the rejecdistribu-tion of H01 by the two tests with 0.025 type I

error. However, when testing H02, the values of statistics corresponding to the

Wald test and generalized score test are 1.59, and 1.57, which does not lead to the rejection of H02 in both two tests. Since |π(5)− π(3)| > |π(5)− π(4)|, the above result

leads to ranking inconsistency for the Wald and score tests under the frequentist framework.

Now we consider the Bayesian framework and implement Method 1, Method 2 and Method 3 for this example. According to the whole data, we assume a prior for pi1i2i3i4i5, which corresponds to α10000 = 13, α01000 = 4, α00100 = 8, α00010 = 5, α00001 = 11, α11000 = 3, α10100 = 2, α10010 = 1, α10001 = 3, α01100 = 1, α01010 =

0, α01001 = 1, α00110 = 1, α00101 = 10, α00011 = 3, α11100 = 0, α11010 = 0, α11001 =

0, α01110 = 0, α01101 = 2, α01011 = 0, α00111 = 6, α10110 = 0, α10101 = 3, α10011 =

1, α11110= 0, α11101= 1, α11011= 0, α10111= 2, α01111= 1 and α11111= 4.

In the real applications, we can estimate the prior or derive a prior from past experience.

For implementing Method 1 and Method 2, we select c = 1, and α2R = 0.15

corresponding to Method 1 and Method 2, resulting in t∗ = 0.5 and 0.9917 with

respect to the two methods. For testing (10) under the given prior, we have v1 = 0.9919 and v2 = 0.9947. Consequently, by (8), H01and H02 are both rejected

(27)

by the two methods.

In this case, the results show that the data leads to the conventional tests under the frequentist framework is ranking inconsistent, and the proposed methods are ranking consistent under the Bayesian framework.

7

Conclusion

Several methods for ranking the responses in a multiple responses question under Bayesian framework are proposed in this paper. The specified ranking criterion and ranking error penalty are established. Compared with the methods under the frequentist framework, these methods are more convincing because they have the property of Bayesian ranking consistency.

From the simulation study, the the methods using the loss functions LN and

L2R are better than the method using the loss function LRif we consider the cases

where c and α2Rare selected such that the minimum expected penalty score occurs.

However, in real applications, the selection of the constant c in LN and LR may

depend on the economic cost. The same is true in the selection of α2R. Since α2R

provides a tolerance error of false discovery rate, in real applications, the setup of

α2R may depend on the allowed tolerance error. For that reason, researchers may

be dependent on cost when selecting the most useful approaches.

8

Appendix

Proof of Theorem 1. Note that πl+1− πl = (πl+1− π(l+1)l) − (πl− π(l+1)l). For

a given n∗, let A = P

ij=0 or 1

(αi1i2...ik+ ni1i2...ik). From the property of the Dirichlet distribution, we have the expectation and variance of pi′

1i′2...′k equal to (αi′ 1i′2...i′k + ni′1i′2...i′k) A and (αi′

1i′2...i′k+ ni′1i′2...i′k)(A − αi′1i′2...i′k− ni′1i′2...i′k) A2(A + 1)

(28)

respectively.

The covariance of pi′

1i′2...i′k and pi′′1i′′2...i′′k is equal to −(αi′

1i′2...i′k + ni′1i′2...i′k)(αi′′1i′′2...i′′k+ ni′′1i′′2...i′′k)

A2(A + 1) .

Therefore, from the above facts and straightforward calculation, the expecta-tion of πl+1− πl can be rewritten as

E(πl+1− πl) = E((πl+1− π(l+1)l) − (πl− π(l+1)l))

= B

and the variance of πl+1− πl can be rewritten as

V ar(πl+1− πl) = V ar((πl+1− π(l+1)l) − (πl− π(l+1)l))

= V ar(πl+1− π(l+1)l) + V ar(πl− π(l+1)l) − 2Cov((πl+1− π(l+1)l)(πl− π(l+1)l))

= C (11)

By normal approximation, we have vl = P (πl+1− πl > 0) = P (πl+1− πl− E(πl+1− πl) pV ar(πl+1− πl) > −E(πl+1− πl) pV ar(πl+1− πl) ) = P (Z > −E(πl+1 − πl) pV ar(πl+1− πl) ) = Φ( E(πl+1− πl) pV ar(πl+1− πl) ) (12)

(29)

Acknowledgements: The authors thank the editor and references for helpful comments.

References

[1] Benjamini, Y., Hochberg, Y. (1995). Controlling the false discovery rates: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B (57), 289V300.

[2] Agresti, A. and Liu, I.M. (1999) Modeling a categorical variable allowing arbitrarily many category choices. Biometrics 55, 936-943.

[3] Agresti, A. Liu, I.M. (2001) Strategies for modeling a categorical variable allowing multiple category choices. Sociological Methods and Research 29, 403V434.

[4] Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis (2nd ed.), New York: Springer-Verlag.

[5] Bilder, C. R., Loughin, T. M.and Nettleton, D. (2000) Multiple marginal inde-pendence testing for pick any/c variables. Comm.Statist.Simulation Comput., 29(4), 1285-1316.

[6] Decady, Y. J. and Thomas, D. H. (2000). A simple test of association for contingency tables with multiple column responses. Biometrics 56, 893-896. [7] Gopalan, R. and Berry, D. A. (1998). Bayesian multiple comparisons using

Dirichlet process priors. Journal of the American Statistical Association 93, 1130V1139.

[8] Do, K., Muller, P. and Tang, F. (2005). A Bayesian mixture model for differ-ential gene expression. J. R. Stat. Soc. C., 54, 627V644.

(30)

[9] Pammer, S., Fong, D. K. H. and Arnold, S. F. (2000). Forecasting the Pen-etration of a New Product: A Bayesian Approach. Journal of Business and Economic Statistics, 18, no. 4, 428-435.

[10] Gonen, M., Westfall, P. H. and Johnson, W. O. (2003). Bayesian Multiple Testing for Two-Sample Multivariate Endpoints. Biometrics, 59, 76-82. [11] Loughin, T. M. and Scherer, P. N. (1998). Testing for association in

contin-gency tables with multiple column responses. Biometrics 54, 630-637.

[12] Muller P, Parmigiani G, and Rice K. (2007). ”FDR and Bayesian decision rules.” In Bayesian Statistics 8. ( Bernardo, J. et al. ed.) Oxford University Press.

[13] Miranda-Moreno, L. F., Labbe, A. and Fu, L. (2007). Bayesian multiple test-ing procedures for hotspot identification. Accident Analysis and Prevention, 39, 1192V1201.

[14] Muller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. Journal of the American Statistical Association, 99, no.468, 990-1001.

[15] Scott, J. (2009). ”Nonparametric Bayesian multiple testing for longitudinal performance stratification.” Annals of Applied Statistics.

[16] Scott, J.G. and Berger, J.O. (2006). An exploration of aspects of Bayesian multiple testing. J. Stat. Plann. Inference 136, no. 7, 2144V2162.

[17] Umesh, U. N. (1995). Predicting nominal variable relationships with multiple responses. Journal of Forecasting 14, 585-596.

[18] Wang, H. (2008a). Ranking responses in multiple responses questions. Journal of Applied Statistics, 35, 465-474.

(31)

[19] Wang, H. (2008b) Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions. Journal of Multivariate Analysis, 99, 896-911.

(32)

0 5 10 15 1.0 2.0 3.0

Loss Function L

N c

Expected Penalty Score

(1,1.162) 0 5 10 15 1.0 2.0 3.0

Loss Function L

R c

Expected Penalty Score

(0.33,1.646) 0.2 0.4 0.6 0.8 1.0 1.5 2.5 3.5

Loss Function L

2R α2R

Expected Penalty Score

(0.15,1.173)

Figure 1: The expected penalty scores of the three methods under the condition of Example 1

(33)

0 5 10 15 1 3 5

Loss Function L

N c

Expected Penalty Score

(1,2.274) 0 5 10 15 2 4 6

Loss Function L

R c

Expected Penalty Score

(0.54,3.012) 0.2 0.4 0.6 0.8 1.0 3 4 5 6 7

Loss Function L

2R α2R

Expected Penalty Score

(0.2,2.534)

Figure 2: The expected penalty scores of the three methods under the condition of Example 2

數據

Table 2: The rejection rates of the three methods corresponding to each hypothesis in (4) for 1000 replicates.
Table 3: The rejection rates of the three methods corresponding to each hypothesis in (4) for 1000 replicates.
Figure 1: The expected penalty scores of the three methods under the condition of Example 1
Figure 2: The expected penalty scores of the three methods under the condition of Example 2

參考文獻

相關文件

First Taiwan Geometry Symposium, NCTS South () The Isoperimetric Problem in the Heisenberg group Hn November 20, 2010 13 / 44.. The Euclidean Isoperimetric Problem... The proof

Root the MRCT b T at its centroid r. There are at most two subtrees which contain more than n/3 nodes. Let a and b be the lowest vertices with at least n/3 descendants. For such

了⼀一個方案,用以尋找滿足 Calabi 方程的空 間,這些空間現在通稱為 Calabi-Yau 空間。.

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

Teachers may consider the school’s aims and conditions or even the language environment to select the most appropriate approach according to students’ need and ability; or develop

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it