• 沒有找到結果。

Hypothesis Testing

在文檔中 最高密度顯著性檢定 (頁 11-14)

In the topic of hypothesis testing, there are two important categories for hypothesis speci-fication, the significance test and the Neyman-Pearson formulation. The Neyman-Pearson formulation considers a decision problem which is composed by two components, null hy-pothesis and alternative hyhy-pothesis. On the other hand, the significance test considers only one hypothesis, the null hypothesis H0. The significance test may occurs that H0 is drawn from scientific guess and we have no idea on assumption about alternative hypothesis when the null hypothesis is not true. Another case is that the model is developed to be checked with new data by a selection process on a subset when H0 is true. Then the problem for significance test is more general than the Neyman-Pearson formulation in that when H0 is not true there are many possibilities for the true alternative.

1.2.1 Neyman-Pearson Formulation

The hypothesis testing for Neyman-Pearson formulation considering a null hypothesis H0and an alternative hypothesis H1 is different from significance test which considers only the null hypothesis. The theory of Neyman-Pearson lemma applies the ratio of the likelihoods with one that H0is true and one that H1is true. This leads to the result of most powerful test when H0 and H1 are both simple and uniformly most powerful test for some composite hypotheses and some specific distributions. Hacking(1965) interpreted the the law of likelihood in the following:

If one hypothesis, H1, implies that a random variable X taking the value x with prob-ability is f1(x), while another hypothesis, H2, implies that the probability is f2(x), then the observation X=x is evidence supporting H1 over H2 if f1(x) > f2(x), and the likelihood ratio, f1(x)/f2(x), measures the strength of the evidence.

The law of likelihood gathers the information of likelihoods for that H0 and H1 are true and use the likelihood ratio to help statisticians in drawing conclusion of acceptance or rejection of null hypothesis.

Besides that it has been with derived optimality, the Neyman-Pearson hypothesis testing can be solved with sample size determination to have a desired power. The significance test is not available in justifying the sample size for the fact that anything is possible when H0 is not true. There is one other advantage for the Neyman-Pearson hypothesis testing. The use of likelihood ratio will automatically derive the test statistic. This desired property is not shared with most other hypothsis testing problems. With the interesting results including the optimal property, samples size determination and test statistic derivation, the theory of Neyman-Pearson lemma provides us a desired test when an alternative hypothesis may be specified. However, in many practical situations, to specify an alternative hypothesis is not appropriate. In this case, what can we do?

1.2.2 Significance Hypothesis Test

For developing more than 200 years, the significance test has been popularly used in many branches of applied science. Some earlier applications of significance test include Armitage’s

(1983) claim finding the germ of the idea in a medical discussion from 1662 and Arbuth-not’s(1710) observation that the male births exceeds the female births. Some important significance tests developed latter include the Karl Pearson’s(1900) χ2 test and W.S. Gos-set’s (1908) student test , the first small-sample test. Significance tests were given their modern justification and then popularized by Fisher who derived most of the test statistics that were broadly adopted in a series of papers and books during 1920s and 1930s. Tradi-tionally, a significance test is to examine whether a given data is in concordance with H0. The practitioners generally formulate a null hypothesis of interest and specify a test statistic to interpret if the observation provides evidence against H0. Then, the p-value is determined as the probability of the sample set for that the test statistic is at least as extreme as its observed value when H0 is true:

p = P r(T at least as extreme as the value observed |H0)., where T represents the test statistic.

A significance test always drawn conclusion in terms of p-value. It interprets the p-value as evidence for that the data is consistent with the null hypothesis by concluding that the hypothesis is significant or not to be true. This is different from the Neyman-Pearson framework which always draws conclusion of acceptance or rejection of null hypothesis. The significance test is often being criticized for that it is hard to provide acceptable reason in supporting the chosen test statistic although the sufficient statistic is usually recommended.

There is other way in the interpretation of the p-value by saying that it represents the strength of evidence against null hypothesis. From this point, the extreme set has to contain sample points which are at least as large as the observed value or absolute value of the test statistic (see this point in Schervish(1996) and Sackrowitz and Samuuel-Chan (1999))

Schervish(1996), Royall(1997) and Donahue(1999) argued that the typical p-value couldn’t completely interpret the statistical evidence. With this concern, the p-value has been re-defined as the probability of the extreme points determined by the joint density function.

In the empirical studies, Hung et al.(1997), and Donahue(1999) proposed modifications for significance test based on the Neyman-Pearson approach where two hypotheses are assumed.

They discussed p-value in the class of Neyman-Pearson formulation regardless of no

alter-native assumption. In other words, they connect the Neyman-Pearson formulation to the significance test. This argument is different from the approach that we will introduce.

It is known that the likelihood function has been recognized as a mathematical represen-tation of the evidence (Birnbaum 1962). However, without consistent technique to defining evidence, the classical approaches for the discipline of the significance test really make users confused for that the hypothesis may be significant for one test statistic and insignificant for the other one. Thus, there needs one approach to interpret the statistical evidence that is more convincing than the existing approaches. Hopefully there are interesting properties for this new approach.

在文檔中 最高密度顯著性檢定 (頁 11-14)

相關文件