• 沒有找到結果。

Concept of P value

The p-value is a measurement of evidence we have against the null hypotheses. The definition of p-value, for the observed sample data, is the smallest significance level which we depend on to reject null hypotheses. Significance level is the largest probability of type I error we can accept under hypotheses test. The smaller the level is, the more narrow rejective region is.

If null hypotheses can be rejected under a specific significance level, it may be rejected under smaller level. However, once the level is too low to reject null hypotheses, the test has no ability of defense. For the Neyman-Pearson criterion, it was mentioned to set a significance level to perform the hypotheses test and choose the best one according to the outcomes.

However, the problem is what size should the level be set? Choosing the significance level is a subjective parameter for any test. That is why we treat p-value as an evidence to support whether the null hypotheses are rejected or not.

How do we use p-value to implement a hypotheses test? The meaning of p-value is the lowest significance level which reject the null hypotheses, i.e., if the p-value is greater than the significance level, the null hypotheses cannot be rejected because the level is smaller than the critical point of rejecting H0; otherwise, if the p-value is smaller than the significance level, the null hypotheses are rejected. The reason is that since the reject region correspond-ing to significance level can reject the critical point (reject hypotheses), the reject region

corresponding to p-value (smaller than significance level, smaller reject region) can reject it with no doubt.

Example

Before the example, we need to translate the above illustration of p-value to mathematical formulas for convenience while discussing the problem. The rule of detection by using p-value is that

p-value < α, then reject H0, p-value ≥ α, then accept H0,

where α means the significance level. The above rule is based on right-tailed test which means

H0 : p < α,

H1 : p ≥ α. (1)

See Fig. 1. The right-tailed test is adopted when the analyzers predict the true value is greater than the critical point (assumed value). In statistics, there are other kinds of tailed test besides right-tailed test. Left-tailed test is used when we assume the true value is smaller than the critical point. It can be written as

H0 : p ≥ α,

H1 : p < α, (2)

and two-tailed test is applied when the both value are difficult to predict so that the test is H0 : p = α,

H1 : p 6= α. (3)

Figure 1: Right-tailed test.

Let us look at the example. Company A has new food product Y which need to be print-tagged outside packages. According to the factory’s report, the weight of the product is 3 kg, which is a standard number. The quality department has survey of the weights to ensure the tags correspond to the product. From the factory product, which is normally distributed, 49 products are randomly sampled. The mean (µ) and the standard variation (S) are 2.95 kg and 0.38, respectively. The significance level α = 0.05. The quality department wants to test whether average of the weights on the tag is reliable.

Because X is normally distributed and the number of samples is quite big, we use the normal distribution for testing the average. The problem can be written as

H0 : µ ≥ 3, (the product is standard and acceptable for consumer),

H1 : µ < 3, (the product is not standard). (4)

We use the left-tailed test for calculating p-value because the decision of alternative hypoth-esis depends on whether µ < 3. The p-value is

p = P( ¯X ≤ ¯X0|µ = 3) = P( ¯X ≤ ¯X0|µ = 3)

= P³ ¯X − 3

0.0542 2.95 − 3 0.0542

´

= P(Z ≤ −0.92) = 0.178, (5)

where ¯X and ¯X0 represent the average of X and X0, respectively. P(·) denotes the probability function. The number 0.0542 comes from S/√

n. At last we can find p-value through look-up tables. Obviously we can tell the result p > α from the calculation. The null hypothesis can not be rejected and the company maintains the quality of product well and the tag is fine.

Bibliography

[1] S. Holm, “A simple sequentially rejective multiple test procedure,” Scandinavian Jour-nal of Statistics, pp. 65–70, 1979.

[2] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society: Series B, no. 1, pp. 289–300, 1995.

[3] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques.

New Jersey: Prentice-Hall, 1993.

[4] H. Akaike, “Information theory and an extension of the maximum likelihood princi-ple,” in Proc. 2nd International Symposium on Information Theory (B. N. Petrov and F. Caski, eds.), pp. 267–281, 1973.

[5] J. Rissanen, “Modeling by the shortest data description,” Automatica, pp. 465–471, 1978.

[6] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,” IEEE Trans. Acoustic Speech Signal Processing, pp. 387–392, Apr. 1985.

[7] L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Time Series Analysis. Addison-Wesley, 1991.

[8] A. T. James, Multivariate Analysis-II. New York: Academic, 1969.

[9] D. N. Lawley, “Test of significance for the latent roots of covariance and correlation matries,” Biometrika, no. 8, pp. 128–136, 1956.

[10] D. B. Williams, “Eigenvalues analysis for source detection with narrowband passive arrays,” master’s thesis, Rice University, Houston, TX, Sep. 1986.

[11] L. C. Zhao and P. R. Bai, “Remarks on certain criteria for detection of number of signals,” IEEE Trans. Acoustics Speech Signal Processing, pp. 129–132, Feb. 1987.

[12] D. B. Williams and D. H. Johnson, “Using the sphericity test for source detection with narrow-band passive arrays,” IEEE Trans. Acoustics Speech Signal Processing, pp. 2008–2014, Nov. 1990.

[13] J. W. Mauchley, “Significance test for sphericity of a normal n-variate distribution,”

Ann. Math. Stat., no. 11, pp. 204–209, 1940.

[14] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York:

Wiley, 1968.

[15] G. E. P. Box, “A general distribution theory for a class of likelihood criteria,” Bio-metrika, pp. 317–346, Dec. 1949.

[16] W. S. Liggett, “Passive sonar: Fitting models to multiple time series,” in Signal Process-ing: Proc. NATO ASI Signal Processing, pp. 327–345, New York: Academic, 1973.

[17] P.-J. Chung, J. F. B¨ohme, A. O. Hero, and C. F. Mechlenbr¨auker, “Signal detection using a multiple hypothesis test,” in Proc. Third IEEE Sensor Array and Multichannel Signal Processing Workshop, 2004.

[18] R. J. Simes, “An improved bonferroni procedure for multiple tests of significance,”

Biometrika, no. 73, pp. 751–754, 1986.

[19] G. Hommel, “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, no. 75, pp. 383–386, 1988.

[20] Y. Hochberg and A. C. Tamhane, Multiple Comparison Procedure. New York: Wiley, 1987.

[21] C. R. Genovese and L. Wasserman, “A stochastic process approach to false discovery control,” Annals Statistics, vol. 32, no. 3, pp. 1035–1061, 2004.

[22] C. R. Genovese and E. P. Merriam, “New procedures for false discovery control,” in Biomedical Imaging: Macro to Nano, pp. 1498–1501, Apr. 2004.

相關文件