• 沒有找到結果。

Modern One-Way ANOVA F Methods: Trimmed Means, One Step M-estimators and Bootstrap Methods

N/A
N/A
Protected

Academic year: 2021

Share "Modern One-Way ANOVA F Methods: Trimmed Means, One Step M-estimators and Bootstrap Methods"

Copied!
13
0
0

加載中.... (立即查看全文)

全文

(1)

行政院國家科學委員會專題研究計畫 成果報告

Modern One-Way ANOVA F Methods: Trimmed Means, One Step

M-estimators and Bootstrap Methods

(2)

Modern One-Way ANOVA F Methods: Trimmed Means, One Step

M-estimators and Bootstrap Methods

Introduction

One of frequently research methods in educational and psychological studies is to compare groups based on measures that reflect the typical response. The sample mean associated with each group is the most common way of representing the typical response and analysis of variance (ANOVA) F test is a well-known method for comparing groups based on means. As is well known, ANOVA F test is based on two crucial assumptions: normality and equal variances. In reality, however, violations of the assumption of normality or equal variances are common (see Grissom, 2000; Micceri, 1989; Walberg, Strykowski, & Hung, 1984; Wilcox, 1987, 1990).

During the past 40 years, a number of studies have indicated that violating these assumptions can result in serious problems (such as distorted rates of Type I error, a loss of statistical power to detect effects and inaccurate the probability coverages) in the realistic situations (see Hampel, Ronchetti, Rousseeuw, & Stahel, 1986; Huber, 1981; Staudte & Sheather, 1990; Tukey, 1960; Wilcox, 2001, 2003). Simultaneously, studies on modern ANOVA F test methods have been published in statistics or

quantitative psychology journals (see Alexander & Govern, 1994; Lix & Keselman, 1998; Luh & Guo, 1999; Wilcox, 1988, 1989, 1995a, 1997, 2002). Among these modern alternatives,Yuen’strimmed meansand onestep M-estimator based on Huber’sare particularly significant in applied research. Both of them have three main good properties: qualitative robustness, quantitative robustness, and

infinitesimal robustness (see Hampel, et. al., 1986; Huber, 1981). That is, these two measures of location are relatively insensitive to a slight departure from normality.

(3)

Wilcox, 2001, 2003, 2005). Also, theories, simulations and experiences with actual data show that power can be greatly increased by comparing trimmed means versus means (e.g. Lix & Keselman, 1998; Rosenberger & Gasko, 2000; Wilcox, 1994, 1995b, 1995c; Wilcox, Keselman, & Kowalchuk, 1998).

There are, however, practical concerns regarding trimmed means. The first general concern is that the amount of trimming is fixed before analyzing the data and 20% trimming is recommended (Rosenberger & Gasko, 2000; Wilcox, 1998a, 1998b, 2003). Yet, when sampling is from a symmetric distribution with heavy-tails (where outliers are common) or with light-tails (where outliers are rare), 20% trimming can be unsatisfactory in terms of Type I error rates or power. The second general concern is that trimming is theoretically assumed to be symmetric. That is, the amounts of smallest and largest observations removed are the same. But the

problems arise when sampling is from a highly skewed distribution, where outliers are more common in the right tail than in the left or the reverse (see, Wu, 2004 for the results when using actual data). These two negative features of trimmed means can be obviated by using a robust M-estimator instead (e.g. Wilcox, 1992, 1993, 2001).

An one step M-estimator is to remove observations that are empirically identified as outliers (see Huber, 1981; Staudte & Sheather, 1990). The outlier detection technique is based on the median and a measure of variation called the median absolute deviation (MAD) statistic. Thus, one step M-estimators allow asymmetric trimming. That is, one step M-estimators allow the possibilities of no trimming, and a greater amount of trimming in the right tail than left tail or the reverse. In addition, it has a relative high breakdown point of .5. Because of these features, one step M-estimators outperform than trimmed means under some certain situations (e.g. Wilcox, 2001, 2003; Wu, 2004). However, one concern is that there is no known way of getting a reasonably good control over the probability of a Type I error when sample is too small (Wilcox, 1992, 1993, 1997 ).

Recently, bootstrap resampling has been developed in computer-intensive methods (see Efron & Tibshirani, 1993; Westfall & Young, 1993, Wilcox, 2001, 2003). The bootstrap samples are drawn with replacement from the sample itself. Then, the bootstrap is used to estimate the sampling distribution of the test statistic under the null hypothesis and calculate the critical values. Theoretically, combining bootstrap methods with any measure of location (such as, trimmed means or one step M-estimators) can result in smaller standard errors or Type I error rates (Wilcox, 1997). There are many variations of bootstrap methods (see Chernick, 1999;

(4)

studies have demonstrated that improvement in Type I error control is possible by using bootstrap methods with any measure of location. Westfall and Young (1993) suggest that by combining bootstrap methods with trimmed means, researchers can obtain better Type I error control when testing for treatment group equality. Also they found that the bootstrap-t is preferred over the percentile bootstrap when

comparing groups based on means. The asymptotic results studies are demonstrated by Hall and Padmanabhan (1992). Wilcox and his colleagues (1998) found

empirical support for the use of robust estimators with bootstrapping in one way independent groups design. In addition, Wilcox (2003) indicated that when using measures of location relatively insensitive to outliers (such as trimmed means), the percentile bootstrap has distinct advantages compared to the bootstrap-t. But, the related studies of bootstrap methods with one step M-estimators and the comparisons of the benefits of bootstrapping with trimmed means and one step M-estimators need to be further investigated. Therefore, the main research questions in this study are: (a) under which situation, which method can provide good control of Type I error rates; (b) under which situation, trimmed means outperform than one step

M-estimators; (c) under which situation, one step M-estimators are preferred to trimmed means and (d) what are the benefits of bootstrapping with these two measures of location.

Methods Procedure

20% trimmed means, one step M-estimators based on , bootstrap methods (a bootstrap-t method and a percentile bootstrap method) with trimmed means and one step M-estimators were used to comparing groups under six distribution shapes. For each distribution shape, 48 simulation situations with different sample sizes, different the numbers of groups and unequal variances were investigated (the characteristics of simulation samples were described below). The outcomes were compared in terms of Type I error rates under these simulation cases.

Samples

In order to select distributions that surely span all values of skewness or kurtosis, a method of generating observations from distributions that include extreme values for both skewness and kurtosis was produced by the g-and-h distribution (Hoaglin, 1985). Any observation X from the g-and-h distribution was generated by first generating Z from a standard normal distribution and setting

) 2 / exp( 1 ) exp( 2 hZ g gZ X   .

(5)

Four variables were manipulated here: (a) six distribution shapes; (b) number of groups (4, 6, and 10), (c) sample size, and (d) degree of variance heterogeneity. Six distribution shapes in this study included: (g=0, h=0), (g=0, h=0.5), (g=0.5, h=0), (g=0.5, h=0.5), (g=1, h=0), and (g=1, h=0.5). The corresponding skewness and kurtosis of the g-and-h distribution are referred in Wilcox, Keselman and Kowalchuk (1998). These simulation data were be generated by the SAS RANNOR function (SAS Institute, 1999). Table I contained the sample size and variance conditions.

Table 1. Investigated sample size and variance conditions

Sample sizes Variances

(10, 15, 20, 25), (15, 20, 25, 30) (10, 15, 20, 25), (15, 20, 25, 30) (16, 9, 4, 1) (1, 4, 9, 16) (10, 15, 20, 25), (15, 20, 25, 30) (10, 15, 20, 25), (15, 20, 25, 30) (36, 1, 1, 1) (1, 1, 1, 36) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (16, 9, 9, 4, 1, 1) (1, 1, 4, 9, 9, 16) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (36, 1, 1, 1, 1, 1) (1, 1, 1, 1, 1, 36) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (16, 9, 9, 9, 4, 4, 4, 1, 1, 1) (1, 1, 1, 4, 4, 4, 9, 9, 9, 36) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (36, 1, 1, 1, 1, 1, 1, 1, 1, 1) (1, 1, 1, 1, 1, 1, 1, 1, 1, 36) Test Statistics

Comparing trimmed means

Let X1,…..,Xn, a random sample from a single group and X(1)

….

X(n)be the order statistics. Let g = [rn], where [rn] is the greater integer

rn and h = n-2g be the effective sample size. The sample trimmed mean is

    n g g i i t X h X 1 ) ( 1 , and the sample Winsorized mean is

i w Y

n

X 1

(6)

where

  ) ( ) 1 ( g n i g i

X

X

X

Y

The sample Winsorized variance is

. ) ( 1 1 2 2 w i w Y X n S     Let n ,j hj, 2 wj

S and Xtj be the values of n, h,

2

w

S and Xt for the jth group, and let , ) 1 ( ) 1 ( 2    j j wj j j h h S n q , 1 j j q w  , j w U  , 1 ~

tj jX w U X

   ( ~) , 1 1 2 X X w J A j tj

   1 ) / 1 ( 1 ) 2 ( 2 2 2 j j h U w J J B , and 1   B A Ft (1)

The null hypothesis is reject, whenF exceeds the 1-quantile of an F distributiont with v1 J 1and 1 2 2 2 1 ) / 1 ( 1 3             

j j h U w J V

Bootstrap-t method for trimmed means

The strategy is to use the shifted available distributions to estimate an appropriate critical value for the test statistic. First, for the jth group, subtract the sample

trimmed mean from each of the observed values (YijXijXtj). Next, for the jth if Xi  X(g+1)

(7)

group, generate a bootstrap sample of size nj from the Yij values, which are

denoted byY ,ij* i1,...nj, j1,...J. The value of the test statisticF given byt

equation (1) and based on theYij* is labeledFt*. Repeat this process B times yielding *

* 1,... tB t F

F . Next, put these B values in ascending order, yielding Ft*(1) ...Ft(*B), and let u be the value of (1-)B round to the nearest integer. The null hypothesis of equal trimmed means is reject when FtFt*(u).

The percentile Method for trimmed means

There are many variations of the percentile bootstrap method, but the test statistic used here is derived from Schrader and Hettmansperger (1980). Let  be the trimmed mean () and lett ˆ be the estimate of  based on data from the jthj

group (j=1,…J). Theteststatisticis 1

nj(ˆj)2 N H , where N

nj and

j J

1 ˆ. To determine a critical value, set

j ij ij X

Y  ˆ. Then, generate

bootstrap samples from each group from the Yij values and compute the test statistic based on bootstrap samples, yielding H .* Repeat this process B times, resulting in

* * 1,...HB

H , and put these B values in order, yieldingH(*1) ...F(*B). Then, an

estimate of a critical value is H(u*), in which u(1)B, rounded to the nearest

integer. The null hypothesis is reject, when HH(u*). Comparing one step M-estimators

Let X1,…..,Xn, a random sample, and letbe an odd and nondecreasing function. An M-estimate of location is any quantity,ˆ, satisfyingm

, 0 ˆ        

Xim

(8)

ofNewton’smethod isemployed to determineˆusing the sample median (M) as am starting value (see Huber 1981; Staudte & Sheather, 1990). The resulting estimate of location is

                      ˆ ˆ ˆ ˆ ' X M M X M i i m 2 1 1 ) ( 1 2 2 1 ) ( ˆ i i n X i i k i n i i i     

   ,

where i is the number of observations1 Xi satisfying (XiM)/ˆk, and i2

is the number of observations satisfying(XiM)/ˆk. k=1.28. For each j<h (where there are J independent groups), the he test statistic is

2 / 1 ) ( ˆ ˆ h j h j jh H    ,

whereandj are the standard errors ofh ˆjandˆ, respectively.h And the null hypothesis is reject, when Hq1.

Bootstrap-t methods for one step M-estimators

The methods are the same with the procedures of bootstrap-t methods for trimmed means, except measure of location replaced by one step M-estimators. Percentile methods for one step M-estimators

The methods are the same with the procedures of percentile methods for trimmed means, except measure of location replaced by one step M-estimators.

Results

(9)

obtained, especially when sampling from skewed distributions.

Regarding to the issues of bootstrapping versus non-bootstrapping methods, the results show that the all bootstrapping methods can control Type I error rates in most simulated conditions. In contrast, all the non-bootstrapping methods perform well except for the skewed distributions. With respect to benefits of bootstrapping with trimmed means and one step M-estimators, bootstrap-t with trimmed means

outperform percentile-bootstrap with trimmed means in most simulated situations. While, the reverse results are obtained for bootstrapping with one Step M-estimators. Moreover, because of the features of trimmed means, bootstrapping (bootstrap-t and percentile-bootstrap) with one step M-estimators provide better controls of Type I errors than trimmed means under skewed conditions.

Conclusion

(10)
(11)

Reference

Alexander, R. A., & Govern, D. M. (1994). A new and simpler approximation fro ANOVA under variance heterogeneity. Journal of Educational Statistics, 19, 91-101.

Bradley, J. V. (1987). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152.

Chernick, M. R. (1999). Bootstrap Methods:A Practitioner’sGuild. New York: Wiley.

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge: Cambridge University Press.

Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7(1), 1-26.

Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall.

Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology, 68, 155-165.

Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. New York: Wiley. Hall, P., & Padmanabhan, A. R. (1992). On the bootstrap and the trimmed mean.

Journal of Multivariate Analysis, 41, 132-153.

Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h diatributions. In D. Hoaglin, F. Mosteller & J. Tukey (Eds), Exploring Data Tables, Trends, and Shapes, pp.461-513. New York: Wiley.

Huber, P. J. (1981). Robust Statistics. New York: Wiley.

Lix, A. M., & Keselman H. J. (1998). To trim or not to trim: Tests of location equality under heteroscedasticity and nonnormality. Educational and Psychological Measurement, 58(3), 409-429.

Lunneborg, C. E. (2000). Data Analysis by Resampling: Concepts and Applications. Pacific Grove, CA: Duxbury.

Luh, W. M., & Guo J. H. (1999). A powerful transformation trimmed mean method for one-way fixed effects ANOVA model under non-normality and inequality of variances. Journal of Mathematical and Statistical Psychology, 52, 303-320. Micceri, T. (1989). The unicorn, the normal curve and other improbable creatures.

Psychological Bulletin, 105(1), 156-166.

Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the Royal Statistical Society Series B, 11, 69-84.

(12)

means, medians and trimean. In D. C. Hoaglin (Ed.) Understanding Robust and Exploratory Data Analysis (pp.297-336). New York: John Wiley & Sons, Inc.

SAS Institute, Inc. (1999). SAS Basic Software, Version 8, Cary, NC: SAS.

Staudte, R. G., & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley.

Schrader, R. M., & Hettmansperger, T. P. (1980). Robust analysis of variance. Biometrika, 67, 93-101.

Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29, 614.

Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In I. Olkin et al. (Eds.), Contributions to Probability and Statistics (pp.448-485). Stanford, CA: Stanford University Press.

Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample:

Trimming/winsorization 1. Sankhya A, 25, 331-352.

Walberg, H. J., Strykowski, B. F., Rovai, E., & Hung, S. S. (1984). Exceptional performance. Review of Education Research, 54(1), 87-112.

Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing. New York: Wiley.

Wilcox, R. R. (1987). New Statistical Procedures for the Social Sciences: Modern Solutions to Basic Problems. Hillsdale, New Jersey: Lawrence Erlbaum. Wilcox, R. R. (1988). A new alternative to the ANOVA F and new results on

James’ssecond-order method. British Journal of Mathematical and Statistical Psychology, 41, 109-117.

Wilcox, R. R. (1989). Adjusting for unequal variances when comparing means in one-way and two-way fixed effects ANOVA models. Journal of Educational Statistics, 14, 269-278.

Wilcox, R. R. (1990). Comparing the means of two independent groups. Biometrical Journal, 32, 771-780.

Wilcox, R. R. (1992). Comparing one-step M-estimators of location corresponding to two independent groups. Psychometrika, 57(1), 141-154.

Wilcox, R. R. (1993). Comparing one-step M-estimators of location when there are more than two groups. Psychometrika, 58(1), 71-78.

(13)

Wilcox, R. R. (1995a). ANOVA: The practical importance of heteroscedastic methods, using trimmed means versus means, and designing simulation studies. British Journal of Mathematical and Statistical Psychology, 48, 99-114.

Wilcox, R. R. (1995b). Comparing two independent groups via multiple quantiles. Statistician, 44(1), 91-99.

Wilcox, R. R. (1995c). Three multiple comparison procedures for trimmed means. Wilcox, R. R. (1996). Statistics for the Social Sciences. San Diego, CA: Academic

Press.

Wilcox, R. R. (1997). Introduction to Robust Estimation and Hypothesis Testing. San Diego, CA: Academic Press.

Wilcox, R. R. (1998a). How many discoveries have been lost by ignoring modern statistical methods? American Psychologist, 53(3), 300-314.

Wilcox, R. R. (1998b). The goals and strategies of robust methods. Journal of Mathematical and Statistical Psychology, 51, 1-39.

Wilcox, R. R. (2001). Fundamentals of Modern Statistical Methods: Substantially improving power and accuracy. New York: Springer-Verlag.

Wilcox, R. R. (2002). Understanding the practical advantages of Modern ANOVA methods. Journal of Clinical Child and Adolescent Psychology, 31(3), 399-412.

Wilcox, R. R. (2003). Applied Contemporary Statistical Methods. New York: Academic Press.

Wilcox, R. R. (2005). Introduction to Robust Estimation and Hypothesis Testing (2en ed.). San Diego, CA: Academic Press.

Wilcox, R. R., Keselman, H. J., & Kowalchuk, R. K. (1998). Can tests for treatment group equality be improved? The bootstrap and trimmed means conjecture. British Journal of Mathematical and Statistical Psychology, 51, 123-134.

Wu, P.-C. (2004). Measure of location: Comparing means, trimmed means, one step M-estimators and modified one step M-estimators under non-normality.

Chinese Journal of Psychology, 46(1), 29-47.

Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29, 614.

數據

Table 1. Investigated sample size and variance conditions
Table 2-4 report estimated Type I error probabilities when sampling from the g-and-h distribution including the normal distribution as a special case
Table 2: Estimated Type I Errors (J=4) g h N S T M Bootstrap-t T M Percentile-bootstrapTM 0.0 0.0 N1 S1 0.052 0.048 0.045 0.056 0.055 0.044 N1 S2 0.055 0.051 0.044 0.051 0.051 0.048 N1 S3 0.055 0.045 0.053 0.055 0.059 0.041 N1 S4 0.053 0.047 0.056 0.053 0.

參考文獻

相關文件

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

Quadratically convergent sequences generally converge much more quickly thank those that converge only linearly.

denote the successive intervals produced by the bisection algorithm... denote the successive intervals produced by the

If the bootstrap distribution of a statistic shows a normal shape and small bias, we can get a confidence interval for the parameter by using the boot- strap standard error and

方法一: 採用 【Compare Means】分析選項 One- One -way ANOVA way ANOVA分析報表說明 分析報表說明

For R-K methods, the relationship between the number of (function) evaluations per step and the order of LTE is shown in the following

To improve the convergence of difference methods, one way is selected difference-equations in such that their local truncation errors are O(h p ) for as large a value of p as

Triple Room A room that can accommodate three persons and has been fitted with three twin beds, one double bed and one twin bed or two double beds... Different types of room in