行政院國家科學委員會專題研究計畫 成果報告
Modern One-Way ANOVA F Methods: Trimmed Means, One Step
M-estimators and Bootstrap Methods
Modern One-Way ANOVA F Methods: Trimmed Means, One Step
M-estimators and Bootstrap Methods
Introduction
One of frequently research methods in educational and psychological studies is to compare groups based on measures that reflect the typical response. The sample mean associated with each group is the most common way of representing the typical response and analysis of variance (ANOVA) F test is a well-known method for comparing groups based on means. As is well known, ANOVA F test is based on two crucial assumptions: normality and equal variances. In reality, however, violations of the assumption of normality or equal variances are common (see Grissom, 2000; Micceri, 1989; Walberg, Strykowski, & Hung, 1984; Wilcox, 1987, 1990).
During the past 40 years, a number of studies have indicated that violating these assumptions can result in serious problems (such as distorted rates of Type I error, a loss of statistical power to detect effects and inaccurate the probability coverages) in the realistic situations (see Hampel, Ronchetti, Rousseeuw, & Stahel, 1986; Huber, 1981; Staudte & Sheather, 1990; Tukey, 1960; Wilcox, 2001, 2003). Simultaneously, studies on modern ANOVA F test methods have been published in statistics or
quantitative psychology journals (see Alexander & Govern, 1994; Lix & Keselman, 1998; Luh & Guo, 1999; Wilcox, 1988, 1989, 1995a, 1997, 2002). Among these modern alternatives,Yuen’strimmed meansand onestep M-estimator based on Huber’sare particularly significant in applied research. Both of them have three main good properties: qualitative robustness, quantitative robustness, and
infinitesimal robustness (see Hampel, et. al., 1986; Huber, 1981). That is, these two measures of location are relatively insensitive to a slight departure from normality.
Wilcox, 2001, 2003, 2005). Also, theories, simulations and experiences with actual data show that power can be greatly increased by comparing trimmed means versus means (e.g. Lix & Keselman, 1998; Rosenberger & Gasko, 2000; Wilcox, 1994, 1995b, 1995c; Wilcox, Keselman, & Kowalchuk, 1998).
There are, however, practical concerns regarding trimmed means. The first general concern is that the amount of trimming is fixed before analyzing the data and 20% trimming is recommended (Rosenberger & Gasko, 2000; Wilcox, 1998a, 1998b, 2003). Yet, when sampling is from a symmetric distribution with heavy-tails (where outliers are common) or with light-tails (where outliers are rare), 20% trimming can be unsatisfactory in terms of Type I error rates or power. The second general concern is that trimming is theoretically assumed to be symmetric. That is, the amounts of smallest and largest observations removed are the same. But the
problems arise when sampling is from a highly skewed distribution, where outliers are more common in the right tail than in the left or the reverse (see, Wu, 2004 for the results when using actual data). These two negative features of trimmed means can be obviated by using a robust M-estimator instead (e.g. Wilcox, 1992, 1993, 2001).
An one step M-estimator is to remove observations that are empirically identified as outliers (see Huber, 1981; Staudte & Sheather, 1990). The outlier detection technique is based on the median and a measure of variation called the median absolute deviation (MAD) statistic. Thus, one step M-estimators allow asymmetric trimming. That is, one step M-estimators allow the possibilities of no trimming, and a greater amount of trimming in the right tail than left tail or the reverse. In addition, it has a relative high breakdown point of .5. Because of these features, one step M-estimators outperform than trimmed means under some certain situations (e.g. Wilcox, 2001, 2003; Wu, 2004). However, one concern is that there is no known way of getting a reasonably good control over the probability of a Type I error when sample is too small (Wilcox, 1992, 1993, 1997 ).
Recently, bootstrap resampling has been developed in computer-intensive methods (see Efron & Tibshirani, 1993; Westfall & Young, 1993, Wilcox, 2001, 2003). The bootstrap samples are drawn with replacement from the sample itself. Then, the bootstrap is used to estimate the sampling distribution of the test statistic under the null hypothesis and calculate the critical values. Theoretically, combining bootstrap methods with any measure of location (such as, trimmed means or one step M-estimators) can result in smaller standard errors or Type I error rates (Wilcox, 1997). There are many variations of bootstrap methods (see Chernick, 1999;
studies have demonstrated that improvement in Type I error control is possible by using bootstrap methods with any measure of location. Westfall and Young (1993) suggest that by combining bootstrap methods with trimmed means, researchers can obtain better Type I error control when testing for treatment group equality. Also they found that the bootstrap-t is preferred over the percentile bootstrap when
comparing groups based on means. The asymptotic results studies are demonstrated by Hall and Padmanabhan (1992). Wilcox and his colleagues (1998) found
empirical support for the use of robust estimators with bootstrapping in one way independent groups design. In addition, Wilcox (2003) indicated that when using measures of location relatively insensitive to outliers (such as trimmed means), the percentile bootstrap has distinct advantages compared to the bootstrap-t. But, the related studies of bootstrap methods with one step M-estimators and the comparisons of the benefits of bootstrapping with trimmed means and one step M-estimators need to be further investigated. Therefore, the main research questions in this study are: (a) under which situation, which method can provide good control of Type I error rates; (b) under which situation, trimmed means outperform than one step
M-estimators; (c) under which situation, one step M-estimators are preferred to trimmed means and (d) what are the benefits of bootstrapping with these two measures of location.
Methods Procedure
20% trimmed means, one step M-estimators based on , bootstrap methods (a bootstrap-t method and a percentile bootstrap method) with trimmed means and one step M-estimators were used to comparing groups under six distribution shapes. For each distribution shape, 48 simulation situations with different sample sizes, different the numbers of groups and unequal variances were investigated (the characteristics of simulation samples were described below). The outcomes were compared in terms of Type I error rates under these simulation cases.
Samples
In order to select distributions that surely span all values of skewness or kurtosis, a method of generating observations from distributions that include extreme values for both skewness and kurtosis was produced by the g-and-h distribution (Hoaglin, 1985). Any observation X from the g-and-h distribution was generated by first generating Z from a standard normal distribution and setting
) 2 / exp( 1 ) exp( 2 hZ g gZ X .
Four variables were manipulated here: (a) six distribution shapes; (b) number of groups (4, 6, and 10), (c) sample size, and (d) degree of variance heterogeneity. Six distribution shapes in this study included: (g=0, h=0), (g=0, h=0.5), (g=0.5, h=0), (g=0.5, h=0.5), (g=1, h=0), and (g=1, h=0.5). The corresponding skewness and kurtosis of the g-and-h distribution are referred in Wilcox, Keselman and Kowalchuk (1998). These simulation data were be generated by the SAS RANNOR function (SAS Institute, 1999). Table I contained the sample size and variance conditions.
Table 1. Investigated sample size and variance conditions
Sample sizes Variances
(10, 15, 20, 25), (15, 20, 25, 30) (10, 15, 20, 25), (15, 20, 25, 30) (16, 9, 4, 1) (1, 4, 9, 16) (10, 15, 20, 25), (15, 20, 25, 30) (10, 15, 20, 25), (15, 20, 25, 30) (36, 1, 1, 1) (1, 1, 1, 36) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (16, 9, 9, 4, 1, 1) (1, 1, 4, 9, 9, 16) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (10, 15, 15, 20, 20, 25), (15, 20, 20, 25, 25, 30) (36, 1, 1, 1, 1, 1) (1, 1, 1, 1, 1, 36) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (16, 9, 9, 9, 4, 4, 4, 1, 1, 1) (1, 1, 1, 4, 4, 4, 9, 9, 9, 36) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (10, 10, 15, 15, 15, 20, 20, 20, 25, 25), (15, 15, 20, 20, 20, 25, 25, 25, 30, 30) (36, 1, 1, 1, 1, 1, 1, 1, 1, 1) (1, 1, 1, 1, 1, 1, 1, 1, 1, 36) Test Statistics
Comparing trimmed means
Let X1,…..,Xn, a random sample from a single group and X(1)
….
X(n)be the order statistics. Let g = [rn], where [rn] is the greater integer
rn and h = n-2g be the effective sample size. The sample trimmed mean is
n g g i i t X h X 1 ) ( 1 , and the sample Winsorized mean isi w Y
n
X 1
where
) ( ) 1 ( g n i g iX
X
X
Y
The sample Winsorized variance is
. ) ( 1 1 2 2 w i w Y X n S Let n ,j hj, 2 wj
S and Xtj be the values of n, h,
2
w
S and Xt for the jth group, and let , ) 1 ( ) 1 ( 2 j j wj j j h h S n q , 1 j j q w , j w U , 1 ~
tj jX w U X
( ~) , 1 1 2 X X w J A j tj
1 ) / 1 ( 1 ) 2 ( 2 2 2 j j h U w J J B , and 1 B A Ft (1)The null hypothesis is reject, whenF exceeds the 1-quantile of an F distributiont with v1 J 1and 1 2 2 2 1 ) / 1 ( 1 3
j j h U w J VBootstrap-t method for trimmed means
The strategy is to use the shifted available distributions to estimate an appropriate critical value for the test statistic. First, for the jth group, subtract the sample
trimmed mean from each of the observed values (Yij Xij Xtj). Next, for the jth if Xi X(g+1)
group, generate a bootstrap sample of size nj from the Yij values, which are
denoted byY ,ij* i1,...nj, j1,...J. The value of the test statisticF given byt
equation (1) and based on theYij* is labeledFt*. Repeat this process B times yielding *
* 1,... tB t F
F . Next, put these B values in ascending order, yielding Ft*(1) ...Ft(*B), and let u be the value of (1-)B round to the nearest integer. The null hypothesis of equal trimmed means is reject when Ft Ft*(u).
The percentile Method for trimmed means
There are many variations of the percentile bootstrap method, but the test statistic used here is derived from Schrader and Hettmansperger (1980). Let be the trimmed mean () and lett ˆ be the estimate of based on data from the jthj
group (j=1,…J). Theteststatisticis 1
nj(ˆj )2 N H , where N
nj and
j J 1 ˆ. To determine a critical value, set
j ij ij X
Y ˆ. Then, generate
bootstrap samples from each group from the Yij values and compute the test statistic based on bootstrap samples, yielding H .* Repeat this process B times, resulting in
* * 1,...HB
H , and put these B values in order, yieldingH(*1) ...F(*B). Then, an
estimate of a critical value is H(u*), in which u(1)B, rounded to the nearest
integer. The null hypothesis is reject, when H H(u*). Comparing one step M-estimators
Let X1,…..,Xn, a random sample, and letbe an odd and nondecreasing function. An M-estimate of location is any quantity,ˆ, satisfyingm
, 0 ˆ
XimofNewton’smethod isemployed to determineˆusing the sample median (M) as am starting value (see Huber 1981; Staudte & Sheather, 1990). The resulting estimate of location is
ˆ ˆ ˆ ˆ ' X M M X M i i m 2 1 1 ) ( 1 2 2 1 ) ( ˆ i i n X i i k i n i i i
,where i is the number of observations1 Xi satisfying (Xi M)/ˆk, and i2
is the number of observations satisfying(Xi M)/ˆk. k=1.28. For each j<h (where there are J independent groups), the he test statistic is
2 / 1 ) ( ˆ ˆ h j h j jh H ,
whereandj are the standard errors ofh ˆjandˆ, respectively.h And the null hypothesis is reject, when H q1.
Bootstrap-t methods for one step M-estimators
The methods are the same with the procedures of bootstrap-t methods for trimmed means, except measure of location replaced by one step M-estimators. Percentile methods for one step M-estimators
The methods are the same with the procedures of percentile methods for trimmed means, except measure of location replaced by one step M-estimators.
Results
obtained, especially when sampling from skewed distributions.
Regarding to the issues of bootstrapping versus non-bootstrapping methods, the results show that the all bootstrapping methods can control Type I error rates in most simulated conditions. In contrast, all the non-bootstrapping methods perform well except for the skewed distributions. With respect to benefits of bootstrapping with trimmed means and one step M-estimators, bootstrap-t with trimmed means
outperform percentile-bootstrap with trimmed means in most simulated situations. While, the reverse results are obtained for bootstrapping with one Step M-estimators. Moreover, because of the features of trimmed means, bootstrapping (bootstrap-t and percentile-bootstrap) with one step M-estimators provide better controls of Type I errors than trimmed means under skewed conditions.
Conclusion
Reference
Alexander, R. A., & Govern, D. M. (1994). A new and simpler approximation fro ANOVA under variance heterogeneity. Journal of Educational Statistics, 19, 91-101.
Bradley, J. V. (1987). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-152.
Chernick, M. R. (1999). Bootstrap Methods:A Practitioner’sGuild. New York: Wiley.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge: Cambridge University Press.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7(1), 1-26.
Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall.
Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology, 68, 155-165.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust Statistics: The Approach Based on Influence Functions. New York: Wiley. Hall, P., & Padmanabhan, A. R. (1992). On the bootstrap and the trimmed mean.
Journal of Multivariate Analysis, 41, 132-153.
Hoaglin, D. C. (1985). Summarizing shape numerically: The g-and-h diatributions. In D. Hoaglin, F. Mosteller & J. Tukey (Eds), Exploring Data Tables, Trends, and Shapes, pp.461-513. New York: Wiley.
Huber, P. J. (1981). Robust Statistics. New York: Wiley.
Lix, A. M., & Keselman H. J. (1998). To trim or not to trim: Tests of location equality under heteroscedasticity and nonnormality. Educational and Psychological Measurement, 58(3), 409-429.
Lunneborg, C. E. (2000). Data Analysis by Resampling: Concepts and Applications. Pacific Grove, CA: Duxbury.
Luh, W. M., & Guo J. H. (1999). A powerful transformation trimmed mean method for one-way fixed effects ANOVA model under non-normality and inequality of variances. Journal of Mathematical and Statistical Psychology, 52, 303-320. Micceri, T. (1989). The unicorn, the normal curve and other improbable creatures.
Psychological Bulletin, 105(1), 156-166.
Quenouille, M. H. (1949). Approximate tests of correlation in time-series. Journal of the Royal Statistical Society Series B, 11, 69-84.
means, medians and trimean. In D. C. Hoaglin (Ed.) Understanding Robust and Exploratory Data Analysis (pp.297-336). New York: John Wiley & Sons, Inc.
SAS Institute, Inc. (1999). SAS Basic Software, Version 8, Cary, NC: SAS.
Staudte, R. G., & Sheather, S. J. (1990). Robust Estimation and Testing. New York: Wiley.
Schrader, R. M., & Hettmansperger, T. P. (1980). Robust analysis of variance. Biometrika, 67, 93-101.
Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29, 614.
Tukey, J. W. (1960). A survey of sampling from contaminated distributions. In I. Olkin et al. (Eds.), Contributions to Probability and Statistics (pp.448-485). Stanford, CA: Stanford University Press.
Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample:
Trimming/winsorization 1. Sankhya A, 25, 331-352.
Walberg, H. J., Strykowski, B. F., Rovai, E., & Hung, S. S. (1984). Exceptional performance. Review of Education Research, 54(1), 87-112.
Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing. New York: Wiley.
Wilcox, R. R. (1987). New Statistical Procedures for the Social Sciences: Modern Solutions to Basic Problems. Hillsdale, New Jersey: Lawrence Erlbaum. Wilcox, R. R. (1988). A new alternative to the ANOVA F and new results on
James’ssecond-order method. British Journal of Mathematical and Statistical Psychology, 41, 109-117.
Wilcox, R. R. (1989). Adjusting for unequal variances when comparing means in one-way and two-way fixed effects ANOVA models. Journal of Educational Statistics, 14, 269-278.
Wilcox, R. R. (1990). Comparing the means of two independent groups. Biometrical Journal, 32, 771-780.
Wilcox, R. R. (1992). Comparing one-step M-estimators of location corresponding to two independent groups. Psychometrika, 57(1), 141-154.
Wilcox, R. R. (1993). Comparing one-step M-estimators of location when there are more than two groups. Psychometrika, 58(1), 71-78.
Wilcox, R. R. (1995a). ANOVA: The practical importance of heteroscedastic methods, using trimmed means versus means, and designing simulation studies. British Journal of Mathematical and Statistical Psychology, 48, 99-114.
Wilcox, R. R. (1995b). Comparing two independent groups via multiple quantiles. Statistician, 44(1), 91-99.
Wilcox, R. R. (1995c). Three multiple comparison procedures for trimmed means. Wilcox, R. R. (1996). Statistics for the Social Sciences. San Diego, CA: Academic
Press.
Wilcox, R. R. (1997). Introduction to Robust Estimation and Hypothesis Testing. San Diego, CA: Academic Press.
Wilcox, R. R. (1998a). How many discoveries have been lost by ignoring modern statistical methods? American Psychologist, 53(3), 300-314.
Wilcox, R. R. (1998b). The goals and strategies of robust methods. Journal of Mathematical and Statistical Psychology, 51, 1-39.
Wilcox, R. R. (2001). Fundamentals of Modern Statistical Methods: Substantially improving power and accuracy. New York: Springer-Verlag.
Wilcox, R. R. (2002). Understanding the practical advantages of Modern ANOVA methods. Journal of Clinical Child and Adolescent Psychology, 31(3), 399-412.
Wilcox, R. R. (2003). Applied Contemporary Statistical Methods. New York: Academic Press.
Wilcox, R. R. (2005). Introduction to Robust Estimation and Hypothesis Testing (2en ed.). San Diego, CA: Academic Press.
Wilcox, R. R., Keselman, H. J., & Kowalchuk, R. K. (1998). Can tests for treatment group equality be improved? The bootstrap and trimmed means conjecture. British Journal of Mathematical and Statistical Psychology, 51, 123-134.
Wu, P.-C. (2004). Measure of location: Comparing means, trimmed means, one step M-estimators and modified one step M-estimators under non-normality.
Chinese Journal of Psychology, 46(1), 29-47.
Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29, 614.