• 沒有找到結果。

Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVA

N/A
N/A
Protected

Academic year: 2021

Share "Confidence intervals and sample size calculations for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVA"

Copied!
13
0
0

加載中.... (立即查看全文)

全文

(1)

Confidence intervals and sample size calculations

for the weighted eta-squared effect sizes in one-way

heteroscedastic ANOVA

Gwowen Shieh

Published online: 18 July 2012 # Psychonomic Society, Inc. 2012

Abstract Effect size reporting and interpreting practices have been extensively recommended in academic journals when primary outcomes of all empirical studies have been analyzed. This article presents an alternative approach to constructing confidence intervals of the weighted eta-squared effect size within the context of one-way heteroscedastic ANOVA mod-els. It is shown that the proposed interval procedure has advantages over an existing method in its theoretical justifi-cation, computational simplicity, and numerical performance. For design planning, the corresponding sample size proce-dures for precise interval estimation of the weighted eta-squared association measure are also delineated. Specifi-cally, the developed formulas compute the necessary sam-ple sizes with respect to the considerations of expected confidence interval width and tolerance probability of in-terval width within a designated value. Supplementary computer programs are provided to aid the implementation of the suggested techniques in practical applications of ANOVA designs when the assumption of homogeneous variances is not tenable.

Keywords Heteroscedasticity . Precision . Welch’s statistic The analysis of variance (ANOVA) compares the impact of categorical design factors on a continuous response variable in order to determine whether differences exist among the treat-ment groups. To indicate how much the knowledge of a

treatment group improves prediction of the response variable, several strength-of-association measures have been suggested in the literature, such as the estimators of bη2, b"2, and bw2

(Grissom & Kim, 2005; Hays, 1994; Keppel,1991; Kline, 2004; Maxwell & Delaney,2004). They can be interpreted as a proportion that reflects how much variability in the response variable is associated with the variation in the treatment levels. The underlying rationale and discrepancy of these three asso-ciation measures have been discussed in Fern and Monroe (1996), Glass and Hakstian (1969), Maxwell, Camp, and Arvey (1981), and Richardson (1996). Accordingly, the sam-ple eta-squared bη2, is one of the most commonly reported association indices in practical applications of ANOVA. De-tailed discussion and related issues can be found in Cohen (1973), Haase (1983), Levine and Hullett (2002), Olejnik and Algina (2003), Pierce, Block, and Aguinis (2004), and Richardson (2011).

One important assumption underlying the ANOVA designs is that of equal population variances. Violation of the homogeneity-of-variance assumption has been the target of criticism in applications of ANOVA. For example, Grissom (2000) emphasized that there are theoretical reasons to expect, and empirical results to document, the existence of heterosce-dasticity in clinical studies. Moreover, Grissom and Kim (2005, pp. 10–14) provided additional explanations for the intrinsic causes of variance heterogeneity in real data. The practical importance and methodological complexity of the problem has incurred numerous attempts to develop various parametric and nonparametric alternative procedures to count-er the effects of hetcount-eroscedasticity (Keselman et al.,1998; Lix, Keselman, & Keselman,1996). It follows from the compre-hensive reviews of Grissom (2000), Harwell, Rubinstein, Hayes, and Olds (1992), and Tomarken and Serlin (1986) that the Welch (1951) procedure is a widely accepted technique for correcting for variance heterogeneity. Specifically, it has advantages over other, contending methods in its overall Electronic supplementary material The online version of this article

(doi:10.3758/s13428-012-0228-7) contains supplementary material, which is available to authorized users.

G. Shieh (*)

Department of Management Science, National Chiao Tung University,

Hsinchu, Taiwan 30010, Republic of China e-mail: gwshieh@mail.nctu.edu.tw

(2)

performance, computational ease, and general availability in statistical computer packages.

According to the general discussions of Breaugh (2003), Ferguson (2009), Fern and Monroe (1996), Kirk (1996), Richardson (1996), and Vacha-Haase and Thompson (2004), group difference and strength of association (or correlation ratio) are two of the major classes of effect sizes in practical applications. It should be noted that these con-ventional measures and conversion formulas of effect size make the standard assumption of homoscedasticity. However, Grissom and Kim (2001) were concerned by the frequent occurrence of variance heterogeneity in many areas of re-search. Therefore, they advised caution regarding the robust-ness and appropriaterobust-ness to heteroscedasticity of current effect size measures that assume homogeneous variances. Ulti-mately, the prevailing heteroscedastic phenomenon has prompted different conceptions and definitions of effect size. This issue has important implications for interpreting the meaning of effect sizes, but it has received relatively limited attention in the methodological literature. In the case of com-paring the means for two different groups, Keselman, Algina, Lix, Wilcox, and Deering (2008) emphasized that the well-known Cohen’s (1988) standardized mean difference is not appropriate when the homogeneity-of-variance assumption is violated and discussed several alternative definitions of a standardized mean difference effect size to circumvent the untenable assumption of equal variances. Accordingly, the diversity of suggested measures in Grissom and Kim (2001) and Keselman et al. (2008) implies that there is no firm consensus as to the definition of a standardized mean differ-ence effect size in the presdiffer-ence of heteroscedasticity. The various indices of standardized mean difference simply repre-sent different quantities, each with their unique features, and may prove to be useful in a given application. Also, see Bonett (2008) for a discussion of standardized linear contrasts of means with different standardizers in hetero-scedastic ANOVA.

Although numerous approaches have been suggested to tackle the practical and complex issue of heteroscedasticity, Keselman et al. (2008) presented a unified formulation of approximate degrees of freedom (ADF) procedures within the context of general linear models. Essentially, the pre-scribed Welch (1951) method for comparing mean equality can be obtained from the general ADF perspective. Thorough treatment and related applications of the Welch statistic and other ADF methods are also described in Lix and Keselman (1995). To further circumvent the sensitivity of traditional methods for comparing mean equality with nonnormality, in additional to heteroscedasticity, Keselman et al. (2008) em-phasized the applications of ADF procedures with robust estimators of both central tendency and variability. As was noted earlier, Keselman et al. (2008) pointed out the appar-ently problematic outcome of Cohen’s standardized mean

difference when variance heterogeneity is present. More im-portant, they also explicated the vital differences and merits of various definitions and estimators of standardized mean dif-ference effect size. It is important to note that Bonett (2008) has suggested several useful standardized linear contrasts of means within the heteroscedastic ANOVA setting. However, no explicit formulations were provided for a population strength of association effect size measure such as the coun-terpart of eta-squaredη2in traditional ANOVA designs.

In contrast, a weighted formulation of effect size θ was proposed in Kulinskaya and Staudte (2006, Equation2) to accommodate the underlying characteristics of possibly un-equal error variances and unbalanced group sizes for a one-way heteroscedastic ANOVA. It was also noted in Kulin-skaya and Staudte that when variances are equal, the weight-ed effect sizeθ reduces to the widely recognized effect size index f2 in traditional one-way ANOVA (Cohen, 1988, p. 274). Moreover, the weighted effect size can be readily transformed to a weighted coefficient of determination ρ2 0 θ/(1 + θ), just as the prevalent strength of association measure of eta-squared η2 0 f2/(1 + f2) is a one-to-one function of effect size f2. Hence, the coefficient of determi-nationρ2resembles the eta-squared indexη2for represent-ing the proportion of explained variance within the context of a one-way heteroscedastic ANOVA. In view of the ap-pealing features and versatile usefulness for the definitions of weighted effect sizeθ and weighted coefficient of deter-minationρ2, Kulinskaya and Staudte presented an approxi-mate interval estimation procedure for θ on the basis of a shifted and rescaled chi-square transformation of Kulin-skaya, Staudte, and Gao (2003). Clearly, it follows from the monotone transformation ρ20 θ/(1 + θ) that a desired confidence interval of ρ2 can be immediately constructed from the obtained interval estimate ofθ. Moreover, several simulation studies were conducted in Kulinskaya and Staudte to examine the performance of the suggested tech-nique. According to the numerical results, they concluded that the interval procedure is surprisingly accurate in terms of the nominal coverage probability, except for very small sample sizes. Also, the coverage probability tends to exceed the nominal level when the magnitude of the weighted effect size is small.

Despite the aforementioned arguments and findings in Kulinskaya and Staudte (2006), the following four caveats to their interval estimation method should be noted. First, their confidence interval of θ is constructed from a shifted and rescaled chi-square approximate distribution for an es-timator of the explained sum of squares (Kulinskaya & Staudte, 2006, Equation 11) as an alternative method for computing the distribution function of Welch’s statistic (Kulinskaya et al.,2003, Equation6). Since they have not successfully obtained a pivotal quantity with the shifted and rescaled chi-square distribution, further approximations are

(3)

made to the shifted and rescaled parameters in order to compute the involved confidence limits. It is notable that the statistical presentations and algebraic expressions for their interval estimators ofθ are fairly complicated and the calculation of confidence intervals requires a special-purpose computer program for performing the necessary computation. Therefore, the complexity may result in limit-ed acceptance in application. Second, the exact interval procedure for the association strength effect sizeη2in ho-moscedastic ANOVA was repeatedly described in Fleishman (1980), Kelley (2007), Kline (2004), Odgaard and Fowler (2010), Smithson (2001), and Steiger (2004). Specifically, the exact approach employs a noncentrality inversion tech-nique of F distributions and is called the“cumulative distri-bution function” pivotal method in Casella and Berger (2002, Section 9.2.3) and Mood, Graybill, and Boes (1974, Section 4.2). Corresponding routines and scripts for the computations of noncentral F distributions and exact confi-dence intervals are available in popular software packages such as R, SAS, SPSS, and STATISTICA. Kulinskaya and Staudte’s approximate interval estimation method deals with the more general target effect size of the weighted coefficient of determination ρ2, which subsumes the association strength eta-squared η2 as a special case. However, the shifted and rescaled chi-square transformation of Kulinskaya and Staudte does not conform to the established noncentral-ity inversion procedure. Thus, the failure to embed the con-fidence intervals ofρ2 and η2 in a unified principle is an obvious limitation of the existing method of Kulinskaya and Staudte.

Third, the empirical investigation in Kulinskaya and Staudte (2006) seems to give practically acceptable results for a wide range of two-sample settings in Tables1–4. But a closer inspection of their numerical performance for three-group situations in Tables 5–7 suggests that the coverage

probability tends to increase with decreasing weighted effect size θ. In other words, the resulting two-sided confidence interval may be too wide when the population weighted effect size is small, whereas the reported interval estimate is probably not wide enough to attain the desired confidence level if the magnitude of underlying weighted effect size is large. Consequently, the unknown magnitude of the under-lying population weighted effect size could distort the cov-erage performance of the interval estimates. Potential users should be aware of the robustness problem associated with the approximate formula of Kulinskaya and Staudte. Fourth, they particularly remarked that the actual distribution of the principal statistic proposed in Kulinskaya et al. (2003) is highly skewed and does not converge rapidly enough to a noncentral chi-square distribution. This implies that their interval procedure gives rise to asymmetric confidence inter-vals forθ or that the resulting two-sided interval estimates are not equidistant around the principal statistic. However, the accuracy of the one-sided confidence intervals and the sensitivity to heteroscedasticity and unbalanced structures of Kulinskaya and Staudte are essentially unknown. The exist-ing results for two-sided confidence intervals in Kulinskaya and Staudte are not detailed enough to elucidate these fun-damental issues. It seems prudent, therefore, to confirm that the properties of their technique are well clarified before it can be adopted as a general procedure.

According to the editorial guidelines and methodological recommendations of several prominent educational and psy-chological journals, it is necessary to include some measures of effect size and confidence intervals for all primary out-comes (Alhija & Levy,2009; Odgaard & Fowler,2010; Sun, Pan, & Wang, 2010). Furthermore, Maxwell, Kelley, and Rausch (2008) advocated the desirability of achieving re-quired precision in parameter estimation and emphasized the importance of sample size planning in constructing precise

Table 1 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted signal-to-noise ratioλ* when σ2

1; σ22; σ23

 

0 (1, 1/2, 4), (N1, N2, N3)0 (10, 10, 10), (μ1,μ2,μ3)0 (0, 1, 1), (0, 1, 2), (0, 1, 3), and (0, 1, 4)

The proposed approach Kulinskaya and Staudte (2006)

λ* Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average

95 % CI 95 % CI 90 % CI width 95 % CI 95 % CI 90 % CI width

0.23 .9597 .0097 .9485 −.0015 .9082 .0082 0.7202 .9625 .0125 .9870 .0370 .9495 .0495 0.8637 0.36 .9542 .0042 .9522 .0022 .9064 .0064 0.9460 .9541 .0041 .9817 .0317 .9358 .0358 1.1230 0.64 .9461 −.0039 .9538 .0038 .8999 −.0001 1.3907 .9383 −.0117 .9816 .0316 .9199 .0199 1.6316 1.08 .9387 −.0113 .9517 .0017 .8904 −.0096 2.0274 .9199 −.0301 .9790 .0290 .8989 −.0011 2.3555 λ* Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average

97.5 % CI 97.5 % CI 95 % CI width 97.5 % CI 97.5 % CI 95 % CI width 0.23 .9799 .0049 .9744 −.0006 .9543 .0043 0.8609 .9760 .0010 .9999 .0249 .9759 .0259 1.0393 0.36 .9774 .0024 .9764 .0014 .9538 .0038 1.1287 .9718 −.0032 .9939 .0189 .9657 .0157 1.3461 0.64 .9724 .0026 .9772 .0022 .9496 −.0004 1.6606 .9575 −.0175 .9934 .0184 .9509 .0009 1.9604 1.08 .9672 .0078 .9753 .0003 .9425 −.0075 2.4228 .9383 −.0367 .9919 .0169 .9302 −.0198 2.8437

(4)

confidence intervals. It is worthwhile to note that the notion of coefficient of determinationρ2in multiple linear regres-sion is more commonly referred to as the eta-squared index η2

to represent the strength of association in the context of ANOVA settings. For clarity, the weighted coefficient of determination in Kulinskaya and Staudte (2006) is therefore referred to as the weighted eta-squared in the remainder of this article. In an effort to improve the quality of research analysis and design, this article presents interval estimation and sample size procedures for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVAs. On the basis of the approximate noncentral F distribution for

Welch’s statistic in Levy (1978), we apply the cumulative distribution function pivotal method to construct well-supported confidence intervals for the weighted eta-squared effect sizes. The proposed general methodology not only enables a transparent and concise exposition of the inherent statistical arguments and properties, but also combines the interval procedures for both homoscedastic and heteroscedastic ANOVA designs into one unified frame-work. The accuracy of the suggested approach is evaluated by the computed confidence interval corresponding to the nominal coverage probability and the actual probability of coverage it achieves. Extensive numerical examinations Table 2 Simulated coverage probability, error, and average width of the

approximate confidence intervals for weighted eta-squaredη2*0 1/6

when σ2

1; σ22; σ23; σ24

 

0 (1, 4, 9, 16), (N1, N2, N3, N4)0 (15, 15, 15,

15), (6, 12, 18, 24), (24, 18, 12, 6), and mean structures (μ1,μ2,μ3,μ4)0

{−1, 0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1, −1, 1, 1}/c denoted by μ 0 1, 2, and 3, respectively

The proposed approach Kulinskaya and Staudte (2006)

μ Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average

95 % CI 95 % CI 90 % CI width 95 % CI 95 % CI 90 % CI width (N1, N2, N3, N4)0 (15, 15, 15, 15) 1 .9550 .0050 .9526 .0026 .9076 .0076 0.2771 .9442 −.0058 .9892 .0392 .9334 .0334 0.3168 2 .9555 .0055 .9517 .0017 .9072 .0072 0.2779 .9454 −.0046 .9892 .0392 .9346 .0346 0.3192 3 .9509 .0009 .9529 .0029 .9038 .0038 0.2769 .9468 −.0032 .9903 .0403 .9371 .0371 0.3229 (N1, N2, N3, N4)0 (6, 12, 18, 24) 1 .9453 −.0047 .9493 −.0007 .8946 −.0054 0.2778 .9381 −.0119 .9892 .0392 .9273 .0273 0.3276 2 .9461 −.0039 .9512 .0012 .8973 −.0027 0.2778 .9364 −.0136 .9894 .0394 .9258 .0258 0.3238 3 .9558 .0058 .9501 .0001 .9059 .0059 0.2783 .9439 −.0061 .9880 .0380 .9319 .0319 0.3191 (N1, N2, N3, N4)0 (24, 18, 12, 6) 1 .9579 .0079 .9532 .0032 .9111 .0111 0.2977 .9384 −.0116 .9918 .0418 .9302 .0302 0.3409 2 .9497 −.0003 .9529 .0029 .9026 .0026 0.2984 .9357 −.0143 .9926 .0426 .9283 .0283 0.3506 2 .9389 −.0111 .9494 −.0006 .8883 −.0117 0.2987 .9282 −.0218 .9930 .0430 .9212 .0212 0.3601

Table 3 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted eta squaredη2*0 1/6 when σ2 1; σ 2 2; σ 2 3; σ 2 4   0 (1, 4, 9, 16), (N1, N2, N3, N4)0 (15, 15, 15,

15), (6, 12, 18, 24), (24, 18, 12, 6), and mean structures (μ1,μ2,μ3,μ4)0

{−1, 0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1, −1, 1, 1}/c denoted by μ 0 1, 2, and 3, respectively

The proposed approach Kulinskaya and Staudte (2006)

μ Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average 97.5 % CI 97.5 % CI 95 % CI width 97.5 % CI 97.5 % CI 95 % CI width (N1, N2, N3, N4)0 (15, 15, 15, 15) 1 .9787 .0037 .9774 .0024 .9561 .0061 0.3212 .9697 −.0053 .9976 .0226 .9673 .0173 0.3647 2 .9795 .0045 .9740 −.0010 .9535 .0035 0.3222 .9709 −.0041 .9973 .0223 .9682 .0182 0.3675 3 .9728 −.0022 .9775 .0025 .9503 .0003 0.3210 .9674 −.0076 .9974 .0224 .9648 .0148 0.3711 (N1, N2, N3, N4)0 (6, 12, 18, 24) 1 .9716 −.0034 .9749 −.0001 .9465 −.0035 0.3221 .9642 −.0108 .9971 .0221 .9613 .0113 0.3759 2 .9719 −.0031 .9740 −.0010 .9459 −.0041 0.3221 .9624 −.0126 .9970 .0220 .9594 .0094 0.3718 3 .9804 .0054 .9736 −.0014 .9540 .0040 0.3226 .9709 −.0041 .9963 .0213 .9672 .0172 0.3673 (N1, N2, N3, N4)0 (24, 18, 12, 6) 1 .9789 .0039 .9754 .0004 .9543 .0043 0.3437 .9605 −.0145 .9987 .0237 .9592 .0092 0.3895 2 .9750 .0000 .9769 .0019 .9519 .0019 0.3447 .9577 −.0173 .9987 .0237 .9564 .0064 0.3992 3 .9664 −.0086 .9744 −.0006 .9408 −.0092 0.3452 .9495 −.0255 .9989 .0239 .9484 −.0016 0.4090

(5)

were conducted to reveal the advantages in coverage prob-ability and interval width of the proposed approach over the approximate transformation method of Kulinskaya and Staudte under a variety of group mean configurations, var-iance patterns, and sample size structures. Moreover, sample size calculations for precise interval estimation of weighted eta-squared effect sizes are also demonstrated in two differ-ent perspectives. One approach gives the minimum sample size, such that the expected confidence interval width is within the designated bound. The other approach provides the sample size needed to guarantee, with a given tolerance probability, that the width of a confidence interval will not

exceed the planned range. To facilitate the recommended procedures in empirical applications, SAS computer pro-grams are developed for computing the confidence intervals of the weighted eta-squared association strength and the necessary sample sizes for designated interval precision criteria in planning research designs.

Interval estimation of weighted eta-squared

Consider the one-way heteroscedastic ANOVA model in which the observations Xijare assumed to be independent

Table 4 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted eta-squaredη2*0 1/6

when σ2

1; σ22; σ23; σ24

 

0 (1, 1, 1, 1), (N1, N2, N3, N4)0 (15, 15, 15, 15),

(6, 12, 18, 24), (24, 18, 12, 6), and mean structures (μ1,μ2,μ3,μ4)0 {−1,

0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1, −1, 1, 1}/c denoted by μ 0 1, 2, and 3, respectively

The proposed approach Kulinskaya and Staudte (2006)

μ Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average

95 % CI 95 % CI 90 % CI width 95 % CI 95 % CI 90 % CI width (N1, N2, N3, N4)0 (15, 15, 15, 15) 1 .9484 −.0016 .9482 −.0018 .8966 −.0034 0.2737 .9446 −.0054 .9868 .0368 .9314 .0314 0.3174 2 .9509 .0009 .9531 .0031 .9040 .0040 0.2747 .9429 −.0071 .9894 .0394 .9323 .0323 0.3170 3 .9528 .0028 .9552 .0052 .9080 .0080 0.2748 .9444 −.0056 .9907 .0407 .9351 .0351 0.3151 (N1, N2, N3, N4)0 (6, 12, 18, 24) 1 .9339 −.0161 .9521 .0021 .8860 −.0140 0.2887 .9252 −.0248 .9918 .0418 .9170 .0170 0.3428 2 .9364 −.0136 .9524 .0024 .8888 −.0112 0.2895 .9256 −.0244 .9929 .0429 .9185 .0185 0.3443 3 .9468 −.0032 .9510 .0010 .8978 −.0022 0.2896 .9301 −.0199 .9893 .0393 .9194 .0194 0.3392 (N1, N2, N3, N4)0 (24, 18, 12, 6) 1 .9397 −.0103 .9512 .0012 .8909 −.0091 0.2888 .9308 −.0192 .9914 .0414 .9222 .0222 0.3423 2 .9367 −.0133 .9509 .0009 .8876 −.0124 0.2891 .9285 −.0215 .9919 .0419 .9204 .0204 0.3445 3 .9469 −.0031 .9512 .0012 .8981 −.0019 0.2896 .9315 −.0185 .9921 .0421 .9236 .0236 0.3388

Table 5 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted eta-squaredη2*0 1/6 when σ2

1; σ22; σ23; σ24

 

0 (1, 1, 1, 1), (N1, N2, N3, N4)0 (15, 15, 15, 15),

(6, 12, 18, 24), (24, 18, 12, 6), and mean structures (μ1,μ2,μ3,μ4)0 {−1,

0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1, −1, 1, 1}/c denoted by μ 0 1, 2, and 3, respectively

The proposed approach Kulinskaya and Staudte (2006)

μ Upper Error Lower Error Two-sided Error Average Upper Error Lower Error Two-sided Error Average 97.5 % CI 97.5 % CI 95 % CI width 97.5 % CI 97.5 % CI 95 % CI width (N1, N2, N3, N4)0 (15, 15, 15, 15) 1 .9716 −.0034 .9740 −.0010 .9456 −.0044 0.3175 .9650 −.0100 .9968 .0218 .9618 .0118 0.3653 2 .9773 .0023 .9772 .0022 .9545 .0045 0.3187 .9699 −.0051 .9972 .0222 .9671 .0171 0.3651 3 0.9775 .0025 .9780 .0030 .9555 .0055 0.3187 .9695 −.0055 .9979 .0229 .9674 .0174 0.3630 (N1, N2, N3, N4)0 (6, 12, 18, 24) 1 .9621 −.0129 .9757 .0007 .9378 −.0122 0.3341 .9476 −.0274 .9979 .0229 .9455 −.0045 0.3910 2 .9621 −.0129 .9757 .0007 .9378 −.0122 0.3351 .9482 −.0268 .9991 .0241 .9473 −.0027 0.3928 3 .9723 −.0027 .9738 −.0012 .9461 −.0039 0.3351 .9569 −.0181 .9978 .0228 .9547 .0047 0.3881 (N1, N2, N3, N4)0 (24, 18, 12, 6) 1 .9662 −.0088 .9757 .0007 .9419 −.0081 0.3342 .9521 −.0229 .9981 .0231 .9502 .0002 0.3906 2 .9628 −.0122 .9752 .0002 .9380 −.0120 0.3346 .9496 −.0254 .9980 .0230 .9476 −.0024 0.3928 3 .9733 −.0017 .9753 .0003 .9486 −.0014 0.3351 .9555 −.0195 .9982 .0232 .9537 .0037 0.3875

(6)

and normally distributed with expected values μi and variances σ2 i: Xij N μi; σ2i   ; ð1Þ

whereμiandσ2i are unknown parameters, i01, . . . , g ( ≥ 2)

and j01, . . . , Ni. For testing the hypothesis that all treatment means are equal, the classic F* statistic is the most widely used statistical procedure assuming homogeneity of variance

σ2 1¼ σ22¼ ::: ¼ σ2g ¼ σ2   : F*¼ SSR g=ð  1Þ SSE=ðNT gÞ; ð2Þ where SSR is the treatment sum of squares, SSE is the error sum of squares, and NT ¼

Pg

i¼1Ni. It follows that

F* F g  1; Nð T g;ΛÞ; ð3Þ

where F(g−1, NT−g, Λ) is the noncentral F distribution with (g−1) and (NT−g) degrees of freedom, and noncentrality parameterΛ0NTλ, l ¼X g i¼1 qi μi μ    σ = n o2 ; ð4Þ qi0Ni/NT, andμ  ¼Pg

i¼1qiμi. Furthermore,λ can be alternatively expressed as l¼ f2¼ σ2 μ σ2  withσ2 μ¼ Pg i¼1qi μi μ   2 , and is called the signal-to-noise ratio (Fleishman,1980). The

measure of strength of association or correlation ratio η2 is a one-to-one function of λ:

η2¼ l

1þ l : ð5Þ

Accordingly, the widely used index of the association effect sizeη2is the sample eta-squared:

bη2¼ SSR

SSRþ SSE ¼

F*

F*þ Nð T gÞ g  1=ð Þ

ð6Þ where F* is defined in Equation 2. Moreover, exact confi-dence intervals of η2 can be constructed with the noncen-trality inversion technique of the noncentral F distribution of F* given in Equation3(e.g., Odgaard & Fowler,2010).

However, it has been shown in numerous studies that the conventional F* test statistic is sensitive to the heterosce-dasticity formulation defined in Equation1. Of the numer-ous alternatives to the ANOVA F test, we focus on the approach proposed in Welch (1951) in the form of

W ¼ Pg i¼1Wi Xi X   2 ðg  1Þ = 1þ 2 g  2ð ÞQ g=ð 2 1Þ ; ð7Þ w h e r e Wi¼ Ni Si2;  S2i ¼P Ni j¼1 Xij  XiÞ2=ðNi 1Þ; Xi¼ PN1 j¼1 Xij Ni; X  . ¼Pg i¼1WiXi=U; U¼ Pg i¼1Wi, and Q¼ Pg i¼1ð1 Wi=UÞ 2 Ni 1 ð Þ

= . In contrast to the well-documented results of F* under homoscedasticity, the statistical properties of Welch’s statistic are more complex, and no explicit analytic form of the Table 6 Computed sample size, expected width and tolerance

proba-bility for 95 % two-sided confidence interval of weighted eta-squared η2*0 .15 with interval bound b 0 ω 0 .3 and tolerance probability 1−γ

0 .90, when σ2 1; σ22; σ23; σ24   0 (1, 4, 9, 16), (q1, q2, q3, q4)0 (1/4, 1/4, 1/ 4, 1/4), (1/10, 2/10, 3/10, 4/10), (4/10, 3/10, 2/10, 1/10), and mean structures (μ1,μ2,μ3,μ4)0 {−1, 0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1,

−1, 1, 1}/c denoted by μ 0 1, 2, and 3, respectively

Expected width Tolerance probability

μ Sample sizes Simulated Approximate Error Sample sizes Simulated Approximate Error

E[H] E[H] P{H <ω} P{H <ω} (q1, q2, q3, q4)0 (1/4, 1/4, 1/4, 1/4) 1 (17, 17, 17, 17) 0.2942 0.2941 0.0001 (24, 24, 24, 24) .9244 .9170 0.0074 2 (17, 17, 17, 17) 0.2939 0.2941 −0.0002 (24, 24, 24, 24) .9166 .9170 −0.0004 3 (17, 17, 17, 17) 0.2955 0.2941 0.0013 (24, 24, 24, 24) .9187 .9170 0.0017 (q1, q2, q3, q4)0 (1/10, 2/10, 3/10, 4/10) 1 (7, 14, 21, 28) 0.2914 0.2905 0.0009 (10, 20, 30, 40) .9782 .9733 0.0049 2 (7, 14, 21, 28) 0.2916 0.2905 0.0011 (10, 20, 30, 40) .9779 .9733 0.0046 3 (7, 14, 21, 28) 0.2910 0.2905 0.0005 (10, 20, 30, 40) .9739 .9733 0.0006 (q1, q2, q3, q4)0 (4/10, 3/10, 2/10, 1/10) 1 (32, 24, 16, 8) 0.2920 0.2923 −0.0003 (48, 36, 24, 12) .9743 .9593 0.0150 2 (32, 24, 16, 8) 0.2931 0.2923 0.0008 (48, 36, 24, 12) .9708 .9593 0.0115 3 (32, 24, 16, 8) 0.2925 0.2923 0.0002 (48, 36, 24, 12) .9629 .9593 0.0036

(7)

corresponding distribution is available. It was presented in Levy (1978) that an approximate noncentral F distri-bution can be obtained by replacing the sample means and variances in Welch’s statistic with corresponding population parameters. The numerical comparisons of the estimated power and simulated power of Levy (1978) suggest that the noncentral F distribution yields an adequate approximation for the underlying distribu-tion of Welch’s statistic. Specifically, the approximate distribution for W in Levy is

W F gð  1; v; Λ*Þ; ð8Þ

where the denominator degrees of freedom v¼ gð 2 1Þ=

3t ð Þ; t ¼Pg i¼1 1 wi u = ð Þ2 Ni 1 ð Þ = ; wi¼ Ni σ2i  ; u ¼Pg i¼1 wi, noncentrality parameter Λ*0NTλ*, l* ¼X g i¼1 qi μi μ  *   σi = n o2 ; ð9Þ and μ*¼P g

i¼1wiμi= . It is essential to note that theu

formulation of λ* is the direct extension of the signal-to-noise ratio λ given in Equation 4 under the heterogeneity-of-variance assumption. For ease of refer-ence, λ* is termed as the weighted signal-to-noise ratio for its recognizable relationship with λ. An analogue application of the monotone transformation between λ and η2 in Equation 5 can arguably be recommended to arrive at a weighted eta-squared effect size with λ* in Equation 9 under the heterogeneity of variance setting,

η2*¼ l*

1þ l* : ð10Þ

Accordingly, the weighted eta-squared effect size η2* was presented in Kulinskaya and Staudte (2006) as a weighted coefficient of determination with the notationρ2. In addition to the supporting arguments in Kulinskaya and Staudte, the weighted eta-squared effect sizeη2* provides a natural generalization of the simple indexη2, and it reflects the proportion of total variance accounted for by the effect of treatment means, heterogeneous variance components, and sample size allocation ratios. Notably, the alternative expressions of wi= ¼ qu i σ2i    Pg j¼1 qj σ 2 j .  ! , and μ*¼Pg i¼1 qiμi σ2i    Pg j¼1 qj σ 2 j .  ! ,

imply that bothλ* and η2* do not depend on the group sizes but, rather, on the allocation ratio among the groups.

To indicate the actual level of the strength of association in a study, a sample estimate of the weighted eta-squaredη2* may be obtained as

bη2*¼ W

Wþ Nð T gÞ g  1=ð Þ;

ð11Þ where W is the Welch statistic given in Equation7. Clearly, bη2* is a heteroscedastic extension of the common effect size

measure bη2 given in Equation 6. Unlike the degrees of

freedom for the distribution of F*, the denominator degrees of freedomν in the noncentral F distribution for W given in Equation8depends on the unknown variances. For inferen-tial purposes, a further modification of the noncentral F distribution can be obtained by substituting the respective sample estimates for the variances in ν, and the resulting approximation is

W F gð  1;bv; Λ*Þ; ð12Þ

where the denominator degrees of freedombv ¼ gð 2 1Þ 3Q=ð Þ

and Q is defined in Equation 7. Ultimately, we propose to compute the confidence intervals ofη2* with the noncentrality inversion principle through the approximate noncentral F dis-tribution presented in Equation12. This is useful becauseΛ*0 NTλ* can be viewed as a one-to-one function of η2* in terms of Λ*0NTη2*/(1−η2*) with the equality betweenλ* and η2* in Equation 10. Explicitly, the upper 100(1− α1)% confidence interval ofη2* is of the formbη2*L; 1, in whichbη2*L satisfies P F g 1;bv; NTbη2*L 1bη2*L      < WOL   ¼ 1  a1; ð13Þ where WOL0max WO;Fðg1Þ;bv;1a

1

 

and WOis the observed value of the W statistic defined in Equation 7. Likewise, the lower 100(1 − α2)% confidence interval of η2 is of the form 0; bη2* U   , in which bη2*U satisfies P F g 1;bv; NTbη2U 1bη 2 U      > WOU   ¼ 1  a2; ð14Þ where WOU0max WO;Fðg 1Þ;bv; 1  a2   . Typically, a 100(1− α)% two-sided confidence interval bη2*

L; bη2*U

 

of weighted eta-squared association effect sizeη2* can be obtained by jointly applying Equations 13 and 14 withα10α20α/2. Since the noncentrality parameter of a noncentral F distribution is always nonnegative, it is necessary to use WOLand WOU, instead of WO, to give proper results for the confidence limits. The particular adjustments not only have theoretical implications, but also yield appropriate arguments to prevent computational error. Although the noncentrality inversion procedure was also presented in confidence interval calculations of η2, such as Odgaard and Fowler (2010), their algorithm did not entail subtle modifications of the observed F* statistic. Note that the calculation of confidence intervals bη2*L; bη2*Uneeds to be performed merely for the value of the statistic WO actually observed. In addition, even though Equations13and14cannot

(8)

be solved analytically, it is really only necessary to compute them numerically, since a 100(1−α)% confidence level does not require a closed-form solution. In short, with the desired confidence level, observed value WO, and estimated degrees of freedombv, the numerical computation of confidence limits bη2*

L and bη2*U involves the evaluation of the noncentrality

distribution function of a noncentral F variable, such as the SAS noncentrality function FNONCT. Accordingly, a SAS/ IML (SAS Institute,2011) program has been developed to perform the confidence interval calculations and is available as supplementary material. In contrast, Kulinskaya and Staudte (2006) also presented an approximate confidence interval procedure for η2* based on a shifted and rescaled chi-square transformation. It should be emphasized, however, that their method differs markedly from the noncentrality inversion technique. More important, their analytical argu-ments and derived formulas are noticeably more involved than the prescribed justification and methodology. Thus, it is of both practical value and theoretical interest to explicate the underlying properties of the two distinct interval procedures. But due to the complex nature of the interval estimation formulas under study, a complete analytical treatment is not possible. Hence, a detailed simulation study is performed next to evaluate and compare their accuracy under a variety of treatment effect configurations, heterogeneous variance pat-terns, and sample size allocation structures.

Numerical comparison of interval estimation procedures

To demonstrate the performance of the two alternative proce-dures under ANOVA settings, the following empirical exam-ination consists of two studies, of which the first one reexamines the interval estimation of weighted signal-to-noise ratio λ* for the three-group case in Kulinskaya and Staudte (2006), and the second study evaluates the confidence intervals of weighted eta-squaredη2* for the case of four groups that were not considered in Kulinskaya and Staudte.

First, we consider the model settings in Table 6 of Kulinskaya and Staudte (2006) with g03. Specifically, the sample sizes and error variances are chosen as (N1, N2, N3)0(10, 10, 10) and ( σ21; σ22; σ23) 0 (1, 1/2, 4), respectively. Moreover, four mean effect settings are con-sidered: (μ1, μ2, μ3)0(0, 1, 1), (0, 1, 2), (0, 1, 3), and (0, 1, 4), and the resulting weighted signal-to-noise ratio λ* values are 0.23, 0.36, 0.64 and 1.08, respectively. With the given sample sizes and parameter configurations, estimates of the true coverage probability are computed through Monte Carlo simulation of 10,000 independent data sets. For each replicate, the confidence limits associated with one-sided upper and lower 100(1−α/2)% confidence inter-vals are computed for both (1−α/2)0.95 and .975. These confidence limits are also employed to construct the

two-sided 90 % and 95 % confidence intervals. Accordingly, a total of six different sets of confidence intervals are obtained. Thus, our simulations cover a much broader range of situa-tions than those considered in Kulinskaya and Staudte, which examined only the performance of two-sided 95 % confidence intervals. In each case, the simulated coverage probability is the proportion of the 10,000 replicates whose intervals contain the population-weighted effect sizeλ*. The accuracy of the examined procedure is determined by the difference between the simulated coverage probability and the designated coverage probability as error0simulated cov-erage probability−nominal covcov-erage probability. In addition, the average interval width of λ* is also computed for the 10,000 replicated widths of both 90 % and 95 % two-sided confidence intervals. The simulated results of coverage prob-abilities, errors, and average widths for Kulinskaya and Staudte’s method and the suggested approach are presented in Table 1. For a concise visualization of these results, the simulated coverage probabilities of one-sided upper and lower 95 % confidence intervals and two-sided 90 % confi-dence intervals are plotted for the proposed approach and Kulinskaya and Staudte’s method in Figs.1 and 2, respec-tively. It appears that the discrepancy between simulated and nominal coverage probabilities of Kulinskaya and Staudte’s two-sided confidence intervals tend to decrease for largerλ*. Although this general pattern agrees with the findings of Kulinskaya and Staudte, the notable errors of the associated one-sided confidence intervals reveal that the results of their two-sided interval estimates remain problematic even for large values of λ*. Specifically, the simulated coverage

Weighted effect size λ

Coverage probability 1.08 0.64 0.36 0.23 0.90 U U U U L L L L T T T T Upper 95% CI Lower 95% CI Two-sided 90% CI 0.92 0.94 0.96 0.98 1.00

Fig. 1 Simulated coverage probabilities of the proposed confidence intervals

(9)

probability of their 90 % two-sided confidence interval is .8989 with error−.0011 for λ*01.08. But the resulting cov-erage probabilities of the upper and lower 95 % one-sided confidence intervals are .9199 and .9790 with substantial errors−.0301 and .0290, respectively. In addition, the best performance of the 95 % two-sided confidence intervals is associated withλ*00.64 and has a simulated coverage prob-ability of .9509, with error of .0009. In this case, the corresponding upper and lower 97.5 % one-sided confidence intervals incur the simulated coverage probabilities of .9575 and .9934, with sizable errors of−.0175 and .0184, respective-ly. Note that the confidence limits of the (1−α)% two-sided confidence interval are constructed with the respective lower and upper limit of the one-sided upper and lower (1−α/2)% confidence intervals. Thus, it is misleading to report that a two-sided interval procedure is accurate on the basis of a combina-tion of some noticeable under- and overestimated one-sided coverage probabilities. Consequently, a mere coverage proba-bility assessment of two-sided confidence intervals may ob-scure systematic overestimation in confidence limits that might have existed in the shifted and rescaled chi-square transforma-tion of Kulinskaya and Staudte. In contrast, the simulated coverage probabilities of the suggested one- and two-sided confidence intervals closely agree with the nominal confidence levels for all 24 combined cases in Table 1. Although the case of the upper 95 % confidence interval forλ*01.08 yields a coverage probability .9387 and indu-ces the largest error − .0113, this result still outperforms that of Kulinskaya and Staudte, which yields a coverage probability .9199 and error − .0301. Moreover, in terms

of the average widths of the simulated two-sided confi-dence intervals for the weighted signal-to-noise ratio λ*, it is apparent that the average width of the proposed ap-proach is consistently smaller than that computed by the method of Kulinskaya and Staudte for each of the eight combinations of two confidence levels (1−α) and four values of weighted effect size λ*.

To demonstrate that the previous contrasting behaviors between the two interval procedures continue to exist in other heteroscedastic ANOVA situations, further numerical investi-gations were conducted with a wide range of different model configurations. In the second study, we focus on the interval estimation of weighted eta-squaredη2* with g04 under both settings of heterogeneous variances (σ2

1; σ22; σ23; σ24)0 (1, 4, 9,

16) and homogeneous variances (σ2

1; σ22; σ23; σ24)0 (1, 1, 1, 1).

For sample size structures, three allocation schemes are exam-ined to represent diverse patterns: (N1, N2, N3, N4)0(15, 15, 15, 15), (6, 12, 18, 24), and (24, 18, 12, 6). These three settings not only include both balanced and unbalanced designs, but also create direct- and inverse-pairing with heteroscedastic struc-ture. Moreover, the three sample size allocation schemes are cross-combined with three different mean variability settings: (μ1,μ2,μ3,μ4)0{−1, 0, 0, 1}, {−3, −1, 1, 3}, and {−1, −1, 1, 1}. For ease of comparison, the actual mean structure is further modified as (μ1,μ2,μ3,μ4)/c with a constant c for adjustment so that the resulting weighted eta-squaredη2* remains the same asη2*01/6 (λ*01/5) for each case of a total of 18 different model configurations. Similar variance structures, mean vari-ability patterns, and sample size allocations were considered in Cohen (1988) and Tomarken and Serlin (1986). These combi-nations of model configurations are selected to reveal the extent of characteristics that are likely to be obtained in actual applications. General guidelines of design and implementation of Monte Carlo experiments can be found in Paxton, Curran, Bollen, Kirby, and Chen (2001). Similar to the implementation of the preceding examination, the simulated results of coverage probabilities, errors, and average widths for (1−α/2)% one-sided and (1−α)% two-sided confidence intervals for hetero-geneous variances (σ2

1; σ22; σ23; σ24)0 (1, 4, 9, 16) are presented

in Tables2and3forα0.10 and .05, respectively. In addition, Tables 4 and 5 contain the corresponding numerical results under homogeneous structure (σ2

1; σ22; σ23; σ24)0 (1, 1, 1, 1) for

α0.10 and .05, respectively.

According to the extensive numerical results in Tables2–5, the coverage probabilities of the proposed interval procedure maintain a close range near the nominal levels. Although some of the absolute errors are slightly larger than .01, the performance seems generally acceptable. It is noteworthy that the suggested approach is developed under the possibly un-equal variances assumption. In view of the stable and ade-quate performance of the resulting confidence intervals under both homogeneous and heterogeneous variance settings, the

Weighted effect size λ

Coverage probability 1.08 0.64 0.36 0.23 0.90 0.92 0.94 0.96 0.98 1.00 U U U U L L L L T T T T Upper 95% CI Lower 95% CI Two-sided 90% CI

Fig. 2 Simulated coverage probabilities of Kulinskaya and Staudte’s (2006) confidence intervals

(10)

proposed interval procedure has great potential usefulness in practical situations where the extent of underlying variance heterogeneity is rarely known and may be nearly trivial. Unfortunately, the interval method of Kulinskaya and Staudte (2006) does not provide satisfactory results, even though the coverage performance of some two-sided confidence intervals in Tables3and5are reasonably good. The sign and magni-tude of the errors associated with upper and lower one-sided confidence intervals show that the simulated coverage proba-bilities are consistently lower or higher than the nominal levels throughout Tables2–5. The poor performance implies that their approach fails to produce accurate confidence limits under most of the conditions examined here. Furthermore, all of the average widths of two-sided confidence intervals of the suggested interval procedure are less than those com-puted by the method of Kulinskaya and Staudte. Consequent-ly, the noncentrality inversion approach is recommended over the existing shifted and rescaled transformation of Kulinskaya and Staudte for its overall performance in the accuracy of coverage probability and the tightness of interval width corresponding to the nominal confidence level.

Sample size determination for precise interval estimation

With the emphasis on greater use of summary measure and confidence intervals in the sixth edition of the Publication Manual of the American Psychological Association (American Psychological Association,2009), it is prudent to facilitate this research practice by determining the necessary sample sizes to satisfy the desired precision of interval estimation in the plan-ning stage of research design. Hence, the sample size determi-nation for precise confidence intervals of the weighted eta-squared effect size is considered.

According to the formulation of the approximate non-central F distribution of Welch’s statistic and the application of the noncentrality inversion technique, an approximate 100 (1−α)% two-sided interval estimate bη2*

L; bη2*U

 

ofη2* can be computed from Equations13and 14with equal tail confi-dence probability, α10α20α/2. To ensure that the confi-dence interval is narrow enough to produce meaningful findings, researchers must recognize the stochastic nature of confidence intervals due to the inherent randomness in Welch’s statistic W and the degrees of freedom estimator bv. However, the property of the approximate degrees of free-dombv involves the joint consideration of g heterogeneous sample variances, and consequently, the complexity of the exact distribution ofbv can be overwhelming. To provide a feasible solution, the random feature ofbv is ignored in the proposed sample size calculations. This simplification is a small price to pay for developing a sample size framework that is informative and useful for precise interval estimation.

The empirical examinations presented later reveal that sampling fluctuations inbv are minimal and the associated effects may be negligible. Hence, the width of a confi-dence interval bη2*L ; bη2*U, denoted by H ¼bη2*U bη2*L, can be viewed as a function of the Welch statistic, degrees of freedom ν, weighted eta-squared η2*, sample sizes (N1, . . . , Ng), and confidence coefficient (1−α). Specifically, the approximate noncentral F distribution suggested in Levy (1978) is utilized to determine the sample sizes required to achieve the specified precision properties of a confidence interval. Two useful principles concerning the control of the expected width and the tolerance probability of the width within a preassigned value are presented here. First, it is necessary to determine the required sample size such that the expected width E[H] of a 100(1−α)% confidence interval bη2*

L; bη2*U

 

is within the given bound

E½H  b; ð15Þ

where b (>0) is a constant. Second, one may compute the sample size needed to guarantee, with a given tolerance prob-ability (1−γ), that the width H of a 100(1−α)% interval estimatebη2*L; bη2*Uwill not exceed the planned value

P Hf  wg  1  g; ð16Þ

whereω ( > 0) is a constant. Both the expectation E[H] and probability P{H≤ ω} are evaluated with respect to the ap-proximate distribution of W presented in Equation8.

For ease of numerical computation, the sample size allo-cation ratios (q1, . . . , qg) are rewritten as qi¼ ri P

g j¼1rj

,

where ri0Ni/N1for i01, . . . , g. Equivalently, ri0qi/q1, i01, . . . , g. Thus, with the initially specified sample size allocation ratios (q1, . . . , qg) or sample size ratios (r1, . . . , rg), the task is reduced to deciding the minimum sample size N1(with Ni0N1ri, i02, . . . , g) required to attain the desired precision level. With the computational formulas of expected width and tolerance probability in Equations15and16, the sample sizes (NEW1, . . . , NEWg) needed for the expected width of a 100(1−α)% two-sided confidence interval bη2*

L; bη2*U

 

to fall within the designated bound b are the minimum integers (N1, . . . , Ng)0N1(r1, . . . , rg) such that E[H]≤b. On the other hand, the sample size (NTP1, . . . , NTPg) required to guarantee with a given tolerance probability (1−γ) that the width of a 100(1−α)% two-sided confidence interval bη2*

L; bη2*U

 

will not exceed the planned rangeω are the smallest integers (N1, . . . , Ng)0N1(r1, . . . , rg) such that P Hf  wg  1  g . The computation of expected width and tolerance probability requires the numerical integration and noncentrality inversion with respect to a noncentral F probability distribution func-tion. To enhance the applicability of these sample size meth-odologies, supplementary SAS/IML (SAS Institute, 2011)

(11)

computer programs have been written to aid researchers with the suggested techniques, and empirical illustrations are pre-sented next to demonstrate their usefulness in sample size calculations.

Numerical investigation of sample size procedures

Due to the approximate nature of the suggested sample size procedures for precise interval estimation of the weighted eta-squared effect sizes, their features and performances need to be delineated and examined through numerical investigations. To demonstrate the sample size methodolo-gy, an empirical study was conducted in two steps. The first step involved extensive sample size calculations for the two precision measures of expected width and tolerance proba-bility across a wide range of model configurations. In the second step, a Monte Carlo simulation study was performed to provide insights into the precision behavior for the rec-ommended sample size formulas under the design character-istics specified in the first step.

Note that the determination of sample sizes needed for the chosen precision of the confidence interval procedures requires detailed specifications of the confidence level, sam-ple size allocation ratio, and the magnitudes of mean effects and variance components. To demonstrate the potential ex-tent of characteristics that an applied work may cover in heteroscedastic ANOVA research, a systematic numerical investigation of four-group design is conducted by fixing the confidence level (1−α)0.95 and heterogeneous error variances (σ2

1; σ22; σ23; σ24) 0 (1, 4, 9, 16) and varying the

other two factors of sample size allocation ratio and mean variability pattern for the selected magnitude of weighted eta-squared η2*0.15. Accordingly, to represent balanced and unbalanced patterns, three sample size allocation settings are considered: (q1, q2, q3, q4)0(1/4, 1/4, 1/4, 1/4), (1/10, 2/10, 3/10, 4/10), and (4/10, 3/10, 2/10, 1/ 10). As in the empirical illustration presented above, the sample size allocation schemes are cross-combined with three different mean spread settings: (μ1, μ2, μ3, μ4)0 {−1, 0, 0, 1}/c, {−3, −1, 1, 3}/c, and {−1, −1, 1, 1}/c. Note that different values of constant c are used for adjustment so that the weighted eta-squared is kept con-stant as η2*0.15 throughout this numerical study. More-over, the interval bound b0ω 0.3, and tolerance probability (1−γ)0.90 are selected for the two precision criteria of expected width and tolerance probability. These levels were selected to reflect common sample sizes used in typical research settings. Accordingly, the necessary sample sizes (NEW1, NEW2, NEW3, NEW4) and (NTP1, NTP2, NTP3, NTP4) are computed with respect to the selected precision requirements of expected width and of tolerance probability, respectively. The resulting

sample sizes are presented in Table 6 for all nine joint model configurations of varying sample size allocation and mean dispersion structure.

An inspection of the sample sizes reported in Table 6 shows that the computed sample sizes are identical for all three mean patterns when the sample size allocation ratio is fixed due to the restriction of a constant weighted eta-squared η2*0.15. Accordingly, the actual sample sizes un-der the expected width consiun-deration are (NEW1, NEW2, NEW3, NEW4)0(17, 17, 17, 17), (7, 14, 21, 28), and (32, 24, 16, 8) for the three sample size allocation settings (q1, q2, q3, q4)0(1/4, 1/4, 1/4, 1/4), (1/10, 2/10, 3/10, 4/10), and (4/10, 3/10, 2/10, 1/10), respectively. On the other hand, the corresponding sample sizes associated with the assurance of tolerance probability principle are (NTP1, NTP2, NTP3, NTP4)0(24, 24, 24, 24), (10, 20, 30, 40), and (48, 36, 24, 12) for the three sample size allocation schemes, respective-ly. Also, it is important to note that the total sample sizes, NT, of the balanced structure are less than those of the unbalanced structure for both types of interval precisions. The case with inverse pairing of heterogeneous variance and unbalanced allocation incurs the largest number of total sample sizes. Since the two precision criteria impose unique and distinct precision characteristics on the resulting confi-dence intervals, the required sample sizes are different. Although the results are not completely comparable, it typ-ically requires a larger sample size to meet the necessary precision of tolerance probability than the control of a designated expected width, as was noted in Kupper and Hafner (1989). More important, the sample size procedures and empirical results presented here enable researchers to better understand the underlying relationship that exists between the designated interval precision and the required sample size given the fundamental information of model configurations.

In the process of sample size determination, the attained precision levels associated with the listed sample sizes (NEW1, NEW2, NEW3, NEW4) and (NTP1, NTP2, NTP3, NTP4) should be less than or greater than the nominal level for width bound b0.3 and tolerance probability (1−γ)0.90, respectively. The achieved expected width E[H] and toler-ance probability P{H≤ω} computed with the approximate noncentral F distribution in Equation8are also summarized in Table 6. It appears that the resulting approximate expected widths .2941, .2905, and .2923 for the three sample size allocation schemes are marginally smaller than the selected width, b0.3. However, the approximate toler-ance probabilities .9170 of equal allocation ratio or baltoler-anced design are slightly greater than the nominal level .90. For the two unbalanced designs, the approximate tolerance proba-bilities are .9733 and .9593 for direct- and inverse-pairing of allocation ratio with heteroscedastic structure, respectively. It is conceivable that the substantial differences between the

(12)

actual tolerance probabilities and the target level (1−γ)0.90 are due to the underlying metric of integer sample sizes and the constraint of a designated sample size allocation ratio. Since it is not possible to compute exact expected width or tolerance probability with the specified sample sizes, we then evaluate the accuracy of the sample size calculations through the following Monte Carlo simulation study. Under the computed sample sizes, parameter configurations and precision settings described in Table6, estimates of the true expected width or tolerance probability are computed through Monte Carlo simulation of 10,000 independent data sets. For each replicate, the confidence limits and corresponding inter-val width of the two-sided 95 % confidence interinter-vals ofη2* are calculated. Then the simulated expected width is the mean of the 10,000 replicates of interval widths, whereas the simu-lated tolerance probability is the proportion of the 10,000 replicates whose values of interval width are less than or equal to the specified boundω0.3.

The adequacy of the sample size procedure for precise interval estimation is determined by one of the following formulas: error0simulated expected width−approximate expected width or error0simulated tolerance probability− approximate tolerance probability. Both the simulated and corresponding errors of expected width and tolerance prob-ability are also summarized in Table6. It can be seen from the results that the performance of the proposed approaches appears to be good for the range of model specifications considered here. In particular, the absolute errors of the expected width are less than .002 for the nine cases exam-ined here. Also, the absolute discrepancies in tolerance probability are smaller than .01, with the two exceptions of .0150 and .0115, associated with inverse pairing of het-erogeneous variance and unbalanced allocation. Overall, this empirical evidence demonstrates that the proposed sample size procedures provide feasible and accurate solutions to precise interval estimation of the weighted eta-squared under a wide variety of heteroscedastic model configurations.

Conclusions

To extend and fortify the use of effect sizes and associated confidence intervals in empirical studies, this article has focused on the interval estimation and sample size determi-nation for the weighted eta-squared effect sizes in one-way heteroscedastic ANOVA. Although existing studies have shown several interesting and fundamental results, this re-search contributes to the effect sizes literature by consider-ing three methodological issues with analytical and numerical expositions. First, in connection with the well-known signal-to-noise ratio and eta-squared effect sizes in the homoscedastic ANOVA framework, we have provided enhanced interpretations and supportive usages for the

notions of weighted effect size and weighted coefficient of determination in Kulinskaya and Staudte (2006), as the weighted signal-to-noise ratio and weighted eta-squared effect size within the extended context of heteroscedastic ANOVA. Second, for the interval estimation of weighted eta-squared, we have addressed the potential deficiencies of the shifted and rescaled chi-square transformation approach of Kulinskaya and Staudte and have proposed an improved procedure that has the advantages of theoretical justification, computational simplicity, and numerical performance over the existing method of Kulinskaya and Staudte. Third, the corresponding sample size procedures for precise interval estimation of weighted eta-squared have been developed for both the expected width and tolerance probability consider-ations. The performance of the suggested sample size cal-culations appears to be sufficiently accurate for practical purposes within the range of model specifications considered in the present article. Overall, the recommended methodology facilitates the advocated practice of confidence intervals for effect sizes, and it reinforces the potential usefulness of ANOVA models under heterogeneity of variance.

Author Note The authors thank the editor, Gregory Francis, and the two anonymous reviewers for their helpful comments that substantially improved the presentation. Correspondence concerning this article should be addressed to G. Shieh, Department of Management Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan 30050 (e-mail: gwshieh@mail.nctu.edu.tw).

References

Alhija, F. N. A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265.

American Psychological Association. (2009). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Bonett, D. G. (2008). Confidence intervals for standardized linear contrasts of means. Psychological Methods, 13, 99–109. Breaugh, J. A. (2003). Effect size estimation: Factors to consider and

mistakes to avoid. Journal of Management, 29, 79–97.

Casella, G., & Berger, R. L. (2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury.

Cohen, J. (1973). Eta-squared and partial eta squared in fixed factor ANOVA designs. Educational and Psychological Measurement, 33, 107–112.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practi-ces, 40, 532–538.

Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23, 89–105.

Fleishman, A. I. (1980). Confidence intervals for correlation ratios. Educational and Psychological Measurement, 40, 659–670.

(13)

Glass, G. V., & Hakstian, A. R. (1969). Measures of association in comparative experiments: Their development and interpretation. American Educational Research Journal, 6, 403–414.

Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology, 68, 155–165. Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and

problems in the appropriate conceptualization of effect size. Psy-chological Methods, 6, 135–146.

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum.

Haase, R. F. (1983). Classical and partial eta square in multifactor ANOVA designs. Educational and Psychological Measurement, 43, 35–39.

Harwell, M. R., Rubinstein, E. N., Hayes, W. S., & Olds, C. C. (1992). Summarizing Monte Carlo results in methodological research: The one and two-factor fixed effects ANOVA cases. Journal of Educational Statistics, 17, 315–339.

Hays, W. L. (1994). Statistics (5th ed.). Belmont, CA: Wadsworth. Kelley, K. (2007). Methods for the behavioral, educational, and social

sciences: An R package. Behavior Research Methods, 39, 979–984. Keppel, G. (1991). Design and analysis: A researcher’s handbook (3rd

ed.). Upper Saddle River, NJ: Prentice Hall.

Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Meth-ods, 13, 110–129.

Keselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., & Levin, J. R. (1998). Statistical practices of educa-tional researchers: An analysis of their ANOVA, MANOVA and ANCOVA analyses. Review of Educational Research, 68, 350–386. Kirk, R. (1996). Practical significance: A concept whose time has come.

Educational and Psychological Measurement, 56, 746–759. Kline, R. B. (2004). Beyond significance testing: Reforming data analysis

methods in behavioral research. Washington, DC: American Psy-chological Association.

Kulinskaya, E., & Staudte, R. G. (2006). Interval estimates of weighted effect sizes in the one-way heteroscedastic ANOVA. British Jour-nal of Mathematical and Statistical Psychology, 59, 97–111. Kulinskaya, E., Staudte, R. G., & Gao, H. (2003). Power

approxima-tions in testing for unequal means in a one-way ANOVA weighted for unequal variances. Communications in Statistics: Theory and Methods, 32, 2353–2371.

Kupper, L. L., & Hafner, K. B. (1989). How appropriate are popular sample size formulas? The American Statistician, 43, 101–105. Levine, T. R., & Hullett, C. R. (2002). Eta squared, partial eta squared,

and misreporting of effect size in communication research. Hu-man Communication Research, 28, 612–625.

Levy, K. J. (1978). Some empirical power results associated with Welch’s robust analysis of variance technique. Journal of Statis-tical Computation and Simulation, 8, 43–48.

Lix, L. M., & Keselman, H. J. (1995). Approximate degrees of free-dom tests: A unified perspective on testing for mean equality. Psychological Bulletin, 117, 547–560.

Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996). Consequences of assumption violations revisited: A quantitative review of alter-natives to the one-way analysis of variance F test. Review of Educational Research, 66, 579–620.

Maxwell, S. E., Camp, C. J., & Arvey, R. D. (1981). Measures of strength of association: A comparative examination. Journal of Applied Psychology, 66, 525–534.

Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Erlbaum.

Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estima-tion. Annual Review of Psychology, 59, 537–563.

Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.

Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297.

Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect sizes for some common research designs. Psychological Methods, 8, 434–447.

Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001). Monte Carlo experiments: Design and implementation. Structural Equation Modeling, 8, 287–312.

Pierce, C. A., Block, R. A., & Aguinis, H. (2004). Cautionary note on reporting eta-squared values from multifactor ANOVA designs. Educational and Psychological Measurement, 64, 916–924. Richardson, J. T. E. (1996). Measures of effect size. Behavior Research

Methods, Instruments, & Computers, 28, 12–22.

Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Re-search Review, 6, 135–147.

SAS Institute. (2011). SAS/IML user’s guide, version 9.3. Cary, NC: Author.

Smithson, M. (2001). Correct confidence intervals for various regres-sion effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psycholog-ical Measurements, 61, 605–632.

Steiger, J. H. (2004). Beyond the F test: Effect size confidence inter-vals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164–182.

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic jour-nals in education and psychology. Journal of Educational Psy-chology, 102, 989–1004.

Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentral-ity structures. Psychological Bulletin, 99, 90–99.

Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473–481. Welch, B. L. (1951). On the comparison of several mean values: An

數據

Table 1 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted signal-to-noise ratio λ* when σ 2
Table 3 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted eta squared η 2 * 0 1/6 when σ 2 1 ; σ 22 ; σ 23 ; σ 24  0 (1, 4, 9, 16), (N 1 , N 2 , N 3 , N 4 ) 0 (15, 15, 15,
Table 4 Simulated coverage probability, error, and average width of the approximate confidence intervals for weighted eta-squared η 2 * 0 1/6
Fig. 1 Simulated coverage probabilities of the proposed confidence intervals
+2

參考文獻

相關文件

substance) is matter that has distinct properties and a composition that does not vary from sample

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

Courtesy: Ned Wright’s Cosmology Page Burles, Nolette &amp; Turner, 1999?. Total Mass Density

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

incapable to extract any quantities from QCD, nor to tackle the most interesting physics, namely, the spontaneously chiral symmetry breaking and the color confinement.. 

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the