Determining Sample Sizes for Precise Contrast Analysis With Heterogeneous Variances

(1)

http://jebs.aera.net

Behavioral Statistics

Journal of Educational and

http://jeb.sagepub.com/content/39/2/91

The online version of this article can be found at:

DOI: 10.3102/1076998614523069

2014 39: 91

JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS

Show-Li Jan and Gwowen Shieh

Heterogeneous Variances

Determining Sample Sizes for Precise Contrast Analysis With

Published on behalf of

American Educational Research Association

and

http://www.sagepublications.com

found at:

can be

Journal of Educational and Behavioral Statistics

Additional services and information for

http://jebs.aera.net/alerts Email Alerts: http://jebs.aera.net/subscriptions Subscriptions: http://www.aera.net/reprints Reprints: http://www.aera.net/permissions Permissions:

- Mar 18, 2014

Version of Record

>>

(2)

Determining Sample Sizes for Precise Contrast

Analysis With Heterogeneous Variances

Show-Li Jan

Chung Yuan Christian University Gwowen Shieh

National Chiao Tung University

The analysis of variance (ANOVA) is one of the most frequently used statistical analyses in practical applications. Accordingly, the single and multiple comparison procedures are frequently applied to assess the differences among mean effects. However, the underlying assumption of homogeneous variances may not always be tenable. This study examines the sample size procedures for precise interval estimation of linear contrasts within the context of one-way het-eroscedastic ANOVA models. The desired precision of both individual and simultaneous confidence intervals is evaluated with respect to the control of expected half width and to the tolerance probability of interval half width within a designated value. Supplementary computer programs are developed to aid the usefulness and implementation of the proposed techniques. The suggested sample size procedures improve upon the existing approaches and extend the methodology development in the statistical literature.

Keywords: confidence interval, contrast, precision, sample size

Individual and multiple comparisons of mean effects in homoscedastic analysis of variance (ANOVA) models have received considerable attention in the literature. Accordingly, Bird (2004); Bretz, Hothorn, and Westfall (2010); Cumming (2012); Hahn and Meeker (1991); Hochberg and Tamhane (1987); Hsu (1996); Smithson (2003); Westfall, Tobias, Rom, Wolfinger, and Hochberg (2011); and the references therein provide an excellent and thorough account of the associated properties and explications for constructing confidence intervals in ANOVA and related models. Although the homogeneity of variance formulation provides a convenient and useful setup, it is not unusual for the homoscedasticity assumption to be violated in actual applications. Specifically, Fenstad (1983), Grissom (2000), and Wilcox (1987) emphasized that there are theoretical reasons to expect and empirical results to document the existence of heteroscedasticity is more common than most researchers realize. Therefore, it is prudent to

Journal of Educational and Behavioral Statistics 2014, Vol. 39, No. 2, pp. 91–116 DOI: 10.3102/1076998614523069 #_{2014 AERA. http://jebs.aera.net}

(3)

recommend employing suitable techniques that are superior to the traditional inferential methods under various conditions of heteroscedasticity.

The comparisons of mean effects require the formulation of a contrast, which is a linear combinations of population means and the coefficients of the means add up to zero. The difference between two group means or pairwise comparison is the simplest case of a linear contrast, whereas a complex comparison may involve several treatment means and designated coefficients in order to address theoretically and practically meaningful questions. Under the independence, normality, and homogeneity of variance assumptions in ANOVA, the inference for a linear contrast of mean effects can be conducted with a single degree of freedom F statistic or a t statistic. However, it is often desirable and sensible to perform multiple comparisons among means through a family of confidence intervals for contrasts to provide specific answers to critical research questions. Thus, it becomes necessary to consider simultaneous interval procedures that permit the family confidence coefficient to be controlled. Specifically, the Bonferroni procedure is useful when the number of comparisons to be investi-gated is identified in advance of the study. Whereas the Scheffe (1959), Tukey (1994), and Kramer (1956) methods are applicable for multiple comparisons of planned and post hoc contrasts and still maintain the desired overall confidence level of the joint confidence intervals. Furthermore, comprehensive guidelines and practical implications can be found in Kutner, Nachtsheim, Neter, and Li (2005) and Maxwell and Delaney (2004).

In order to take into account the heteroscedastic situations, particular empha-sis is devoted to the problem of mean comparisons when the population variances are unknown and cannot be assumed equal. As shown in Rossi (1975), the approximate t-solution of Welch (1947) can be readily applied to construct confidence intervals of linear contrasts involved more than two mean effects. The intrinsic notion is a generalization of the approach suggested independently by Satterthwaite (1946), Smith (1936), and Welch (1938) for the Behrens–Fisher problem of comparing the means of two populations. For pairwise and general multiple comparisons of means with unequal variances, Dunnett (1980) and Tamhane (1979) described several feasible procedures and compared their per-formance of confidence levels and interval widths by Monte Carlo simulation study. In view of the overall behavior and computational requirement, the six methods considered in Brown and Forsythe (1974), Dunnett (1980), Games and Howell (1976), Tamhane (1977), and Ury and Wiggins (1971) are potentially appropriate for practical applications.

The reporting of effect sizes and associated confidence intervals for primary results in all empirical social science research has been recommended in Wilkinson and American Psychological Association Task Force on Statistical Inference (1999), the American Educational Research Association Task Force on Reporting of Research Methods (2006), and the Publication Manual of the American Psychological Association (American Psychological Association,

(4)

2010). According to the editorial guidelines and methodological recommendations of several prominent educational and psychological journals, it is necessary to include some measures of effect size and confidence intervals in all research stud-ies (Alhija & Levy, 2009; Cohen, 1990, 1994; Dunst & Hamby, 2012; Fritz, Mor-ris, & Richler, 2012; Odgaard & Fowler, 2010; Sun, Pan, & Wang, 2010). Within the context of ANOVA, the use of effect sizes in conjunction with confidence intervals has been emphasized in Bird (2002); Levine, Weber, Park, and Hullett (2008); and Robey (2004). It is essential to note that a linear contrast between two or more means can be considered an effect size index in the individual and multiple comparison investigations. For advance research design planning, the methods for computing necessary sample sizes of desired confidence intervals of linear con-trasts for multiple comparison studies have been presented in Pan and Kupper (1999). However, it is important to note that their methods are confined to the homogeneous variance and balanced design. In view of the continued recommen-dation for the use of confidence intervals in all empirical studies, this study aims to expedite this research practice by presenting the sample size procedures for precise individual and simultaneous confidence intervals for single and multiple comparisons in fixed-effects heteroscedastic ANOVA designs.

Specifically, the individual comparison of contrasts and six renowned multiple comparison methods will be considered under the general framework of heteroge-neous variances and unbalanced structures. In addition, the desired precision of a confidence interval is assessed with respect to the control of expected half width and to the tolerance probability of interval half width within a designated value. Hence, the proposed sample size calculations for precise interval estimation are described in terms of two distinct features. One method gives the minimum sample size, such that the expected half widths of a family of confidence intervals are within the designated bounds. The other provides the sample size needed to guarantee, with a given tolerance probability, that the half widths of a family of confidence intervals will not exceed the planned ranges. The notion of expected half width for sample size calculations is frequently introduced in standard texts. However, considerable attention has focused on the criterion of tolerance probabil-ity of interval half width within a given value. For example, see Kelley, Maxwell, and Rausch (2003); Kupper and Hefner (1989); and Liu (2009) for related discussion in the context of estimating the mean difference between two normal populations with homoscedasticity. Consequently, this investigation updates and expands the current work in sample size determinations of confidence interval esti-mation for mean comparisons in ANOVA, especially the existing results in Kupper and Hafner (1989), Pan and Kupper (1999), Shieh and Jan (2012), and Wang and Kupper (1997). In addition, the computations of these procedures involve iterative algorithms not currently available in statistical packages, and hence, the computer codes are presented to facilitate the recommended approaches for computing the necessary sample sizes of linear contrast confidence intervals with designated precision in planning research designs.

(5)

Individual Comparisons of Means

Consider the one-way heteroscedastic ANOVA model in which the observa-tions Yijare assumed to be independent and normally distributed with expected values miand variances s2_i:

Yij Nðm_i;s2_iÞ; ð1Þ

where miand s2

i are unknown parameters, i¼ 1, . . . , g (2) and j ¼ 1, . . . , Ni. For

inference purposes of linear combinations of mean parameters, a contrast is defined as

c¼X

g

i¼1

cimi;

where ciare the contrast coefficients withP

g i¼1

ci¼ 0: It follows from the model

assumption in Equation 1 that a convenient unbiased contrast estimator bc for c is of the form b c¼X g i¼1 ciYi; where Y ¼P Ni j¼1

YijNiis the ith group sample mean and is an unbiased estimator

of mifor i¼ 1, . . . , g. Moreover, the linear estimator bc has the distribution

b c N ðc; SÞ; ð2Þ where S¼ VarðbcÞ ¼P g i¼1 c2s2

i=Ni. Also, an unbiased estimator bS of S can be

obtained by replacing the variance s2

i in S with its unbiased estimator Si2 as

follows: b S¼X g i¼1 c2_iS2_iNi; ð3Þ where S_i2¼P Ni j¼1 ðYji YiÞ2 .

ðNi 1Þ is the sample variance for i ¼ 1, . . . , g. Then

an approximate and useful pivotal quantity T for interval estimation of c can be expressed as

T¼cb c

b

S1=2 : ð4Þ

Due to the dependence of bS on the sample variancesðS2

1; . . . ;Sg2Þ, the exact

distribution of T is fairly complicated. Accordingly, it is of practical interest to consider feasible approximations. Assume X ¼P

g i¼1

aiXi, is a positive linear

(6)

Xi w2ðfiÞ are independent chi-square random variables with degrees of

free-dom fi for i ¼ 1, . . . , g. Using a chi-square approximation to the distribution of X, Rossi (1975) and Welch (1947) showed that

X _ x n w 2_ðnÞ; where x¼P g i¼1 aifi and n¼ ( Pg i¼1 aifi )2,( Pg i¼1 a2 ifi )

. It is well known that the sample variances S2

i are distributed independently of each other andðNi 1ÞSi2

s2

i w2ðNi 1Þ for i ¼ 1, . . . , g. Hence, bS has the approximate distribution

b S _ S n w 2_ðnÞ; _ð5Þ where S¼P g i¼1 c2 is2i Niand n¼ Pg i¼1 c2 is2i . Ni 2 Pg i¼1 c4 is4i½Ni2ðNi 1Þ . It readily follows from Equations 2 and 5 that the quantity T given in Equation 4 has a convenient approximate distribution

T _ tðnÞ;

where t(n) is a t distribution with degrees of freedom n. For inferential purposes, the term of degrees of freedom n is replaced by its counterpartbn with direct sub-stitution ofðS2 1; . . . ;S2gÞ for (s21; . . . ;s2g) in n, where ^ n¼ X g i¼1 c2_iS2_i.Ni 2,_Xg i¼1 c4_iS4_i.hN_i2ðNi 1Þ i : ð6Þ

Thus, the adjustment gives the following modified distribution

T _ tð^nÞ: ð7Þ

Accordingly, a 100 (1 a)% approximate two-sided confidence interval (L, U) for the contrast effect c can be constructed from Equation 7 where

L¼ bc t^n; a=2Sb1=2; U¼ bcþ t^n; a=2Sb1=2

and t^n; a=2is the upper 100 (a/2) percentile of the t distribution tð^nÞ. For ease of

presentation, the half width of the 100 (1 a)% two-sided confidence interval (L, U) is denoted by

H¼ t^n; a=2Sb1=2:

It is clear that the actual half-width H depends on the confidence coefficient 1 a, the sample sizes (N1, . . . , Ng), and variance estimatesðS2

1; . . . ;Sg2Þ.

(7)

For advance research design, it is desirable to determine the sample sizes required to achieve the designated precision properties of a confidence interval. Two useful principles concern the control of the expected half width and the tol-erance probability of the half width within a preassigned value. Specifically, it is necessary to determine the required sample sizes such that the expected half width of a 100 (1 a)% confidence interval is within the given bound

E½H d; ð8Þ

where the expectation E[H] is taken with respect to the joint distribution of ðS2

1; . . . ;Sg2Þ, and d (>0) is a constant. On the other hand, one may compute the

sample sizes needed to guarantee, with a given tolerance probability, that the half width of a 100 (1 a)% confidence interval will not exceed the planned value

PfH og 1 g; ð9Þ

where 1 g is the specified tolerance level and o (>0) is a constant.

Given the involved property in the variance estimator bS, it may be tempting to adapt a simplified approach to computing the expected half-width E[H] and tol-erance probability PfH og by employing the approximate distribution of bS given in Equation 5 and the straightforward simplification of t^n; a=2¼t_ n; a=2:

How-ever, an exact approach is considered here to provide more accurate results. For ease of explication, a detailed description of alternative formulation for H is pre-sented in Appendix A. With the distributional properties prepre-sented in Appendix A for K, ^n, and W, the evaluation of expected half-width E[H] in Equation 8 can be simplified as

E½H ¼ EK½K1=2 EB½t^n; a=2W1=2: ð10Þ

The expectation EK½K1=2 is taken with respect to the distribution of K, and it

follows from the standard result of a chi-square distribution with NT g degrees of freedom that EK½K1=2 ¼ 21=2 GfðNT g þ 1Þg

GfNT g=2g: On the

other hand, the expectation EB½t^n; a=2W1=2 is taken with respect to the joint

dis-tribution of (B1, . . . , Bg1) and does not permit a closed-form expression. Since the pseudo b random variable function is generally available in major statistical software packages, Monte Carlo integration approach is utilized to assess the actual value of EB½t^n; a=2W1=2. Similarly, for analytic clarity and computational

ease, the probability PfH og given in Equation 9 is expressed as

PfH og ¼ EB½FKfo2

ðt2

^

n; a=2WÞg; ð11Þ where FKf:g is the cumulative density function of K w2ðNT gÞ: Note that the

cumulative density function of a chi-square distribution and pseudo beta random number generating function are readily available in standard software systems. As in the case of expected half width, Monte Carlo integration method is used

(8)

to perform the required computation of tolerance probability PfH og in Equa-tion 11 with the current computing capabilities.

As there may be several possible choices of sample sizesðN1; . . . ;NgÞ that

sat-isfy the chosen precision criterion in the process of sample size calculations, it is constructive to consider an appropriate design with a priori designated sample size ratios that leads to a unique and optimal result. For ease of illustration, the sample size ratios ðr1; . . . ;rgÞ are specified in advance with ri¼ Ni=N1; and

consequently, the group allocation ratios qi¼ Ni=NT ¼ ri=P g j¼1

rj for

i¼ 1; . . . ; g: Thus, the process is confined to deciding the minimum sample size N1ðwith Ni¼ N1ri; i¼ 2; . . . ; gÞ required to achieve the selected precision

level with the computational formulas of expected half width and tolerance prob-ability in Equations 10 and 11, respectively. Specifically, the sample sizes ðNEW1; . . . ;NEWgÞ needed for the expected half width of a 100 (1 a)%

two-sided confidence interval (L, U) to fall within the designated bound d are the minimum integers ðN1; . . . ;NgÞ ¼ N1ðr1; . . . ;rgÞ such that E½H d: On the

other hand, the sample sizesðNTP1; . . . ;NTPgÞ required to guarantee with a given

tolerance probability (1 g) that the half width of a 100 (1 a)% two-sided confidence interval (L, U) will not exceed the planned range o are the smallest integersðN1; . . . ;NgÞ ¼ N1ðr1; . . . ;rgÞ such that PfH og 1 g:

The determinations of optimal sample sizes involve iterative algorithm not readily available in standard statistical packages and, therefore, require a special purpose computer program for performing the necessary computations. To enhance the applicability of these sample size techniques, supplementary SAS/ IML (SAS Institute, 2011) computer programs are developed to perform the extensive calculations. Moreover, a detailed simulation study is performed next to evaluate the accuracy of the suggested sample size procedures under a variety of model configurations.

Empirical Assessments of Sample Size Calculations for Individual Comparisons

Due to the theoretical complications of the suggested methodology for precise interval estimation of the contrasts under heteroscedastic ANOVA settings, the features and performances of the sample size procedures need to be delineated and examined through numerical investigations. Explicitly, the empirical exam-ination was conducted in two stages. The first stage presented sample size calcu-lations for the two precision measures of expected half width and tolerance probability under several model configurations. Then, Monte Carlo simulation was performed to demonstrate the precision behavior for the recommended sam-ple size formulas under the design characteristics specified in the first step. Jan and Shieh

(9)

Note that the determination of sample sizes needed for the chosen precision of the confidence interval procedure requires detailed specifications of the confidence level, magnitudes of variance components, contrast coefficients, and sample size ratios. For illustration, the bounds of the interval expected half-width criterion are chosen as d¼ 1 and 2, and the other precision assurance principle specified the tolerance probability and interval half bound as 1 – g¼ 0.90 and o¼ 1 and 2, respectively. The confidence level is fixed as 1 – a ¼ 0.95 through-out this numerical study. Moreover, we focus on the situation of g¼ 4 with the heterogeneous variancesðs2

1;s22;s23;s23Þ ¼ ð1; 4; 9; 16Þ and the contrast

coeffi-cients ðc1;c2;c3;c4Þ ¼ ð1; 1=3; 1=3; 1=3Þ: To represent balanced and

unbalanced patterns, three different settings of sample size ratios are considered: ðr1;r2;r3;r4Þ ¼ ð1; 2; 3; 4Þ; ð1; 1; 1; 1Þ, and ð4; 3; 2; 1Þ. The designated

frameworks basically follow those in Jan and Shieh (2014) and Tomarken and Serlin (1986) with some modifications for the purpose of interval estimation rather than hypothesis testing. More important, the combined configurations were chosen to give a wide range of sample size settings so that they not only provide practically useful implications but also serve as a benchmark to demon-strate the robustness of the proposed sample size procedures. Accordingly, the necessary sample sizesðNEW1;NEW2;NEW3;NEW4Þ and ðNTP1;NTP2;NTP3;NTP4Þ

are computed with respect to the selected precision requirements of expected half width and of tolerance probability, respectively. The resulting sample sizes are presented in Table 1 for all six joint model configurations of two varying interval half bounds and three different sample size ratio settings.

In particular, when d¼ o ¼ 1; the computed sample sizes under the expected half-width consideration areðNEW1;NEW2;NEW3;NEW4Þ ¼ð9; 18; 27; 36Þ; ð17;

17; 17; 17Þ; and ð48; 36; 24; 12Þ for the three sample size ratio structures (r1, r2, r3, r4)¼ (1, 2, 3, 4), (1, 1, 1, 1), and (4, 3, 2, 1), respectively. Alternatively, the corresponding sample sizes associated with the tolerance probability criterion are (NTP1, NTP2, NTP3, NTP4)¼ (12, 24, 36, 48), (21, 21, 21, 21), and (64, 48, 32, 16) for the three sets of sample size ratios, respectively. From a practical standpoint, the total sample sizes, NT, of the balanced structure are less than those of the unba-lanced structure for both types of interval precisions. Conversely, the case with inverse pairing of heterogeneous varianceðs2

1;s22; s23;s24Þ ¼ð1; 4; 9; 16Þ and

sample size ratio (r1, r2, r3, r4)¼ (4, 3, 2, 1) incurs the largest number of total sam-ple size. As expected, the same phenomenon continues to exist for larger interval half bounds d¼ o ¼ 2. However, the required sample sizes for d ¼ o ¼ 2 are comparatively smaller than those for d ¼ o ¼ 1 for both precision principles. Specifically, the reported sample sizes when d¼ o ¼ 2 are (NEW1, NEW2, NEW3, NEW4)¼ (4, 8, 12, 16), (5, 5, 5, 5), and (16, 12, 8, 4), and (NTP1, NTP2, NTP3, NTP4) ¼ (5, 10, 15, 20), (7, 7, 7, 7), and (24, 18, 12, 6) for the three distinct sample size ratio setups. Moreover, it is prudent to note that the two precision criteria impose unique and distinct precision characteristics on the confidence intervals of linear

(10)

TABLE 1 Computed Sample Size, Expected Half Width, and Tolerance Probability of the Proposed Approaches for 95 % Two-Sided Confidence Interval of Contrast c With Interval Half-Bound d ¼ o ¼ 1 and 2, and Tolerance Probability 1 g ¼ 0.90, When (s 2 1;s 2 2;s 2 3;s 2 4) ¼ (1 ; 4 ; 9 ; 16 ) and (c1 ; c2 ; c3 ; c4 ) ¼ (1 ; 1 = 3 ; 1 = 3 ; 1 = 3 ) Bound Expected Half Width Tolerance Probability Sample Sizes Simulated E [H ] Attained E [H ] Relative _Error (% ) Sample Sizes Simulated Pf H < o g Attained _PfH < o g Relative _Error (% ) d ¼ o ¼ 1 (9, 18, 27, 36) 0.9582 0.9573 0.0891 (12, 24, 36, 48) 0.9542 0.9539 0.0271 (17, 17, 17, 17) 0.9968 0.9968 0.0007 (21, 21, 21, 21) 0.9133 0.9125 0.0885 (48, 36, 24, 12) 0.9625 0.9633 0.0897 (64, 48, 32, 16) 0.9365 0.9377 0.1257 d ¼ o ¼ 2 (3, 6, 9, 12) 1.9100 1.9074 0.1359 (5, 10, 15, 20) 0.9649 0.9706 0.5928 (5, 5, 5, 5) 1.9984 1.9967 0.0826 (7, 7, 7, 7) 0.9171 0.9199 0.3090 (16, 12, 8, 4) 1.9086 1.9102 0.0834 (24, 18, 12, 6) 0.9251 0.9283 0.3479

(11)

contrast and lead to fundamentally different magnitudes of desired sample sizes. According to the numerical assessment, it often requires a larger sample size to meet the necessary precision of tolerance probability than the control of a desig-nated expected half width. The pattern of results between the two precision prin-ciples is similar to those reported in Kupper and Hafner (1989) and Shieh and Jan (2012). In the process of sample size calculations, the obtained precision levels associated with the reported sample sizes (NEW1, NEW2, NEW3, NEW4) and (NTP1, NTP2, NTP3, NTP4) should be less than or greater than the target value of interval half bound and tolerance probability, respectively. The actually achieved values of exact expected half-width E[H] and tolerance probability PfH og are also summarized in Table 1. The precision differences between the actual level and the nominal value are due to the underlying metric of integer sample sizes and the con-straint of a designated sample size allocation ratio.

As mentioned earlier, one may attempt to simplify the distribution of interval half width as H _ðtn; a=2S1=2=n1=2Þfw2ðnÞg1=2, where n is given in Equation 5.

Consequently, the simple approximation gives

E½H _¼ ½tn; a=2S1=221=2Gfðn þ 1Þ=2g=½n1=2 G n=2f g; ð12Þ and P H < of g _¼ FK n o2. t_{n; a=2}2 S n o ; ð13Þ

where FK*f g is the cumulative density function of K_w2_{ðnÞ. The two}

expres-sions in Equations 12 and 13 provide alternative formulas to compute the optimal sample sizes for precise interval estimation of contrast effects. For the prescribed design configurations along with three sample size ratio settings, the computed sample sizes are summarized in Table 2 where the resulting sample sizes (NEW1, NEW2, NEW3, NEW4) for the expected half-width consideration are (9, 18, 27, 36), (18, 18, 18, 18), and (44, 33, 22, 11) for d¼ 1, and (4, 8, 12, 16), (7, 7, 7, 7), and (20, 15, 10, 5) for d¼ 2. Also, the required sample sizes (NTP1, NTP2, NTP3, NTP4) associated with tolerance probability are (11, 22, 33, 44), (22, 22, 22, 22), and (56, 42, 28, 14) for d ¼ 1, and (5, 10, 15, 20), (10, 10, 10, 10), and (24, 18, 12, 6) for d¼ 2. Although the simplified method gives the identical result with the proposed procedure for two of the six sets of sample sizes in both cases of precision criteria, the two formulations generally produce distinct behaviors and the calculated sample sizes can be substantially different in some cases. The attained precision levels of the approximate expected half width and approximate tolerance probability computed with Equations 12 and 13 are also presented in Table 2. More important, the resulting approximate precision outcomes differ from the exact expected half width and exact tolerance probability calculated with the recommended Equations 10 and 11, respectively. The adequacy and

(12)

TABLE 2 Computed Sample Size, Expected Half Width and Tolerance Probability of the Simplified Methods for 95 % Two-Sided Confidence Interval of Contrast c With Interval Half-Bound d ¼ o ¼ 1 and 2, and Tolerance Probability 1 g ¼ 0.90, When (s 2 1;s 2 2;s 2 3;s 2 4) ¼ (1 ; 4 ; 9 ; 16 ) and (c1 ; c2 ; c3 ; c4 ) ¼ (1 ; 1 = 3 ; 1 = 3 ; 1 = 3 ) Bound Expected Half Width Tolerance Probability Sample Sizes Simulated E [H ] Attained E [H ] Relative _Error (% ) Sample Sizes Simulated Pf H < o g Attained _PfH < o g Relative _Error (% ) d ¼ o ¼ 1 (8, 16, 24, 32) 1.0189 0.9846 3.3616 (9, 18, 27, 36) 0.6565 0.9620 46.5323 (17, 17, 17, 17) 0.9955 0.9790 1.6562 (18, 18, 18, 18) 0.6521 0.9437 44.7155 (44, 33, 22, 11) 1.0128 0.9719 4.0328 (48, 36, 24, 12) 0.6232 0.9396 50.7727 d ¼ o ¼ 2 (3, 6, 9, 12) 1.9174 1.6736 12.7130 (4, 8, 12, 16) 0.8605 0.9999 16.1962 (5, 5, 5, 5) 2.0046 1.8581 7.3077 (6, 6, 6, 6) 0.7710 0.9655 25.2324 (16, 12, 8, 4) 1.9018 1.6870 11.2959 (20, 15, 10, 5) 0.8054 0.9967 23.7511

(13)

discrepancy of the competing techniques are further evaluated through the fol-lowing Monte Carlo simulation.

With the designated configurations and respective sample sizes for the exact and approximate methods listed in Tables 1 and 2, estimates of the true interval half width and tolerance probability are computed through Monte Carlo simula-tion of 10,000 independent data sets. For each replicate, the confidence limits and corresponding interval half width of the two-sided 95% confidence intervals of linear contrast are calculated. Then the simulated expected half width is the mean of the 10,000 replicates of interval half widths, whereas the simulated tolerance probability is the proportion of the 10,000 replicates whose values of interval half width are less than or equal to the specified bound. The adequacy of the sample size procedure for precise interval estimation is determined by one of the follow-ing formulas: relative error ¼ (simulated expected half width attained expected half width)/simulated expected half width or relative error¼ (simulated tolerance probability attained tolerance probability)/simulated tolerance probability. Both the simulated values of expected half width and tolerance prob-ability and the corresponding percentage of relative errors are summarized in Tables 1 and 2. According to the numerical results in Tables 1 and 2, the preci-sion performance of the proposed sample size procedures maintains a close range near the nominal levels. Specifically, all the six absolute relative errors of the expected half width are less than 1%, and the absolute relative differences of tolerance probability have a maximum of 0.5928%. It can be seen that the per-formance of the proposed sample size procedures is fairly good for the range of model specifications considered here. However, the discrepancies between simulated half width and approximate half width in Table 2 indicate that the sim-plified method is not sufficiently accurate because the resulting relative errors range from 1.6562% to 12.7130%. Moreover, the other results of tolerance prob-ability are not satisfactory either because the corresponding relative errors have a wide range of 16.1962% to 50.7727%. In view of these numerical evalua-tions, we conclude that the proposed procedures outperform the simplified meth-ods in sample size calculations for precise interval estimation of contrast effects.

Multiple Comparisons of Means

The examination and methodology of individual comparisons of means are extended in this section to the general context of multiple comparisons involving a family of linear contrasts. Assume it is desirable to estimate L linear contrasts of the means denoted by

c_l¼X

g

i¼1

climi;

(14)

b

c_l¼X

g

i¼1

cliYi;

where cliare the contrast coefficients withP

g i¼1

cli¼ 0 for l ¼ 1, . . . , L. In order to

control the overall confidence level for the family of simultaneous confidence intervalsfðLl;UlÞ; l¼ 1; . . . ; Lg, the construction of proper procedures is more

involved than the situation of an individual interval estimation. Notably, several distinct and useful methods have been studied in Brown and Forsythe (1974), Dunnett (1980), Games and Howell (1976), Tamhane (1977), and Ury and Wig-gins (1971). In particular, a total of six procedures are considered here for their unique feature and desirable property. The approach of Brown and Forsythe is applicable when the researcher is interested in multiple comparisons of complex linear contrasts, and the other five methods are only appropriate for multiple comparisons of pairwise mean differences.

The previous variance estimator and corresponding distribution of individual contrast are modified as bSl¼P

g i¼1 c2 liSi2=Ni, and d X l _ Sl nl w2_ðn lÞ; where bSl¼P g i¼1 c2 lis2i=Ni and nl¼ P g i¼1 c2 lis2i=Ni 2 = P g i¼1 c4 lis4i=Ni2ðNi 1Þ

for l¼ 1, . . . , L. The resulting confidence interval (Ll, Ul) of clis of the form

Ll¼ bcl Q^nl; abS

1=2

l and Ul¼ bclþ Q^nl; abS

1=2 l ;

where the critical value Q^nl; ais suitably chosen to approximate the desired joint

confidence level 1 a, and the estimated degrees of freedom is

^ nl¼ Xg i¼1 c2_liS_i2=Ni 2,(_Xg i¼1 c4_liS_i4= N_i2ðNi 1Þ ;

for l¼ 1, . . . , L. The half width of the two-sided confidence interval (Ll, Ul) is denoted by

Hl¼ Q^nl; a^S

1=2 l :

In the particular case of pairwise multiple comparisons, the contrast coeffi-cients are all zero except that cli ¼ 1 and cli0¼ 1; 1 i<i0 g; for

l¼ 1; . . . ; L ¼ gðg 1Þ=2. Thus, the expressions of bSl and ^nl can be

specifi-cally simplified as bSl¼ fSi2=Niþ Si20=Ni0g and ^n_l¼ fS2_i=N_iþ S_i20=Ni0g2=fS4_i=

½N2

iðNi 1Þ þ Si40=½N_i20ðNi0 1Þg, respectively.

The following six multiple comparison procedures for choosing Q^nl;a are

considered: Jan and Shieh

(15)

1. The approximate confidence intervals of all contrasts proposed in Brown and Forsythe (1974) have

Q^nl; a¼ fðg 1ÞFg1;^nl; ag

1=2

; ð14Þ

where Fg1;^nl; a is the upper a quantile of an F distribution with degrees of

freedom g 1 and ^nl.

2. The approximate confidence intervals for all pairwise differences suggested in

Ury and Wiggins (1971) use

Q^nl; a¼ t^nl;aB and aB¼ a=ð2LÞ: ð15Þ

3. Games and Howell (1976) proposed the following expression

Q^nl;a¼ qg; ^nl;a=2

1=2_; _ð16Þ

where qg ^nl;a denotes the upper a quantile of the studentized range distribution

with parameters g and ^nl

4. Tamhane (1977) employed the critical value

Q^nl;a¼ t^nl;aT and aT¼ 1 ð1 aÞ

1=L

n o.

2: ð17Þ

5. Dunnett (1980) modified the notion of Cochran (1964) and suggested the

desig-nated quantity Q^nl;a¼ q g;^nl;a=2 1=2_; _ð18Þ where q_g;^_n l;a¼ fqg; Ni1; aS 2 i=Niþ qg; Ni1; aS 2 i0=Ni0g=fS_i2=N_iþ S2_i0=Ni0g.

6. Dunnett (1980) also considered the alternative form

Q^nl;a¼ mL; ^nl;a; ð19Þ

where mL; ^nl;adenotes the upper a quantile of the studentized maximum modulus

distribution with parameters L and ^nl.

To enhance the applicability of these multiple comparison procedures, we apply the expected half width and tolerance probability criteria to determine the required sample sizes for precise interval estimation of both general and pairwise contrasts. It is essential to note that the aforementioned critical values given in Equations 14 through 19 differ among all contrasts. However, Pan and Kupper (1999) focused on the multiple comparison methods under the homogeneous var-iance and balanced design, and the corresponding critical values remain identical for the whole family of contrasts. Consequently, the underlying properties of the resulting interval half widths in Pan and Kupper (1999) are relatively simpler than those within the context of heteroscedastic and unbalanced settings. Because of the complex nature of the interval half widths, complete analytical assess-ments of the joint properties of expected half width and tolerance probability are

(16)

not feasible. Alternative arguments and simplifications are developed to circum-vent theoretical difficulties and permit useful applications.

In view of the critical implications of desired precision for the resulting confidence intervals, the notion of expected half width for sample size calcula-tions is frequently introduced in standard texts. For multiple comparisons, it is necessary to determine the required sample sizesfNEW1; . . . ; NEWgg such that

all the expected half widths of the simultaneous confidence intervals fðLl;UlÞ; l ¼ 1; . . . ; Lg for contrast effects are within the given bound

E½Hl dl, where dl (>0) are the designated bounds for l¼ 1, . . . , L. Then it

is equivalent to consider that

max

1lLfE½Hl=dlg 1: ð20Þ Unfortunately, the appraisal of expected half-width E½Hl=dl cannot be solved

analytically. It is shown in the Appendix B that the condition given in Equation 20 can be alternatively evaluated by

E½Hl d_l; ð21Þ

where l* is the value l so that Dl¼ N11=2

P1=2

l =dlattains the maximum for l¼ 1,

. . . , L. Therefore, the alternative condition given in Equation 21 permits an enor-mous simplification of the joint appraisal of several bounded interval half widths and the previous approach to compute the exact expected half-width E[H] in Equation 10 of individual confidence interval can be readily applied to calculate the exact value E[Hl*]. More importantly, with the initially specified sample size ratios (r1, . . . , rg), the sample sizesðNEW1; . . . ;NEWgÞ ¼ N1ðr1; . . . ; rgÞ required

to ensure E½Hl dl;l¼ 1; . . . ; L, can be determined by the minimum integer

N1such that E½Hl d_l.

In addition, the criterion of tolerance probability of interval half width within a given value is of special interest. Hence, it is desirable to find the sample sizesfNTP1; . . . ;NTPgg needed to guarantee, with a given joint tolerance

probability 1 g, that the half widths of the simultaneous confidence intervals fðLl;UlÞ; l ¼ 1; . . . ; Lg for contrast effects will not exceed the planned range

PfHl ol; l¼ 1; . . . ; Lg 1 g; ð22Þ

where ol(> 0) are the designated bounds for l¼ 1, . . . , L. To provide a feasible solution for the joint tolerance probability, an approximate expression is shown in Appendix C for the condition given in Equation 22

PfHl o_lg 1 g; ð23Þ

where l* is the value l so that Ol¼ N11=2

P1=2

l =ol attains the maximum for l¼

1, . . . , L. Notably, the alternative formulation in Equation 23 affords a major simplification of the combined evaluation of several interval half widths and the Jan and Shieh

(17)

prescribed method to compute the exact tolerance probability PfH og in Equa-tion 11 of individual confidence interval can be immediately applied to compute the exact value PfHl o_lg. In short, for the formerly designated sample size

ratios (r1, . . . , rg), the sample sizes (NTP1, . . . , NTPg)¼ NTP1(r1, . . . , rg) required to ensure the joint tolerance probability PfHl ol; l¼ 1; . . . ; Lg 1 g, is

computed by the minimum integer N1so that PfHl o_lg 1 g.

For the special case of multiple comparisons pertaining to the pairwise con-trasts of each different treatment to a control, the three approaches of Ury and Wiggins (1971), Tamhane (1977), and Dunnett (1980) given in Equations 15, 17, and 19 can be readily modified with L¼ g 1 to construct the simultaneous confidence intervals. The corresponding sample size determinations can also be conducted with the suggested methods. For practical applications, computer algorithms are required for performing the sample size calculations so that the simultaneous confidence intervals of the six multiple comparison procedures will attain the desired precision. Empirical illustrations are presented next to demon-strate the usefulness and accuracy of the proposed sample size procedures and supplementary SAS/IML (SAS Institute, 2011) computing algorithms.

Empirical Assessments of Sample Size Calculations for Multiple Comparisons

The similarities and differences among the proposed sample size procedures for multiple comparisons are demonstrated here through the model formulations in the previous illustration of individual comparisons. In this case, we address the sample size problem for the family of simultaneous confidence intervals for pair-wise multiple comparisons. A systematic numerical investigation of four-group heteroscedastic ANOVA is conducted by fixing the confidence level 1 a ¼ 0.95 and heterogeneous error variances ðs2

1;s22;s23;s24Þ ¼ ð1; 4; 9 ; 16Þ, and

varying the sample size allocation ratio: (1, 2, 3, 4), (1, 1, 1, 1), and (4, 3, 2, 1). Similar to the implementation of the preceding examination, this empirical study includes sample size calculation and Monte Carlo simulation.

For the designated multiple comparison procedure to ensure all six two-sided confidence intervals of pairwise mean differences have expected half widths within the bound dl ¼ d ¼ 2, the necessary sample sizes (NEW1, NEW2, NEW3, NEW4) computed with suggested technique are summarized in Table 3. Moreover, the sample sizes (NTP1, . . . , NTP4) are also presented when it is required to guaran-tee, with a given tolerance probability 1 g ¼ 0.90, that the half widths of all six two-sided confidence intervals for pairwise mean differences will not exceed the planned range ol¼ o ¼ 2. It should be clear from the optimal sample sizes pre-sented in Table 3 that the required sample sizes under the tolerance probability consideration are still larger than those for the expected half-width criterion for all six multiple comparison procedures. Also, it is interesting to note that the com-puted sample sizes for the three methods of Ury and Wiggins (1971), Tamhane

(18)

TABLE 3 Computed Sample Size, Expected Half Width and Tolerance Probability of the Proposed Approaches for Simultaneous 95 % Two-Sided Confidence Intervals of Pairwise Contrasts With Interval Half-bound d ¼ o ¼ 2 and Tolerance Probability 1 g ¼ 0.90, When (s 2 1;s 2 2;s 2 3;s 2 4) ¼ (1 ; 4 ; 9 ; 16 ) Procedure Expected Half Width Tolerance probability Sample Sizes Simulated Maximum E [H ] Approximate Maximum E [H ] Relative _Error (% ) Sample Sizes Simulated Joint P f H < o g Approximate _Joint P f H < o g Relative _Error (% ) Brown and Forsythe (15, 30, 45, 60) 1.9361 1.9367 0.0309 (17, 34, 51, 68) 0.9318 0.9367 0.5211 (51, 51, 51, 51) 1.9894 1.9880 0.0691 (60, 60, 60, 60) 0.9203 0.9126 0.8386 (168, 126, 84, 42) 1.9975 1.9957 0.0876 (204, 153, 102, 51) 0.9041 0.9066 0.2801 Ury and Wiggins (13, 26, 39, 52) 1.9762 1.9769 0.0387 (15, 30, 45, 60) 0.9002 0.9020 0.2022 (46, 46, 46, 46) 1.9871 1.9860 0.0516 (54, 54, 54, 54) 0.8984 0.9028 0.4856 (152, 114, 76, 38) 1.9977 1.9939 0.1921 (188, 141, 94, 47) 0.9202 0.9204 0.0199 Game and Howell (12, 24, 36, 48) 1.9973 1.9980 0.0333 (15, 30, 45, 60) 0.9597 0.9608 0.1157 (43, 43, 43, 43) 1.9980 1.9942 0.1921 (52, 52, 52, 52) 0.9264 0.9246 0.1961 (144, 108, 72, 36) 1.9878 1.9870 0.0399 (176, 132, 88, 44) 0.9032 0.9067 0.3841 Tamhane (13, 26, 39, 52) 1.9706 1.9713 0.0387 (15, 30, 45, 60) 0.9090 0.9093 0.0303 (46, 46, 46, 46) 1.9814 1.9804 0.0516 (54, 54, 54, 54) 0.9067 0.9097 0.3316 (152, 114, 76, 38) 1.9919 1.9881 0.1921 (188, 141, 94, 47) 0.9258 0.9256 0.0240 Dunnett and Cochran (12, 24, 36, 48) 1.9509 1.9525 0.0830 (14, 28, 42, 56) 0.9232 0.9270 0.4151 (45, 45, 45, 45) 1.9836 1.9843 0.0346 (53, 53, 53, 53) 0.9094 0.9157 0.6902 (152, 114, 76, 38) 1.9854 1.9845 0.0478 (184, 138, 92, 46) 0.9218 0.9183 0.3748 Dunnett (13, 26, 39, 52) 1.9684 1.9682 0.0089 (15, 30, 45, 60) 0.9113 0.9127 0.1460 (46, 46, 46, 46) 1.9791 1.9774 0.0869 (54, 54, 54, 54) 0.9098 0.9136 0.4124 (152, 114, 76, 38) 1.9885 1.9848 0.1858 (184, 138, 92, 46) 0.9037 0.9074 0.4074

(19)

(1977), and Dunnett (1980) given in Equations 15, 17, and 19 provide almost the identical results for all six combined cases of different sample size ratio and pre-cision principle. Also, Brown and Forsythe’s (1974) procedure appears to require the largest sample sizes, while the method of Games and Howell (1976) tends to give the least sample sizes for the model configurations considered here. Also, the sample sizes associated with the Dunnett’s (1980) modified approach of Cochran (1964) are slightly smaller than those of the three procedures of Ury and Wiggins (1971), Tamhane (1977), and Dunnett (1980).

For ease of exposition, Table 3 specifically shows the corresponding approxi-mate maximum expected half width and approxiapproxi-mate joint tolerance probability for all design settings. These values are compared with the respective simulated maximum expected half width and simulated joint tolerance probability obtained from Monte Carlo simulation. With the design configurations and respective sam-ple sizes for the multisam-ple comparison methods, the simulated maximum expected half width is the largest value of the six means of the 10,000 replicates of interval half widths, whereas the simulated joint tolerance probability is the proportion of the 10,000 replicates whose values of all six interval half width are less than or equal to the specified bound. Consequently, the adequacy of the suggested sample size procedures for precise interval estimation is determined by the discrepancy between the nominal levels of simulated maximum half width and approximate maximum half width, or the difference between the simulated joint tolerance prob-ability and approximate joint tolerance probprob-ability. Accordingly, the simulated results and corresponding relative errors listed in Table 3 clearly show that the pro-posed sample size formulas perform extremely well because all absolute relative errors are less than 0.01 for the 36 cases examined here.

Note that the computations of the quantiles of the studentized range distribution and the studentized maximum modulus distribution require a special function such as the SAS PROBMC function which may not be readily available in other software systems. To ease the burden of the extensive and iterative process in sample size determinations, the computations of quantile values qg;^nl;a;q

g;^nl;a, and mL;^nl;a in Equations 16, 18, and 19, respectively, can be

simplified by using the respective values qg;^nl;a;q

g;^nl;a, and mL;^nl;a with the

degrees of freedom ^n being replaced by its parameter counterpart n. As explained in the theoretical derivations, the discrepancy should be negligible for moder-ately large degrees of freedom. More important, our numerical results demon-strate the modification leads to a substantially more efficient algorithm and also maintain sensible accuracy. It is prudent to examine the behavior of the suggested techniques in a variety of other situations. However, these empirical evidences demonstrate that the proposed sample size procedures provide feasible and accurate solutions to precise simultaneous confidence interval estimation of the six multiple comparison procedures under a wide variety of heteroscedastic model configurations.

(20)

Numerical Illustrations

To demonstrate the features of the suggested procedures in sample size plan-ning, the National Assessment Educational Progress educational achievement data considered in Williams, Jones, and Tukey (1999) is used as an example. Specifically, the tabulated values shown in their Table 3 represent the means and standard errors of the eighth-grade mathematics proficiency changes between 1990 and 1992 for the 34 states. However, unlike the demonstration of alternative hypothesis testing procedures in Williams et al., we focus on the sample size calculations for interval estimation of the differences in achieve-ment changes between the eight states in the northeast region. First, the individ-ual comparison may take up the scenario to compare the outcomes of New Jersey and average of the other seven states in the region. To illustrate sample size determination for design planning, the reported summary statistics are modified as population mean change and standard deviation parameters. Because the sample standard deviations are not available, for the sake of expli-cation, the standard deviations are set as 5 times of the sample standard errors. In the following sample size calculations, the mean changes and associated standard deviations are m¼ (1.565, 1.374, 3.399, 4.893, 4.303, 3.204, 4.422, 5.097) and s¼ 5 (1.927, 1.347, 1.923, 2.532, 2.205, 1.534, 1.354, 0.948) for the states of New Jersey, Delaware, Maryland, New York, Pennsylvania, Con-necticut, New Hampshire, and Rhode Island, respectively. With the additional settings of confidence coefficient 1 a ¼ 0.95, interval half-width bounds d ¼ o¼ 2.5, tolerance level 1 g ¼ 0.90, the computed sample sizes for balanced design are 66 and 78 for each group under the expected half width and tolerance probability criterion, respectively. The actual specifications of these configura-tions are incorporated in the SAS/IML programs presented as the supplemen-tary files.

Second, to ensure all pairwise confidence intervals between the achievement changes of the eight states are narrow enough to yield meaningful precision, the necessary sample sizes can be calculated with the developed algorithms for the six different multiple comparison procedures described earlier. Using the previously mentioned model configurations, the required sample sizes to meet the desired expected half width with pairwise difference dl ¼ 2.5, l ¼ 1, . . . , 7, were calculated with the supplementary SAS/IML programs. Accord-ingly, the resulting sample sizes for the six procedures of Brown and Forsythe (1974), Ury and Wiggins (1971), Games and Howell (1976), Tamhane (1977), and Dunnett (1980) are 637, 443, 417, 441, 419, and 441, respectively. On the other hand, the corresponding sample sizes to guarantee the joint tolerance prob-ability is at least 1 g ¼ 0.90 with the desired half-widths ol¼ 2.5, l ¼ 1, . . . , 7, are 669, 470, 442, 468, 444, and 467 for the six multiple comparison methods. Clearly, the required sample sizes are substantially larger than those in the indi-vidual comparisons. With these numerical illustrations, users can easily identify Jan and Shieh

(21)

the statements containing the key values in the computer code and then modify the program to accommodate their own model specifications.

Conclusion

The editorial policies and statistical guidelines of several prominent educational and psychological journals called for greater use of confidence intervals for prin-cipal effect sizes. Accordingly, it has become consensus across many scientific dis-ciplines to include appropriate effect size measures and associated confidence intervals when documenting the results of research studies. From a study-planning point of view, researchers may wish to credibly address specific research questions and confirm meaningful treatment effects, so that the resulting confi-dence interval will meet the designated precision requirements. The general formu-lation of a linear combination of popuformu-lation means permits a wide range of research questions to be evaluated within the context of ANOVA. Accordingly, a linear contrast between two or more means represents an effect size index in the individual and multiple comparison investigations.

In order to enhance the applicability of single and simultaneous confidence intervals within the framework of one-way heteroscedastic ANOVA, this study presents the corresponding sample size techniques under two precision principles. The precision criteria consist of the control of the expected width and the assurance of tolerance probability of confidence intervals. It is noteworthy that the two prin-ciples of expected width and tolerance probability are closely related to the two standard criteria of unbiasedness and consistency in statistical point estimation, respectively. In other words, these two measures impose unique and distinct aspects of precision characteristics on the resulting confidence intervals, and each principle has conceptual and empirical implications in its own right. For most of the situations, prior knowledge or theory alone enables us to determine the appro-priate magnitude of interval half width because its scale is the same as that of the linear contrast. On the other hand, the suitable values of tolerance levels are within the range of 0.70 to 0.99 as demonstrated in Kupper and Hafner (1989).

Consequently, the suggested sample size procedures update and expand upon current work of Pan and Kupper (1999) and related results in the literature. Although the discussion concentrated on the one-way ANOVA setting, the princi-ples and procedures are also applicable in more complicated factorial and extended formulations. Detailed sample size tables are presented to help researchers have a better understanding of the intrinsic relationships that exists between the optimal sample sizes, model characteristics, and precision considerations. Since existing software packages do not accommodate sample size calculations with the same degree of generality as illustrated in this research, computer programs are also developed to aid the use of the suggested procedures. The proposed sample size methodology should be useful for practical purposes of planning individual and multiple comparison studies in which variances differ across groups.

(22)

Appendix A

Alternative Formulation of Interval Half Width

In order to conduct exact and efficient computations, the following alternative formulation for H is derived from the expression of ^S given in Equation 3

H¼ t^n; a=2fK W g1=2; where K¼P g i¼1 Ki w2ðNT gÞ; Ki w2ðNi 1Þ; NT ¼P g i¼1 Ni W ¼P g i¼1 biAi; bi¼ ðc2is2iÞ

fNiðNi 1Þg and Ai¼ Ki=K; i¼ 1; . . . ; g: Note that the

approximate degrees of freedom ^n given in Equation 6 can also be expressed as ^n¼ Pg i¼1 biAi 2 Pg i¼1 b2 iA2i ðNi 1Þ : Moreover, it is compu-tationally simple and relatively stable to rewrite the dependence of ðA1; . . . ;AgÞ on the chi-square random variables in terms of the beta random

variables, see Johnson, Kotz, and Balakrishnan (1995, p. 212). Specifically, A1¼ Q g1 i¼1 Bi;A2¼ ð1 B1ÞQ g1 i¼2 Bi; . . . ; Ag1¼ ð1 Bg2ÞBg1; and Ag¼ 1 Bg1; where Bi¼ Pi j¼1 Kj . Piþ1 j0_¼1 Kj0

has a beta distribution with Bi beta Pi j¼1 ðNj 1Þ2; ðNiþ1 1Þ=2 for i¼ 1; . . . ; g 1: An important underlying property of the suggested formulations is that the random variables B1; . . . ; Bg1 and K are mutually independent. Hence, both ^n and W can be

viewed as a function of beta random variablesðB1; . . . ;Bg1Þ, and they are

inde-pendent of K.

Appendix B

Approximate Expression of Equation 20

Consider the approximate evaluation for the expected half-width E[Hl/dl]

E½Hl=dl _¼ E½ðQnl; aS1=2l =dlÞðUl=n1=2l Þ _¼ Qnl; aS1=2l =dl;

where Ul w2ðnlÞ for l ¼ 1, . . . , L. Assume the maximum of Dl¼

N₁1=2P1=2_l =dl¼ P g i¼1 c2 lis2i=ri 1=2

=dl; l¼ 1; . . . ; L, occurs when l ¼ l* with

Dl¼ max

1lLDl¼ ð

Xg

i¼1

c2_l_is2_i=riÞ1=2=dl:

(23)

Note that the critical values Qnl; adepend on the sample sizesfN1, . . . , Ngg and

are not substantially different from each other for moderately large degrees of free-dom nl. Essentially, the free-dominant term in Qnl; aS1=2_l =dl is Sl1=2=dl¼ Dl=N11=2.

Hence, max 1lLfE½Hl=dlg _¼ max1lLfQnl; aS 1=2 l =dlg¼ max 1lLfQnl; aDl=N 1=2 1 g _¼Qnl; aDl=N1=2 1 ¼ Qnl_;_aS1=2

l =dl. The result implies that max

1lLfE½Hldlg _¼

E½Hl=d_l, and the condition given in Equation 20 can be alternatively evaluated

by

E½Hl d_l:

Appendix C

Approximate Expression of Equation 22 Note that an useful approximate formulation of PfHl olg is

PfHl olg ¼ PfHl=ol 1g _¼ P n Qnl;aS 1=2 l ol ðUl=nlÞ1=2 1 o ;

where Ul w2ðnlÞ for l¼ 1; . . . ; L: Assume the maximum of

Ol¼ N11=2S 1=2 l =ol¼ P g i¼1 c2 lis2i . ri 1=2

=ol occurs when l¼ lwith

Ol ¼ max 1lLOl¼ Xg i¼1 c2 l_is2_i ri !1=2 ol:

Also, PfUl=nl ag are fairly equivalent when the degrees of freedom nlare

mod-erately large and the constant a is substantially greater than 1. In addition, the domi-nant term in Qnl;aðS1=2l =olÞ is S1=2l =ol¼ Ol=N11=2because the critical values Qnl; a

are relatively close to each other in magnitude for moderately large degrees of freedom nl. Therefore, PfHl ol; l¼1; . . . ; Lg ¼ Pf max

1lLðHlolÞ 1g _¼ P max 1lLðQnl;aS 1=2 l =olÞðUl=nlÞ1=21 _ ¼ PfQnl;_aðO_l=N₁1=2ÞðU_l=_nlÞ1=2 1g ¼ PfQnl;aðS 1=2 l =olÞðU_l=n_lÞ1=2 1g _¼ PfQ_n l;aSb 1=2 l =ol 1g:

Accord-ingly, PfHl ol; l¼ 1; . . . ; Lg _¼ PfHl o_lg and the condition given in

Equation 22 can be approximately evaluated as

PfHl o_lg 1 g

Acknowledgment

The authors would like to thank the editor, Dr. Sandip Sinharay, and four anonymous reviewers for constructive suggestions that led to improved presentation.

(24)

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research of the first author was partially supported by National Science Council Grant NSC-102-2118-M-033-001.

References

Alhija, F. N. A., & Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265.

American Educational Research Association Task Force on Reporting of Research Meth-ods. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Research, 35, 33–40.

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author.

Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of variance. Educa-tional and Psychological Measurement, 62, 197–226.

Bird, K. D. (2004). Analysis of variance via confidence intervals. London, England: Sage. Bretz, F., Hothorn, T., & Westfall, P. H. (2010). Multiple comparisons using R. Boca

Raton, FL: Chapman & Hall/CRC.

Brown, M. B., & Forsythe, A. B. (1974). The ANOVA and multiple comparisons for data with heterogeneous variances. Biometrics, 30, 719–724.

Cochran, W. G. (1964). Approximate significance levels of the Behrens-Fisher test. Biometrics, 20, 191–195.

Cohen, J. (1990). Things I have learned so far. American Psychologist, 45, 1304–1312.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals,

and meta-analysis. New York, NY: Routledge/Taylor & Francis.

Dunnett, C. W. (1980). Pairwise multiple comparisons in the unequal variance case. Journal of the American Statistical Association, 75, 796–800.

Dunst, C. J., & Hamby, D. W. (2012). Guide for calculating and interpreting effect sizes and confidence intervals in intellectual and developmental disability research studies. Journal of Intellectual & Developmental Disability, 37, 89–99.

Fenstad, G. U. (1983). A comparison between the U and V tests in the Behrens-Fisher problem. Biometrika, 70, 300–302.

Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18.

Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal N’s and/or variances: A Monte Carlo study. Journal of Educational Statistics, 1, 113–125.

Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology, 68, 155–165.

(25)

Hahn, G. J., & Meeker, W. Q. (1991). Statistical intervals: A guide for practitioners. New York, NY: John Wiley.

Hochberg, Y., & Tamhane, A. J. (1987). Multiple comparison procedures. New York, NY: John Wiley.

Hsu, J. (1996). Multiple comparisons: Theory and methods. London, England: Chapman and Hall.

Jan, S. L., & Shieh, G. (2014). Sample size determinations for Welch’s test in one-way heteroscedastic ANOVA. British Journal of Mathematical and Statistical Psychology, 67, 72–93.

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Continuous univariate distributions (2nd ed., Vol. 2). New York, NY: Wiley.

Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation and the Health Professions, 26, 258–287.

Kramer, C. (1956). Extension of multiple range tests to groups means with unequal numbers of replications. Biometrics, 12, 307–310.

Kupper, L. L., & Hafner, K. B. (1989). How appropriate are popular sample size formu-las? The American Statistician, 43, 101–105.

Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (5th ed.). New York, NY: McGraw-Hill.

Levine, T. R., Weber, R., Park, H. S., & Hullett, C. R. (2008). A communication research-ers’ guide to null hypothesis significance testing and alternatives. Human Communica-tion Research, 34, 188–209.

Liu, X. S. (2009). Sample size and the width of the confidence interval for mean difference. British Journal of Mathematical and Statistical Psychology, 62, 201–215. Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data:

A model comparison perspective (2nd ed.). Mahwah, NJ: Erlbaum.

Odgaard, E. C., & Fowler, R. L. (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the journal of consulting and clinical psychology. Journal of Consulting and Clinical Psychology, 78, 287–297.

Pan, Z., & Kupper, L. L. (1999). Sample size determination for multiple comparison stud-ies treating confidence interval width as random. Statistics in Medicine, 18, 1475–1488.

Robey, R. R. (2004). Reporting point and interval estimates of effect-size for planned con-trasts: Fixed within effect analyses of variance. Journal of Fluency Disorders, 29, 307–341.

Rossi, J. A. D. (1975). An application of Welch’s approximate t-solution of the Behrens-Fisher problem to confidence intervals. Technometrics, 17, 57–60.

SAS Institute. (2011). SAS/IML User’s Guide (Version 9.2) [Computer software]. Cary, NC: SAS Institute.

Satterthwaite, F. E. (1946). An approximate distribution of estimate of variance compo-nents. Biometrics Bulletin, 2, 110–114.

Scheffe, H. (1959). The analysis of variance. New York, NY: Wiley.

Shieh, G., & Jan, S. L. (2012). Optimal sample sizes for precise interval estimation of Welch’s procedure under various allocation and cost considerations. Behavior Research Methods, 44, 202–212.

(26)

Smith, H. F. (1936). The problem of comparing the results of two experiments with unequal errors. Journal of the Council for Scientific and Industrial Research, 9, 211–212. Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989–1004.

Tamhane, A. C. (1977). Multiple comparisons in model I one-way ANOVA with unequal variances. Communications in Statistics, A6, 15–32.

Tamhane, A. C. (1979). A comparison of procedures for multiple comparisons of means with unequal variances. Journal of the American Statistical Association, 74, 471–480.

Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychological Bulletin, 99, 90–99. Tukey, J. W. (1994). The problem of multiple comparisons. In H. Braun (Ed.), The

col-lected works of John W. Tukey VIII. Multiple comparisons: 1948–1983 (pp. 1–300). New York: Chapman and Hall.

Ury, H. K., & Wiggins, A. D. (1971). Large sample and other multiple comparisons among means. British Journal of Mathematical and Statistical Psychology, 24, 174–194.

Wang, Y., & Kupper, L. L. (1997). Optimal sample sizes for estimating the difference in means between two normal populations treating confidence interval length as a random variable. Commemorations in Statistics-Theory and Methods, 26, 727–741. Welch, B. L. (1938). The significance of the difference between two means when the

population variances are unequal. Biometrika, 29, 350–362.

Welch, B. L. (1947). The generalization of students’ problem when several different population variances are involved. Biometrika, 34, 28–35.

Westfall, P. H., Tobias, R. D., Rom, D., Wolfinger, R. D., & Hochberg, Y. (2011). Multiple comparisons and multiple tests: Using the SAS system (2nd ed.). Cary, NC: SAS Institute.

Wilcox, R. R. (1987). New designs in analysis of variance. Annual Review of Psychology, 38, 29–60.

Wilkinson, L., & American Psychological Association Task Force on Statistical Infer-ence. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604.

Williams, V. S. L., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple com-parisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24, 42–69.

Authors

SHOW-LI JAN is a professor of applied mathematics, Chung Yuan Christian University, Chungli, Taiwan; e-mail: [email protected]. Her research focuses on nonpara-metric methods and multiple test procedures.

(27)

GWOWEN SHIEH is a professor of management science, National Chiao Tung University, Hsinchu, Taiwan; e-mail: [email protected]. His current research interests include sample size methodology and research methods.

Manuscript received June 04, 2013 Revision received October 19, 2013 Accepted January 09, 2014