On tests of treatment-covariate interactions: An illustration of appropriate power and sample size calculations

(1)

On tests of treatment-covariate interactions:

An illustration of appropriate power and

sample size calculations

Gwowen Shieh*

Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan

Abstract

The appraisals of treatment-covariate interaction have theoretical and substantial implica-tions in all scientific fields. Methodologically, the detection of interaction between categorical treatment levels and continuous covariate variables is analogous to the homogeneity of regression slopes test in the context of ANCOVA. A fundamental assumption of ANCOVA is that the regression slopes associating the response variable with the covariate variable are presumed constant across treatment groups. The validity of homogeneous regression slopes accordingly is the most essential concern in traditional ANCOVA and inevitably determines the practical usefulness of research findings. In view of the limited results in current literature, this article aims to present power and sample size procedures for tests of heterogeneity between two regression slopes with particular emphasis on the stochastic feature of covariate variables. Theoretical implications and numerical investigations are pre-sented to explicate the utility and advantage for accommodating covariate properties. The exact approach has the distinct feature of accommodating the full distributional properties of normal covariates whereas the simplified approximate methods only utilize the partial infor-mation of covariate variances. According to the overall accuracy and robustness, the exact approach is recommended over the approximate methods as a reliable tool in practical applications. The suggested power and sample size calculations can be implemented with the supplemental SAS and R programs.

Introduction

The existence of interactive phenomena between predictor variables on the response variable is an essential issue in all scientific studies. The detection of interactions between categorical treatment levels and continuous covariate variables is equivalent to the test of homogeneity of regression slopes test in ANCOVA designs. Notably, ANCOVA represents a constructive syn-thesis of analysis of variance and multiple linear regression to account for the relationship between the response variable and the concomitant or covariate variables in treatment com-parisons. In addition to the fundamental assumptions of independence, normality, and con-stant variance, the within-group regression coefficients of the criterion variable on the

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS

Citation: Shieh G (2017) On tests of

treatment-covariate interactions: An illustration of appropriate power and sample size calculations. PLoS ONE 12 (5): e0177682.https://doi.org/10.1371/journal. pone.0177682

Editor: Jake Olivier, University of New South

Wales, AUSTRALIA

Received: January 22, 2017 Accepted: April 30, 2017 Published: May 17, 2017

Copyright:© 2017 Gwowen Shieh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability Statement: Data are from the

book of Fleiss, J. L. (2011). Design and analysis of clinical experiments (Vol. 73). New York, NY: Wiley.

Funding: This study was funded by Ministry of

Science and Technology of Taiwan with the grant: MOST 105-2410-H-009 -035 -MY2.

Competing interests: The author has declared that

(2)

covariate variable are presumed to be equal in ANCOVA. Violation of the ANCOVA assump-tions has been the target of attention in the literature such as Glass, Peckham, and Sanders [1] and Harwell [2]. Naturally, the actual significance level and power of the regular test for treat-ment effects can be distorted to some extent under nonparallel regression settings. Hence, the validity of heterogeneity regression slopes plays a crucial role in applying the traditional ANCOVA or generalized alternatives. As a general guideline, a test for nonparallel regression lines is required as the preliminary procedure for use of traditional ANCOVA. If the test for heterogeneity of regression slopes is significant, then it suggests that the standard ANCOVA is no longer an appropriate technique. Accordingly, Fleiss [3], Huitema [4], and Maxwell and Delaney [5] provide comprehensive exposition and general strategy under heterogeneity of regression.

The statistical perspectives and appropriate strategies of covariate selection are presented in Hauck, Anderson, and Marcus [6], Hernandez, Steyerberg, and Habbema [7], Pocock et al. [8], Raab, Day, and Sales [9], and references therein. Moreover, the impact of omitted covari-ates on the statistical inferences has been demonstrated in Hauck et al. [10], Gail, Wieand, and Plantadosi [11], and Negassa and Hanley [12]. However, there is no related exploration about the direct consequence of excluding covariate characteristics in power and sample size calcula-tions. In view of the potential applicability in practice, this article focuses on the most funda-mental ANCOVA designs for two treatment groups and a single covariate. For the purposes of planning research designs and validating crucial interactions, power and sample size proce-dures were considered in Dupont and Plummer [13]. Their formula is very attractive from a computational standpoint and has been implemented in statistical packages. However, it is important to note that the particular method involves several convenient approximations in-cluding the use of a shiftedt distribution for a noncentral t distribution and the substitution of

fixed parameters for random covariates. The inherent nature and implications of accuracy were not addressed in Dupont and Plummer [13]. Accordingly, the existing illustrations were not detailed enough to elucidate the potential deficiency of their approximate technique. Because of the limited results in the literature, the current article aims to contribute to the development of power and sample size methodology for the tests for heterogeneity of two regression slopes. The emphasis is placed on the practical situation that not only the values of response variables for each subject are just available after the observations are made, but also the levels of covariate variables cannot be predetermined before data collection.

It is noteworthy that a different and prominent situation of interactive research involves interactions between two continuous covariates. Although the model formulations and test procedures of the interactive analysis are rather similar for the two types of covariate variable combination: continuous by continuous and categorical by continuous, their test statistics and associated distribution properties are considerably different. Therefore, the power and sample size calculations of Shieh [14] for detecting interactions between two continuous variables in multiple regression settings are not appropriate for assessing interactions between grouping and continuous variables within the context of ANCOVA. In a continual effort to support the analytical development and improve the essence of research findings in interaction studies, this investigation updates and expands the previous work of Dupont and Plummer [13] in such a way that the findings not only notify the fundamental deficiency of existing procedure, but also reinforce the usefulness of interaction designs in applications.

The present study has three key aspects. First, to account for the stochastic nature of covari-ate variables, the covaricovari-ates are assumed to follow a normal distribution. Both exact and approximate power functions and sample size procedures for detecting heterogeneity of regression slopes are derived. Second, extensive numerical examinations were conducted to examine the deficiency of the approximate methods and the advantage of the exact approach

(3)

under a wide range of model settings. The performance and robustness of the described tech-niques with respect to non-normality of the covariates are also investigated. Third, in view of the limited features of existing software packages, both SAS [15] and R [16] computer algo-rithms are developed to facilitate the implementation of the suggested power and sample size computations.

Methods

The two-group nonparallel simple linear regression model is of the form

Y1j ¼ b01þX1jb11þε1j and Y2k¼ b02þX2kb12þε2k; ð1Þ

whereε1jandε2kareiid N(0,σ2) random variables,j = 1,. . ., N1, andk = 1,. . ., N2. It is often

informative to rewrite the regression model with heterogeneous slopes inEq 1as the following interactive multiple regression model using a dummy variableM:

Yi¼ b02þMib0DþXib12þMiXib1Dþεi;i ¼ 1; . . . ; N; N ¼ N1þN2; ð2Þ

where b0D¼ b01 b02; b1D¼ b11 b12;

Yi¼Y1j;Xi¼X1j;εi¼ε1j; and Mi ¼ 1if i ¼ j; j ¼ 1; . . . ; N1;

Yi¼Y2k;Xi¼X2k;εi¼ε2k; and Mi¼ 0if i ¼ N1þk; k ¼ 1; . . . ; N2: Note that a traditional ANCOVA model assumes that the regression slopes are equivalent

β11=β12=β1and it postulates the parallel regression formulation

Yi¼ b02þMib0DþXib1þεi;i ¼ 1; . . . ; N: ð3Þ

Because the strategy and procedure for treatment comparisons differ for the nonparallel and parallel regression frameworks, the equality of covariate regression coefficients is viewed as the most crucial assumption in ANCOVA. Accordingly, a test for heterogeneity of regression slopes is generally required to justify the use of ANCOVA. When the assumption of equal within-group covariate regression coefficients is not tenable, the standard procedures of ANCOVA are no longer appropriate and alternative methods such as Johnson-Neyman and Picked-Point solutions for heterogeneous regression should be adopted. More conceptual and thorough discussions of alternative solutions to traditional ANCOVA can be found in Rogosa [17] and Rutherford [18].

In order to facilitate the detection of heterogeneous regression slopes, this article describes and examines the corresponding procedures for power and sample size determinations. Under the heterogeneous linear model assumption defined inEq 1, it follows from standard results that the least squares estimators ^b₁₁and ^b₁₂of slope coefficientsβ11andβ12have the following

distributions ^ b11Nðb11; s 2 =SSX1Þ and ^b12Nðb12; s 2 =SSX2Þ; whereSSX1¼ P_N 1 j¼1ðX1j X1Þ 2 andSSX2 ¼ P_N 2 k¼1ðX2k X2Þ 2

, X1and X2are the respective sample means of theX1jandX2kobservations. Accordingly, ^b1D¼ ^b11 ^b12Nfb1D; s

2 ð1=SSX1þ 1=SSX2Þg. On the other hand, ^s

2 _¼_{SSE=n is the usual unbiased estimator of σ}2

whereSSE is the error sum of squares and ν = N– 4. Moreover, SSE/σ2*χ2(ν), where χ2(ν) are

(4)

coefficients in terms of H0:β11=β12versus H1:β116¼β12, the test statistic has the form T ¼ b^1D

f^s2_ð1=_SSX

1þ 1=SSX2Þg

1=2: ð4Þ

Under the null hypothesis H0:β11=β12, the statistic has the distribution

T tðnÞ; ð5Þ

wheret(ν) is a t distribution with degrees of freedom ν. The null hypothesis is rejected at the

significance levelα if

jTj > tn;a=2; ð6Þ

wheretν,α/2is the 100(1 –α/2) percentile of the distribution t(ν). Note that the inference setting

is discussed here only from the perspective of a two-sided test. The same concepts may be readily extended to one-sided situations.

The statistical inferences about the heterogeneous slope effect are based on the conditional distribution of the continuous covariates. Therefore, the corresponding results would be spe-cific to the particular values of the covariates. However, before conducting a research study, the actual values of covariates cannot be known in advance just as the primary responses. Under such circumstances, it is more suitable to employ the random or unconditional setup as explicated in Sampson [19]. The underlying similarities and differences between fixed and ran-dom models have also been thoroughly illuminated in Cramer and Appelbaum [20] and Rau-denbush [21]. Despite the complexity associated with the unconditional properties of the test procedure, the tests of hypotheses and estimates of parameters remain the same under both conditional and unconditional frameworks. Hence, the usual rejection rule and critical value remain unchanged. The distinction between the two modeling approaches becomes important only when power and sample size calculations are to be made. Thus, it is vital to recognize the stochastic nature of the covariate variables and to evaluate the distribution of the test statistic over possible values of the covariates. In order to elucidate the critical notion of accommodat-ing the distributional properties of the covariate variables, the continuous covariate variables {X1j,j = 1,. . ., N1} and {X2k,k = 1,. . ., N2} are assumed to have the independent normal

distri-butionsNðy1; t 2

1Þ andNðy2; t 2

2Þ, respectively. It should be noted that the normality setting is commonly employed to provide a convenient framework for analytical derivation and theoret-ical discussion in interaction studies, for example, see Harwell [2], McClelland and Judd [22], O’Connor [23], and Shieh [14].

To help justify the contribution of current investigation, a brief review of the simple interac-tion model with two continuous covariates is presented here:

Yi¼ bIþXibXþZibZþ XiZibXZþ xi; ð7Þ

whereYiis the value of the response variableY, XiandZiare the known constants of the

con-tinuous covariatesX and Z, ξiareiid N(0,ω2) random errors fori = 1,. . ., N, and βI,βX,βZ, and

βXZare unknown parameters. For the purpose of detecting the interaction effect in terms of

the hypotheses H0:βXZ= 0 versus H1:βXZ6¼ 0, it is important to examine the distributional

property for the least squares estimator ^b_XZofβXZ:

^

b_XZNðbXZ;Vð^bXZÞÞ; ð8Þ

whereVð^b_XZÞ ¼ o2_{M, M is the (3, 3) element of ðX}T

CXCÞ 1 , where XC= ½X1 X; . . . ; X N X T ,

(5)

Zi, and their cross productXiZifori = 1,. . ., N. The corresponding test statistic TXZis of the form TXZ¼ ^ b_XZ f ^o2_Mg1=2 ð9Þ

where ^o2_{is the usual unbiased estimator of}_ω2

. When the null hypothesis H0:βXZ= 0 is true,

the statisticTXZis distributed ast(ν), and H0is rejected at the significance levelα if |TXZ| >tν, α/2. At first sight, all of the model structure, tested hypothesis, and decision rule are similar to

the prescribed results given in Eqs4–6for detecting the treatment by covariate interaction. However, the two test statisticsTXZandT have different forms and distribution properties

under alternative hypothesis. Specifically, an alternative expression for the centered design matrix XCis XC= [xC, zC, wC] where xC, zC, and wCare the threeN × 1 column vectors of XC.

Then, it can be shown thatM = ðwT

CMACwCÞ 1 ,MAC ¼IN XACðX T ACXACÞ 1 XT ACand XAC=

[xC, zC]. The complex expression ofM generally does not have a simple analytic distribution

even though the two covariate variablesX and Z may have a bivariate normal distribution. It

should be obvious that the productXZ of two normally distributed variables does not have a

normal distribution. Hence, it is inaccessible to obtain a transparent nonnull distribution for the test statisticTXZunder random or unconditional framework with a given joint

distribu-tion ofX and Z. Instead, Shieh [14] adopted a large-sample viewpoint and considered the asymptotic distribution ofM. The resulting nonnull distribution and associated power

func-tion of the statisticTXZare considerably more complicated than the explications presented

later for theT test of treatment by covariate interactions. Consequently, the power and sample

size calculations of Shieh [14] for detecting interactions between two continuous variables in multiple regression analysis are not applicable for assessing interactions between grouping and continuous variables within the context of ANCOVA. In the following, particular atten-tion is given to develop useful and specialized statistical techniques for power and sample size computations in assessing the difference between two regression slopes.

In general, the statisticT has the nonnull distribution for the given values of SSX1andSSX2: Tj½SSX1;SSX2 tðn; DÞ; ð10Þ

wheret(ν,Δ) is a noncentral t distribution with degrees of freedom ν and noncentrality

param-eter

D ¼ d

ð1=SSX1þ 1=SSX2Þ

1=2; ð11Þ

whereδ = β1D/σ. It follows from Johnson, Kotz, and Balakrishnan [24] that the first moment of

a noncentralt distribution is E[T] = (ν/2)1/2Γ{(ν−1)/2}Δ/Γ{ν/2}, where Γ{} is the gamma func-tion. Hence, an unbiased estimator of the effect sizeδ is

^ d_UE ¼ð1=SSX1þ 1=SSX2Þ 1=2 Gfn=2g ðn=2Þ1=2Gfðn 1Þ=2g T ¼ Gfn=2g ðn=2Þ1=2Gfðn 1Þ=2g ^ b1D ^ s :

To derive the nonnull distribution ofT, an exact and sophisticated approach is to utilize the

full distribution associated withSSX1andSSX2. With the prescribed normal covariate

assump-tions, it can be readily established thatK1=SSX1=t 2 1 w 2_ðk 1Þ andK2=SSX2=t 2 2 w 2_ðk 2Þ whereκ1=N1−1 and κ2=N2−1. For ease of illustration, the two random variables of K1and K2are transformed to obtainK = K1+K2~χ2(κ) and B = K1/K ~ Beta{κ1/2,κ2/2} where Beta

(6)

andB are independent. Under the prescribed stochastic considerations of SSX1andSSX2in

terms ofK and B, the T statistic has the following two-stage distribution Tj½K; B tðn; DK BÞ; K w 2 ðkÞ; and B Betafk1=2; k2=2g: ð12Þ where D_KB¼ d f½1=ðB1t21Þ þ 1=ðB2t22Þ=Kg 1=2;

B1=B, and B2= (1 –B). Hence, the resulting power function for comparing nonparallel

regres-sion lines is

C_KBðb1DÞ ¼EKEB½Pfjtðn; DKBÞj >tn;a=2g; ð13Þ where the expectationEK[] andEB[] is taken with respect to the distribution ofK and B,

respectively.

Alternatively, a simple and naive method to obtain a unconditional distribution ofT is to

substitute the two sum of squaresSSX1andSSX2inΔ with the corresponding expected values E[SSX1] = k1t

2

1andE[SSX2] = k2t 2

2. Consequently, the distribution ofT can be approximated by a noncentralt distribution as T _tðn; DAÞ; ð14Þ where D_A¼ d f½1=ðb1t21Þ þ 1=ðb2t22Þ=kg 1=2;

b1=κ1/κ, b2=κ1/κ, and κ = κ1+κ2. The corresponding power function for the test for

hetero-geneity of regression slopes can be expressed as

C_Aðb1DÞ ¼Pfjtðn; DAÞj >tn;a=2g: ð15Þ On the other hand, Dupont and Plummer [13] presented a relatively more simplified power function for the test of difference between two regression slopes:

C_DPðb1DÞ ¼PftðnÞ < DDP tn;a=2g þPftðnÞ < DDP tn;a=2g: ð16Þ where

D_DP¼ d

f½1=ðp1t21Þ þ 1=ðp2t22Þ=Ng 1=2;

p1=N1/N and p2= 1 –p1. Although the two noncentrality parametersΔAandΔDPare quite

similar, especially when the sample sizeN is large, the two approximate power functions CA

and CDPhave a crucial difference. Note that the power function CAinvolves a noncentralt

distributiont(ν,ΔA), whereas CDPis formulated through a shiftedt distribution t(ν) + ΔDP). It

is well known that ifZ ~ N(0, 1) then X = (Z + μ) * N(μ, 1) where μ is a constant. However,

the result does not generalize to the case oft distribution, i.e., if t ~ t(df) then Y = (t + μ) does

not follow a noncentralt distribution t(df, μ) with noncentrality parameter μ and degrees of

freedomdf. A random variable Y is said to have a noncentral t distribution t(df, μ) if and only

ifY = (Z + μ)/(W/df)1/2whereZ ~ N(0, 1), W ~ χ2(df), and Z and W are independent.

Essen-tially, Dupont and Plummer [13] extended the results under normal theory in Dupont and Plummer [25] to the case of noncentralt distributions in the comparison of two regression

(7)

slopes. The resulting formulation suffers the absence of proper theoretical justification. Despite the computational appeal of the approximate power function CDP, the prescribed analytic

issue induces a fundamental question about its general adequacy as a reliable procedure. It is essential to note that all the power functions CDP, CAand CKBdepend on the

differ-ence between two coefficients {β11,β12} and error varianceσ2through the standardized effect δ. Under the prescribed stochastic assumptions for the covariate variables, these power

func-tions rely on the covariate variances {t2 1, t

2

2} through the associated noncentrality parameter, but not the mean values of covariate variables {θ1,θ2}. Moreover, the approximate

formula-tions of CDPand CAonly involve the centralt and noncentral t distributions, whereas the

nor-mal covariate distributions lead to the unique and more complex conditional property of CKB

on the chi-square distribution and beta distribution. It can be shown that the noncentrality termsΔDP,ΔA, andΔKBare asymptotically equivalent as sample size goes to infinity. Therefore,

the three power functions CDP, CA, and CKBhave the same large sample properties. Despite

the close resemblance between the three power formulas, the corresponding behaviors for fi-nite sample obviously differ. Their relative performance of power calculations will be appraised in the numerical investigations.

For planning research design, the power formulas can be employed to determine the sample sizesN1andN2needed to attain the specified power (1 –β) through a simple iterative search

for the chosen significance levelα and parameter settings. In practice, a research study requires adequate statistical power and sufficient sample size to detect scientifically credible effects. It is sensible that the corresponding power calculations and sample size determinations must be considered in the planning stage of a study. Consequently, it is of theoretical importance to evaluate the potential discrepancy between the three procedures in power and sample size cal-culations. In view of the wide variety of practical situations, the presumed normal covariate distribution merely provides a convenient and important situation. Evidently, the degree of robustness to nonnormal covariates for the resulting power and sample size procedures is also an essential issue and requires further sensitivity assessments.

Simulation study

To justify the distinct advantage of the suggested exact approach and the potential deficiency of the approximate methods, numerical examinations of power and sample size calculations were conducted in two studies under a wide variety of model configurations. The first investi-gation focuses on the situations with normal covariate variables, whereas several notable sce-narios of non-normal covariates are examined in the subsequent appraisal.

Study I

For the purpose of explicating the critical discrepancy between the three power functions CDP, CA, and CKBin using covariate information, the two covariatesX1andX2are assumed

to have normal distributions with variances {t2 1, t

2

2} = {1, 1} and {1, 3} for balanced design withN1=N2and {t21, t

2

2} = {1, 1}, {1, 3}, and {3, 1} for unbalanced design withN2= 3N1. As noted earlier, the power functions do not depend on the covariate meansθ1andθ2.

With-out loss of generality, they are set asθ1=θ2= 0. In addition, the selected configurations of

treatment means and error variance areβ11= 0.50 and 0.75,β12= 0, andσ2= 1. Hence, the

resulting standardized effect size has two different valuesδ = 0.50 and 0.75. Overall these considerations result in a total of 10 different combined arrangements. These combinations of different covariate structures, effect magnitudes, and sample size allocations were chosen to represent as much as possible the extent of characteristics that are likely to be encountered in actual applications.

(8)

With the prescribed specifications, the required sample sizes were computed for the three procedures with the chosen power value and significance level. Throughout this empirical investigation, the significance level and nominal power are fixed asα = 0.05 and 1 – β = 0.80, respectively. The computed sample sizes associate with the effect sizeδ = 0.50 and 0.75 are pre-sented in Tables1and2, respectively. For ease of illustration, the total sample sizes of the exact approach forδ = 0.50 and 0.75 are plotted in Figs1and2, respectively.

The graphs show that, for fixed values of sample size ratior and covariate variance t2 1, the total sample sizesN decrease with increasing covariance variance t2

2. It is clear that the com-puted sample sizes inTable 1are larger than those inTable 2when all other characteristics are the same. More importantly, the results show that the calculated sample sizes of the exact approach differ from those of the two approximate procedures for all ten cases. The sample sizes of the approximate methods are relatively smaller than those of the exact approach. Also, the discrepancy are slightly larger forδ = 0.75 inTable 2than those ofδ = 0.50 inTable 1. In order to evaluate the accuracy of the power functions, the estimated power or computed power are also listed. Because of the underlying metric of integer sample sizes, the attained val-ues are marginally larger than the nominal level for all three procedures.

Then, Monte Carlo simulation studies were performed to evaluate the accuracy of the sam-ple size calculations. With the computed samsam-ple sizes, parameter configurations, and nominal power, estimates of the true power were computed via Monte Carlo simulation of 10,000 inde-pendent data sets. For each replicate,N1andN2covariate values were generated from the

selected normal distributions. The resulting values of covariate variables in turn determined the mean responses for generatingN1andN2normal outcomes with the designated ANCOVA

Table 1. Computed sample size, estimated power, and simulated power whenδ= 0.50, Type I errorα= 0.05, and nominal power 1 –β= 0.80.

Dupont and Plummer (1998) Approximate method Exact approach

Covariate variance Sample sizes Estimated power Simulated power Error Sample sizes Estimated power Simulated power Error Sample sizes Estimated power Simulated power Error {1, 1} {64, 64} 0.8013 0.7773 0.0240 {65, 65} 0.8015 0.7859 0.0156 {67, 67} 0.8026 0.8000 0.0026 {1, 3} {43, 43} 0.8011 0.7804 0.0207 {44, 44} 0.8015 0.7844 0.0171 {46, 46} 0.8037 0.8065 – 0.0028 {1, 1} {43, 129} 0.8059 0.7894 0.0165 {44, 132} 0.8076 0.7977 0.0099 {45, 135} 0.8033 0.8014 0.0019 {1, 3} {36, 108} 0.8068 0.7831 0.0237 {37, 111} 0.8007 0.7943 0.0134 {38, 114} 0.8015 0.8011 0.0004 {3, 1} {22, 66} 0.8103 0.7674 0.0429 {23, 69} 0.8165 0.7920 0.0245 {24, 72} 0.8122 0.8169 – 0.0047 https://doi.org/10.1371/journal.pone.0177682.t001

Dupont and Plummer (1998) Approximate method Exact approach

Covariate variance Sample sizes Estimated power Simulated power Error Sample sizes Estimated power Simulated power Error Sample sizes Estimated power Simulated power Error {1, 1} {29, 29} 0.8008 0.7682 0.0326 {30, 30} 0.8014 0.7745 0.0269 {32, 32} 0.8045 0.8072 – 0.0027 {1, 3} {20, 20} 0.8068 0.7467 0.0601 {21, 21} 0.8080 0.7715 0.0365 {23, 23} 0.8135 0.8156 – 0.0021 {1, 1} {20, 60} 0.8180 0.7687 0.0493 {20, 60} 0.8016 0.7719 0.0297 {22, 66} 0.8125 0.8145 – 0.0020 {1, 3} {17, 51} 0.8236 0.7710 0.0526 {17, 51} 0.8020 0.7736 0.0284 {19, 57} 0.8124 0.8169 – 0.0045 {3, 1} {10, 30} 0.8068 0.7194 0.0874 {11, 33} 0.8211 0.7713 0.0498 {12, 36} 0.8126 0.8181 – 0.0055 https://doi.org/10.1371/journal.pone.0177682.t002

(9)

designs. Next, the test statisticT was computed and the simulated power was the proportion of

the 10,000 replicates whose test statistics |T| exceeded the corresponding critical value tν,0.025.

Therefore, the adequacy of the approximate and exact sample size procedures is determined by the error (= estimate power–simulated power) between the estimated power computed from analytic formulas and the simulated power of Monte Carlo study. The simulated power and error are also summarized in Tables1and2for all 10 design schemes.

It is noticeable from the results that there exists a close agreement between the estimated power and the simulated power for the proposed exact sample size procedure regardless of the model configurations. Specifically, all the incurred errors of the 10 designs are all within the small range of –0.0055 to 0.0026. In contrast, the estimated powers for the two approximate methods are consistently larger than the simulated powers for all 10 settings in Tables1and2. In particular, the errors associated with Dupont and Plummer’s [13] procedure are {0.0240,

Fig 1. Computed sample size for effect sizeδ= 0.50. https://doi.org/10.1371/journal.pone.0177682.g001

(10)

0.0207, 0.0165, 0.0237, 0.0429} and {0.0326, 0.0601, 0.0493, 0.0526, 0.0874} forδ = 0.50 and 0.75 in Tables1and2, respectively. For the approximate method with power function CA, the

corresponding errors of the ten cases in Tables1and2are {0.0156, 0.0171, 0.0099, 0.0134, 0.0245} and {0.0269, 0.0365, 0.0297, 0.0284, 0.0498} forδ = 0.50 and 0.75, respectively. Al-though some of the differences are not substantial, it delineates a clear pattern that the accu-racy of the approximate power functions deteriorates to some degree for smaller sample sizes, especially for the simple method of Dupont and Plummer [13]. Furthermore, the magnitudes of errors correspond to the direct-paring cases (when larger covariate variance is paired with larger sample size) are relative smaller than those of the inverse-pairing situations (when larger covariate variance is paired with smaller sample size). Note that the resulting errors of Dupont and Plummer’s [13] procedure associated with {t2

1, t 2

2} = {1, 3} and {N1,N2} = {36, 108} and

{17, 51} under direct-pairing are 0.0237 and 0.0526 in Tables1and2, respectively. However,

Fig 2. Computed sample size for effect sizeδ= 0.75. https://doi.org/10.1371/journal.pone.0177682.g002

(11)

the counterparts of inverse-pairing setting with {t2 1, t

2

2} = {3, 1} and {N1,N2} = {22, 66} and {10, 30} are much larger with 0.0429 and 0.0874 forδ = 0.50 and 0.75, respectively. These realiza-tions imply that the magnitude of sample sizes plays an essential role in the performance of the approximate methods. More importantly, the adequacy of the approximate power formulas and sample size procedures varies with model configurations. In contrast, the numerical per-formance suggests that the exact methodology performs fairly well for the range of model spec-ifications considered here.

Study II

The described exact power function is obtained under the essential framework that the covari-ate variables have normal distributions. Instead of using the full features, the approximcovari-ate power formula CAonly relies on the partial information of second moments or variances of

the covariates. At first sight, the simplified method may be more robust than the exact ap-proach to the violation of normality assumption of the covariates. To further illuminate the sensitivity issues and profound implications of the two distinct techniques, power and sample size calculations were also conducted for the scenarios with non-normal covariates. Due to the undesired and inferior performance of Dupont and Plummer’s [13] technique, their method is not considered in this examination.

Specifically, the two covariates are assumed to have five different sets of distributions: Beta, Exponential, Gamma, Laplace, and Uniform. For ease of comparison, the designated distribu-tions were constructed to have variances {t2

1, t 2

2} = {1, 1} and {1, 3}. Moreover, only balanced designs were considered and the treatment means and error variance were fixed asβ11= 0.50, β12= 0, andσ2= 1. Hence, the required sample sizes and estimated powers associated with the

exact procedure remain identical for the five different distributions. Unlike the previous study, the estimated powers and related evaluations of the approximate method were computed with the sample sizes determined by the exact approach.Table 3summarizes the empirical results of the ten combined structures of covariate distribution and associated variance. In the case of Beta distribution, the actual two pairs of Beta covariates areX1~ Beta(2, 5)/c1andX2~ Beta(2,

5)/c1, andX1~ Beta(2, 5)/c1andX2~ Beta(2, 5)/c2wherec1andc2are selected such that the

resulting variances are 1 and 3, respectively. On the other hand, the parameter specifications of

Approximate method Exact approach

Covariate distributions Sample sizes Estimated power Simulated power Error Estimated power Simulated power Error

Beta(2, 5)*and Beta(2, 5)* {67, 67} 0.8135 0.7973 0.0162 0.8026 0.7973 0.0053

Beta(2, 5)*and Beta(2, 5)** {46, 46} 0.8194 0.7960 0.0234 0.8037 0.7960 0.0077

Exponential(1) and Exponential(1) {67, 67} 0.8135 0.7775 0.0360 0.8026 0.7775 0.0251

Exponential(1) and Exponential(31/2) {46, 46} 0.8194 0.7697 0.0497 0.8037 0.7697 0.0340 Gamma(2, 1/21/2) and Gamma(2, 1/21/2) {67, 67} 0.8135 0.7905 0.0230 0.8026 0.7905 0.0121 Gamma(2, 1/21/2_{) and Gamma(2,}

(3/2)1/2)

{46, 46} 0.8194 0.7830 0.0364 0.8037 0.7830 0.0207

Laplace(21/2_{) and Laplace(2}1/2₎ _{{67, 67}} _0.8135 _0.7927 _0.0208 _0.8026 _0.7927 _0.0099

Laplace(21/2_{) and Laplace((2/3)}1/2₎ _{{46, 46}} _0.8194 _0.7814 _0.0380 _0.8037 _0.7814 _0.0223 Uniform(–1/2, 1/2) and Uniform(–1/2,

1/2)

{67, 67} 0.8135 0.8115 0.0020 0.8026 0.8115 –

0.0089

Uniform(–1/2, 1/2) and Uniform(–3, 3) {46, 46} 0.8194 0.8095 0.0099 0.8037 0.8095 –

0.0058

*Beta(2, 5) is scaled to have a variance 1

**Beta(2, 5) is scaled to have a variance 3.

(12)

the other four types of distribution can be found inTable 3. Similar to the numerical assess-ments in Study I,Table 3presents the computed sample sizes, estimated powers, simulated powers, and associated errors of the two competing procedures.

A detailed inspection of the findings inTable 3reveals that the performance of both the contending procedures is affected by the non-normal covariate settings, especially for the Exponential cases. However, it is important to note that the approximate technique incurs larger estimated powers and errors between estimated power and simulated power than the exact approach. The only exceptions occurred with the Uniform covariate distribution that the exact procedure does not have a clear advantage over the approximate method. Conceivably, the degree of robustness of the suggested exact technique presumably depends on the extent of how badly covariate distributions deviate from normality assumption. Nonetheless, these empirical evidences show that the exact procedure give acceptable results even for the non-normal covariates. In view of the potentially diverse treatment and covariate configurations of ANCOVA studies, it appears that the exact approach is relatively more consistent and accurate than the approximate method to be considered as a general tool.

Results

The implementation of the suggested power and sample size calculations involves specialized programs not currently available in prevailing statistical packages. To exemplify the computa-tional aspects of the developed algorithms for design planning, the numerical demonstration of evaluating two treatments for gingivitis in Fleiss [3, Section 7.3] is reexamined here. The data consists of measurements of patients before and after treatment on a modification of the Loe and Silness [26] index of gingivitis. A higher value indicates a more severe level of gingivi-tis. Accordingly, the response variable of ANCOVA is the post-treatment measurement with the pretreatment value serving as the covariate. It should be note that the illustration in Fleiss [3] does not address the power and sample size issues. Moreover, the emphasis of this numeri-cal demonstration is on the typinumeri-cal research scenario most frequently encountered in the plan-ning stage of an ANCOVA study.

Due to the prospective nature of advance research planning, the general guidelines suggest that typical sources like published finding or expert opinion can offer plausible and reasonable planning values for the model characteristics, such as treatment effects, variance component, and covariate properties. To explicate the essential processes, the prescribed data of comparing two treatments of gingivitis is employed to provide planning values of the model parameters and covariate configurations for related gingivitis studies. Specifically, the summary statistics yield the designated treatment effects and variance component:β11= 0.8502,β12= 0.4008, and σ2= 0.04. In addition, the covariate variances are obtained from the reported pretreatment val-ues as t2

1= 0.0646 and t 2

2= 0.0526. With the sample sizes of {N1,N2} = {74, 64} and significance

levelα = 0.05, the achieved power can be readily computed with the supplemental programs (Programs A and C). The result shows that the achieved power of the particular unbalanced design is CKB= 0.8650 which falls between the two fairly common levels of 0.80 and 0.90.

Therefore, the power calculation suggests that the designated configurations warrant a decent chance of detecting the slope difference between two treatment groups.

Alternatively, under the notion of a balanced design, the presented algorithms (Programs B and D) reveal that the equal sample sizes of {N1,N2} = {69, 69} yield the power of 0.8694. It is

interesting to note that, although the two sample size schemes {74, 64} and {69, 69} have the identical total sample size 138, the balanced design has a slightly advantage over the unbal-anced structure in power performance. For an illustration of sample size determination for planning balanced study, detailed computations show that the balanced sample sizes of {N1,

(13)

N2} = {58, 58} and {77, 77} are needed to achieve the target powers of 0.80 and 0.90,

respec-tively. It is noted above, because of the sample sizes need to be integer values in practice, that the attained power is marginally greater than the nominal power level. Here, the correspond-ing actual powers of the two sample size designs are 0.8043 and 0.9038, respectively. These vital configurations are incorporated in the user specifications of the SAS/IML [13] and R [14] programs presented in the supplemental files. With the prescribed explications, users can eas-ily identify the statements containing the exemplifying values in the computer code and then modify the program to accommodate their own model specifications.

Conclusions and discussion

Within the context of ANCOVA, an underlying assumption is the parallelism of the regression lines associating the criterion variable with the covariate. It has been emphasized that the homogeneity of covariate regression slopes is the most important statistical assumption in ANCOVA. However, there are theoretical reasons and empirical evidences to document non-parallel phenomenon of regression lines across many scientific fields. Although the test of the hypothesis of parallel regression lines is a simple and straightforward procedure, the corre-sponding analytic derivations and computational algorithms of power and sample size deter-minations have not been examined in the literature. Conceivably, the corresponding power analysis and sample size determination must also be considered before it can be adopted as a general methodology in practice. To facilitate proper use and implication of traditional ANCOVA and extended alternatives, this article presents both pedagogical explication and numerical appraisal of power and sample size procedures for the detection of heterogeneity between two covariate regression coefficients. Despite the simplicity, this scenario embodies all the essential notion and critical feature of ANCOVA that can be useful in undertaking simi-lar considerations for the more involved multi-group situations.

The existing method of Dupont and Plummer (1998) seems to provide a simple solution and maintains reasonable accuracy for some model configurations. However, no research to date has properly examined its properties both analytically and empirically. The presented ana-lytic explication and empirical results showed that the approximate formula of Dupont and Plummer [13] does not guarantee to give accurate power and sample size calculations. The proposed exact approach has the distinct feature of accommodating the full distributional properties of normal covariates whereas the simplified approximate methods only utilize the partial information of covariate variances. It is important to note that although Glueck and Muller [27] and Shieh [28] considered the problem of adjusting power for random covariates in multivariate linear models, their model formulations do not cover the interaction effects between treatment groups and continuous covariates. Hence, the corresponding power and sample size procedures do not applied to the detection of slope heterogeneity considered here. Moreover, due to the complexity of multivariate settings, only moments of the covariate vari-ables are employed in the power formulas presented in Glueck and Muller [27] and Shieh [28]. Consequently, their methods do not take into account the full distributional features of covariate variables. In view of the overall accuracy and robustness, the exact approach is recommended over the approximate methods as a reliable tool in practical applications. The supporting SAS/ IML [15] and R [16] computer algorithms will yield accurate power calculations and sample size determinations provided that all the required information is properly specified.

Supporting information

S1 File. SAS programs.

(14)

S2 File. R programs. (DOCX)

Author Contributions

Conceptualization: GS. Data curation: GS. Formal analysis: GS. Funding acquisition: GS. Investigation: GS. Methodology: GS. Project administration: GS. Resources: GS. Software: GS. Supervision: GS. Validation: GS. Visualization: GS.

Writing – original draft: GS. Writing – review & editing: GS.

References

1. Glass G. V, Peckham P. D., & Sanders J. R. (1972). Consequences of failure to meet assumptions underlying the analysis of variance and covariance. Review of Educational Research, 42, 237–288.

2. Harwell M. (2003). Summarizing Monte Carlo results in methodological research: The single-factor, fixed-effects ANCOVA case. Journal of Educational and Behavioral Statistics, 28, 45–70.

3. Fleiss J. L. (2011). Design and analysis of clinical experiments (Vol. 73). New York, NY: Wiley.

4. Huitema B. (2011). The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies (Vol. 608). New York, NY: Wiley.

5. Maxwell S. E., & Delaney H. D. (2004). Designing experiments and analyzing data: A model comparison perspective ( 2nd ed.). Mahwah, NJ: Erlbaum.

6. Hauck W. W., Anderson S., & Marcus S. M. (1998). Should we adjust for covariates in nonlinear regres-sion analyses of randomized trials? Control Clinical Trials, 19, 249–256.

7. Hernandez A. V., Steyerberg E. W., & Habbema J. D. F. (2004). Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. Journal of Clinical Epidemiology, 57, 454–460.https://doi.org/10.1016/j.jclinepi.2003.09. 014PMID:15196615

8. Pocock S. J., Assmann S. E., Enos L. E., & Kasten L. E. (2002). Subgroup analysis, covariate adjust-ment and baseline comparison in clinical trial reporting: Current practice and problems. Statistics in Medicine, 21, 2917–2930.https://doi.org/10.1002/sim.1296PMID:12325108

9. Raab G. M., Day S., & Sales J. (2000). How to select covariates to include in the analysis of a clinical trial. Controlled Clinical Trials, 21, 330–342. PMID:10913808

10. Hauck W. W., Neuhaus J.M., Kalbfleisch J.D., & Anderson S. (1991). A consequence of omitted covari-ates when estimating odds ratios. Journal of Clinical Epidemiology, 44, 77–81. PMID:1986061

11. Gail M. H., Wieand S., & Piantadosi S. (1984). Biased estimates of treatment effect in randomized experiments with non-linear regression and omitted covariates. Biometrika, 71, 431–44.

(15)

12. Negassa A., & Hanley J. A. (2007). The effect of omitted covariates on confidence interval and study power in binary outcome analysis: A simulation study. Contemporary Clinical Trials, 28, 242–248.

https://doi.org/10.1016/j.cct.2006.08.007PMID:17011835

13. Dupont W. D., & Plummer W. D. (1998). Power and sample size calculations for studies involving linear regression. Controlled Clinical Trials, 19, 589–601. PMID:9875838

14. Shieh G. (2009). Detecting interaction effects in moderated multiple regression with continuous vari-ables: Power and sample size considerations. Organizational Research Methods, 12, 510–528.

15. SAS Institute. SAS/IML User’s Guide, Version 9.3. Cary, NC: SAS Institute Inc; 2014.

16. R Development Core Team. R: A language and environment for statistical computing [Computer soft-ware and manual]; 2016. Retrieved fromhttp://www.r-project.org.

17. Rogosa D. (1980). Comparing nonparallel regression lines. Psychological Bulletin, 88, 307–321.

18. Rutherford A. (1992). Alternatives to traditional analysis of covariance. British Journal of Mathematical and Statistical Psychology, 45, 197–223.

19. Sampson A. R. (1974). A tale of two regressions. Journal of the American Statistical Association, 69, 682–689.

20. Cramer E. M., & Appelbaum M. I. (1978). The validity of polynomial regression in the random regression model. Review of Educational Research, 48, 511–515.

21. Raudenbush S. W. (1997). Statistical analysis and optimal design for cluster randomized trials. Psycho-logical Methods, 2, 173–185.

22. McClelland G. H., & Judd C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114, 376–390. PMID:8416037

23. O’Connor B. P. (2006). Programs for problems created by continuous variable distributions in moder-ated multiple regression. Organizational Research Methods, 9, 554–567.

24. Johnson N. L., Kotz S., & Balakrishnan N. (1995). Continuous univariate distributions ( 2nd ed., Vol. 2). New York, NY: Wiley.

25. Dupont W. D., & Plummer W. D. (1990). Power and sample size calculations: A review and computer program. Controlled Clinical Trials, 11, 116–128. PMID:2161310

26. Loe H., & Silness J. (1963). Periodontal disease in pregnancy. Acta Odontologica Scandinavica, 21, 533–551. PMID:14121956

27. Glueck D. H., & Muller K.E. (2003). Adjusting power for a baseline covariate in linear models. Statistics in Medicine, 22, 2535–2551.https://doi.org/10.1002/sim.1341PMID:12898543

28. Shieh G. (2005). Power and sample size calculations for multivariate linear models with random explan-atory variables. Psychometrika, 70, 347–358.