Power and sample size calculations for multivariate linear models with random explanatory variables

(1)

DOI:10.1007/s11336-003-1094-0

POWER AND SAMPLE SIZE CALCULATIONS FOR MULTIVARIATE LINEAR MODELS WITH RANDOM EXPLANATORY VARIABLES

Gwowen Shieh

national chiao tung university

This article considers the problem of power and sample size calculations for normal outcomes within the framework of multivariate linear models. The emphasis is placed on the practical situation that not only the values of response variables for each subject are just available after the observations are made, but also the levels of explanatory variables cannot be predetermined before data collection. Using analytic justi-fication, it is shown that the proposed methods extend the existing approaches to accommodate the extra variability and arbitrary configurations of the explanatory variables. The major modification involves the noncentrality parameters associated with the F approximations to the transformations of Wilks likelihood ratio, Pillai trace and Hotelling-Lawley trace statistics. A treatment of multivariate analysis of covariance models is employed to demonstrate the distinct features of the proposed extension. Monte Carlo simulation studies are conducted to assess the accuracy using a child’s intellectual development model. The results update and expand upon current work in the literature.

Key words: MANCOVA, MANOVA, multivariate regression, noncentral F distribution. 1. Introduction

This paper studies the multivariate linear model that provides a basic and convenient framework for repeated measures and longitudinal researches. The general setup of the multi-variate linear model encompasses many common statistical models as special cases. When all explanatory variables are quantitative or continuous covariates, the models are called multivar-iate regression models. When the design matrix contains only indicator variables taking values of zero or one, the models are called multivariate analysis of variance models; while the de-sign matrix contains both continuous covariates and indicator variables, the class of models is called multivariate analysis of covariance (MANCOVA) models. Methods for analyzing data from repeated measures designs have recently received considerable attention in the literature; see Keselman, Algina, and Kowalchuk (2001) for a comprehensive review and their references for related discussions. Traditionally, the values of the explanatory variables in these models are treated as fixed and known and the only variability in the model pertains exclusively to the response variables. These multivariate studies are referred to as fixed (conditional) mod-els. The results would be specific to the particular values of the explanatory variables that are observed or preset by the researcher. However, it is quite common in many multivariate lin-ear model applications that the levels of the explanatory variables for each subject cannot be controlled and are available only after the observations are made, such as the quantitative or con-tinuous covariates in the multivariate regression and multivariate analysis of covariance models. These models are usually referred to as random (unconditional) models. Sampson (1974) stud-ied the problem with normal explanatory variables for both the univariate multiple regression model and the multivariate regression model. Also, Mendoza and Stafford (2001) emphasized the important differences between the random and fixed models when performing confidence

The author wishes to thank the associate editor and the referees for comments which improve the paper considerably. This research was partially supported by a grant from the Natural Science Council of Taiwan.

Requests for reprints should be sent to: Gwowen Shieh, Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan 30050, R.O.C., E-mail: [email protected]

c

(2)

interval computation, power and sample size calculations for the squared multiple correlation coefficient.

To calculate power and sample size in the context of multivariate linear models, Muller and Peterson (1984) and O’Brien and Shieh (1992) suggested several promising approaches using non-central F approximations to the functions of multivariate test statistics. The methods of Muller and Peterson (1984) seem to have been widely accepted, see Keselman (1998), Muller, LaVange, Ramey, and Ramey (1992), O’Brien and Muller (1993), Rencher (1998, Section 4.4) and Timm (2002, Section 4.16). In contrast, the work of O’Brien and Shieh (1992) is less well known. Nevertheless, it was pointed out in O’Brien and Muller (1993) that the methods of Muller and Peterson (1984) are not invariant to sample size and thus the noncentrality can not be defined. In general, O’Brien and Shieh’s (1992) approaches appear to be more accurate than the competing methods of Muller and Peterson (1984) according to their numerical examination. Thus, the for-mulas of O’Brien and Shieh (1992) are of great potential use and should be properly recognized. However, the existing approaches of Muller and Peterson (1984) and O’Brien and Shieh (1992) do not take into account the variability of explanatory variables. Therefore, these methods are only applicable to fixed models. As pointed out in Sampson (1974), the random explanatory vari-ables do not affect the analysis if the analysis is performed conditionally. However, the resulting power functions for random models are fundamentally different and usually more complicated than that of fixed models. Furthermore, Timm (2002, Section 4.17) cautioned the unknown effect of treating random explanatory variables as fixed in power and sample size calculations. Under the circumstances, it is questionable to apply the existing approaches to multivariate linear models with random explanatory variables without proper recognition and account for the extra variability of explanatory variables.

Glueck and Muller (2003) recently considered the problem of adjusting power for a single baseline covariate with Gaussian distribution in multivariate linear models. They proposed both univariate and multivariate methods based on small sample F and large sample chi-square approx-imations of Hotelling-Lawley trace statistic. Although their small sample approxapprox-imations give accurate results, they are computationally intensive. On the contrary, their multivariate approaches appear to be less accurate. Moreover, a single normal covariate in the model would be inadequate if several key random covariates affect the response variable in important and distinctive ways. A natural generalization to incorporate multiple random covariates and non-normal distribu-tions should be essential to their approach for performing power and sample size calculadistribu-tions in practice. Accordingly, Shieh (2003) suggested several extensions to accommodate arbitrary configurations of the explanatory variables. The noncentral chi-square approximations proposed in Shieh (2003) are based on the Pillai trace and Hotelling-Lawley trace statistics, respectively. In fact, it can be shown that the multivariate method in Glueck and Muller (2003) is a special case of the noncentral chi-square approximation of Hotelling-Lawley trace statistic in Shieh (2003). More importantly, it is found in Shieh (2003) that the noncentral chi-square approximations have a potential problem associated with the control of Type I error rate or the sensitivity to the unbalanced design for the most common research paradigm of multivariate general linear mod-els. Furthermore, they are outperformed by the F transforms of Hotelling-Lawley trace statistic. Although the focus of Shieh (2003) is on the differences between generalized estimating equations and likelihood-based approaches, the notion of F transformations of Hotelling-Lawley trace sta-tistic has a clear advantage over their chi-square approximation counterpart. As mentioned above, the available methods are restricted to the F transformations of Hotelling-Lawley trace statistic. Consequently, the extensions of similar methodology have not been addressed for the other two commonly used test statistics of Wilks likelihood ratio and Pillai trace. This is partially due to the fact that the analytical properties associated with F approximations of Hotelling-Lawley trace statistic are comparatively easier to justify than those of Wilks likelihood ratio and Pillai trace criteria.

(3)

In order to improve the practical usefulness and to overcome the analytical difficulties of power and sample size calculation procedures, this research provides systematic solutions to power analyses and sample size determinations for multivariate linear models with random explanatory variables. The ideas of O’Brien and Shieh (1992) and Shieh (2003) are synthesized to present natural modifications of the existing approaches so that the random setting of explanatory vari-ables is embedded in the procedures. The proposed methodology extends and combines various considerations into one unified framework for the prominent F -type transformations of Wilks likelihood ratio, Pillai trace and Hotelling-Lawley trace statistics. To demonstrate the versatility of the proposed approaches, detailed discussion is provided for a class of MANCOVA models. Consequently, the aforementioned multivariate linear setting presented in Glueck and Muller (2003) can be viewed as a special case. The distinct features of the proposed approach are the accommodation of an arbitrary number of fixed, random and mixed components of explanatory variables, the flexibility of joint distributions of explanatory variables and the coverage of major multivariate test statistics within the framework of multivariate linear models.

In the next section, the usual multivariate linear model with fixed explanatory variables and the F -based approaches to power and sample size calculations of O’Brien and Shieh (1992) are described. Section 3 provides the analytical justification and important details of the proposed extension and illustrates the results within the framework of MANCOVA models with both fixed and random explanatory variables. In Section 4, Monte Carlo simulation study is conducted to assess the finite sample adequacy of the proposed methods using a child’s intellectual development model. Finally, Section 5 contains some final remarks.

2. The F -based Approaches for Fixed Models

Consider the standard multivariate linear model with all the levels of explanatory variables fixed a priori

Y= XB + , (1)

where Y= (Y1, ..., YN)Tis a N× p matrix with Yias the p× 1 vector of observed sequence of measurements for the ith subject; X= (x1, ..., xN)Tis a N× r design matrix with full column rank r < N , where xiis the r× 1 vector of explanatory variables associated with the ith subject;

B is the r× p matrix of unknown regression coefficients; and = (1, ...,N)Tis a N× p matrix withias the p× 1 vector of random errors associated with the ith subject, for i = 1, ..., N. The errorsi are assumed to have independent and identical normal distribution Np(0, ), where is a p× p positive-definite covariance matrix. The general linear hypothesis H0: CBA= 0is considered here, where C is the c×r matrix of between-subject contrasts with full row rank c ≤ r, and A is the p× a matrix of within-subject contrasts with full column rank a ≤ p. The maximum likelihood estimators for B and are ˆB = (XTX)−1XTY and ˆ = (Y − X ˆB)T(Y− X ˆB/N, respectively. The common statistics for H0: CBA = 0are obtained from the eigenvalues of

E−1H, where

E= (YA − X ˆBA)T(YA− X ˆBA) and H = (C ˆBA − 0)T[C(XTX)−1CT]−1(C ˆBA− 0). With standard results, it can be easily established that E and H have independent Wishart distri-butions:

E∼ Wa(N− r, ATA) and H ∼ Wa(c, ATA, ),

where N − r ≥ a , c ≥ a and = (ATA)−1(CBA− 0)T[C(XTX)−1CT]−1(CBA− 0) is the noncentrality parameter matrix. More specifically, the major test statistics related to the eigenvalues of E−1H for testing the general linear hypothesis are the Wilks likelihood ratio test

(4)

statistic = |E(E + H)−1|, Pillai trace V = tr[H(E + H)−1] and Hotelling-Lawley trace T = tr(E−1H), where tr(·) is the trace of a matrix, and Roy’s (1953) largest eigenvalue criterion θ= λ1/(1+ λ1), where λ1is the largest eigenvalue for the nonzero eigenvalues associated with

E−1H. It is in general difficult to compute the noncentral distributions of the test statistics since most results involve asymptotic expansion and zonal polynomial function, see Muirhead (1982, Section 10.4). Therefore, more tractable F -transformations have been proposed for practical use. For Wilks’ = |E(E + H)−1|, Rao (1951) showed that

F=

1− 1/t 1/t ·

df2

ca (2)

is approximately distributed as an F random variable with ca and df2degrees of freedom under the null hypothesis, where t = 1 for ca ≤ 3 and t = {(c2a2− 4)/(c2+ a2− 5)}1/2for ca≥ 4, and df2= t{N − r − (a − c + 1)/2} − (ca − 2)/2. Regarding Pillai trace V = tr[H(E + H)−1], an approximate F -statistic of Pillai (1956) is given by

FV=

V s− V ·

df2V

ca . (3)

Under the null hypothesis, FV has an approximate F distribution with ca and df2V degrees of freedom, where s= minimum(c, a) and df2V = s(N −r +s −a). Two common transformations of Hotelling-Lawley trace T = tr(E−1H) are considered here. Pillai and Samson (1959) proposed

FT1= T · df2T1

sca , (4)

where df2T1= s(N − r − a − 1) + 2. Moreover, McKeon (1974) presented

FT2= T · df2T2

hca , (5)

where df2T2= (ca + 2)g + 4, g = {(N − r)2− (N − r)(2a + 3) + a(a + 3)}/{(N − r)(c +

a+ 1) − (c + 2a + a2− 1)}, and h = (df2T2− 2)/(N − r − a − 1). Under the null hypothesis, both FT1and FT2are compared to an F distribution with numerator degrees of freedom ca, and denominator degrees of freedom df2T1and df2T2, respectively. Since no accurate approximation using a single F transformation is available for Roy’s largest eigenvalue criterion, it will not be considered here. In summary, these F -type tests defined in (2)–(5) are carried out by rejecting H0 if the F value is greater than Fca,df2, α, where Fca,df2, αis the upper α percentage point of the central F -distribution F (ca, df2) and df2 represents the corresponding denominator degrees of freedom (df2, df2V, df2T1, df2T2).

For the case under the alternative hypothesis, it is more difficult to determine the distribu-tions of the test statistics , V and T due to their complexity. Among various approximadistribu-tions to the noncentral distribution of the test statistics, Muller and Peterson (1984) proposed non-central F approximations to the F -type statistics (2)–(4). Along the same line of power and sample size calculations within the framework of fixed multivariate linear models, O’Brien and Shieh (1992) considered slightly simplified approximations to the four F -type statistics (2)–(5). Suppose that X= (x1, ..., xN)Thas m (< N ) distinct components xuj with corresponding pro-portionsj, j = 1, . . ., m, where

m

j=1j = 1. Then XTX can be expressed as XTX= NK, where K=m_j₌₁jXujXT_uj. Hence,

(5)

where D= (CBA − 0)T(CK−1CT)−1(CBA− 0). It follows that the distribution of H can be expressed as H∼ Wa(c, ATA, N ¯), where ¯ = (ATA)−1D. Essentially, O’Brien and Shieh (1992) considered the following approximations

F ˙∼ F (ca, df2,), FV ˙∼ F (ca, df2V,V),

FT1˙∼F (ca, df2T1,T1) and FT2˙∼F (ca, df2T2,T2), (6) where = N · t(−1/tOS − 1), V = N · sVOS/(s− VOS),T1= T2= N · TOS, with OS=

|(Ia+ ¯)−1|, VOS= tr[ ¯(Ia+ ¯)−1], TOS= tr( ¯) and Iais the a× a identity matrix. 3. The Proposed Methods

Suppose that the explanatory variables{X∗_i = x∗_i, i= 1, ..., N} follow a distribution f (X∗_i) with finite moments, which can be discrete, continuous or mixed such as the interaction between a random continuous covariate and a fixed discrete predictor. The form of f (X∗_i) is assumed to be dependent on none of the unknown parameters B and. Thus, the notation of xiand X as observed values for the multivariate linear model (1) in the previous section are replaced with corresponding X∗_i and X* as random variables here, where X*= (X₁∗, ..., X∗_N)T. Accordingly, ˆB, H and E are expressed as ˆB*= (X*TX*)−1X*TY, H*= (C ˆB*A − 0)T[C(X*TX*)−1CT]−1(C ˆB*A− 0) and E*= (YA − X* ˆB*A)T(YA− X* ˆB*A), respectively. It follows from the standard asymp-totic result that X*TX*/N = N_i₌₁X∗_iX∗T_i /N converges in probability to K*, where K*= EX∗[X∗_iX_i∗T] and EX∗[·] denotes the expectation taken with respect to the distribution of X∗i. Under local alternatives to H0of the form H1: CBA= 0+ /N1/2for the constant matrix, it follows from the application of Slutsky’s Theorem that H* converges in distribution to the Wishart distribution Wa(c, ATA, *), where H* = (C ˆB*A − 0)T[C(X*TX*)−1CT]−1(C ˆB*A− 0) and* = (ATA)−1T(CK*−1CT)−1. For the purpose of relating asymptotic power func-tion calculafunc-tions to the fixed values of B and in terms of the local alternatives to H0, the following operational and asymptotically equivalent Wishart distribution for the distribution of H* is considered H* ˙∼Wa(c, ATA, N ¯*), where ¯* = (ATA)−1D* and D*= (CBA − 0)T(CK*−1CT)−1(CBA− 0). It can be shown that the distribution of E* is the Wishart dis-tribution Wa(N− r, ATA) under both hypotheses for either cases of fixed and random models. Therefore, E*−1H*= (E*/N)−1(H*/N ) converges in probability to ¯*.

Let (F∗ ,F_V∗, F_T1∗, F_T2∗ ) represent (F, FV, FT1, FT2) in which the matrices E and H are replaced by their counterparts E* and H*, respectively. Likewise, the statistics , V and T are written as *, V * and T *, respectively. Then these test statistics depend on the eigenvalues of E*−1H*, or equivalently the roots φ* in|H*−φ *E*| = 0. Again, under the notion of large sam-ple approximation, the matrices E* and H* are replaced by their (approximate) expected values E[E*] = (N − r)ATA and E[H*] ˙= cATA + ND*. This leads to a population version of |H*−φ*E*| = 0, namely,

0= |(cATA + ND∗)− γ∗(N− r)ATA| ˙= |D∗− γ∗ATA| for large N,

where γ * denote the eigenvalues of ¯*. Consequently, the noncentrality parameters of the statis-tics (F∗, F_V∗, F_T1∗, F_T2∗) are functions of the eigenvalues γ * of ¯*.

In order to provide a unified formulation of the approximate noncentrality parameter for the F-type statistic as in O’Brien and Shieh (1992), we extract the factor N from the F -type statistic by evaluating the limiting value of (ca·F */N), namely,

f∗= lim N→∞ ca· F∗ N .

(6)

Then, the result is equated to the asymptotic form of the expected value of (ca·F */N). It can be shown that E ca· F∗ N = df2− 2 df2 · ca+ δ∗ N ˙= δ∗ N for large N,

where* is the noncentrality parameter of the F * statistic. It follows that the noncentrality param-eter δ* can be approximated by δ*= N · f *, where f * is the effect size. Hence, the distributions of the F -type statistics (F∗, F∗_V, F ∗_T1, F_T2∗ ) can be approximated as follows:

F∗˙∼F (ca, df2, δ∗), FV∗˙∼F (ca, df2V, δ∗V),

F_T1∗ ˙∼F (ca, df2T1, δT1∗ )and FT2∗ ˙∼F (ca, df2T2, δT2∗ ), (7) where the corresponding noncentrality parameters are of the form δ∗ = N · t(*−1/t− 1), δ∗ = N·sV*/(s − V *), δ∗

T1 = δ∗T2 = N · T *, with * = |(Ia+ ¯*)−1|, V * = tr[ ¯*(Ia + ¯*)−1], and T *= tr( ¯*). Accordingly, the proposed approximate statistical power achieved for testing hypothesis H0: CBA= 0with specified significance level α against the alternative H1: CBA= 0is the probability

P{F (ca, df2, δ∗) > Fca,df2,α}, (8) where df2 = (df2, df2V, df2T1, df2T2) and δ*= (δ∗, δ_V∗, δ_T1∗ , δ_T2∗ ) for the approximate F statistics (F∗, F_V∗, F_T1∗, F_T2∗ ).

It is important to note that the proposed approximations given in (7) resemble the noncentral F approximate counterparts of O’Brien and Shieh (1992) given in (6) for fixed models. Concep-tually, the resulting formulas and computations for both situations are identical if the probabilities for discrete distribution of X∗_i are viewed as the weights for distinct configuration of x∗_i. With this conceptual and straightforward modification of notation, the proposed methods apply to the usual multivariate linear models with fixed levels of explanatory variables as well. In general, numerical integration may be needed to carry out the expectation of K* in ¯* for continuous or complex explanatory variables. Note that the proposed approximations for F_T1∗ and F_T1∗ here are identical to those described in Shieh (2003) based on heuristic extensions. However, not only is the analytical justification presented here more in-depth, the systematic generalization also cov-ers more F transformations (F∗ and F_V∗) than Shieh (2003). The approximate power functions defined in (8) can be inverted to calculate the sample size needed to test hypothesis H0: CBA =0 against the alternative H1: CBA= 0in order to attain the specified power 1− β for the chosen significance level α, parameter values B and, and probability distribution f (X∗_i). However, it usually involves an iterative process to find the solution because both F (ca, df2, δ*) and Fca,df2,α depend on the sample size N .

To demonstrate the general setup of the proposed approaches, the class of MANCOVA mod-els is considered. Let the design matrix X* in the multivariate linear model be of the form XFG

= [F G] where F represents the N × rFmatrix of fixed explanatory variables and G denotes the

N × rG matrix of random explanatory variables with r = rF+ rG. As illustrated in Section 2 for the fixed setting, suppose that F= (f1, ..., fN)T has m finite number of distinct values fuj with corresponding proportions πj, j = 1, . . . , m, where

m

j=1πj = 1. Moreover, the random component G= (G1, ..., GN)Tis composed of random vectors Giwhere Gifollows a distribution

f(Gi) with finite moments, i = 1, . . . , N. Specifically, assume that the mean vector EGi[Gi]= µGand covariance matrix Var[Gi]= G. It follows that XTFGXFG/N is adequately approximated by K∗_FGunder large sample consideration, where

(7)

K∗_FG= K∗₁₁ K∗₁₂ K∗₁₂T K∗₂₂ , K∗₁₁=m_j₌₁πjfujfT_uj, K₁₂∗ = m

j=1πjfujµT_Gand K₂₂∗ = µGµT_G+G. As a result, with X*= XFG and K*= K∗_FG, the proposed general formulas for the four statistics given in (7) can immediately be applied to performing power and sample size calculations in the MANCOVA models. It is noteworthy that the proposed approaches do not require a full specification of the distribution form for the explanatory variables; only the mean vector and covariance matrix of the explanatory variables are needed. If the investigator is unable or unwilling to specify a distribution for G, the observed n× rGmatrix Gobsfrom the pilot study can be employed as an empirical approximation to the underlying distribution f (Gi). Specifically, one can proceed to approximate the distribution of Giwith p(Gi= Guj)= 1/n, where Guj, j= 1, ..., n, are the n distinct columns of GT_obs. Hence, the proposed approach is still applicable in such situations.

For the special case of a single random covariate (rG = 1) with mean 0 and variance σG2, it is obvious that K∗₁₂and K∗₂₂reduce to the rF× 1 null vector 0 and σG2, respectively. Hence, K∗FG is simplified to K∗_FG1, where K∗_FG1,= K₁₁∗ 0 0T σ_G2 .

Essentially, the proposed formulas can be readily established under this particular specification. For this particular model, Glueck and Muller (2003) suggested both small sample F and large sample χ2 _{approximations to estimate the power of Hotelling-Lawley trace statistic. Although} their small sample approximations are very accurate, they require the integration of conditional power with the density function of noncentrality. The computation appears to be considerably more complicated than the proposed F approximations. In addition, their large sample noncentral χ2 approximation has the identical noncentrality as the FT2 approximation given in (7) with

δ∗_T2= N· tr( ¯∗_FG1)and ¯∗_FG1 = (ATA)−1(CBA− 0)T(CK∗FG1−1CT)−1(CBA − 0). It is interesting to note that the illustration and argument for the derivation of noncentrality parameter

¯∗

FG1 presented here are much simpler than those in Lemma 3 and Theorem 3 of Glueck and Muller (2003). More importantly, it is well known that the chi-square approximation is too liberal in controlling the type I error under finite-sample assessment. According to the findings in Shieh (2003), the phenomenon continues to exist in the case of random explanatory variables. Hence, its practical use is problematic.

4. Simulation Study

In this section, Monte Carlo simulation study is conducted to evaluate the finite sample prop-erty of the proposed approaches. For illustration, the model formulation of the child’s intellectual development example considered in Muller et al. (1992, Section 3) is exploited as the base for the numerical examinations. The example involves a longitudinal study of a child’s intellectual performance as a function of the mother’s estimated verbal intelligence. With child IQ measure-ments at 12, 24, and 36 months (p= 3), and with intercept, linear, quadratic, and cubic trends in the mother’s standardized IQ (MSIQ) as explanatory variables (r= 4), this yields

Yi = _IQ 12 IQ24 IQ36 , X∗_i=    1 MSIQ MSIQ2 MSIQ3    and B =   

βI.12 βI.24 βI.36

βL.12 βL.24 βL.36 βQ.12 βQ.24 βQ.36 βC.12 βC.24 βC.36    ,

where IQt is the child’s IQ measurement at time t, βI.t is the intercept, while βL.t, βQ.t, and βC.t are the corresponding coefficients of linear, quadratic, and cubic values of MSIQ for time t= 12,

(8)

24, and 36, respectively. In an attempt to demonstrate the usefulness of the proposed methods for studies with random explanatory variables, it is assumed that the mother’s standardized IQ has a standard normal distribution, MSIQ∼ N(0, 1). It follows that

K∗= EX∗[X∗iX∗Ti ]=    1 m1 m2 m3 m1 m2 m3 m4 m2 m3 m4 m5 m3 m4 m5 m6    =    1 0 1 0 0 1 0 3 1 0 3 0 0 3 0 15    ,

where mi is the ith moment of a standard normal distribution. According to the description in Muller et al. (1992), the model parameter matrices are set as

B=    114.46 104.66 98.83 2.88 8.77 10.67 −0.71 −0.90 −1.30 −0.21 −0.54 −0.72    and = _218.48 _83.66 _72.19 83.66 251.92 158.60 72.19 158.60 244.58 .

The hypothesized relationship between mother and child competence of interest corresponds to a test of the time× mother’s IQ interaction H0: CBA= 0 with the between-subject and within-subject contrast matrices (c= 3 and a = 2)

C= ₀ ₁ ₀ ₀ 0 0 1 0 0 0 0 1 and A=  −1/√2 1/√6 0 −2/√6 1/√2 1/√6 respectively.

With the specifications described above, the simulation study is conducted in three steps. First, the effect sizes and estimates of sample sizes required for testing the specified hypothesis with significance level α= 0.05 and power levels (0.80, 0.90) are calculated. The resulting effect sizes and sample sizes correspond to the proposed statistics and are presented in Table 1.

Table 1.

Calculated sample sizes and estimates of actual power at specified sample size for the child development model

F∗ FV∗ FT1∗ FT2∗

MSIQ N(0, 1)

Effect size 0.1288 0.1248 0.1328 0.1328

N for power 0.80 and 0.90 110 139 113 143 106 135 108 137

Estimated α 0.0523 0.0488 0.0508 0.0485 0.0530 0.0499 0.0523 0.0492

Nominal power at N 0.8042 0.9013 0.7896 0.8905 0.8181 0.9111 0.8112 0.9074

Estimated power 0.8024 0.8996 0.7961 0.8980 0.8070 0.9017 0.8051 0.9001

MSIQ standardized Gamma(5, 2)

Effect size 0.1216 0.1184 0.1248 0.1248

N for power 0.80 and 0.90 116 147 119 151 113 143 115 145

Estimated α 0.0506 0.0468 0.0491 0.0455 0.0514 0.0481 0.0506 0.0470

Estimated power 0.7872 0.8819 0.7815 0.8790 0.7917 0.8844 0.7889 0.8833

MSIQ standardized Gamma(10, 2)

Effect size 0.1220 0.1186 0.1254 0.1254

N for power 0.80 and 0.90 115 146 119 151 112 143 114 144

Estimated α 0.0502 0.0467 0.0496 0.0452 0.0510 0.0478 0.0497 0.0464

(9)

In the second step, the sample sizes in the following simulations are unified by choosing the sample size estimate associated with F∗in step 1, say N, as the benchmark. Under the null hypothesis H0: CBA= 0 with the given sample sizes and model configurations, the estimates of actual Type I error rate for the nominal significance level α = 0.05 are computed through Monte Carlo simulation using 10,000 replicate data sets. For each replicate, Nvalues of MSIQ are generated from standard normal distribution. In turn, these values determine the mean func-tions for generating N children’s intellectual performance outcomes. The computations are conducted by comparing the simulated values of test statistics with their corresponding critical values F6, df 2, 0.05. The estimated α is the proportion of the 10,000 replicates whose test statistic values exceed the critical value.

The third and last step studies the power approximations under the alternative hypothesis H1: CBA= 0. For a fair comparison among these approaches, the nominal powers at N are recalculated for all four competing methods. The SAS/IML (SAS Institute, 2003) program used to perform power calculations of the proposed methods is provided in the Appendix. As expected, the nominal powers of F∗ are almost identical to 0.80 or 0.90, while those of the other three tests (F_V∗, F_T1∗, F_T2∗ ) slightly deviate from the values of 0.80 or 0.90. As in the previous step, the estimates of true power are computed using 10,000 replicate data sets. The adequacy of the sample size formula is determined by the difference between the estimated and nominal values of power. All calculations are performed using programs written with SAS/IML (2003).

As suggested by a referee, the vector of explanatory variables composed of powers of a standardized Gamma variable for MSIQ is also considered. Specifically, X∗_i = [1 Z Z2Z3]T, where Z= (X−E[X])/[Var(X)]1/2and X has a Gamma (g1, g2) distribution with the shape and scale parameters g1and g2, respectively. For illustrative purpose, the parameters (g1, g2) are set as (5, 2) and (10, 2). It can be shown that the matrix K* for the standardized Gamma(5, 2) and standardized Gamma(10, 2) distributions of MSIQ are

K∗=    1 0 1 0.8944 0 1 0.8944 4.2 1 0.8944 4.2 11.0909 0.8944 4.2 11.0909 45.8    and    1 0 1 0.6325 0 1 0.6325 3.6 1 0.6325 3.6 7.0835 0.6325 3.6 7.0835 29.2    ,

respectively. Ultimately, the simulation process was repeated with these two standardized Gamma settings and the corresponding empirical results are also presented in Table 1.

It can be seen from the results summarized in Table 1 that first, the computed sample sizes allow comparison of relative efficiencies of the four approaches. It is interesting to note that the ordering of sample size is consistently F_V∗ > F∗ > F_T2∗ > F_T1∗ in all cases considered here. However, it appears that the differences in magnitude are small. Among the three MSIQ distri-butions, the two standardized Gamma distributions incur slightly larger sample size estimates than the standard normal situation for all four F transformations. Next, the resulting values of estimated α are surprisingly accurate for all four statistics throughout Table 1. However, it should be noted that the errors between nominal powers and estimated powers associated with the skewed Gamma distributions are relatively larger than those of the symmetric normal distribution. This situation is more prominent for the standardized Gamma(5, 2) distribution due to the outsized moments. In general, the computed power approximations for the competing methods maintain a close agreement between the estimated power and nominal power. Specifically, the method F_T1∗ gives the largest errors among the four competing formulas. Thus, for the two methods based on F transforms of Hotelling-Lawley trace statistic,the F_T2∗ approach constantly provides more accurate results. This finding is in accordance with those of Shieh (2003). More importantly, both the proposed extensions F∗ and F_V∗have excellent performance of achieving the nominal levels in all standard normal and standardized Gamma settings for MSIQ. Overall, the accuracy

(10)

of the proposed approaches increases slightly with the sample size, and varies with the structure of the model parameters. Nevertheless, the results are sufficiently accurate for most purposes. In practical situations, it is a difficult task to assess the robustness of the proposed approaches for the underlying distribution of explanatory variables. For the cases of complicated and unbal-anced explanatory variables distributions, it is advisable to consider a range of design variations to provide guidance about the sample sizes required for the study.

5. Conclusions

For the power and sample size calculations within the framework of multivariate linear mod-els, this article provides updated description of existing methods of O’Brien and Shieh (1992). However, their approaches only apply to the fixed models whose results would be specific to the particular realization of the explanatory variables. As in the child’s development example, the explanatory variables typically can not be fixed in advance and induce addition variability to the paradigm. Similar applications are frequently encountered in behavioral and social studies. For practical purpose, the work of O’Brien and Shieh (1992) and Shieh (2003) are exploited thoroughly to obtain useful methods and promising results. The simple structure of F transformations permits computational simplifications that are explicitly recognized in the statistical procedures for anal-ysis of variance. The essence of the proposed approaches is the modification of the noncentrality for the F -type statistics to accommodate the characteristics of random explanatory variables. For implementation, the only difference is that the designation of particular values for design matrix amounts to the specification of the joint distribution of explanatory variables. In fact, the general formulation allows fixed, random and mixed components of explanatory variables. The numerical assessment suggests that the discrepancies between the estimated and nominal levels of type I error rate and power seemed completely acceptable, given the many unknowns in study planning. The proposed methods are efficient, accurate and more simplified than the other approximations in similar studies. According to these findings, it is concluded that the proposed methods expand the current literature and should be valuable in many applications.

Appendix

SAS/IML program for performing power calculations of the proposed methods

PROC IML; ALPHA=0.05;N=110; B={114.46 104.66 98.83, 2.88 8.77 10.67, −0.71 −0.90 −1.30, −0.21 −0.54 −0.72}; SIGMA={218.48 83.66 72.19, 83.66 251.92 158.60, 72.19 158.60 244.58}; KSTAR={1 0 1 0,0 1 0 3,1 0 3 0,0 3 0 15}; CMAT={0 1 0 0, 0 0 1 0, 0 0 0 1}; AMAT=({−1,0,1}/SQRT(2))||({1,−2,1}/SQRT(6)); R=NROW(B);P=NCOL(B);C=NROW(CMAT);A=NCOL(AMAT);S=MIN(A,C);CA=C#A; SMALLN=N−R;

EPM=(AMAT *SIGMA*AMAT);

HPM=(CMAT*B*AMAT) *INV(CMAT*INV(KSTAR)*CMAT )*(CMAT*B*AMAT); EIHPM=INV(EPM)*HPM;

*L;

IF CA <4 THEN T=1;ELSE T=SQRT((CA##2−4)/(C##2+A##2−5)); DF2L=T#(SMALLN−(A−C+1)/2)−(CA−2)/2;

(11)

LPM=DET(INV(I(A)+EIHPM));EFSL=T#(LPM##(−1/T)−1); POWERL=1−PROBF(FCRITL,CA,DF2L,N#EFSL); *V; DF2V=S#(SMALLN+S−A); FCRITV=FINV(1−ALPHA,CA,DF2V,0); VPM=TRACE(EIHPM*INV(I(A)+EIHPM));EFSV=S#VPM/(S−VPM); POWERV=1−PROBF(FCRITV,CA,DF2V,N#EFSV); *T1; DF2T1=S#(SMALLN−A−1)+2; FCRITT1=FINV(1−ALPHA,CA,DF2T1,0); T1PM=TRACE(EIHPM);EFST1=T1PM; POWERT1=1−PROBF(FCRITT1,CA,DF2T1,N#EFST1); *T2; G=(SMALLN##2−SMALLN#(2#A+3)+A#(A+3))/ (SMALLN#(C+A+1)−(C+2#A+A##2−1)); DF2T2=4+(CA+2)#G; FCRITT2=FINV(1−ALPHA,CA,DF2T2,0); T2PM=TRACE(EIHPM);EFST2=T2PM; POWERT2=1−PROBF(FCRITT2,CA,DF2T2,N#EFST2); PRINT N[FORMAT=6.0],

EFSL[FORMAT=7.4] EFSV[FORMAT=7.4] EFST1[FORMAT=7.4] EFST2[FORMAT=7.4],

POWERL[FORMAT=7.4] POWERV[FORMAT=7.4] POWERT1[FORMAT=7.4] POWERT2[FORMAT=7.4];

QUIT;

References

Glueck, D.H., & Muller, K.E. (2003). Adjusting power for a baseline covariate in linear models. Statistics in Medicine,

22, 2535–2551.

Keselman, H.J. (1998). Testing treatment effects in repeated measures designs: An update for psychophysiological researchers. Psychophysiology, 35, 470–478.

Keselman, H.J., Algina, J., & Kowalchuk, R.K. (2001). The analysis of repeated measures designs: a review. British

Journal of Mathematical and Statistical Psychology, 54, 1–20.

McKeon, J.J. (1974). F approximations to the distribution of Hotelling’s T2₀. Biometrika, 61, 381–383.

Mendoza, J.L., & Stafford, K.L. (2001). Confidence interval, power calculation, and sample size estimation for the squared multiple correlation coefficient under the fixed and random regression models: A computer program and useful standard tables. Educational and Psychological Measurement, 61, 650–667.

Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. New York, NY: Wiley.

Muller, K.E., & Peterson, B.L. (1984). Practical methods for computing power in testing the multivariate general linear hypothesis. Computational Statistics and Data Analysis, 2, 143–158.

Muller, K.E., LaVange, L.M., Ramey, S.L., & Ramey, C.T. (1992). Power calculations for general linear multivariate models including repeated measures applications. Journal of the American Statistical Association, 87, 1209–1226. O’Brien, R.G., & Muller, K.E. (1993). Unified power analysis for t-tests through multivariate hypotheses. In L.K. Edwards

(Ed.), Applied Analysis of Variance in Behavioral Science (pp. 297–344). New York, NY: Marcel Dekker. O’Brien, R.G., & Shieh, G. (1992). Pragmatic, unifying algorithm gives power probabilities for common F tests of the

multivariate general linear hypothesis. In paper presented at the Annual Joint Statistical Meetings of the American Statistical Association, Boston, Massachusetts.

Pillai, K.C.S. (1956). On the distribution of the largest or the smallest root of a matrix in multivariate analysis. Biometrika,

43, 122–127.

Pillai, K.C.S., & Samson, P. Jr. (1959). On Hotelling’s generalization of T2_{. Biometrika, 46, 160–168.}

Rao, C.R. (1951). An asymptotic expansion of the distribution of Wilks’ criterion. Bulletin of the International Statistics

Institute, 33, 177–180.

Rencher, A.C. (1998). Multivariate Statistical Inference and Applications. New York, NY: Wiley.

Roy, S.N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Annals of Mathematical

Statistics, 24, 220–238.

Sampson, A.R. (1974). A tale of two regressions. Journal of the American Statistical Association, 69, 682–689. SAS Institute (2003). SAS/IML software: usage and reference. Version 8, Carey, NC.

(12)

Shieh, G. (2003). A comparative study of power and sample size calculations for multivariate general linear models.

Multivariate Behavioral Research, 38, 285–307.

Timm, N.H. (2002). Applied Multivariate Analysis. New York, NY: Springer.

Manuscript received 28 MAY 2003 Final version received 23 MAR 2004