ELSEVIER Computational Statistics & Data Analysis 22 (1996) 345-350
COMPUTATIONAL
STATISTICS
& DATA ANALYSIS
On the computation of the
distribution of the square of the
sample multiple correlation
coefficient
Cherng G. Ding
Institute of Management Science, National Chiao Tung University, 4F, 114 Chung-Hsiao I'K Road, Section 1, Taipei, Taiwan, ROC
Abstract
Two computationally simple methods are proposed to evaluate the distribution function and the density of the square of the sample multiple correlation coefficient. No auxiliary routine is required. The accuracy of recursive computations can be effectively controlled. The distribu- tion function and the density can be evaluated concurrently because their computing formulas are closely related. This property can enhance efficiency of Newton's method for computing the quantiles of the distribution. The corresponding algorithms are provided in a step-by-step form.
Keywords: Central beta distribution; Error bound; Gamma function; Multiple correlation coefficient; Newton's method; Noncentral beta distribution; Quantile; Series representation
1. Introduction
Let X t , X 2 , . . . ,
Xm
have a multivariate normal distribution with mean vector # and covariance matrix 2~, and R be the sample multiple correlation coefficient between X1 and X2, ... ,Xm based on a sample of size N > m. The density of Y = R 2 can be expressed as an infinite weighted sum of central beta densities as0167-9473/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved
346 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350
follows (see, e.g., Anderson, 1984, p. 145):
f ( y ; m, N, p2) F 2 [(N - 1)/2 + i] ( p 2 ) i (1 - - p Z ) ( N - 1)/2 y ( m - 1)/2 + i - 1(1 __ y ) ( N - ra- 2)/2 2., i = 0 F[(N - 1)/2]i!
F[(m
- 1)/2 + i] F[(N - m)/2] = ~ qif(Y; a + i, b), (1) i=Owhere 0 _ y < 1, p is the population multiple correlation coefficient, F(0¢) is the
gamma function, a = (m - 1)/2, b = (N - m)/2, qi = (F(a + b + i)/F(a + b)i!)
(p2)/(1 --pZ)a+t,,
and f ( y ; a, b) is the central beta density with shape parametersa and b. The distribution function of Y can, similarly, be expressed as an infinite weighted sum of central beta distribution functions. That is,
P(Y <_ y) = F(y; m, N, p2) = ~ q~F(y; a + i, b),
i = 0
(2)
where F(y; a, b) is the central beta distribution function with shape parameters
a and b. A recursive algorithm for evaluating F(y; m, N, p2) based on the series
representation in (2) was given by Ding and Bargmann (1991a). The algorithm sums up the terms in (2) until the derived upper bound for the error of truncation is less
than some predetermined accuracy. Auxiliary routines for evaluating F(y; a, b) and
the natural logarithm of the gamma function are required. Ding and Bargmann
(1991b) also computed the quantile yp such that f(yp; m, N, p2) = p for given values
of m ( > 1), N ( > m),
p2(0 ~< p2 < 1),
and p(0 < p < 1) by using the Illinois method.In this paper, computationally simple methods are proposed to evaluate F(y;
m, N, p2) (based on an alternative series representation) and F(y; m, N, p2) (based
on the series representation in (1)). No auxiliary routine is required. The computa- tional accuracy can be effectively controlled by using the error bounds obtained.
Moreover, the recurrence formulas for computing F(y; m, N,
p2)
a n d f ( y ; m, N, p2)are closely related, and therefore F(y; m, N,
p2)
and f ( y ; m, N,p2) can
be evalu-ated concurrently. This property can enhance efficiency of Newton's method for computing the quantile yp. The corresponding algorithms are provided in a step-by-step form. Numerical methods and the resultant algorithms basi- cally follow those given in Ding (1994) for computing the noncentral beta distribu- tion, which is an infinite weighted sum of central beta distributions with Poisson weights.
2. Numerical methods
The numerical methods discussed in this section for evaluating F(y; m, N,
p2)
a n d f ( y ; m, N, p2) is for 0 < y < 1. No computation is needed for y = 0 or y = 1 since f(0; m, N, p2) = f ( 1 ; rn, N, p2) = F(0; m, N,
p2) =
0, and F(1; m, N, p2) = 1.C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 347 A recursive f o r m u l a for evaluating
F(y; a + i, b)
in (2) was given by Ding (1994) as follows:F ( y ; a + i , b ) = ~
( 1 - - Y ) k ) f ( y ; a + k + l , b ) =
~ tk,
i = 0 , 1 , . . . , (3) :i(a +-b + whereF(a + b) . y,(1
to = r ( a + 1 ) F ( b ) _ y)b,t ~ = t ~ _ l y ( a + b + i - - 1 ) / ( a + i ) ,
i _ > l . (4) It follows thatF(y; m, N, p2)
can be expressed by a new series in term of central beta densities:F(y; m, N, 0 2) = ~ q~
i = i = O \ k = O qkti
: ~ l)iti, (5) i=Owhere the terms are evaluated recursively by Vo = qo = (1 --
p2)a+b,
v~=vi_l +q~, q ~ = q ~ - ~ ( a + b + i - 1 ) ( p 2 ) / i ,
i>_l,
ti,
i >_ O,
as in (4). (6)Since m a n d N are b o t h integers, the g a m m a function in (4) is easy to evaluate in a sense that F(~) = (ct - 1)! if ct is an integer, a n d F(~) = (~ - 1 ) . . . ½ x / ~ if ~ is
n--1
a half-integer. F ( y ; m, N, p2) can be a p p r o x i m a t e d by the finite s u m ~/=o
vitl.
LetEF.
denote the error of truncation. Using the facts that Y,~oqi = 1 (see, e.g., A b r a m o w i t z and Stegun, 1965, p. 556), a n d ~i~,t~ < t,_~ y(a + b + n - 1)/ [(a + n) - (a + b + n)y] if a + n > (a + b + n)y (see Ding, 1994), we have, u n d e r the same condition,E F . = ~, viti<_ ~ t i < t , - , y ( a + b + n - l l / [ ( a + n ) - ( a + b + n ) y ] .
(7)i=n i=n
T h e error b o u n d given above is a decreasing function of n when a + n > (a + b + n)y. T o evaluate F ( y ; m, N, p2), accumulate the terms in (5), c o m p u t e d recursively t h r o u g h (6), until the error b o u n d is n o t greater t h a n a speci- fied accuracy e. In fact, the evaluation of F(y; m, N, p2) requires no auxiliary routine.
The density f ( y ; m, N, p2) of Y expressed in (1) is a n o t h e r series in terms of central beta densities. Its terms can be evaluated recursively, a n d are related to
348 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350
those of (5) as follows:
f(y;m,N, p2) = ~ qif(y;a+i,b)= ~
qis,,i = 0 i = 0
(8)
where r(a +b)
y a - 1(1
- y ) b - 1 So - r ( a )r(b)
= ato/y/(1 - y), si = s t - l y ( a + b + i - 1 ) / ( a + i - 1 ) = t , - l ( a + b + i - 1)/(1 - y), i _> 1, (9) qi a n d h, i >_ O, are those in (6).Let Ef, be the t r u n c a t i o n error at i = n for the series in (8). Since the sequence {sl} (i > n) is decreasing w h e n a + n > (a + b + n)y, we have, u n d e r the same condition,
E f n = q i s i < qisn -= Sn 1 - - qi = Sn(1 - - V n - 1 ) .
i=n i=n i = 0
(10) Likewise, the above error b o u n d is a decreasing function of n, a n d is used to control the accuracy of the evaluation o f f ( y ; m, N, p2). Again, the evaluation o f f ( y ; m, N, p2) requires no auxiliary routine.
Let G(y) = F(y; m, N, p2) _ p. T h e quantile yp is to be o b t a i n e d by solving the e q u a t i o n G(y) = 0. Since G(0) = - p, G(1) = 1 - p, a n d G is strictly increasing in [0, 1], the solution yp of G(y) = 0 is unique. An efficient root-finding m e t h o d is N e w t o n ' s m e t h o d , which requires the evaluations of b o t h of F(y; rn, N, p2) and G'(y) = f l y ; m, N, p2). The process is to repeat c o m p u t i n g (see, e.g., K e n n e d y and Gentle, 1980, pp. 72-73)
[F(yj; m, N, p2) _ p]
Y j + I - = Y j - f ( y j ; m , N , p2 ) , j = 0 , 1 . . . . , (11) until lYj+ 1 - Yjl ~ 6yj+ 1, where 6 is a specified accuracy. It is obvious that yp = 0 for p = 0 and yp = 1 for p = 1. N o c o m p u t a t i o n is needed for these cases. F o r 0 < p < 1, perform iterations (11) with the starting value Y0 = 0.5. F o r each iter- ation, F(yj; m, N, p 2) a n d f ( y ~ ; m, N, p2) can be evaluated concurrently rather t h a n i n d e p e n d e n t l y because their c o m p u t i n g formulas (5) and (8) are closely related. This p r o p e r t y can greatly enhance the c o m p u t a t i o n a l efficiency of N e w t o n ' s m e t h o d . T o ensure the legality of the iterate yj + 1, use the same adjustments as in Ding (1994): if Yj+I ~ O, replace it by y J 2 ; if Yj+I ~ 1, replace it by (yj + 1)/2. N o t e that the
evaluations of F(yj; m, N, pZ) a n d f ( y j ; m, N, pZ) should be precise e n o u g h so that the accuracy of N e w t o n ' s solution yp can be ensured. Also, the n u m b e r of iterations s h o u l d be controlled.
C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 349
3. Algorithms
According to the formulas discussed in Section 2, we provide three effective algorithms in a step-by-step form for evaluating F ( y ; m, N , p 2 ) , f ( y ; m, N , p2), and the quantile yp. The algorithm for computing yp is particularly efficient.
Algorithm
A. This algorithm computes the distribution function F ( y ; m, N , / 9 2) of Y = R 2 for given values of y(0 < y < 1), m ( > 1), N ( > m), and p 2 ( 0 ~ / 9 2 ~ 1).A1 (Specify the accuracy). Set E P S ~ e (some desired accuracy, e.g., 10-6).
A2 (Set the constants). Set a ~- (m - 1)/2, b ~- ( N - m)/2.
A3 (Initialize). Set n ~ 1, t ~ (r(a + b)/r(a + 1)F(b)) ya(1 - y)b, q ~_ (1 -- p2)a+b, V ~-- q, C D F ~ vt.
A4 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step A6. A5 (Update the term and then accumulate. Then increase n by 1). Set
q ~ - q ( a + b + n - 1 ) ( p E ) / n , v ~ - v + q , t ~ - t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ~ n + 1, and return to step A4.
A6 (Find the error bound and check for convergence). Set b o u n d ~ -
ty(a + b + n - 1)/((a + n) - (a + b + n)y). If bound < E P S , terminate the algo- rithm. ( C D F is the answer.)
A7 (Update the term and then accumulate. Then increase n by 1). Set
q * - - q ( a + b + n - 1 ) ( p E ) / n , v ~ v + q , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ,--- n + 1, and return to step A6.
Algorithm B. This algorithm computes the density f ( y ; m, N, p 2 ) of Y = R 2 for
given values of y(0 < y < 1), m ( > 1), N ( > m), and p2(0 < p2 < 1).
B1 (Specify the accuracy). Set E P S . - e (some desired accuracy, e.g., 10-6).
B2 (Set the constants). Set a ~ (m - 1)/2, b *- ( N - m)/2.
B3 (Initialize). Set n ~- 1, s ~- (F(a + b)/F(a) F(b))
ya-
1 (1 -y)b- 1,
q ~ (1 -p2)a+b,
v ~- q, P D F ~- qs.
114 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step B6. 115 (Update the term and then accumulate. Then increase n by 1). Set
q ~- q(a + b + n - 1 ) ( p E ) / n , v ~- v + q, s ~ sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B4.
B6 (Find the error bound and check for convergence). Set b o u n d ~
sy(a + b + n - 1)(1 - v)/(a + n - 1). If bound < E P S , terminate the algorithm.
( P D F is the answer.)
B7 (Update the term and then accumulate. Then increase n by 1). Set
q ~ q(a + b + n - 1 ) ( p 2 ) / n , v *-- v + q, s , - sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B6.
Algorithm C. This algorithm computes the quantile yp of the distribution of Y = R 2
such that F(yp; m, N , / 9 2) --- p for given values of m(> 1), N ( > m), p2(0 < / 9 2 ~ 1),
and p(0 < p < 1).
C1 (Specify the accuracy and the maximum number of Newton's iterations al- lowed). Set E P S ~- ~ (some desired accuracy, e.g., 10 - 6 , for computing F ( y ; m, N , t9 2)
350 C.G. Ding/ Computational Statistics & Data Analysis 22 (1996) 345-350
a n d f ( y ; m, N, p2)), D E L T A ~ 6 (some desired accuracy, e.g., 10 -4, for c o m p u t i n g
yp), I T R M A X ~ Nmax (an integer, e.g., 10, for controlling the n u m b e r of Newton's iterations).
C2 (Set the constants). Set a *-- (m - 1)/2, b ~ ( N - m)/2, c o e f f ~
r(a + b)/r(a +
1)r(b),
qo ~ ( 1 -- p2)a+b.C3 (Loop on k, Newton's iteration). First initialize y by setting y , - 0.5. Then perform steps C 4 - C 1 0 for k = 1, 2, . . . , I T R M A X . (Steps C 4 - C 1 0 constitute one iteration.) C4 (Initialize within each iteration). Set n ~ 1, t ~ coeff y"(1 - y)b, S ~ at~y~(1 -- y), q ~ qo, v ~ q, C D F ~ vt, P D F ~ qs.
C5 ( C o m p a r e a + n, (a + b + n)y). If a + n > ( a + b + n)y, go to step C7. C6 ( U p d a t e the term and then a c c u m u l a t e for b o t h of F ( y ; m, N , p2) a n d f ( y ; m , N , p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1 ) ( p 2 ) / n , v ~ v + q , s ~ t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F * - - C D F + v t , P D F ~ P D F + qs, n ~ n + 1, and return to step C5.
C7 (Find the corresponding error bounds). Set b n d c d f ~ ty(a + b + n - 1)/
((a + n) - (a + b + n)y), b n d p d f ~ t(a + b + n - 1)(1 - v)/(1 - y).
C8 (Check for convergence). If b n d c d f < E P S a n d b n d p d f <_ E P S , go to step C 10. C9 ( U p d a t e the terms a n d then accumulate for F ( y ; m, N , p2) a n d / o r f ( y ;
re, N, p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1)(p2)/n, v ~ v + q. If
b n d c d f <_ E P S , set s ~ sy(a + b + n - 1)/(a + n - 1), P D F *-- P D F + qs, n ~ n + 1, b n d p d f ~ sy(a + b + n - 1)(1 - v)/(a + n - 1), a n d return to step C8; otherwise if b n d p d f < E P S , set t ~ ty(a + b + n - 1)/(a + n), C D F ~ C D F + vt, n ~ n + 1, b n d c d f ~ ty(a + b + n - 1)/((a + n) - (a + b + n)y), and return to step C8; otherwise set s * - - t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) ,
C D F *-- C D F + vt, P D F ~ P D F + qs, n *-- n + 1, a n d return to step C7.
C10 (Find new y a n d check for convergence of N e w t o n ' s process). Set
d i f f ~ ( C D F - p ) / P D F . If y - d / i f < 0, set y ,-- y/2; otherwise if y - d / i f > 1, set y ~ (y + 1)/2; otherwise set y ~ y - d/ft. If I d i f f l / y < D E L T A , terminate the algo- rithm. (y is the answer.)
C l l ( O u t p u t error message). T e r m i n a t e the algorithm with the message " N o convergence after Nmax iterations".
F O R T R A N codes based on the above algorithms are available u p o n request.
References
Abramowitz, M. and I.A. Stegun, Handbook of mathematical functions (Dover, New York, 1965). Anderson, T.W., An introduction to multivariate analysis, 2nd. Edn. (Wiley, New York, 1984). Ding, C.G., On the computation of the noncentral beta distribution, Comput. Statist. Data Anal., 18
(1994) 449-455.
Ding, C.G. and R.E. Bargmann, Algorithm AS 260: evaluation of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991a) 195-198.
Ding, C.G. and R.E. Bargmann, Algorithm AS 261: quantiles of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991b) 199-202.