On the computation of the distribution of the square of the sample multiple correlation coefficient

(1)

ELSEVIER Computational Statistics & Data Analysis 22 (1996) 345-350

COMPUTATIONAL

STATISTICS

& DATA ANALYSIS

On the computation of the

distribution of the square of the

sample multiple correlation

coefficient

Cherng G. Ding

Institute of Management Science, National Chiao Tung University, 4F, 114 Chung-Hsiao I'K Road, Section 1, Taipei, Taiwan, ROC

Abstract

Two computationally simple methods are proposed to evaluate the distribution function and the density of the square of the sample multiple correlation coefficient. No auxiliary routine is required. The accuracy of recursive computations can be effectively controlled. The distribution function and the density can be evaluated concurrently because their computing formulas are closely related. This property can enhance efficiency of Newton's method for computing the quantiles of the distribution. The corresponding algorithms are provided in a step-by-step form.

Keywords: Central beta distribution; Error bound; Gamma function; Multiple correlation coefficient; Newton's method; Noncentral beta distribution; Quantile; Series representation

1. Introduction

Let X t , X 2 , . . . ,

Xm

have a multivariate normal distribution with mean vector # and covariance matrix 2~, and R be the sample multiple correlation coefficient between X1 and X2, ... ,Xm based on a sample of size N > m. The density of Y = R 2 can be expressed as an infinite weighted sum of central beta densities as

(2)

346 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350

follows (see, e.g., Anderson, 1984, p. 145):

f ( y ; m, N, p2) F 2 [(N - 1)/2 + i] ( p 2 ) i (1 - - p Z ) ( N - 1)/2 y ( m - 1)/2 + i - 1(1 __ y ) ( N - ra- 2)/2 2., i = 0 F[(N - 1)/2]i!

F[(m

- 1)/2 + i] F[(N - m)/2] = ~ qif(Y; a + i, b), (1) i=O

where 0 _ y < 1, p is the population multiple correlation coefficient, F(0¢) is the

gamma function, a = (m - 1)/2, b = (N - m)/2, qi = (F(a + b + i)/F(a + b)i!)

(p2)/(1 --pZ)a+t,,

and f ( y ; a, b) is the central beta density with shape parameters

a and b. The distribution function of Y can, similarly, be expressed as an infinite weighted sum of central beta distribution functions. That is,

P(Y <_ y) = F(y; m, N, p2) = ~ q~F(y; a + i, b),

i = 0

(2)

where F(y; a, b) is the central beta distribution function with shape parameters

a and b. A recursive algorithm for evaluating F(y; m, N, p2) based on the series

representation in (2) was given by Ding and Bargmann (1991a). The algorithm sums up the terms in (2) until the derived upper bound for the error of truncation is less

than some predetermined accuracy. Auxiliary routines for evaluating F(y; a, b) and

the natural logarithm of the gamma function are required. Ding and Bargmann

(1991b) also computed the quantile yp such that f(yp; m, N, p2) = p for given values

of m ( > 1), N ( > m),

p2(0 ~< p2 < 1),

and p(0 < p < 1) by using the Illinois method.

In this paper, computationally simple methods are proposed to evaluate F(y;

m, N, p2) (based on an alternative series representation) and F(y; m, N, p2) (based

on the series representation in (1)). No auxiliary routine is required. The computational accuracy can be effectively controlled by using the error bounds obtained.

Moreover, the recurrence formulas for computing F(y; m, N,

p2)

a n d f ( y ; m, N, p2)

are closely related, and therefore F(y; m, N,

p2)

and f ( y ; m, N,

p2) can

be evalu-

ated concurrently. This property can enhance efficiency of Newton's method for computing the quantile yp. The corresponding algorithms are provided in a step-by-step form. Numerical methods and the resultant algorithms basi- cally follow those given in Ding (1994) for computing the noncentral beta distribution, which is an infinite weighted sum of central beta distributions with Poisson weights.

2. Numerical methods

The numerical methods discussed in this section for evaluating F(y; m, N,

p2)

a n d f ( y ; m, N, p2) is for 0 < y < 1. No computation is needed for y = 0 or y = 1 since f(0; m, N, p2) = f ( 1 ; rn, N, p2) = F(0; m, N,

p2) =

0, and F(1; m, N, p2) = 1.

(3)

C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 347 A recursive f o r m u l a for evaluating

F(y; a + i, b)

in (2) was given by Ding (1994) as follows:

F ( y ; a + i , b ) = ~

( 1 - - Y ) k ) f ( y ; a + k + l , b ) =

~ tk,

i = 0 , 1 , . . . , (3) :i(a +-b + where

F(a + b) . y,(1

to = r ( a + 1 ) F ( b ) _ y)b,

t ~ = t ~ _ l y ( a + b + i - - 1 ) / ( a + i ) ,

i _ > l . (4) It follows that

F(y; m, N, p2)

can be expressed by a new series in term of central beta densities:

F(y; m, N, 0 2) = ~ q~

i = i = O \ k = O qk

ti

: ~ l)iti, (5) i=O

where the terms are evaluated recursively by Vo = qo = (1 --

p2)a+b,

v~=vi_l +q~, q ~ = q ~ - ~ ( a + b + i - 1 ) ( p 2 ) / i ,

i>_l,

ti,

i >_ O,

as in (4). (6)

Since m a n d N are b o t h integers, the g a m m a function in (4) is easy to evaluate in a sense that F(~) = (ct - 1)! if ct is an integer, a n d F(~) = (~ - 1 ) . . . ½ x / ~ if ~ is

n--1

a half-integer. F ( y ; m, N, p2) can be a p p r o x i m a t e d by the finite s u m ~/=o

vitl.

Let

EF.

denote the error of truncation. Using the facts that Y,~oqi = 1 (see, e.g., A b r a m o w i t z and Stegun, 1965, p. 556), a n d ~i~,t~ < t,_~ y(a + b + n - 1)/ [(a + n) - (a + b + n)y] if a + n > (a + b + n)y (see Ding, 1994), we have, u n d e r the same condition,

E F . = ~, viti<_ ~ t i < t , - , y ( a + b + n - l l / [ ( a + n ) - ( a + b + n ) y ] .

(7)

i=n i=n

T h e error b o u n d given above is a decreasing function of n when a + n > (a + b + n)y. T o evaluate F ( y ; m, N, p2), accumulate the terms in (5), c o m p u t e d recursively t h r o u g h (6), until the error b o u n d is n o t greater t h a n a specified accuracy e. In fact, the evaluation of F(y; m, N, p2) requires no auxiliary routine.

The density f ( y ; m, N, p2) of Y expressed in (1) is a n o t h e r series in terms of central beta densities. Its terms can be evaluated recursively, a n d are related to

(4)

348 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350

those of (5) as follows:

f(y;m,N, p2) = ~ qif(y;a+i,b)= ~

qis,,

i = 0 i = 0

(8)

where r(a +

b)

_{y a -}₁

₍₁

_- _{y ) b - 1} So - r ( a )

r(b)

= ato/y/(1 - y), si = s t - l y ( a + b + i - 1 ) / ( a + i - 1 ) = t , - l ( a + b + i - 1)/(1 - y), i _> 1, (9) qi a n d h, i >_ O, are those in (6).

Let Ef, be the t r u n c a t i o n error at i = n for the series in (8). Since the sequence {sl} (i > n) is decreasing w h e n a + n > (a + b + n)y, we have, u n d e r the same condition,

E f n = q i s i < qisn -= Sn 1 - - qi = Sn(1 - - V n - 1 ) .

i=n i=n i = 0

(10) Likewise, the above error b o u n d is a decreasing function of n, a n d is used to control the accuracy of the evaluation o f f ( y ; m, N, p2). Again, the evaluation o f f ( y ; m, N, p2) requires no auxiliary routine.

Let G(y) = F(y; m, N, p2) _ p. T h e quantile yp is to be o b t a i n e d by solving the e q u a t i o n G(y) = 0. Since G(0) = - p, G(1) = 1 - p, a n d G is strictly increasing in [0, 1], the solution yp of G(y) = 0 is unique. An efficient root-finding m e t h o d is N e w t o n ' s m e t h o d , which requires the evaluations of b o t h of F(y; rn, N, p2) and G'(y) = f l y ; m, N, p2). The process is to repeat c o m p u t i n g (see, e.g., K e n n e d y and Gentle, 1980, pp. 72-73)

[F(yj; m, N, p2) _ p]

Y j + I - = Y j - f ( y j ; m , N , p2 ) , j = 0 , 1 . . . . , (11) until lYj+ 1 - Yjl ~ 6yj+ 1, where 6 is a specified accuracy. It is obvious that yp = 0 for p = 0 and yp = 1 for p = 1. N o c o m p u t a t i o n is needed for these cases. F o r 0 < p < 1, perform iterations (11) with the starting value Y0 = 0.5. F o r each iter- ation, F(yj; m, N, p 2) a n d f ( y ~ ; m, N, p2) can be evaluated concurrently rather t h a n i n d e p e n d e n t l y because their c o m p u t i n g formulas (5) and (8) are closely related. This p r o p e r t y can greatly enhance the c o m p u t a t i o n a l efficiency of N e w t o n ' s m e t h o d . T o ensure the legality of the iterate yj + 1, use the same adjustments as in Ding (1994): if Yj+I ~ O, replace it by y J 2 ; if Yj+I ~ 1, replace it by (yj + 1)/2. N o t e that the

evaluations of F(yj; m, N, pZ) a n d f ( y j ; m, N, pZ) should be precise e n o u g h so that the accuracy of N e w t o n ' s solution yp can be ensured. Also, the n u m b e r of iterations s h o u l d be controlled.

(5)

C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 349

3. Algorithms

According to the formulas discussed in Section 2, we provide three effective algorithms in a step-by-step form for evaluating F ( y ; m, N , p 2 ) , f ( y ; m, N , p2), and the quantile yp. The algorithm for computing yp is particularly efficient.

Algorithm

A. This algorithm computes the distribution function F ( y ; m, N , / 9 2) of Y = R 2 for given values of y(0 < y < 1), m ( > 1), N ( > m), and p 2 ( 0 ~ / 9 2 ~ 1).

A1 (Specify the accuracy). Set E P S ~ e (some desired accuracy, e.g., 10-6).

A2 (Set the constants). Set a ~- (m - 1)/2, b ~- ( N - m)/2.

A3 (Initialize). Set n ~ 1, t ~ (r(a + b)/r(a + 1)F(b)) ya(1 - y)b, q ~_ (1 -- p2)a+b, V ~-- q, C D F ~ vt.

A4 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step A6. A5 (Update the term and then accumulate. Then increase n by 1). Set

q ~ - q ( a + b + n - 1 ) ( p E ) / n , v ~ - v + q , t ~ - t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ~ n + 1, and return to step A4.

A6 (Find the error bound and check for convergence). Set b o u n d ~ -

ty(a + b + n - 1)/((a + n) - (a + b + n)y). If bound < E P S , terminate the algorithm. ( C D F is the answer.)

A7 (Update the term and then accumulate. Then increase n by 1). Set

q * - - q ( a + b + n - 1 ) ( p E ) / n , v ~ v + q , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ,--- n + 1, and return to step A6.

Algorithm B. This algorithm computes the density f ( y ; m, N, p 2 ) of Y = R 2 for

given values of y(0 < y < 1), m ( > 1), N ( > m), and p2(0 < p2 < 1).

B1 (Specify the accuracy). Set E P S . - e (some desired accuracy, e.g., 10-6).

B2 (Set the constants). Set a ~ (m - 1)/2, b *- ( N - m)/2.

B3 (Initialize). Set n ~- 1, s ~- (F(a + b)/F(a) F(b))

ya-

1 (1 -

y)b- 1,

q ~ (1 -

p2)a+b,

v ~- q, P D F ~- qs.

114 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step B6. 115 (Update the term and then accumulate. Then increase n by 1). Set

q ~- q(a + b + n - 1 ) ( p E ) / n , v ~- v + q, s ~ sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B4.

B6 (Find the error bound and check for convergence). Set b o u n d ~

sy(a + b + n - 1)(1 - v)/(a + n - 1). If bound < E P S , terminate the algorithm.

( P D F is the answer.)

B7 (Update the term and then accumulate. Then increase n by 1). Set

q ~ q(a + b + n - 1 ) ( p 2 ) / n , v *-- v + q, s , - sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B6.

Algorithm C. This algorithm computes the quantile yp of the distribution of Y = R 2

such that F(yp; m, N , / 9 2) --- p for given values of m(> 1), N ( > m), p2(0 < / 9 2 ~ 1),

and p(0 < p < 1).

C1 (Specify the accuracy and the maximum number of Newton's iterations al- lowed). Set E P S ~- ~ (some desired accuracy, e.g., 10 - 6 , for computing F ( y ; m, N , t9 2)

(6)

350 C.G. Ding/ Computational Statistics & Data Analysis 22 (1996) 345-350

a n d f ( y ; m, N, p2)), D E L T A ~ 6 (some desired accuracy, e.g., 10 -4, for c o m p u t i n g

yp), I T R M A X ~ Nmax (an integer, e.g., 10, for controlling the n u m b e r of Newton's iterations).

C2 (Set the constants). Set a *-- (m - 1)/2, b ~ ( N - m)/2, c o e f f ~

r(a + b)/r(a +

1)r(b),

qo ~ ( 1 -- p2)a+b.

C3 (Loop on k, Newton's iteration). First initialize y by setting y , - 0.5. Then perform steps C 4 - C 1 0 for k = 1, 2, . . . , I T R M A X . (Steps C 4 - C 1 0 constitute one iteration.) C4 (Initialize within each iteration). Set n ~ 1, t ~ coeff y"(1 - y)b, S ~ at~y~(1 -- y), q ~ qo, v ~ q, C D F ~ vt, P D F ~ qs.

C5 ( C o m p a r e a + n, (a + b + n)y). If a + n > ( a + b + n)y, go to step C7. C6 ( U p d a t e the term and then a c c u m u l a t e for b o t h of F ( y ; m, N , p2) a n d f ( y ; m , N , p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1 ) ( p 2 ) / n , v ~ v + q , s ~ t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F * - - C D F + v t , P D F ~ P D F + qs, n ~ n + 1, and return to step C5.

C7 (Find the corresponding error bounds). Set b n d c d f ~ ty(a + b + n - 1)/

((a + n) - (a + b + n)y), b n d p d f ~ t(a + b + n - 1)(1 - v)/(1 - y).

C8 (Check for convergence). If b n d c d f < E P S a n d b n d p d f <_ E P S , go to step C 10. C9 ( U p d a t e the terms a n d then accumulate for F ( y ; m, N , p2) a n d / o r f ( y ;

re, N, p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1)(p2)/n, v ~ v + q. If

b n d c d f <_ E P S , set s ~ sy(a + b + n - 1)/(a + n - 1), P D F *-- P D F + qs, n ~ n + 1, b n d p d f ~ sy(a + b + n - 1)(1 - v)/(a + n - 1), a n d return to step C8; otherwise if b n d p d f < E P S , set t ~ ty(a + b + n - 1)/(a + n), C D F ~ C D F + vt, n ~ n + 1, b n d c d f ~ ty(a + b + n - 1)/((a + n) - (a + b + n)y), and return to step C8; otherwise set s * - - t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) ,

C D F *-- C D F + vt, P D F ~ P D F + qs, n *-- n + 1, a n d return to step C7.

C10 (Find new y a n d check for convergence of N e w t o n ' s process). Set

d i f f ~ ( C D F - p ) / P D F . If y - d / i f < 0, set y ,-- y/2; otherwise if y - d / i f > 1, set y ~ (y + 1)/2; otherwise set y ~ y - d/ft. If I d i f f l / y < D E L T A , terminate the algorithm. (y is the answer.)

C l l ( O u t p u t error message). T e r m i n a t e the algorithm with the message " N o convergence after Nmax iterations".

F O R T R A N codes based on the above algorithms are available u p o n request.

References

Abramowitz, M. and I.A. Stegun, Handbook of mathematical functions (Dover, New York, 1965). Anderson, T.W., An introduction to multivariate analysis, 2nd. Edn. (Wiley, New York, 1984). Ding, C.G., On the computation of the noncentral beta distribution, Comput. Statist. Data Anal., 18

(1994) 449-455.

Ding, C.G. and R.E. Bargmann, Algorithm AS 260: evaluation of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991a) 195-198.

Ding, C.G. and R.E. Bargmann, Algorithm AS 261: quantiles of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991b) 199-202.