• 沒有找到結果。

On the computation of the distribution of the square of the sample multiple correlation coefficient

N/A
N/A
Protected

Academic year: 2021

Share "On the computation of the distribution of the square of the sample multiple correlation coefficient"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

ELSEVIER Computational Statistics & Data Analysis 22 (1996) 345-350

COMPUTATIONAL

STATISTICS

& DATA ANALYSIS

On the computation of the

distribution of the square of the

sample multiple correlation

coefficient

Cherng G. Ding

Institute of Management Science, National Chiao Tung University, 4F, 114 Chung-Hsiao I'K Road, Section 1, Taipei, Taiwan, ROC

Abstract

Two computationally simple methods are proposed to evaluate the distribution function and the density of the square of the sample multiple correlation coefficient. No auxiliary routine is required. The accuracy of recursive computations can be effectively controlled. The distribu- tion function and the density can be evaluated concurrently because their computing formulas are closely related. This property can enhance efficiency of Newton's method for computing the quantiles of the distribution. The corresponding algorithms are provided in a step-by-step form.

Keywords: Central beta distribution; Error bound; Gamma function; Multiple correlation coefficient; Newton's method; Noncentral beta distribution; Quantile; Series representation

1. Introduction

Let X t , X 2 , . . . ,

Xm

have a multivariate normal distribution with mean vector # and covariance matrix 2~, and R be the sample multiple correlation coefficient between X1 and X2, ... ,Xm based on a sample of size N > m. The density of Y = R 2 can be expressed as an infinite weighted sum of central beta densities as

0167-9473/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved

(2)

346 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350

follows (see, e.g., Anderson, 1984, p. 145):

f ( y ; m, N, p2) F 2 [(N - 1)/2 + i] ( p 2 ) i (1 - - p Z ) ( N - 1)/2 y ( m - 1)/2 + i - 1(1 __ y ) ( N - ra- 2)/2 2., i = 0 F[(N - 1)/2]i!

F[(m

- 1)/2 + i] F[(N - m)/2] = ~ qif(Y; a + i, b), (1) i=O

where 0 _ y < 1, p is the population multiple correlation coefficient, F(0¢) is the

gamma function, a = (m - 1)/2, b = (N - m)/2, qi = (F(a + b + i)/F(a + b)i!)

(p2)/(1 --pZ)a+t,,

and f ( y ; a, b) is the central beta density with shape parameters

a and b. The distribution function of Y can, similarly, be expressed as an infinite weighted sum of central beta distribution functions. That is,

P(Y <_ y) = F(y; m, N, p2) = ~ q~F(y; a + i, b),

i = 0

(2)

where F(y; a, b) is the central beta distribution function with shape parameters

a and b. A recursive algorithm for evaluating F(y; m, N, p2) based on the series

representation in (2) was given by Ding and Bargmann (1991a). The algorithm sums up the terms in (2) until the derived upper bound for the error of truncation is less

than some predetermined accuracy. Auxiliary routines for evaluating F(y; a, b) and

the natural logarithm of the gamma function are required. Ding and Bargmann

(1991b) also computed the quantile yp such that f(yp; m, N, p2) = p for given values

of m ( > 1), N ( > m),

p2(0 ~< p2 < 1),

and p(0 < p < 1) by using the Illinois method.

In this paper, computationally simple methods are proposed to evaluate F(y;

m, N, p2) (based on an alternative series representation) and F(y; m, N, p2) (based

on the series representation in (1)). No auxiliary routine is required. The computa- tional accuracy can be effectively controlled by using the error bounds obtained.

Moreover, the recurrence formulas for computing F(y; m, N,

p2)

a n d f ( y ; m, N, p2)

are closely related, and therefore F(y; m, N,

p2)

and f ( y ; m, N,

p2) can

be evalu-

ated concurrently. This property can enhance efficiency of Newton's method for computing the quantile yp. The corresponding algorithms are provided in a step-by-step form. Numerical methods and the resultant algorithms basi- cally follow those given in Ding (1994) for computing the noncentral beta distribu- tion, which is an infinite weighted sum of central beta distributions with Poisson weights.

2. Numerical methods

The numerical methods discussed in this section for evaluating F(y; m, N,

p2)

a n d f ( y ; m, N, p2) is for 0 < y < 1. No computation is needed for y = 0 or y = 1 since f(0; m, N, p2) = f ( 1 ; rn, N, p2) = F(0; m, N,

p2) =

0, and F(1; m, N, p2) = 1.

(3)

C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 347 A recursive f o r m u l a for evaluating

F(y; a + i, b)

in (2) was given by Ding (1994) as follows:

F ( y ; a + i , b ) = ~

( 1 - - Y ) k ) f ( y ; a + k + l , b ) =

~ tk,

i = 0 , 1 , . . . , (3) :i(a +-b + where

F(a + b) . y,(1

to = r ( a + 1 ) F ( b ) _ y)b,

t ~ = t ~ _ l y ( a + b + i - - 1 ) / ( a + i ) ,

i _ > l . (4) It follows that

F(y; m, N, p2)

can be expressed by a new series in term of central beta densities:

F(y; m, N, 0 2) = ~ q~

i = i = O \ k = O qk

ti

: ~ l)iti, (5) i=O

where the terms are evaluated recursively by Vo = qo = (1 --

p2)a+b,

v~=vi_l +q~, q ~ = q ~ - ~ ( a + b + i - 1 ) ( p 2 ) / i ,

i>_l,

ti,

i >_ O,

as in (4). (6)

Since m a n d N are b o t h integers, the g a m m a function in (4) is easy to evaluate in a sense that F(~) = (ct - 1)! if ct is an integer, a n d F(~) = (~ - 1 ) . . . ½ x / ~ if ~ is

n--1

a half-integer. F ( y ; m, N, p2) can be a p p r o x i m a t e d by the finite s u m ~/=o

vitl.

Let

EF.

denote the error of truncation. Using the facts that Y,~oqi = 1 (see, e.g., A b r a m o w i t z and Stegun, 1965, p. 556), a n d ~i~,t~ < t,_~ y(a + b + n - 1)/ [(a + n) - (a + b + n)y] if a + n > (a + b + n)y (see Ding, 1994), we have, u n d e r the same condition,

E F . = ~, viti<_ ~ t i < t , - , y ( a + b + n - l l / [ ( a + n ) - ( a + b + n ) y ] .

(7)

i=n i=n

T h e error b o u n d given above is a decreasing function of n when a + n > (a + b + n)y. T o evaluate F ( y ; m, N, p2), accumulate the terms in (5), c o m p u t e d recursively t h r o u g h (6), until the error b o u n d is n o t greater t h a n a speci- fied accuracy e. In fact, the evaluation of F(y; m, N, p2) requires no auxiliary routine.

The density f ( y ; m, N, p2) of Y expressed in (1) is a n o t h e r series in terms of central beta densities. Its terms can be evaluated recursively, a n d are related to

(4)

348 C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350

those of (5) as follows:

f(y;m,N, p2) = ~ qif(y;a+i,b)= ~

qis,,

i = 0 i = 0

(8)

where r(a +

b)

y a - 1

(1

- y ) b - 1 So - r ( a )

r(b)

= ato/y/(1 - y), si = s t - l y ( a + b + i - 1 ) / ( a + i - 1 ) = t , - l ( a + b + i - 1)/(1 - y), i _> 1, (9) qi a n d h, i >_ O, are those in (6).

Let Ef, be the t r u n c a t i o n error at i = n for the series in (8). Since the sequence {sl} (i > n) is decreasing w h e n a + n > (a + b + n)y, we have, u n d e r the same condition,

E f n = q i s i < qisn -= Sn 1 - - qi = Sn(1 - - V n - 1 ) .

i=n i=n i = 0

(10) Likewise, the above error b o u n d is a decreasing function of n, a n d is used to control the accuracy of the evaluation o f f ( y ; m, N, p2). Again, the evaluation o f f ( y ; m, N, p2) requires no auxiliary routine.

Let G(y) = F(y; m, N, p2) _ p. T h e quantile yp is to be o b t a i n e d by solving the e q u a t i o n G(y) = 0. Since G(0) = - p, G(1) = 1 - p, a n d G is strictly increasing in [0, 1], the solution yp of G(y) = 0 is unique. An efficient root-finding m e t h o d is N e w t o n ' s m e t h o d , which requires the evaluations of b o t h of F(y; rn, N, p2) and G'(y) = f l y ; m, N, p2). The process is to repeat c o m p u t i n g (see, e.g., K e n n e d y and Gentle, 1980, pp. 72-73)

[F(yj; m, N, p2) _ p]

Y j + I - = Y j - f ( y j ; m , N , p2 ) , j = 0 , 1 . . . . , (11) until lYj+ 1 - Yjl ~ 6yj+ 1, where 6 is a specified accuracy. It is obvious that yp = 0 for p = 0 and yp = 1 for p = 1. N o c o m p u t a t i o n is needed for these cases. F o r 0 < p < 1, perform iterations (11) with the starting value Y0 = 0.5. F o r each iter- ation, F(yj; m, N, p 2) a n d f ( y ~ ; m, N, p2) can be evaluated concurrently rather t h a n i n d e p e n d e n t l y because their c o m p u t i n g formulas (5) and (8) are closely related. This p r o p e r t y can greatly enhance the c o m p u t a t i o n a l efficiency of N e w t o n ' s m e t h o d . T o ensure the legality of the iterate yj + 1, use the same adjustments as in Ding (1994): if Yj+I ~ O, replace it by y J 2 ; if Yj+I ~ 1, replace it by (yj + 1)/2. N o t e that the

evaluations of F(yj; m, N, pZ) a n d f ( y j ; m, N, pZ) should be precise e n o u g h so that the accuracy of N e w t o n ' s solution yp can be ensured. Also, the n u m b e r of iterations s h o u l d be controlled.

(5)

C.G. Ding / Computational Statistics & Data Analysis 22 (1996) 345-350 349

3. Algorithms

According to the formulas discussed in Section 2, we provide three effective algorithms in a step-by-step form for evaluating F ( y ; m, N , p 2 ) , f ( y ; m, N , p2), and the quantile yp. The algorithm for computing yp is particularly efficient.

Algorithm

A. This algorithm computes the distribution function F ( y ; m, N , / 9 2) of Y = R 2 for given values of y(0 < y < 1), m ( > 1), N ( > m), and p 2 ( 0 ~ / 9 2 ~ 1).

A1 (Specify the accuracy). Set E P S ~ e (some desired accuracy, e.g., 10-6).

A2 (Set the constants). Set a ~- (m - 1)/2, b ~- ( N - m)/2.

A3 (Initialize). Set n ~ 1, t ~ (r(a + b)/r(a + 1)F(b)) ya(1 - y)b, q ~_ (1 -- p2)a+b, V ~-- q, C D F ~ vt.

A4 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step A6. A5 (Update the term and then accumulate. Then increase n by 1). Set

q ~ - q ( a + b + n - 1 ) ( p E ) / n , v ~ - v + q , t ~ - t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ~ n + 1, and return to step A4.

A6 (Find the error bound and check for convergence). Set b o u n d ~ -

ty(a + b + n - 1)/((a + n) - (a + b + n)y). If bound < E P S , terminate the algo- rithm. ( C D F is the answer.)

A7 (Update the term and then accumulate. Then increase n by 1). Set

q * - - q ( a + b + n - 1 ) ( p E ) / n , v ~ v + q , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F C D F + vt, n ,--- n + 1, and return to step A6.

Algorithm B. This algorithm computes the density f ( y ; m, N, p 2 ) of Y = R 2 for

given values of y(0 < y < 1), m ( > 1), N ( > m), and p2(0 < p2 < 1).

B1 (Specify the accuracy). Set E P S . - e (some desired accuracy, e.g., 10-6).

B2 (Set the constants). Set a ~ (m - 1)/2, b *- ( N - m)/2.

B3 (Initialize). Set n ~- 1, s ~- (F(a + b)/F(a) F(b))

ya-

1 (1 -

y)b- 1,

q ~ (1 -

p2)a+b,

v ~- q, P D F ~- qs.

114 (Compare a + n, (a + b + n)y). If a + n > (a + b + n)y, go to step B6. 115 (Update the term and then accumulate. Then increase n by 1). Set

q ~- q(a + b + n - 1 ) ( p E ) / n , v ~- v + q, s ~ sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B4.

B6 (Find the error bound and check for convergence). Set b o u n d ~

sy(a + b + n - 1)(1 - v)/(a + n - 1). If bound < E P S , terminate the algorithm.

( P D F is the answer.)

B7 (Update the term and then accumulate. Then increase n by 1). Set

q ~ q(a + b + n - 1 ) ( p 2 ) / n , v *-- v + q, s , - sy(a + b + n - 1 ) / ( a + n - 1 ) , P D F P D F + qs, n ~ n + 1, and return to step B6.

Algorithm C. This algorithm computes the quantile yp of the distribution of Y = R 2

such that F(yp; m, N , / 9 2) --- p for given values of m(> 1), N ( > m), p2(0 < / 9 2 ~ 1),

and p(0 < p < 1).

C1 (Specify the accuracy and the maximum number of Newton's iterations al- lowed). Set E P S ~- ~ (some desired accuracy, e.g., 10 - 6 , for computing F ( y ; m, N , t9 2)

(6)

350 C.G. Ding/ Computational Statistics & Data Analysis 22 (1996) 345-350

a n d f ( y ; m, N, p2)), D E L T A ~ 6 (some desired accuracy, e.g., 10 -4, for c o m p u t i n g

yp), I T R M A X ~ Nmax (an integer, e.g., 10, for controlling the n u m b e r of Newton's iterations).

C2 (Set the constants). Set a *-- (m - 1)/2, b ~ ( N - m)/2, c o e f f ~

r(a + b)/r(a +

1)r(b),

qo ~ ( 1 -- p2)a+b.

C3 (Loop on k, Newton's iteration). First initialize y by setting y , - 0.5. Then perform steps C 4 - C 1 0 for k = 1, 2, . . . , I T R M A X . (Steps C 4 - C 1 0 constitute one iteration.) C4 (Initialize within each iteration). Set n ~ 1, t ~ coeff y"(1 - y)b, S ~ at~y~(1 -- y), q ~ qo, v ~ q, C D F ~ vt, P D F ~ qs.

C5 ( C o m p a r e a + n, (a + b + n)y). If a + n > ( a + b + n)y, go to step C7. C6 ( U p d a t e the term and then a c c u m u l a t e for b o t h of F ( y ; m, N , p2) a n d f ( y ; m , N , p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1 ) ( p 2 ) / n , v ~ v + q , s ~ t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) , C D F * - - C D F + v t , P D F ~ P D F + qs, n ~ n + 1, and return to step C5.

C7 (Find the corresponding error bounds). Set b n d c d f ~ ty(a + b + n - 1)/

((a + n) - (a + b + n)y), b n d p d f ~ t(a + b + n - 1)(1 - v)/(1 - y).

C8 (Check for convergence). If b n d c d f < E P S a n d b n d p d f <_ E P S , go to step C 10. C9 ( U p d a t e the terms a n d then accumulate for F ( y ; m, N , p2) a n d / o r f ( y ;

re, N, p2). T h e n increase n by 1). Set q ~ q ( a + b + n - 1)(p2)/n, v ~ v + q. If

b n d c d f <_ E P S , set s ~ sy(a + b + n - 1)/(a + n - 1), P D F *-- P D F + qs, n ~ n + 1, b n d p d f ~ sy(a + b + n - 1)(1 - v)/(a + n - 1), a n d return to step C8; otherwise if b n d p d f < E P S , set t ~ ty(a + b + n - 1)/(a + n), C D F ~ C D F + vt, n ~ n + 1, b n d c d f ~ ty(a + b + n - 1)/((a + n) - (a + b + n)y), and return to step C8; otherwise set s * - - t ( a + b + n - 1 ) / ( 1 - y ) , t ~ t y ( a + b + n - 1 ) / ( a + n ) ,

C D F *-- C D F + vt, P D F ~ P D F + qs, n *-- n + 1, a n d return to step C7.

C10 (Find new y a n d check for convergence of N e w t o n ' s process). Set

d i f f ~ ( C D F - p ) / P D F . If y - d / i f < 0, set y ,-- y/2; otherwise if y - d / i f > 1, set y ~ (y + 1)/2; otherwise set y ~ y - d/ft. If I d i f f l / y < D E L T A , terminate the algo- rithm. (y is the answer.)

C l l ( O u t p u t error message). T e r m i n a t e the algorithm with the message " N o convergence after Nmax iterations".

F O R T R A N codes based on the above algorithms are available u p o n request.

References

Abramowitz, M. and I.A. Stegun, Handbook of mathematical functions (Dover, New York, 1965). Anderson, T.W., An introduction to multivariate analysis, 2nd. Edn. (Wiley, New York, 1984). Ding, C.G., On the computation of the noncentral beta distribution, Comput. Statist. Data Anal., 18

(1994) 449-455.

Ding, C.G. and R.E. Bargmann, Algorithm AS 260: evaluation of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991a) 195-198.

Ding, C.G. and R.E. Bargmann, Algorithm AS 261: quantiles of the distribution of the square of the sample multiple-correlation coefficient, Appl. Statist., 40 (1991b) 199-202.

參考文獻

相關文件

Then, we tested the influence of θ for the rate of convergence of Algorithm 4.1, by using this algorithm with α = 15 and four different θ to solve a test ex- ample generated as

Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions,

Particularly, combining the numerical results of the two papers, we may obtain such a conclusion that the merit function method based on ϕ p has a better a global convergence and

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

By exploiting the Cartesian P -properties for a nonlinear transformation, we show that the class of regularized merit functions provides a global error bound for the solution of

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =&gt;

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •