• 沒有找到結果。

Trimmed least squares estimator as best trimmed linear conditional estimator for linear regression model

N/A
N/A
Protected

Academic year: 2021

Share "Trimmed least squares estimator as best trimmed linear conditional estimator for linear regression model"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

On: 28 April 2014, At: 05:15 Publisher: Taylor & Francis

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Communications in Statistics - Theory and

Methods

Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/lsta20

Trimmed least squares estimator as best trimmed

linear conditional estimator for linear regression

model

Lin-An Chen a & Peter Thompson b a

Institute of Statistics , National Chiao Tung University , Hsinchu, Taiwan b

Mathematics Department , Wabash College , Crawfordsville, IN, 47933, U.S.A Published online: 27 Jun 2007.

To cite this article: Lin-An Chen & Peter Thompson (1998) Trimmed least squares estimator as best trimmed linear conditional estimator for linear regression model, Communications in Statistics - Theory and Methods, 27:7, 1835-1849

To link to this article: http://dx.doi.org/10.1080/03610929808832193

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or

systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

(2)

TRIMMED LEAST SQUARES ESTIMATOR AS BEST TRIMMED LINEAR CONDITIONAL ESTIMATOR FOR

LINEAR REGRESSION MODEL Lin-An Chenl and Peter ~ h o m ~ s o n ~ 'Institute of Statistics, National Chiao Tung University, Hsinchu,

Taiwan.

2Mathematics Department, Wabash College, Crawfordsville, IN 47933, U.S.A.

Key words: Instrumental variables estimator; linear conditional estimator; linear regression; regression quantile; trimmed least squares estimator.

ABSTRACT

A class of trimmed linear conditional estimators based on regression quan- tiles for the linear regression model is introduced. This class serves as a robust analogue of non-robust linear unbiased estimators. Asymptotic anal- ysis then shows that the trimmed least squares estimator based on regres- sion quantiles (Koenker and Bassett (1978)) is the best in this estimator class in terms of asymptotic covariance matrices. The class of trimmed linear conditional estimators contains the Mallows-type bounded influence trimmed means (see De Jongh et al (1988)) and trirnmed instrumental vari- ables estimators. A large sample methodology based on trimmed instru- mental variables estimator for confidence ellipsoids and hypothesis testing is also provided.

1835 Copyright O 1998 by Marcel Dekker, Inc.

(3)

1. INTRODUCTION Consider the linear regression model

where y is a vector of observations of dependent variable, X is a n x p matrix of observations of p

-

1 independent variables with values 1's on the first column, and 6 is a vector of i.i.d. disturbance variables. The interest is to estimate the parameter vector

P.

I t is well known that the least squares estimator is the best in covariance matrix among the unbiased subclass of linear estimators. However, the least squares estimator is highly sensitive to quite small departures from normality and to the presence of outliers.

Thus,

there are already a great number of papers in the literature developing robust alternatives for analyzing the linear regression model. For example, see Ruppert and Carroll (1980), Welsh (1987), Koenker and Portnoy (1987), Kim (1992), Chen (1997) and Chen and Chiang (1996). We then consider the question: Is there a robust estimator which is best in asymptotic covariance matrix in some class of robust estimators. To be specific, suppose let y t be the subvector of y after all suspected outliers are trimmed. The vector y t has a corresponding trimmed model that we take

as

It is then natural to ask in this way if there is an estimator which is best in terms of asymptotic covariance matrices in sotne subclass of linear estimators for this trimmed model. For large sample comparison of covariance matrices, we replace the condition of unbiasedness in linear unbiased estimation with a condition on the trimmed linear estimation. The purpose of this paper is to introduce a class of trimmed linear estimators specified by a trimming procedure and to derive the best estimator in this class.

The trimmed linear regression model is determined by observations re- moved from model (1.1), which we take to be those lying outside the regres-

(4)

sion quantiles (see Koenker and Bassett (1978)). We then introduce a class of trimmed linear conditional estimators (LCE) (see (2.2) as an analogue of linear unbiased estimators to the trimmed regression model). The asymp- totic properties of these estimators are then derived, and the trimmed least squares estimator (LSE) based on the regression quantiles, which was pro- posed by Koenker and Bassett (1978) and studied by Ruppert and Carroll (1980), is shown to be the best trimmed LCE. As a subclass of trimmed LCE, a class of trimmed instrumental variables estimators (WE) is also introduced, where instrumental variables are variables independent of the disturbance variables and correlated with the independent variables (see Dhrymes (1970, p296-298)). It is also shown that the best trimmed W E ex- ists and is also a best trimmed LCE. We also note that the class of Mallows- type bounded influence trimmed means (see De Jongh et a1 (1988))is also a subclass of trimmed LCE's. In Section 2, we introduce the class of trimmed LCE's and their large sample properties are derived in Section 3. In Sec- tion 4, we introduce a class of trimmed NE's. We derive the best trimmed W E in Section 5. A large sample methodology for confidence ellipsoids and hypothesis testing based on the trimmed W E is introduced in Section 6. Section 7 gives the proofs of the theorems.

2. THE TRIMMED LINEAR CONDITIONAL ESTIMATORS Recall that the regression model is

Let yi be the d-th element of y and x: be the i-th row of X for i = 1 ,

...,

n. For 0

<

a

<

1, the a-th regression quantile under the model with intercept, &a), of /3 defined by Koenker and Bassett (1978) is any vector b that solves the following equation

n

minbEw /Ia(gi

-

~ f b ) for a E ( 0 , l ) i=l

(5)

where p,(u) = u&(u) with &(u) = a

-

I ( u

<

0). Here I(A) is the indicator function of the event A. As described in Koenker and Bassett (1978), the process, & a ) is piecewise constant and uniquely defined between the breakpoints. I t successfully generalizes almost all of the properties of one-sample quantiles, and may be coruputed very quickly using parametric linear programming (see Koenker and d'Orey (1987)).

For 0

<

a l

<

a 2

<

1, let &a,) and b(a2) be the regression quantiles. We then define the trimming matrix A = ( a i j , i , j = 1, . . . , p and a;, =

I(i

=

j and z$(cr,)

<

yi

<

x$(a2))). After outliers are trimmed by the regres- sion quantiles, the submodel (1.2) can be written as

Since A is random, the error vector A€ is now not a set of independent vari- ables. We are now ready to define a subclass of linear trimmed estimators. Definition 2.1. A statistic

&,

is called a (al, a2)-trimmed

LCE

if there exists a stochastic p

x

p matrix

H

and a nonstochastic n x p matrix Ho such that it has the following representation:

where matrices H and Ho satisfy the following two conditions: ( a l ) n H

-+

H in probability, where H is a full rank p x p matrix. (a2) HHAAX = I,

+

o,(n-'I2), where I, is the p x p identity matrix. We note that "conditional" means "conditional on the satnple being trimmed" Condition (al) is similar to the usual condition that n-lX'X converges to a positive definite matrix. Analogously, Condition (a2) for this trimmed

LCE

plays an analogous role to that of unbiasedness for linear unbiased estimation. Suppose that By is a linear unbiased estimator of

0.

Then, with the fact that B X = I , nonstochastic matrices H and Ho such that

H H ; = (a2

-

a l ) - ' B make

&,

an example of trimmed LCE. This implies

(6)

that the class of trimmed LCE's is a t least as big with the size of the class of linear unbiased estimators.

Let 6 has distribution function F with probability density function f .

Denote by hi the i-th row of Ho and zi, the j-th element of vector zi for

r = x and h. The following conditions are similar to the standard ones for linear regression models as given in Ruppert and Carroll (1980) and Koenker and Portnoy (1987), for example:

(a3) n-'

xy='=,

r$ = O(1) for r =

x

and h and all j,

(a4) n-'X'X = Q,

+

o(l), n-'HiX = Qhz

+

o(1) and nA'HiHo =

Qh

+

o(1) where Q, and Qh are positive definite matrices and Qh, is a full rank matrix.

(a5) n-'

Cy='=,

zi = 8,

+

o(l), for r = x and h, and where 8, is a finite vector with first element value 1.

(a6) The probability density function and its derivative are both bounded and bounded away from 0 in a neighborhood of F-'(a) for a E ( 0 , l ) .

For any two positive definite p x p matrices Q1 and Q2, we say that Q1 is smaller than Q2 if Qz

-

Q1 is positive semidefinite.

Definition 2.2. An estimator in the class of ( a l , a2)-trimmed LCE's is called the best if it has asymptotic covariance matrix smaller than or equal to it of any estimator in this class.

In analogy with the case of the best linear unbiased estimator, we will show that the best ( a l , a2)-trimmed LCE always exists. It can also be seen that the asymptotic covariance matrix of the best (al,az)-trimmed LCE will vary in the trimming percentage ( a l , a 2 ) . We then may further expect the existence of a uniformly best one.

Definition 2.3. A trimmed LCE for some trimming percentage is said to be a uniformly best trimming LCE if it has asymptotic covariance matrix

(7)

smaller than or equal to it of any best ( a l , az)-trimmed LCE, for all 0

<

al

<

0.5

<

a 2

<

1.

We are not going to study when a uniformly best trimming LCE exists.

3. ASYMPTOTIC PROPERTIES OF THE TRIMMED LINEAR

CONDITIONAL ESTIMATOR

The following theorem gives a "Bahadur" representation of the ( a l , a 2 ) - trimmed LCE.

Theorem 3.1. With assumptions (a1)-(a6), we have

+ [ P 1 ( a l ) I ( Q

<

F-I ( a l ) )

+

F-l(a2) ~ ( c i

>

F-' (az)) - ((1

-

a 2 ) ~ - ' (a2)

The limiting distribution of the (a1, a2)-trimmed LCE follows from the central limit theorem (see, e.g. Serfling (1980, p. 30)).

Corollary 3.2. n1/2(&,- (P+ytl,)) has an asymptotic normal distribution with zero mean vector and the following asymptotic covariance matrix:

The (al,a2)-trimmed

LSE

proposed by Koenker and Bassett (1978) is de- fined by

dl.

= (x'Ax)-~x'A~.

From the result of this estimator studied by Ruppert and Carroll (1980), we have

(8)

By letting H = (X1AX)-' and Ho = X , can see that condition (a2) also holds for

&,.

So, the ( a l , a2)-trimmed LSE is in the class of ( a l , a2)- trimmed LCE's. Moreover, Ruppert and Carroll (1980) provided the result that n1/2(bir

-

(/3+rtr)), where

rtr

= (a2 - a l ) - l XQ;'B,, has an asymptotic normal distribution with zero means and covariance matrix u 2 ( a l , a 2 ) Q i 1 , where

FA' (a?)

02(a1,a2) = ( a s - a d - z [

/

(6

-

X)'dF(r)

+

al (F-l ( a l ) - (3.2)

F-l(a1)

The following lemma orders the matrices H Q ~ H ~ and Q,

Lemma 3.3. For any matrices H and Qh induced from conditions (al) and (a4), the difference

HQ,&

-

( a 2 - Q ~ ) - ~ Q ; ~ (3.3) is positive semidefinite.

The relation in (3.3) then implies the following main theorem.

Theorem 3.4. Under the conditions (a.3)-(a.6), the ( a l , a2)-trimmed LSE

a,

is the best ( a l , a2)-trimmed LCE.

Since the ( a l , a2)-trimmed LSE always exists, then the best ( a l , a2)-trimmed LCE always exists. However, the existence of uniformly best trimming LCE depends on the underlying distribution. Suppose that there is a (a;, a;) so that u2(a;, a;) = <P2<1u2(a1, a 2 ) , then the best (a;, a;)-trimmed LCE is the uniformly best trimming LCE. Two questions induced from above discussion are then raised. First, how big is the class of ( a l , a2)-trimmed LCE's? Secondly, are the best ( a l , a2)-trimmed LCE and the uniformly best trimming LCE unique if they exist? We are not going to study the

(9)

scope of the trimmed LCE's. However, we will introduce a class of ( a l , a2)- trimmed

NE's

which is shown to be a subclass of the (a1,aa)-trimmed LCE's. We will also show that if there is a best (a1,aa)-trimmed LCE, then it is asymptotically equivalent to the best (a', a2)-trimmed WE. Let

H

= (X1WAX)-' and Ho =

WX

with W a diagonal tnatrix of weights. This shows that the Mallows-type bounded influence trimmed means also form a subclass of trimmed LCE1s (see De Jongh et al (1988) for their large sample properties). In particular,

&.

is the one with W the identity matrix and then belongs to this subclass. A direct result from Theorem 3.4 is that

fitjt,

is the best Mallows-type bounded influence trimmed mean. In the next section, we will introduce the trimmed TVE.

4. TRIMMED INSTRUMENTAL VARIABLES ESTIMATORS

Let

S

be the n x k, k

>

p, observation matrix of instrumental variables. Each instrument is a variable independent of the disturbance variables and correlated with the independent variables. Denote by s: the i-th row of S and si, the j-th element of si. We add the following conditions:

(a7) n-I

Cy=l

st, = O(1) for all j,

(a8) n-'SIX = Q,, + o ( l ) , and n-'S'S =

Q,

+ o ( l ) , where

Q,

is a k x k

positive definite matrix and Q,, is a full rank matrix, (a9) n-'

Cy=,

si = 8,

+

o(1).

We then define the trimmed WE.

Definition 4.1. Let P, be the idempotent matrix S(SIS)-IS'. The trimmed WE is defined by

I t will be shown that the trimmed TVE is a ( a l , a2)-trimmed LCE. Even in this trimmed regression model, it is apparent that there may exist many sets of instruments that one might consider using. Thus, we shall be

(10)

concerned with finding the instruments for which the corresponding trimmed N E has smallest covariance matrix.

Definition 4.2. (1) An estimator in the class of (a', a2)-trimmed TVE's is called the best if it has asymptotic covariance matrix smaller than or equal to any estimator in this class.

(2) A trimmed IVE for some trimming percentage is called the uniformly best trimming W E if it has asymptotic covariance matrix smaller than or equal to it of any best (al, a2)-trimmed W E , for 0

<

a1

<

0.5

<

a 2 . We first show the relation between the N E and the LCE.

Lemma 4.3. ( ( A X ) ' P , A X ) - ' ( A X ) ' S ( S 1 S ) - ' converges to the full rank matrix (a2 - al)-' (Q~,Q~'Q,,)-'Qi,Q;' in probability.

This lemma then implies that condition ( a l ) holds. One can check that con- dition (a2) also holds. Then the trimmed N E ' s form a subclass of trimmed LCE's.

We state the asymptotic properties of the trimmed WE. The following theorem gives a "Bahadur" representation of

Theorem 4.4. With the assumptions, we have

where

rg

= (a2

-

I ~ ~ ) - ~ A ( Q ~ ~ Q ; ' Q ~ ~ ) - ' Q ~ ~ Q ; ~ ~ , and 8, has been defined as limn,,n-'

Cy=l

si.

The limiting distribution of the trimmed N E is stated in the following corollary.

Corollary 4.5. n'/2(b,

-

((P

+

7,)) has an asymptotic normal distribution with zero mean vector and the following asymptotic covariance matrix

(11)

5. BEST TRIMMED INSTRUMENTAL VARIABLES ESTIMATOR

Consider the design of instrumental variables with S = X and then P, =

X(XIX)-'XI. The asymptotic covariance matrix in this design is

The matrix in (4.2) subtracted by the tnatrix of (5.1) is

By assumption (a2), the difference matrix ((Q~,Q;lQ,,)-l&~xQ;l

-

Q;') is positive semidefinite. I t can also easy to check that

We then have the theorem of best trimmed WE. Theorem 5.1. The following trimmed WE

is a best (a,, a2)-trimmed IVE and also a best ( a l , az)-trimmed LCE. This says that this (a1,az)-trimmed LSE is asymptotically equivalent to the best ( a l , a2)-trimmed

WE

and then the best ( a l , 02)-trimmed LCE is

(12)

not unique. If there is a (a;, a;)-trimmed LSE which is a uniformly best LCE, then the best (a;, a;)-trimmed W E is also a uniformly best LCE. This says that if there is a trimmed LSE which is a uniformly best LCE then the uniformly best trimmed LCE is not numerically unique.

For a large sample inference methodology, we here give the limiting dis- tribution of the trimmed TVE when the distribution F is symmetric. Corollary 5.2. When the distribution F is symmetric and we let a1 = 1

-

a 2 = a,O

<

a

<

0.5 then n'I2(&

-

/3) has an asymptotic normal distribution with zero mean vector and the following asymptotic covariance matrix

F-' (1 -a)

(1

-

2a)-l[/ ~ 2 d ~ ( r ) ( ~ ~ , ~ ~ ' ~ . z ) ' 1

+

2 a ( ~ - ' ( o ) ) ~ Q ; ' ] . F-' (a)

6. LARGE SAMPLE INFERENCE

Here we sketch a large-sample methodology for confidence ellipsoids and hypothesis testing based on the trimmed W E for the case of symmetric distribution. To do this, we need first to estimate the asymptotic covariance matrix of

be.

Let

Q,

= n-'

C;='=,

xixi, Q^,, = n-I C:=l six: and

Q,

=

,

:

and also F-' (1

-

a ) = 6'()(1- a )

-

bs)

where 6 is a pvector

n-'

c;=l

s.3

with first element value 1 and else zeros. Furthermore, let

where ei = yi

-

x $ ~ ,

i = 1,

...,

n.

F-'(1-a) 2

Theorem 6.1. V

+

(1 - 2c1)-~[$,-, (a) E ~ F ( C ) ( Q ~ ~ Q ~ ~ Q ~ ~ ) - '

+

2 a (F-'(~))~Q;'] in probability.

For 0

<

u

<

1, let Fu(rl, 7-2) denote the (I-u) quantile of the F distribution, with rl and r2 degrees of freedom, and let

(13)

Suppose for some integer

I,

K is matrix of size I x p , and K has rank b. Let m

be the number of observations y i lying outside the interval ( x $ ' ( a ) , x$(l

-

a)). Then the region of

P

has probability of approximately u. If

K

= I,, the confidence ellipsoid

for

p

has a n asymptotic confidence coefficient of approximately 1

-

u. More- over, if we test Ho : K P = u by rejecting Ho whenever

( K &

-

v)'(KvK')-'(KB,

- 1,)

2

d,(e, n

-

m

-

p ) has a n asymptotic size of u.

7 . APPENDIX

Proof of Theorem 3.1. Inserting ( 1 . 1 ) in the equation (2.2), we have

Now, we consider a representation of IX-'/*H;AC. Let

T h e following result, which also uses the Jureckova and Sen (1987) ext.ension of Billingsly's Theorem, will provides the representation for n - ' 1 2 ~ ; ~ c .

(14)

for p = I , . . . , p and Tn = Op(l).

To complete the proof of Theorem 3.1, from (7.1) and the representation of & a ) (see Ruppert and Carroll (1980)) we have

The theorem is then followed from (7.2) and Condition (al).

Proof of Lemma 3.3. Denote by plim(Bn) = B if Bn converges to B in probability. Let

C

= H H o

-

(XIAX)-'XI.

With this, plim(CX) = plim(HHoX) - plim(XIAX)-'XIX = 0. Then

Proof of Lemma 4.3. From Condition (ag), we need only to show that

(15)

The following result, which uses the Jureckova and Sen (1987) extension of Billingsley's Theorem, will give an expansion of the rrlatrix 11-l(SIAX).

where q,k is the jk-th term of the matrix Q,,, and s,,, x i k are the ij-th and ik-th terms of S and

X,

respectively. We then have the statemerlt (7.3). Theorems 4.4 and 4.6 are followed from the arguments for Theorern 3.1 arid then their proofs are omitted.

Proof of Theorem 6.1. From the representation of regression quatitile in Ruppert and Carroll (1980) and the trimmed

TVE

bs

we have

p(1

-

a)

-+

F-I (1

-

a) in probability. Now,

I(x$(a)

<

yi

<

t$(l

-

a ) )

+

n-I c: r(x$(a)

<

y,

<

x$(1

-

a)). i = l

From the fact that n 1 l 2 ( h - ~ ) = 0,(1) and the Condition (a8), the theorem follows from the result that

F - l ( l

-

a ) )

+

op(I)

which follows from Lemma A.4 of Ruppert and Carroll (1980). ACKNOWLEDGEMENT

We thank an associate editor and three referees for their insightful comments and suggestions, which led t o a much improved version of this article.

REFERENCES

Chen, L.-A. (1997). An efficient class of weighted trimmed means for linear regression models, Statistics Sinica, 7, 669-686.

(16)

Chen, L.-A. and Chiang, Y. C. (1996), Symmetric type quantile and trimmed means for location and linear regression model. Journal of Nonpara- metric Statistics, 7, 171-185.

De Jongh, P. J., De Wet, T. and Welsch, A. H. (1988), Mallows-Type Bounded-Influence-Regression Trimmed Means, Journal of the Ameri- can Statistical Association, 83, 805-810.

Dhrymes, P. J. (1970). Econometrics

-

Statistical Foundations and Appli- cations. Harper & Row, New York.

Jureckova, J. and Sen, P. K. (1987). An extension of Billingsley's theorem to higher dimension M-processes. Kybernetica 23, 382-387.

Kim, S. J. (1992). The metrically trirnmed means as a robust estimator of location. Annals of Statistics. 20, 1534-1547.

Koenker, R. and Bassett, G. J . (1978). Regression quantiles. Econornetrica.

46, 33-50.

Koenker, R. and Portnoy, S. (1987). L-estimation for linear model. Journal

of the American Statistical Association, 82, 851-857.

Koenker, R. and d'Orey, V. (1987). Computing regression quantiles. Applied Statistics. 36, 383-393.

Maddala, G. S. (1988), Introduction to econometrics. Macmillan Publishing Company, New York.

Ruppert, D. and Carroll, R. J. (1980). Trimmed least squares estimation in the linear model. Journal of the American Statistical Association. 7 5 ,

828-838.

Serfling, R. J. (1980). Approximation theorems of nlatltematical statistics.

Wiley, New York.

Welsh, A. H. (1987). The trimmed mean in the linear model. Annals of Statistics 1 5 , 20-36.

Received Decrneter, 1996; Revised January, 1998.

參考文獻

相關文件

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

It has been well-known that, if △ABC is a plane triangle, then there exists a unique point P (known as the Fermat point of the triangle △ABC) in the same plane such that it

On the other hand Chandra and Taniguchi (2001) constructed the optimal estimating function esti- mator (G estimator) for ARCH model based on Godambes optimal estimating function

Keywords: accuracy measure; bootstrap; case-control; cross-validation; missing data; M -phase; pseudo least squares; pseudo maximum likelihood estimator; receiver

This theorem does not establish the existence of a consis- tent estimator sequence since, with the true value θ 0 unknown, the data do not tell us which root to choose so as to obtain

We propose two types of estimators of m(x) that improve the multivariate local linear regression estimator b m(x) in terms of reducing the asymptotic conditional variance while

double-slit experiment is a phenomenon which is impossible, absolutely impossible to explain in any classical way, and.. which has in it the heart of quantum mechanics -

introduction to continuum and matrix model formulation of non-critical string theory.. They typically describe strings in 1+0 or 1+1 dimensions with a