Linear trimmed means for the linear regression with AR(1) errors model

(1)

Linear trimmed means for the linear regression with AR(1)

errors model

Yi-Hsuan Lai

a

, Lin-An Chen

b,

, Chau-Shyun Tang

c

a

Department of Applied Mathematics, Hsuan Chuang University, Hsinchu, Taiwan b

Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan c

Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan

a r t i c l e

i n f o

Article history:

Received 28 October 2008 Accepted 13 May 2010 Available online 1 June 2010 Keywords:

Gauss Markov theorem

Generalized least squares estimator Linear trimmed mean

Robust estimator

a b s t r a c t

For the linear regression with AR(1) errors model, the robust generalized and feasible generalized estimators ofLai et al. (2003)of regression parameters are shown to have the desired property of a robust Gauss Markov theorem. This is done by showing that these two estimators are the best among classes of linear trimmed means. Monte Carlo and data analysis for this technique have been performed.

1. Introduction

Consider the linear regression model

y ¼ X

b

þ

e

, ð1:1Þ

where y is a vector of observations for the dependent variable, X is a known n p design matrix with 1’s in the ﬁrst column,

and

e

is a vector of independent and identically distributed disturbance variables with a distribution of ﬁnite variance. We

consider the problem of estimating the parameter vector

b

. From the Gauss–Markov theorem, it is known that the least

squares estimator has the smallest covariance matrix in the class of unbiased linear estimators My where M satisﬁes MX=Ip.

However, the least squares estimator is sensitive to departures from normality and to the presence of outliers so we need to consider robust estimators. An interesting question in robust regression is if there is a robust Gauss–Markov theorem, i.e., if there is a robust estimator that is (asymptotically) more efﬁcient than a class of linear robust estimators? This has been done byChen et al. (2001), who considered a class of estimators based on Winsorized observations and showed that the trimmed

mean ofWelsh (1987)is asymptotically the best among in this class.

Suppose that the error vector

e

¼ ð

e

1, . . . ,

e

nÞ0has the covariance matrix structure

Covð

e

Þ ¼

s

2

_O

_, _ð1:2Þ

where

O

is a positive deﬁnite matrix and

s

is ﬁnite. From the regression theory of the estimation of

b

, it is known that any

estimator having an (asymptotic) covariance matrix of the form

d

ðX0

_O

1

XÞ1 _ð1:3Þ

Contents lists available atScienceDirect

journal homepage:www.elsevier.com/locate/jspi

Journal of Statistical Planning and Inference

_{Corresponding author.}

(2)

is more efﬁcient than the estimator having (asymptotic) covariance matrix of the form

d

ðX0_XÞ1_ðX0

_O

_XÞðX0_XÞ1_, _ð1:4Þ

where

d

is some positive constant. In the least squares estimation when the matrix

O

is known,Aitken (1935)introduced the

generalized least squares estimator (GLS) and showed that it has a covariance matrix of the form (1.3) and the LSE has a

covariance matrix of the form (1.4) with

d

¼

s

2_{. It is also well known that, when}

_O

_{is unknown, the feasible generalized LSE has}

the asymptotic covariance matrix of the form (1.3). Thus these two generalized type estimators are asymptotically more efﬁcient than the LSE.

Although the GLS and feasible GLS are asymptotically more efﬁcient than the LSE in many regression problems, they are highly sensitive to even very small departures from normality and to the presence of outliers. Therefore developing robust generalized and feasible generalized estimators in each speciﬁc regression problem is important. We consider one of the most popular models, the linear regression with AR(1) errors model, a structure of (1.2), as follows:

yi¼x

0

i

b

þ

e

i, i ¼ 1, . . . ,n,

e

i¼

re

i1þei, ð1:5Þ

where e1,y,en are independent and identically distributed (iid) random variables. Suppose that j

r

jo1 and ei has a

distribution function F.

Denote the transformed vector u ¼

O

1=20y. One approach to robust estimation is to construct a weighted observation

vector u and then construct a consistent estimator which is linear in u. In case

r

is unknown, all vectors are replaced by

the ones with estimating

r

by estimator ^

r

; see for example,Lai et al. (2003). There are two types of weighted observation

vectors in this literature. First, u can represent a trimmed observation vector Au with A being a trimming matrix

constructed from regression quantiles (see Koenker and Bassett, 1978), or residuals based on an initial estimator (see

Ruppert and Carroll, 1980; Chen, 1997). Second, u can be a Winsorized observation vector that is deﬁned as inWelsh (1987). In this paper, we use the trimmed observation vector of Koenker and Bassett (1978)to study classes of linear

functions based on u for estimation of

b

, and we develop a robust version of the Gauss–Markov theorem. Based on

regression quantiles, Lai et al. (2003) proposed generalized and feasible generalized trimmed means for estimating

regression parameters

b

. Then robust generalized and feasible generalized estimation techniques have been developed.

With the Gauss Markov theorem for linear regression with iid errors model, it is then interesting to see if there are any robust type generalized and feasible generalized estimators for linear regression with the AR(1) errors model that have the

desired property of robust Gauss Markov theorem. Our aim in this paper is to show that the estimators inLai et al. (2003)

does have this desired property.

In Section 2 we introduce a class of linear trimmed means when

r

is known and we establish their large sample theory

in Section 3. We also establish the theory for a class of linear trimmed means when

r

is unknown in Section 4. In both

cases, we show that the generalized and feasible generalized trimmed means are the best, respectively, in these two classes of linear trimmed means in terms of asymptotic covariance matrix. Monte Carlo studies and Data analysis are performed and their results are displayed in Section 5. Finally, the proofs of the theorems are presented in Appendix A.

2. Linear trimmed mean when

q

is known

For the linear regression with AR(1) errors model (1.5), to obtain a linear trimmed mean we need to specify the quantile for determining the observation trimming and to make a transformation of the linear model to obtain generalized estimators. For a given i-th dependent variable for model (1.5), assuming that iZ 2 , one way to derive a generalized

estimator is to consider the one stepCochrane and Orcutt (C–O) (1949)procedure as yi¼

r

yi1þ ðxi

r

xi1Þ0

b

þei. For error

variable e, we assume that it has distribution function F with probability density function f. With the transformation for

generalized estimation, a quantile could be deﬁned through variable e or a linear conditional quantile of yi 1and yi. By the

fact that xiis a vector with the ﬁrst element 1, the following two events determined by two quantiles are equivalent:

eirF1ð

a

Þ ð2:1Þ and ð

r

,1Þ yi1 yi ! rð

r

,1Þ x 0 i1 x0 i !

b

ð

a

Þ, ð2:2Þ with

b

ð

a

Þ ¼

b

þ 1 1

r

F 1_ð

_a

_Þ 0p1 0 B @ 1 C A:

The event in inequality (2.1) speciﬁes the quantile of the error variable e and through inequality (2.2) it speciﬁes the

conditional quantile of linear function ð

r

,1Þ yi1

yi

. Here

b

ð

a

Þis called the population regression quantile byKoenker and

(3)

For deﬁning the linear trimmed means, we consider the one step C–O procedure on the matrix form of the linear regression with AR(1) error model of (1.5) which is

y ¼ X

b

þ

e

,

where it can be seen that Covð

e

Þ ¼

s

2

_O

_with

O

¼ 1 1

r

2 1

r

2

_r

n1

r

1

r

n2 ^ ^ ^ ^

r

n1

_r

n2

_r

n3 ₁ 0 B B B B @ 1 C C C C A: ð2:3Þ

Deﬁne the half matrix of

O

1as

O

1=20¼ ð1

r

2_Þ1=2 ₀ ₀ 0 0

r

1 0 0 0 0

r

1 0 0 ^ ^ ^ ^ ^ 0 0 0

r

1 0 B B B B B B B @ 1 C C C C C C C A :

With the above half matrix of

O

, we consider the model for the one step C–O procedure u ¼

O

1=20y as

u ¼ Z

b

þ ðð1

r

2

Þ1=2

e

1,e2,e3, . . . ,enÞ0, ð2:4Þ

where Z ¼

O

1=20X. Note that the vector u and the matrix Z are both functions of parameter

r

. The usual descriptive

statistics, robust or nonrobust, based on model (1.1) can be carried over straightforwardly to the transformed model (2.4)

when

r

is known. However, when

r

is unknown, u and Z need to be replaced by the ones that place its

r

by its consistent

estimator. Knowing that the GLS is simply the LSE of

b

for model (2.4), we may consider the linear trimmed mean deﬁned

on this transformed model. To validate the terminology calling the linear trimmed means with

r

being known and

unknown, we will show that they are asymptotically equivalent in the sense of having the same asymptotic covariance matrix. This is what the GLS and feasible GLS performed.

For 0o

a

o1, the

a

- th (sample) regression quantile ofKoenker and Bassett (1978)for the linear regression with AR(1)

errors model is deﬁned as ^

b

ð

a

Þ ¼arg b2Rp minX n i ¼ 1 ðuiz 0 ibÞð

a

Iðuirz 0 ibÞÞ, where uiand z 0

i are the i-th rows of u and Z, respectively, and I(A) is the indicator function of the event A. Deﬁne the

trimming matrix as A ¼ diagfai¼Iðz

0

i

b

^ð

a

1Þruirz

0

i

b

^ð

a

2ÞÞ: i ¼ 1, . . . ,ng with 0o

a

1o

a

2o1 where fractions

a

1and 1

a

2of

observations, respectively, from low tail and upper tail are trimmed. After the outliers are trimmed by regression quantiles ^

b

ð

a

Þand ^

b

ð1

a

Þ, we have the following submodel:

Au ¼ AZ

b

þA ð1

r

2_Þ1=2

_e

1 e2 ^ en 0 B B B B @ 1 C C C C A: ð2:5Þ

Since A is random, the error vector in the above transformed model is now not a set of independent variables. Koenker and

Bassett’s type generalized trimmed mean (proposed byLai et al., 2003) is deﬁned as

^

b

tm¼ ðZ0AZÞ1Z0Au: ð2:6Þ

We now move to deﬁne the linear trimmed means. Any linear unbiased estimator deﬁned in model of (2.4) has the form

Mu with M being a p n nonstochastic matrix satisfying MZ= Ip. Since M is a full-rank matrix, there exist matrices H and H0

such that M ¼ HH00. Thus, an estimator is a linear unbiased estimator if there exists a p p nonsingular matrix H and an

n p full-rank matrix H0such that the estimator can be written as

HH0₀u:

We generalize linear unbiased estimators deﬁned on the observation vector u to estimators deﬁned on Au by requiring

them to be of the form MAu where M ¼ HH00.

Deﬁnition 2.1. A statistic ^

b

ltm is called a ð

a

1,

a

2Þlinear trimmed mean if there exists a stochastic p p matrix H and a

nonstochastic n p matrix H0such that it has the following representation:

^

b

ltm¼HH

0

(4)

where H and H0satisfy the following two conditions:

(a1) nH- ~H in probability, where ~H is a full rank p p matrix.

(a2) HH00Z ¼ ð

a

2

a

1Þ1Ipþopðn1=2Þwhere Ipis the p p identity matrix.

This is similar to the usual requirements for unbiased estimation, except that we have introduced a trimmed observation vector to allow for robustness and considered an asymptotic property instead of unbiasedness.

Two questions arise for the class of linear trimmed means. First, does this class of means contain estimators that have already appeared in the literature? The answer is afﬁrmative because the class of linear trimmed means deﬁned in this

paper contains the generalized trimmed mean ofLai et al. (2003)(H ¼ ðZ0_AZÞ1 _{and H}

0=Z), and the set of Mallows-type

bounded inﬂuence trimmed means (H ¼ ðZ0_WAZÞ1

and H0

0¼Z0W with W, a diagonal matrix of weights; see Section 3).

Second, is there a best estimator in this class of linear trimmed means and can we ﬁnd it if it exists? This question will be answered in the next section.

With the one step C–O procedure, the half matrix

O

1=20 has rows with only a ﬁnite number (not depending on n) of

elements that depend on the parameter

r

. This trick, traditionally used in econometrics literature for regression with AR(1)

errors (see, for example,Fomby et al., 1984, pp. 210–211), makes the study of asymptotic theory for ^

b

ltmð

a

Þsimilar to what

we have for the classical trimmed mean for linear regression. Large sample representations of the linear trimmed mean and its role as generalized robust estimator will be introduced in the next section.

3. Asymptotic properties of linear trimmed mean

Let us denote by h0

ithe ith row of H

0

0,

y

h¼limn-1n1Pni ¼ 1hi, Qhz¼limn-1n1Pni ¼ 1hiz

0

iand Qz¼limn-1n1Z0Z. We

assume throughout this section that conditions (a3)–(a6) in Appendix A are satisﬁed. So, for examples, Qhsis a full rank

matrix and Qzis positive definite matrix. It is not difficult to see that these conditions are satisfied in typical analysis of

variance designs and these hold in probability when the rows of X form a random sample from a very wide class of

distributions in Rp _{(see this point in detail in}_{Koenker and Portnoy, 1987}_{). The following theorem gives a ‘‘Bahadur’’}

representation of the ð

a

1,

a

2Þlinear trimmed mean.

Theorem 3.1. With assumptions (a1)–(a6), we have n1=2

ð ^

b

ltmð

b

þ

g

ltmÞÞ ¼n1=2H~

Xn

i ¼ 1

fhiðeiIðF1ð

a

1ÞreirF1ð

a

2ÞÞ

l

Þ

þ ½F1_ð

_a

1ÞIðeioF1ð

a

1ÞÞ þF1ð

a

2ÞIðei4F1ð

a

2ÞÞðð1

a

2ÞF1ð

a

2Þ þ

a

1F1ð

a

1ÞÞQhzQz1zig þopð1Þ, where

g

ltm¼

l

H~

y

h,

l

¼ RF1_ð_a 2Þ

F1ða1Þe dFðeÞ and

y

his deﬁned in assumption (a5).

The limiting distribution of the ð

a

1,

a

2Þlinear trimmed mean follows from the central limit theorem (see, e.g.Serﬂing,

1980, p. 30).

Corollary 3.2. By conditions (a1)–(a6), n1=2_{ð ^}

_b

ltmð

b

þ

g

ltmÞÞhas an asymptotic normal distribution with zero mean vector and

the following asymptotic covariance matrix: Z F1_ð_a 2Þ F1ða1Þ e2_dFðeÞ

_l

2 " # ~ HQhH~0þ ð

a

2

a

1Þ2½

a

1ðF1ð

a

1ÞÞ2þ ð1

a

2ÞðF1ð

a

2ÞÞ2 ð

a

1F1ð

a

1Þ þ ð1

a

2ÞF1ð

a

2ÞÞ22

l

ð

a

1F1ð

a

1Þ þ ð1

a

2ÞF1ð

a

2ÞÞQz1: ð3:1Þ

The ð

a

1,

a

2Þgeneralized trimmed mean proposed byLai et al. (2003)is deﬁned by

^

b

tm¼ ðZ0AZÞ1Z0Au: ð3:2Þ

From the result of this estimator studied byRuppert and Carroll (1980), we have

n1_Z0_AZ_-ð

_a

2

a

1ÞQz:

By letting H ¼ ðZ0_AZÞ1_{and H}

0=Z, can see that condition (a2) also holds for ^

b

tm. So, the ð

a

1,

a

2Þgeneralized trimmed mean is

in the class of ð

a

1,

a

2Þlinear trimmed mean’s. Moreover,Lai et al. (2003)provided the result that n1=2ð ^

b

tmð

b

þ

g

tmÞÞ, where

(5)

where

s

2 ð

a

1,

a

2Þ ¼ ð

a

2

a

1Þ2 Z F1_ð_a 2Þ F1_ð_a 1Þ ðe

l

Þ2dFð

e

Þ þ

a

1ðF1ð

a

1ÞÞ2þ ð1

a

2Þ " ðF1_ð

_a

2ÞÞ2ð

a

1F1ð

a

1Þ þ ð1

a

2ÞF1ð

a

2ÞÞ22

l

ð

a

1F1ð

a

1Þ þ ð1

a

2ÞF1ð

a

2ÞÞ i : ð3:3Þ

The following lemma orders the matrices ~HQhH~0 and Qz.

Lemma 3.3. For any matrices ~H and Qhinduced from conditions (a1) and (a4), the difference

~

HQhH~0ð

a

2

a

1Þ2Qz1 ð3:4Þ

is positive semideﬁnite.

The relation in (3.4) then implies the following main theorem.

Theorem 3.4. Under the conditions (a.3)–(a.6), the ð

a

1,

a

2Þgeneralized trimmed mean ^

b

tmof (3.2) is the best ð

a

1,

a

2Þlinear

trimmed mean.

Since the ð

a

1,

a

2Þgeneralized trimmed mean always exists, then the best ð

a

1,

a

2Þlinear trimmed mean always exists.

A further question is how big is the class of ð

a

1,

a

2Þlinear trimmed mean’s? However, we do not study the scope of the

linear trimmed means.

In the literature, consideration has been given to the development of estimators of regression parameters

b

that limit

the effects of the error variable and the independent variables. Among them, approaches which simultaneously bound the

inﬂuence of the design points and the residuals for the linear regression model includeKrasker and Welsch (1982)and

Krasker (1985). On the other hand, the approach of Mallow’s type bounded-inﬂuence trimmed mean is to bound

the inﬂuence of the design points and the residuals separately as applied in the AR(1) regression model byDe Jongh and De

Wet (1985)and in the linear regression model byDe Jongh et al. (1988). In a study byGiltinan et al. (1986), they found these two approaches are competitive in a way that neither is preferable to the other one. They also note that Mallow’s type estimators should theoretically give more stable inference than the Krasker–Welsch approach.

Let wi, i= 1,y,n, be real numbers. For 0o

a

o1, Mallow’s type bounded-inﬂuence regression quantile, denoted by ^

b

wð

a

Þ,

is deﬁned as the solution for the minimization problem min b2Rp Xn i ¼ 1 wiðuiz 0 ibÞð

a

Iðuirz 0 ibÞÞ:

With W the diagonal matrix of {wi,i=1,y,n}, the bounded inﬂuence trimmed mean is deﬁned as

^

b

BI¼ ðZ0WAwZÞ1Z0WAwu

where Aw¼diagfai: Iðz

0

i

b

^wð

a

1Þruirz

0

i

b

^wð

a

2ÞÞ, i ¼ 1, . . . ,ng.

Let H ¼ ðZ0_WA

wZÞ1and H0= WZ. This shows that the bounded inﬂuence trimmed means also form a subclass of linear

trimmed means’s (seeDe Jongh et al., 1988for their large sample properties).

Theorem 3.5. If assumptions (a1)–(a5) hold, then

(a) n1=2 ð ^

b

BIð

b

þ

g

wÞÞ ¼ ð

a

2

a

1Þ1Qw1n1=2 Xn i ¼ 1 wizi½ðeiIðF1ð

a

1ÞreirF1ð

a

2ÞÞ

l

Þ þF1 ð

a

1ÞIðeioF1ð

a

1ÞÞ þF1ð

a

2ÞIðei4F1ð

a

2ÞÞðð1

a

2ÞF1ð

a

2Þ þ

a

1F1ð

a

1ÞÞ þopð1Þ, where

g

w¼ ð

a

2

a

1Þ1

l

Qw1

y

w, Qw¼limn-1n1Pni ¼ 1wiziz 0

i, a positive deﬁnite matrix, and

y

w¼limn-1n1Pni ¼ 1 wizi

and (b) n1=2_{ð ^}

_b

BIð

b

þ

g

wÞÞ-Nð0,

s

2ð

a

1,

a

2ÞQw1QwwQw1Þwhere Qww¼limn-1n1Pni ¼ 1w2iziz

0

i, a positive deﬁnite matrix.

In particular, ^

b

tmis the example of ^

b

BI with W as the identity matrix, and it then belongs to this subclass. We may

also show that Qw1QwwQw1 Qz1 is positive semideﬁnite which shows that ^

b

tm is the best bounded inﬂuence

trimmed mean.

Theorem 3.6. The ð

a

1,

a

2Þgeneralized trimmed mean is the best bounded inﬂuence trimmed mean.

This result is based solely on considerations of the asymptotic variance and ignores the fact that generalized trimmed mean does not have bounded influence in the space of independent variables. It confirms the notion that bounded influence is achieved at the cost of efficiency.

(6)

4. Linear trimmed means when

q

is unknown

After the development of the theory of the linear trimmed means for the case where

r

is known, the next interesting

problem is whether, when the parameter

r

is unknown, the linear trimmed mean of (2.7) with

r

replaced by a consistent

estimator ^

r

will have the same asymptotic behavior as displayed by ^

b

ltm. If so, the theory of generalized least squares

estimation is then carried over to the theory of robust estimation in this speciﬁc linear regression model. Let ^

O

be the

matrix of

O

with

r

replaced by its consistent estimator ^

r

which could be the LSE through the C–O estimation procedure.

Deﬁne matrices ^u ¼ ^

O

1=2 0 y, ^Z ¼ ^

O

1=2 0 X and ^e ¼ ^

O

1=2 0

e

. Let the regression quantile when the parameter

r

is unknown be

deﬁned as ^

b

ð

a

Þ ¼arg b2Rp minX n i ¼ 1 ð ^ui ^z 0 ibÞð

a

Ið ^uir ^z 0 ibÞÞ, where ^ui and ^z 0

i are i-th rows of û and ^Z , respectively. Define the trimming matrix as Â ¼ diagfai¼Ið^z

0 i

b

^ ð

a

1Þr ^ uir ^z 0 i

b

^ ð

a

2ÞÞ: i ¼ 1, . . . ,ng.

Deﬁnition 4.1. A statistic, ^

b

ltm, is called a ð

a

1,

a

2Þlinear trimmed mean if there exists stochastic p p and n p matrices,

respectively, H and H0such that it has the following representation:

^

b

ltm¼HH

0

0A ^^u,

where H and H0satisfy conditions (a1) and (a2) for these H and H0.

Koenker and Bassett’s feasible generalized trimmed mean is deﬁned as ^

b

tm¼ ð ^Z0A ^^Z Þ1Z^0A ^^u:

FromLai et al. (2003), we can see that n1_Z^0_{A ^}^_Z_-p

ð

a

2

a

1ÞQz. By letting H ¼ ð ^Z0A ^^Z Þ1and H0¼ ^Z , we see that ^

b

tmis in the

class of ð

a

1,

a

2Þlinear trimmed means.Lai et al. (2003)also showed that ^

b

tmand ^

b

tmhave the same Bahadur representation

and then they have the same asymptotic distribution. The following theorem states that the linear trimmed means for the

cases where

r

is known and unknown have the same large sample properties.

Theorem 4.2. pffiffiffinð ^

b

ltm ^

b

ltmÞ ¼opð1Þ.

We then have the result that the feasible generalized trimmed mean is the best linear trimmed mean when

r

is uknown.

Theorem 4.3. The feasible generalized trimmed mean is the best linear trimmed mean.

For the rest of this section, we will consider several related questions. First, is the best linear trimmed mean unique for this linear regression with AR(1) errors model? For this, we develop an analogous optimal theory for the trimmed mean of Welsh (1987). Let ^

b

0be an initial estimator of

b

for model (2.4). Let ^

Z

ð

a

1Þand ^

Z

ð

a

2Þrepresent, respectively, the

a

1and

a

2th

empirical quantiles of the regression residuals e

i ¼uiz

0

i

b

^0, i ¼ 1, . . . ,n. The Winsorized observation deﬁned byWelsh

(1987)is u

i ¼uiIð ^

Z

ð

a

1Þreir ^

Z

ð

a

2ÞÞ þ ^

Z

ð

a

1ÞðIðeio ^

Z

ð

a

1ÞÞ

a

1Þ þ ^

Z

ð

a

2ÞðIðei4 ^

Z

ð

a

2ÞÞð1

a

2ÞÞ:

Let u_{¼ ðu}

1, . . . ,unÞ0 and denote the trimming matrix by B ¼ diagðIð ^

Z

ð

a

1Þreir ^

Z

ð

a

2ÞÞ, i ¼ 1, . . . ,nÞ.

Deﬁnition 4.4. A statistic ^

b

lwis called a ð

a

1,

a

2ÞWelsh’s type linear trimmed mean if there exists a stochastic p p matrix H

and a nonstochastic n p matrix H0such that it has the following representation:

^

b

lw¼HH

0

0u,

where H and H0satisfy conditions (a1) and (a2).

Theorem 4.5. With assumptions (a1)–(a7), ^

b

lwand ^

b

ltmhave the same Bahadur representation of Theorem 3.1 and then they

have the same asymptotic distribution.

If we let H ¼ ðZ0_BZÞ1_{and H}

0= Z, we see that the generalizedWelsh’s (1987)trimmed mean of as

^

b

w¼ ðZ0BZÞ1Z0u, ð4:1Þ

is a member of ð

a

1,

a

2ÞWelsh’s type linear trimmed mean. We then have the following theorem.

Theorem 4.6. The generalized Welsh’s trimmed mean ^

b

w and generalized trimmed mean ^

b

tm have the same asymptotic

distribution. Hence, ^

b

wis the best ð

a

1,

a

2ÞWelsh’s type linear trimmed mean.

The above theorem shows that it is not unique for existence of best robust generalized estimator. As mentioned by one referee that further study of the generalized Huber’s M estimator may provide one more example of best robust

(7)

generalized estimator (seeJureckova and Sen, 1984for a representation of Huber’s M estimator). Hence, further searching for some other types of best robust generalized estimators is not desirable. One interesting question is that if there is one best robust generalized estimator that has asymptotic variances identical or close to the Cramer-Rao lower bound when the error variables follow heavy tail distributions. For the classical linear regression model, the symmetric trimmed mean ofChen and Chiang (1996)andChiang et al. (2006)has been shown to have asymptotic variances close to the Cramer–Rao lower bound. Hence, showing an extension of symmetric trimmed mean on this linear regression with AR(1) errors model may provide the desirable solution.

We also note here that a best linear trimmed mean is conﬁned in comparison of linear trimmed means of ﬁxed

trimming percentages ð

a

1,

a

2Þ. Without knowing the distribution F, can we estimate the best percentages ð

a

1,

a

2Þin terms of

asymptotic covariance matrix that still is a best linear trimmed mean? To attack this problem, it involves minimizing the

asymptotic covariance matrix (see the approach of Jaeckel, 1971) and the development of optimal theory will be

complicated that needs further investigation. 5. Monte Carlo study and example

In this section, we ﬁrst consider a simulation study to compare the feasible GLS ^

b

FG and the feasible generalized

trimmed mean ^

b

tm. By letting ^

e

i¼yix

0

i

b

^ls, where ^

b

lsis the LSE of

b

, we estimate

r

by ^

r

¼ ^

r

by

Pn

i ¼ 2

e

^i

e

^i1=Pni ¼ 2

e

^ 2

i. With

sample size n =30, the simple linear regression model, yi¼

b

0þ

b

1xi1þ

e

iwhere

e

ifollows the AR(1) error is considered. For

this simulation, we let the true parameter values of

b

0and

b

1be 1’s and

r

be 0.3. This simulation is conducted with the

same data generation system, except that the error variable ei is generated from the mixed normal distribution

ð1

d

ÞNð0,1Þ þ

d

Nð0,

s

2_Þ_with

_d

_¼_{0:1,0:2,0:3 and}

_s

_¼_{3,5,10 and x}

iare independent normal random variables with mean i/2

and variance 1. A total of 10 000 replications were performed and we compute the mean squares errors (MSE) for the

feasible generalized LSE ^

b

FG and feasible generalized trimmed mean ^

b

tmfor

a

1¼1

a

2¼

a

¼0:1,0:2,0:3 where the total

mean squared error is the square of the Euclidean distance between the estimator and true regression parameter

b

. For

convenience, below in this section we re-denote the feasible generalized trimmed mean by ^

b

tmð

a

Þ. The MSEs are listed in

Tables 1 and 2.

We may draw several conclusions fromTables 1 and 2:

(a) The case

d

¼0 indicates that

e

ifollows a normal distribution. Then the results in these two tables fulﬁll the statistical

theory that the feasible generalized least squares estimator ^

b

FG is more efﬁcient than other consistent estimators.

However, the trimmed means are still efﬁcient in this ideal design.

(b) The MSE’s of these two estimators both increase when the contaminated percentage

d

increases or contaminated

variance

s

2_{increases. This veriﬁes the performance of the usual estimators, robust or non-robust.}

(c) The feasible generalized trimmed mean is relatively more efﬁcient than the feasible generalized LSE in all cases of contaminated errors. This result shows that the feasible generalized trimmed mean is indeed, among the class of linear trimmed means, a robust one.

(d) The simulation results displaying in these two tables show the MSE in most cases of

d

and

s

(not in

d

¼0 and

ð

d

,

s

Þ ¼ ð0:1,3Þ an decreasing trend with

a

increasing). Some further simulation results in our experience show that MSE

goes up for

a

not too far after 0.3.

Next we consider real data regression analysis. Many ﬁrms use past sales to forecast future sales. Suppose a wholesale distributor of sporting goods is interested in forecasting its sales revenue for each of the next 5 years. Since an inaccurate forecast may have dire consequences to the distributor, efﬁciency of the estimation of regression parameters is an

Table 1

MSE’s for ^bFGand ^btmunder contaminated normal distribution (n= 30).

s _b^ FG b^ tmð0:1Þ b^ tmð0:2Þ b^ tmð0:3Þ ðd¼0Þ 0.2654 0.2663 0.2681 0.2690 ðd¼0:1Þ 3 0.3746 0.2874 0.2697 0.2698 5 0.7326 0.3644 0.3075 0.2964 10 2.3055 0.5463 0.4184 0.3714 ðd¼0:2Þ 3 0.5543 0.3963 0.3600 0.3300 5 1.2306 0.5819 0.4530 0.4229 10 4.4579 1.4236 0.7820 0.6012 ðd¼0:3Þ 3 0.7075 0.5380 0.4448 0.4101 5 1.7109 0.9723 0.6503 0.5749 10 6.5214 2.8921 1.5105 1.0893

(8)

important indicator in accuracy of forecasting. Data collected on a ﬁrm’s yearly sales revenue (1000s of dollars) with

sample size n= 35 has been analyzed by Mendenhall and Sincich (1993). Since the scatter plot of the data revealed a

linearly increasing trend, so a simple linear regression model yi¼

b

0þ

b

1xiþ

e

i, i ¼ 1, . . . ,35

seems to be reasonable to describe the trend, a simple. They ﬁrst analyzed it with the least squares method that yields

R2_{= 0.98 which indicates that it is appropriate to be formulated as a linear regression model. They further displayed a plot}

of the residuals that revealed the existence of AR(1) errors, and then the Durbin and Watson test was performed, rejecting

the null hypothesis

r

¼0. They also computed the prediction 95% conﬁdence intervals for yearly revenues for years, 36–40,

however, the interval estimates are wide, which makes the prediction of future observations less certain (see this point in Mendenhall and Sincich, 1993, p. 481). We expect to have better analysis, based on the feasible generalized trimmed mean, in some sense.

We follow their idea in evaluating the prediction for the yearly revenues for years 36–40. Since the observations of these are available, we may compute the following prediction MSE,

MSE ¼1 3 X35 i ¼ 33 ðyið ^

b

0þ ^

b

1xiÞÞ2, Table 2

MSE’s for ^bFGand ^btmunder contaminated normal distribution (n= 100).

s _b^ FG b^ tmð0:1Þ b^ tmð0:2Þ b^ tmð0:3Þ ðd¼0Þ 0.0921 0.0928 0.0934 0.0937 ðd¼0:1Þ 3 0.1308 0.0964 0.0960 0.0904 5 0.2539 0.1067 0.1059 0.0988 10 0.8494 0.1253 0.1154 0.1111 ðd¼0:2Þ 3 0.1962 0.1257 0.1182 0.1125 5 0.4249 0.1679 0.1343 0.1270 10 1.5763 0.2736 0.1574 0.1543 ðd¼0:3Þ 3 0.2522 0.1670 0.1466 0.1362 5 0.6266 0.2744 0.1844 0.1643 10 2.1937 0.7071 0.2683 0.2147 Table 3

MSE’s for predictors based on some estimators.

Estimator Estimate Observation Prediction MSE

^ bls 1:053 4:239 _146:10 151:40 150:90 0 B @ 1 C A 140:94 145:17 149:41 0 B @ 1 C A 67.503 ^ bFG 0:142 4:319 _142:67 146:99 151:31 0 B @ 1 C A 31.336 ^ b‘1 0:531 4:268 _141:38 145:64 149:91 0 B @ 1 C A 56.304 ^ btmð0:1Þ 0:859 4:386 143:88 148:27 152:65 0 B @ 1 C A 17.786 ^ btmð0:2Þ 0:072 4:364 _144:10 148:47 152:83 0 B @ 1 C A 16.302 ^ btmð0:3Þ 0:051 4:336 _143:15 147:49 151:83 0 B @ 1 C A 24.775 ^ bFG‘1 0:341 4:258 141:06 145:32 149:58 0 B @ 1 C A 21.343

(9)

where b^0 ^ b1 is the estimate of b0 b1

corresponding the estimator. The MSE, in this design, provides a numerical measure for

the performance of future observation prediction. For this example, estimators considered include LSE ^

b

ls, feasible GLS ^

b

FG,

‘1- norm estimator ^

b

‘1, feasible generalized trimmed mean ^

b

tmð

a

Þand feasible generalized ‘1- norm estimator ^

b

FG‘1and

their evaluated MSE’s are listed inTable 3.

There are several comments can be drawn fromTable 3:

(a) Without implement of the information of AR(1) errors, the least squares estimate ^

b

ls is really not appropriate in

prediction since it not only gives conﬁdence intervals too wide for future observations but also leads to large MSE in our design of prediction.

(b) The performance of ‘1- norm estimator ^

b

‘1also suffers from that it does not introduce the correlation between error

variables into its estimation.

(c) Although the feasible generalized LSE ^

b

FGconsiders the correlation between error variables, its performance is still

poorer than feasible generalized robust estimators.

(d) Surprisingly the feasible generalized trimmed means for several symmetric trimming proportions have MSE’s that are all smaller than those of the other three estimators. The feasible generalized trimmed mean not only has asymptotic optimal properties in the class of linear trimmed means but also shows an interesting fact in the prediction of future observations. This interesting result imply that the feasible generalized trimmed means are more capable in detection of the main trend showing in the data.

(e) The feasible generalized ‘1- norm estimator is much more efﬁcient than the ‘1- norm estimator ^

b

‘1 since it

considers the correlation between error variables. This estimator is also competitive with the feasible generalized trimmed mean.

Acknowledgments

The authors would like to express their greatest gratitude to three referees for their valuable comments and suggestions. The constructive and insightful comments from the referees greatly improved the quality of the paper.

Appendix A

Let

e

have distribution function F with probability density function f. Let zij represent the jth element of vector zi.

The following conditions are similar to the standard ones for linear regression models as given inRuppert and Carroll

(1980)andKoenker and Portnoy (1987):

(a3) n1Pn i ¼ 1z4ij¼Oð1Þ, (a4) n1_Z0_{Z ¼ Q} zþoð1Þ,n1H 0 0Z ¼ Qhzþoð1Þ and n1H 0

0H0¼Qhþoð1Þ where Qzand Qhare positive deﬁnite matrices and Qhz

is a full rank matrix.

(a5) n1Pn

i ¼ 1hi¼

y

hþoð1Þ, where

y

his a ﬁnite vector.

(a6) The probability density function and its derivative are both bounded and bounded away from 0 in a neighborhood of

F1_ð

_a

_Þ_for

_a

_{2 ð0,1Þ.}

(a7) n1=2_{ð ^}

_b

0

b

Þ ¼Opð1Þ.

Proof of Theorem 3.1. From condition (a2) and (A.10) ofRuppert and Carroll (1980), HH00AnZ

b

¼

b

þopðn1=2Þ. Inserting

(1.1) in Eq. (2.7), we have n1=2_{ð ^}

_b

lt

b

Þ ¼n1=2HH 0 0Ae, ðA:1Þ where we replace ð1

r

2_Þ1=2

_e

1 by e1 that have the same asymptotic representation. Now we develop a representation

of n1=2_H0

0Ae. Let Ujð

a

,TnÞ ¼n1=2Pi ¼ 1n hijeiIðeioF1ð

a

Þ þn1=2z

0

iTnÞ and Uð

a

,TnÞ ¼ ðU1ð

a

,TnÞ, . . . ,Upð

a

,TnÞÞ. Also, let T

nð

a

Þ ¼n1=2½ ^

b

ð

a

Þ

b

ð

a

Þ. Then n1=2H

0

0Ane ¼ Uð

a

2,Tnð

a

2ÞÞUð

a

1,Tnð

a

1ÞÞ. By conditions (a3) and (a6) and from Jureckova

and Sen’s (1987)extension of Billingsley’s Theorem (see alsoKoul, 1992), we have Ujð

a

,TnÞUjð

a

,0Þn1F1ð

a

Þf ðF1ð

a

ÞÞ Xn i ¼ 1 hijz 0 iTn ¼opð1Þ ðA:2Þ

(10)

for j =1,y,p and Tn= Op(1). We know that, fromLai et al. (2003), n1=2_{ð ^}

_b

_ð

_a

_Þ

_b

_ð

_a

_{ÞÞ ¼}_Q1 z f1ðF1ð

a

ÞÞn1=2 Xn i ¼ 1 zið

a

IðeirF1ð

a

ÞÞÞ þopð1Þ: ðA:3Þ

By condition (a4) and from (6.2) and (6.3) n1=2_H0 0Ane ¼ n1=2 Xn i ¼ 1 hieiIðF1ð

a

1ÞreirF1ð

a

2ÞÞ þF1ð

a

2ÞQhzQz1n 1=2Xn i ¼ 1 zið

a

2IðeirF1ð

a

2ÞÞ þF1_ð

_a

1ÞQhzQz1n1=2 Xn i ¼ 1 zið

a

1IðeirF1ð

a

1ÞÞ: ðA:4Þ

Then the theorem follows from (A.1) and (A.4).

The proof of Theorem 3.5 is analogous as it for the above and then is skipped. &

Proof of Lemma 3.3. Denote by plim (Bn) =B if Bnconverges to B in probability. Let

C ¼ HH0ðZ0AnZÞ1Z0:

With this, plim ðCZÞ ¼ plimðHH0ZÞplimðZ0AnZÞ1Z0Z ¼ 0.

Then ~ HQhH~0¼plimðHH0ðHH0Þ0Þ ¼plimððC þðZ0_A nZÞ1Z0ÞðC þ ðZ0AnZÞ1Z0Þ0Þ ¼plimðCC0_{Þ þplimððZ}0_A nZÞ1Z0ZðZ0AnZÞ1Þ ¼plimðCC0 Þ þ ð

a

2

a

1Þ2plimðZ0ZÞ1 Zð

a

2

a

1Þ2Qz1: &

Proof of Theorem 4.2. We here only brieﬂy sketch a proof of the theorem. For detailed references, see Chen et al. (2001) andLai et al. (2003). With the fact that n1=2_{ð ^}

_r

_{Þ ¼}_O

pð1Þ and conditions (a1), (a3) and (a6), we may see that

n1=2_{ð ^}

_b

ltm

b

Þ ¼n1=2HH

0

0Aeþ o^ pð1Þ: ðA:5Þ

By letting Mðt1,t2,

a

Þ ¼n1=2Pni ¼ 1hieiIðein1=2t1

e

i1rF1ð

a

Þ þn1=2ðziþn1=2t1xi1Þ0t2þn1=2t1F1ð

a

ÞÞ, we see that n1=2_Z_^0_A ne ¼ MðT1ð

a

2Þ,T2,

a

2ÞMðT1ð

a

1Þ,T2,

a

1Þ, ðA:6Þ with T 1ð

a

Þ ¼n1=2ð ^

b

ð

a

Þ

b

ð

a

ÞÞand T

2¼n1=2ð ^

r

Þ. However, using the same methods in the proof of Lemma 3.5 and by (a3)

and (a6), we can see that

mðT1,T2,

a

ÞMð0,0,

a

Þ ¼F1ð

a

Þf ðF1ð

a

ÞÞn1=2 Xn i ¼ 1 hiðz 0 iT2T1F1ð

a

ÞÞ þopð1Þ ðA:7Þ

for any sequences T1=Op(1) and T2= Op(1). Then, from (A.6) and (A.7), we see that n1=2H

0

0A^ne has the same representation

of (A.4). Then (a1) and (A.5) further implies the theorem. &

Proof of Theorem 4.5. The proof can be derived similarly with it for Theorem 3.1 where a representation of ^

Z

ð

a

Þmay be

applied that can be seen inRuppert and Carroll (1980). Hence, it is skipped. &

References

Aitken, A.C., 1935. On least squares and linear combination of observations. Proceedings of the Royal Society of Edinburgh 55, 42–48. Chen, L.-A., 1997. An efﬁcient class of weighted trimmed means for linear regression models. Statistica Sinica 7, 669–686.

Chen, L.-A., Chiang, Y.C., 1996. Symmetric type quantile and trimmed means for location and linear regression model. Journal of Nonparametric Statistics 7, 171–185.

Chen, L.-A., Welsh, A.H., Chan, W., 2001. Estimators for the linear regression model based on Winsorized observations. Statistica Sinica 11, 147–172. Chiang, Y.-C., Chen, L.-A., Yang, H.-C.P., 2006. Symmetric quantiles and their applications. Journal of Applied Statistics 33, 807–817.

Cochrane, D., Orcutt, G.H., 1949. Application of least squares regressions to relationships containing autocorrelated error terms. Journal of the American Statistical Association 44, 32–61.

De Jongh, P.J., De Wet, T., 1985. Trimmed mean and bounded inﬂuence estimators for the parameters of the AR(1) process. Communications in Statistics—Theory and Methods 14, 1357–1361.

De Jongh, P.J., De Wet, T., Welsch, A.H., 1988. Mallows-type bounded-inﬂuence-regression trimmed means. Journal of the American Statistical Association 83, 805–810.

Fomby, T.B., Hill, R.C., Johnson, S.R., 1984. Advanced Econometric Methods. Springer-Verlag, New York.

Giltinan, D.M., Carroll, R.J., Ruppert, D., 1986. Some new estimation methods for weighted regression when there are possible outliers. Technometrics 28, 219–230.

(11)

Jaeckel, L.A., 1971. Some ﬂexible estimates of location. Annals of Mathematical Statistics 42, 1540–1552.

Jureckova, J., Sen, P.K., 1984. On adaptive scale-equivariant M-estimators in linear models. Statistics and Decisions (1), 31–46. Jureckova, J., Sen, P.K., 1987. An extension of Billingsley’s theorem to higher dimension M-processes. Kybernetica 23, 382–387. Koenker, R.W., Bassett, G.W., 1978. Regression quantiles. Econometrica 46, 33–50.

Koenker, R., Portnoy, S., 1987. L-estimators for linear models. Journal of the American Statistical Association 82, 851–857. Koul, H.L., 1992. Weighted Empiricals and Linear Models. IMS Lecture Notes 21.

Krasker, W.S., 1985. Two stage bounded-influence estimators for simultaneous equations models. Journal of Business and Economic Statistics 4, 432–444. Krasker, W.S., Welsch, R.E., 1982. Efficient bounded influence regression estimation. Journal of the American Statistical Association 77, 595–604. Lai, Y.-H., Thompson, P., Chen, L.-A., 2003. Generalized and pseudo generalized trimmed means for the linear regression with AR(1) error model. Statistics

and Probability Letter 67, 203–211.

Mendenhall, W., Sincich, T., 1993. A Second Course in Business Statistics: Regression Analysis. Macmillan Publishing Company, New York. Ruppert, D., Carroll, R.J., 1980. Trimmed least squares estimation in the linear model. Journal of the American Statistical Association 75, 828–838. Serﬂing, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.