Linear trimmed means for the linear regression with AR(1)
errors model
Yi-Hsuan Lai
a, Lin-An Chen
b,, Chau-Shyun Tang
ca
Department of Applied Mathematics, Hsuan Chuang University, Hsinchu, Taiwan b
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan c
Department of Management Science, National Chiao Tung University, Hsinchu, Taiwan
a r t i c l e
i n f o
Article history:
Received 28 October 2008 Accepted 13 May 2010 Available online 1 June 2010 Keywords:
Gauss Markov theorem
Generalized least squares estimator Linear trimmed mean
Robust estimator
a b s t r a c t
For the linear regression with AR(1) errors model, the robust generalized and feasible generalized estimators ofLai et al. (2003)of regression parameters are shown to have the desired property of a robust Gauss Markov theorem. This is done by showing that these two estimators are the best among classes of linear trimmed means. Monte Carlo and data analysis for this technique have been performed.
&2010 Elsevier B.V. All rights reserved.
1. Introduction
Consider the linear regression model
y ¼ X
b
þe
, ð1:1Þwhere y is a vector of observations for the dependent variable, X is a known n p design matrix with 1’s in the first column,
and
e
is a vector of independent and identically distributed disturbance variables with a distribution of finite variance. Weconsider the problem of estimating the parameter vector
b
. From the Gauss–Markov theorem, it is known that the leastsquares estimator has the smallest covariance matrix in the class of unbiased linear estimators My where M satisfies MX=Ip.
However, the least squares estimator is sensitive to departures from normality and to the presence of outliers so we need to consider robust estimators. An interesting question in robust regression is if there is a robust Gauss–Markov theorem, i.e., if there is a robust estimator that is (asymptotically) more efficient than a class of linear robust estimators? This has been done byChen et al. (2001), who considered a class of estimators based on Winsorized observations and showed that the trimmed
mean ofWelsh (1987)is asymptotically the best among in this class.
Suppose that the error vector
e
¼ ðe
1, . . . ,e
nÞ0has the covariance matrix structureCovð
e
Þ ¼s
2O
, ð1:2Þwhere
O
is a positive definite matrix ands
is finite. From the regression theory of the estimation ofb
, it is known that anyestimator having an (asymptotic) covariance matrix of the form
d
ðX0O
1XÞ1 ð1:3Þ
Contents lists available atScienceDirect
journal homepage:www.elsevier.com/locate/jspi
Journal of Statistical Planning and Inference
0378-3758/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2010.05.015
Corresponding author.
is more efficient than the estimator having (asymptotic) covariance matrix of the form
d
ðX0XÞ1ðX0O
XÞðX0XÞ1, ð1:4Þwhere
d
is some positive constant. In the least squares estimation when the matrixO
is known,Aitken (1935)introduced thegeneralized least squares estimator (GLS) and showed that it has a covariance matrix of the form (1.3) and the LSE has a
covariance matrix of the form (1.4) with
d
¼s
2. It is also well known that, whenO
is unknown, the feasible generalized LSE hasthe asymptotic covariance matrix of the form (1.3). Thus these two generalized type estimators are asymptotically more efficient than the LSE.
Although the GLS and feasible GLS are asymptotically more efficient than the LSE in many regression problems, they are highly sensitive to even very small departures from normality and to the presence of outliers. Therefore developing robust generalized and feasible generalized estimators in each specific regression problem is important. We consider one of the most popular models, the linear regression with AR(1) errors model, a structure of (1.2), as follows:
yi¼x
0
i
b
þe
i, i ¼ 1, . . . ,n,e
i¼re
i1þei, ð1:5Þwhere e1,y,en are independent and identically distributed (iid) random variables. Suppose that j
r
jo1 and ei has adistribution function F.
Denote the transformed vector u ¼
O
1=20y. One approach to robust estimation is to construct a weighted observationvector u and then construct a consistent estimator which is linear in u. In case
r
is unknown, all vectors are replaced bythe ones with estimating
r
by estimator ^r
; see for example,Lai et al. (2003). There are two types of weighted observationvectors in this literature. First, u can represent a trimmed observation vector Au with A being a trimming matrix
constructed from regression quantiles (see Koenker and Bassett, 1978), or residuals based on an initial estimator (see
Ruppert and Carroll, 1980; Chen, 1997). Second, u can be a Winsorized observation vector that is defined as inWelsh (1987). In this paper, we use the trimmed observation vector of Koenker and Bassett (1978)to study classes of linear
functions based on u for estimation of
b
, and we develop a robust version of the Gauss–Markov theorem. Based onregression quantiles, Lai et al. (2003) proposed generalized and feasible generalized trimmed means for estimating
regression parameters
b
. Then robust generalized and feasible generalized estimation techniques have been developed.With the Gauss Markov theorem for linear regression with iid errors model, it is then interesting to see if there are any robust type generalized and feasible generalized estimators for linear regression with the AR(1) errors model that have the
desired property of robust Gauss Markov theorem. Our aim in this paper is to show that the estimators inLai et al. (2003)
does have this desired property.
In Section 2 we introduce a class of linear trimmed means when
r
is known and we establish their large sample theoryin Section 3. We also establish the theory for a class of linear trimmed means when
r
is unknown in Section 4. In bothcases, we show that the generalized and feasible generalized trimmed means are the best, respectively, in these two classes of linear trimmed means in terms of asymptotic covariance matrix. Monte Carlo studies and Data analysis are performed and their results are displayed in Section 5. Finally, the proofs of the theorems are presented in Appendix A.
2. Linear trimmed mean when
q
is knownFor the linear regression with AR(1) errors model (1.5), to obtain a linear trimmed mean we need to specify the quantile for determining the observation trimming and to make a transformation of the linear model to obtain generalized estimators. For a given i-th dependent variable for model (1.5), assuming that iZ 2 , one way to derive a generalized
estimator is to consider the one stepCochrane and Orcutt (C–O) (1949)procedure as yi¼
r
yi1þ ðxir
xi1Þ0b
þei. For errorvariable e, we assume that it has distribution function F with probability density function f. With the transformation for
generalized estimation, a quantile could be defined through variable e or a linear conditional quantile of yi 1and yi. By the
fact that xiis a vector with the first element 1, the following two events determined by two quantiles are equivalent:
eirF1ð
a
Þ ð2:1Þ and ðr
,1Þ yi1 yi ! rðr
,1Þ x 0 i1 x0 i !b
ða
Þ, ð2:2Þ withb
ða
Þ ¼b
þ 1 1r
F 1ða
Þ 0p1 0 B @ 1 C A:The event in inequality (2.1) specifies the quantile of the error variable e and through inequality (2.2) it specifies the
conditional quantile of linear function ð
r
,1Þ yi1yi
. Here
b
ða
Þis called the population regression quantile byKoenker andFor defining the linear trimmed means, we consider the one step C–O procedure on the matrix form of the linear regression with AR(1) error model of (1.5) which is
y ¼ X
b
þe
,where it can be seen that Covð
e
Þ ¼s
2O
withO
¼ 1 1r
2 1r
r
2r
n1r
1r
r
n2 ^ ^ ^ ^r
n1r
n2r
n3 1 0 B B B B @ 1 C C C C A: ð2:3ÞDefine the half matrix of
O
1asO
1=20¼ ð1r
2Þ1=2 0 0 0 0r
1 0 0 0 0r
1 0 0 ^ ^ ^ ^ ^ 0 0 0r
1 0 B B B B B B B @ 1 C C C C C C C A :With the above half matrix of
O
, we consider the model for the one step C–O procedure u ¼O
1=20y asu ¼ Z
b
þ ðð1r
2Þ1=2
e
1,e2,e3, . . . ,enÞ0, ð2:4Þwhere Z ¼
O
1=20X. Note that the vector u and the matrix Z are both functions of parameterr
. The usual descriptivestatistics, robust or nonrobust, based on model (1.1) can be carried over straightforwardly to the transformed model (2.4)
when
r
is known. However, whenr
is unknown, u and Z need to be replaced by the ones that place itsr
by its consistentestimator. Knowing that the GLS is simply the LSE of
b
for model (2.4), we may consider the linear trimmed mean definedon this transformed model. To validate the terminology calling the linear trimmed means with
r
being known andunknown, we will show that they are asymptotically equivalent in the sense of having the same asymptotic covariance matrix. This is what the GLS and feasible GLS performed.
For 0o
a
o1, thea
- th (sample) regression quantile ofKoenker and Bassett (1978)for the linear regression with AR(1)errors model is defined as ^
b
ða
Þ ¼arg b2Rp minX n i ¼ 1 ðuiz 0 ibÞða
Iðuirz 0 ibÞÞ, where uiand z 0i are the i-th rows of u and Z, respectively, and I(A) is the indicator function of the event A. Define the
trimming matrix as A ¼ diagfai¼Iðz
0
i
b
^ða
1Þruirz0
i
b
^ða
2ÞÞ: i ¼ 1, . . . ,ng with 0oa
1oa
2o1 where fractionsa
1and 1a
2ofobservations, respectively, from low tail and upper tail are trimmed. After the outliers are trimmed by regression quantiles ^
b
ða
Þand ^b
ð1a
Þ, we have the following submodel:Au ¼ AZ
b
þA ð1r
2Þ1=2e
1 e2 ^ en 0 B B B B @ 1 C C C C A: ð2:5ÞSince A is random, the error vector in the above transformed model is now not a set of independent variables. Koenker and
Bassett’s type generalized trimmed mean (proposed byLai et al., 2003) is defined as
^
b
tm¼ ðZ0AZÞ1Z0Au: ð2:6ÞWe now move to define the linear trimmed means. Any linear unbiased estimator defined in model of (2.4) has the form
Mu with M being a p n nonstochastic matrix satisfying MZ= Ip. Since M is a full-rank matrix, there exist matrices H and H0
such that M ¼ HH00. Thus, an estimator is a linear unbiased estimator if there exists a p p nonsingular matrix H and an
n p full-rank matrix H0such that the estimator can be written as
HH00u:
We generalize linear unbiased estimators defined on the observation vector u to estimators defined on Au by requiring
them to be of the form MAu where M ¼ HH00.
Definition 2.1. A statistic ^
b
ltm is called a ða
1,a
2Þlinear trimmed mean if there exists a stochastic p p matrix H and anonstochastic n p matrix H0such that it has the following representation:
^
b
ltm¼HH0
where H and H0satisfy the following two conditions:
(a1) nH- ~H in probability, where ~H is a full rank p p matrix.
(a2) HH00Z ¼ ð
a
2a
1Þ1Ipþopðn1=2Þwhere Ipis the p p identity matrix.This is similar to the usual requirements for unbiased estimation, except that we have introduced a trimmed observation vector to allow for robustness and considered an asymptotic property instead of unbiasedness.
Two questions arise for the class of linear trimmed means. First, does this class of means contain estimators that have already appeared in the literature? The answer is affirmative because the class of linear trimmed means defined in this
paper contains the generalized trimmed mean ofLai et al. (2003)(H ¼ ðZ0AZÞ1 and H
0=Z), and the set of Mallows-type
bounded influence trimmed means (H ¼ ðZ0WAZÞ1
and H0
0¼Z0W with W, a diagonal matrix of weights; see Section 3).
Second, is there a best estimator in this class of linear trimmed means and can we find it if it exists? This question will be answered in the next section.
With the one step C–O procedure, the half matrix
O
1=20 has rows with only a finite number (not depending on n) ofelements that depend on the parameter
r
. This trick, traditionally used in econometrics literature for regression with AR(1)errors (see, for example,Fomby et al., 1984, pp. 210–211), makes the study of asymptotic theory for ^
b
ltmða
Þsimilar to whatwe have for the classical trimmed mean for linear regression. Large sample representations of the linear trimmed mean and its role as generalized robust estimator will be introduced in the next section.
3. Asymptotic properties of linear trimmed mean
Let us denote by h0
ithe ith row of H
0
0,
y
h¼limn-1n1Pni ¼ 1hi, Qhz¼limn-1n1Pni ¼ 1hiz0
iand Qz¼limn-1n1Z0Z. We
assume throughout this section that conditions (a3)–(a6) in Appendix A are satisfied. So, for examples, Qhsis a full rank
matrix and Qzis positive definite matrix. It is not difficult to see that these conditions are satisfied in typical analysis of
variance designs and these hold in probability when the rows of X form a random sample from a very wide class of
distributions in Rp (see this point in detail inKoenker and Portnoy, 1987). The following theorem gives a ‘‘Bahadur’’
representation of the ð
a
1,a
2Þlinear trimmed mean.Theorem 3.1. With assumptions (a1)–(a6), we have n1=2
ð ^
b
ltmðb
þg
ltmÞÞ ¼n1=2H~Xn
i ¼ 1
fhiðeiIðF1ð
a
1ÞreirF1ða
2ÞÞl
Þþ ½F1ð
a
1ÞIðeioF1ða
1ÞÞ þF1ða
2ÞIðei4F1ða
2ÞÞðð1a
2ÞF1ða
2Þ þa
1F1ða
1ÞÞQhzQz1zig þopð1Þ, whereg
ltm¼l
H~y
h,l
¼ RF1ða 2ÞF1ða1Þe dFðeÞ and
y
his defined in assumption (a5).The limiting distribution of the ð
a
1,a
2Þlinear trimmed mean follows from the central limit theorem (see, e.g.Serfling,1980, p. 30).
Corollary 3.2. By conditions (a1)–(a6), n1=2ð ^
b
ltmð
b
þg
ltmÞÞhas an asymptotic normal distribution with zero mean vector andthe following asymptotic covariance matrix: Z F1ða 2Þ F1ða1Þ e2dFðeÞ
l
2 " # ~ HQhH~0þ ða
2a
1Þ2½a
1ðF1ða
1ÞÞ2þ ð1a
2ÞðF1ða
2ÞÞ2 ða
1F1ða
1Þ þ ð1a
2ÞF1ða
2ÞÞ22l
ða
1F1ða
1Þ þ ð1a
2ÞF1ða
2ÞÞQz1: ð3:1ÞThe ð
a
1,a
2Þgeneralized trimmed mean proposed byLai et al. (2003)is defined by^
b
tm¼ ðZ0AZÞ1Z0Au: ð3:2ÞFrom the result of this estimator studied byRuppert and Carroll (1980), we have
n1Z0AZ-ð
a
2
a
1ÞQz:By letting H ¼ ðZ0AZÞ1and H
0=Z, can see that condition (a2) also holds for ^
b
tm. So, the ða
1,a
2Þgeneralized trimmed mean isin the class of ð
a
1,a
2Þlinear trimmed mean’s. Moreover,Lai et al. (2003)provided the result that n1=2ð ^b
tmðb
þg
tmÞÞ, wherewhere
s
2 ða
1,a
2Þ ¼ ða
2a
1Þ2 Z F1ða 2Þ F1ða 1Þ ðel
Þ2dFðe
Þ þa
1ðF1ða
1ÞÞ2þ ð1a
2Þ " ðF1ða
2ÞÞ2ða
1F1ða
1Þ þ ð1a
2ÞF1ða
2ÞÞ22l
ða
1F1ða
1Þ þ ð1a
2ÞF1ða
2ÞÞ i : ð3:3ÞThe following lemma orders the matrices ~HQhH~0 and Qz.
Lemma 3.3. For any matrices ~H and Qhinduced from conditions (a1) and (a4), the difference
~
HQhH~0ð
a
2a
1Þ2Qz1 ð3:4Þis positive semidefinite.
The relation in (3.4) then implies the following main theorem.
Theorem 3.4. Under the conditions (a.3)–(a.6), the ð
a
1,a
2Þgeneralized trimmed mean ^b
tmof (3.2) is the best ða
1,a
2Þlineartrimmed mean.
Since the ð
a
1,a
2Þgeneralized trimmed mean always exists, then the best ða
1,a
2Þlinear trimmed mean always exists.A further question is how big is the class of ð
a
1,a
2Þlinear trimmed mean’s? However, we do not study the scope of thelinear trimmed means.
In the literature, consideration has been given to the development of estimators of regression parameters
b
that limitthe effects of the error variable and the independent variables. Among them, approaches which simultaneously bound the
influence of the design points and the residuals for the linear regression model includeKrasker and Welsch (1982)and
Krasker (1985). On the other hand, the approach of Mallow’s type bounded-influence trimmed mean is to bound
the influence of the design points and the residuals separately as applied in the AR(1) regression model byDe Jongh and De
Wet (1985)and in the linear regression model byDe Jongh et al. (1988). In a study byGiltinan et al. (1986), they found these two approaches are competitive in a way that neither is preferable to the other one. They also note that Mallow’s type estimators should theoretically give more stable inference than the Krasker–Welsch approach.
Let wi, i= 1,y,n, be real numbers. For 0o
a
o1, Mallow’s type bounded-influence regression quantile, denoted by ^b
wða
Þ,is defined as the solution for the minimization problem min b2Rp Xn i ¼ 1 wiðuiz 0 ibÞð
a
Iðuirz 0 ibÞÞ:With W the diagonal matrix of {wi,i=1,y,n}, the bounded influence trimmed mean is defined as
^
b
BI¼ ðZ0WAwZÞ1Z0WAwuwhere Aw¼diagfai: Iðz
0
i
b
^wða
1Þruirz0
i
b
^wða
2ÞÞ, i ¼ 1, . . . ,ng.Let H ¼ ðZ0WA
wZÞ1and H0= WZ. This shows that the bounded influence trimmed means also form a subclass of linear
trimmed means’s (seeDe Jongh et al., 1988for their large sample properties).
Theorem 3.5. If assumptions (a1)–(a5) hold, then
(a) n1=2 ð ^
b
BIðb
þg
wÞÞ ¼ ða
2a
1Þ1Qw1n1=2 Xn i ¼ 1 wizi½ðeiIðF1ða
1ÞreirF1ða
2ÞÞl
Þ þF1 ða
1ÞIðeioF1ða
1ÞÞ þF1ða
2ÞIðei4F1ða
2ÞÞðð1a
2ÞF1ða
2Þ þa
1F1ða
1ÞÞ þopð1Þ, whereg
w¼ ða
2a
1Þ1l
Qw1y
w, Qw¼limn-1n1Pni ¼ 1wiziz 0i, a positive definite matrix, and
y
w¼limn-1n1Pni ¼ 1 wiziand (b) n1=2ð ^
b
BIð
b
þg
wÞÞ-Nð0,s
2ða
1,a
2ÞQw1QwwQw1Þwhere Qww¼limn-1n1Pni ¼ 1w2iziz0
i, a positive definite matrix.
In particular, ^
b
tmis the example of ^b
BI with W as the identity matrix, and it then belongs to this subclass. We mayalso show that Qw1QwwQw1 Qz1 is positive semidefinite which shows that ^
b
tm is the best bounded influencetrimmed mean.
Theorem 3.6. The ð
a
1,a
2Þgeneralized trimmed mean is the best bounded influence trimmed mean.This result is based solely on considerations of the asymptotic variance and ignores the fact that generalized trimmed mean does not have bounded influence in the space of independent variables. It confirms the notion that bounded influence is achieved at the cost of efficiency.
4. Linear trimmed means when
q
is unknownAfter the development of the theory of the linear trimmed means for the case where
r
is known, the next interestingproblem is whether, when the parameter
r
is unknown, the linear trimmed mean of (2.7) withr
replaced by a consistentestimator ^
r
will have the same asymptotic behavior as displayed by ^b
ltm. If so, the theory of generalized least squaresestimation is then carried over to the theory of robust estimation in this specific linear regression model. Let ^
O
be thematrix of
O
withr
replaced by its consistent estimator ^r
which could be the LSE through the C–O estimation procedure.Define matrices ^u ¼ ^
O
1=2 0 y, ^Z ¼ ^O
1=2 0 X and ^e ¼ ^O
1=2 0e
. Let the regression quantile when the parameterr
is unknown bedefined as ^
b
ða
Þ ¼arg b2Rp minX n i ¼ 1 ð ^ui ^z 0 ibÞða
Ið ^uir ^z 0 ibÞÞ, where ^ui and ^z 0i are i-th rows of ^u and ^Z , respectively. Define the trimming matrix as ^A ¼ diagfai¼Ið^z
0 i
b
^ ða
1Þr ^ uir ^z 0 ib
^ ða
2ÞÞ: i ¼ 1, . . . ,ng.Definition 4.1. A statistic, ^
b
ltm, is called a ða
1,a
2Þlinear trimmed mean if there exists stochastic p p and n p matrices,respectively, H and H0such that it has the following representation:
^
b
ltm¼HH0
0A ^^u,
where H and H0satisfy conditions (a1) and (a2) for these H and H0.
Koenker and Bassett’s feasible generalized trimmed mean is defined as ^
b
tm¼ ð ^Z0A ^^Z Þ1Z^0A ^^u:FromLai et al. (2003), we can see that n1Z^0A ^^Z-p
ð
a
2a
1ÞQz. By letting H ¼ ð ^Z0A ^^Z Þ1and H0¼ ^Z , we see that ^b
tmis in the
class of ð
a
1,a
2Þlinear trimmed means.Lai et al. (2003)also showed that ^b
tmand ^
b
tmhave the same Bahadur representationand then they have the same asymptotic distribution. The following theorem states that the linear trimmed means for the
cases where
r
is known and unknown have the same large sample properties.Theorem 4.2. pffiffiffinð ^
b
ltm ^b
ltmÞ ¼opð1Þ.We then have the result that the feasible generalized trimmed mean is the best linear trimmed mean when
r
is uknown.Theorem 4.3. The feasible generalized trimmed mean is the best linear trimmed mean.
For the rest of this section, we will consider several related questions. First, is the best linear trimmed mean unique for this linear regression with AR(1) errors model? For this, we develop an analogous optimal theory for the trimmed mean of Welsh (1987). Let ^
b
0be an initial estimator ofb
for model (2.4). Let ^Z
ða
1Þand ^Z
ða
2Þrepresent, respectively, thea
1anda
2thempirical quantiles of the regression residuals e
i ¼uiz
0
i
b
^0, i ¼ 1, . . . ,n. The Winsorized observation defined byWelsh(1987)is u
i ¼uiIð ^
Z
ða
1Þreir ^Z
ða
2ÞÞ þ ^Z
ða
1ÞðIðeio ^Z
ða
1ÞÞa
1Þ þ ^Z
ða
2ÞðIðei4 ^Z
ða
2ÞÞð1a
2ÞÞ:Let u¼ ðu
1, . . . ,unÞ0 and denote the trimming matrix by B ¼ diagðIð ^
Z
ða
1Þreir ^Z
ða
2ÞÞ, i ¼ 1, . . . ,nÞ.Definition 4.4. A statistic ^
b
lwis called a ða
1,a
2ÞWelsh’s type linear trimmed mean if there exists a stochastic p p matrix Hand a nonstochastic n p matrix H0such that it has the following representation:
^
b
lw¼HH0
0u,
where H and H0satisfy conditions (a1) and (a2).
Theorem 4.5. With assumptions (a1)–(a7), ^
b
lwand ^b
ltmhave the same Bahadur representation of Theorem 3.1 and then theyhave the same asymptotic distribution.
If we let H ¼ ðZ0BZÞ1and H
0= Z, we see that the generalizedWelsh’s (1987)trimmed mean of as
^
b
w¼ ðZ0BZÞ1Z0u, ð4:1Þis a member of ð
a
1,a
2ÞWelsh’s type linear trimmed mean. We then have the following theorem.Theorem 4.6. The generalized Welsh’s trimmed mean ^
b
w and generalized trimmed mean ^b
tm have the same asymptoticdistribution. Hence, ^
b
wis the best ða
1,a
2ÞWelsh’s type linear trimmed mean.The above theorem shows that it is not unique for existence of best robust generalized estimator. As mentioned by one referee that further study of the generalized Huber’s M estimator may provide one more example of best robust
generalized estimator (seeJureckova and Sen, 1984for a representation of Huber’s M estimator). Hence, further searching for some other types of best robust generalized estimators is not desirable. One interesting question is that if there is one best robust generalized estimator that has asymptotic variances identical or close to the Cramer-Rao lower bound when the error variables follow heavy tail distributions. For the classical linear regression model, the symmetric trimmed mean ofChen and Chiang (1996)andChiang et al. (2006)has been shown to have asymptotic variances close to the Cramer–Rao lower bound. Hence, showing an extension of symmetric trimmed mean on this linear regression with AR(1) errors model may provide the desirable solution.
We also note here that a best linear trimmed mean is confined in comparison of linear trimmed means of fixed
trimming percentages ð
a
1,a
2Þ. Without knowing the distribution F, can we estimate the best percentages ða
1,a
2Þin terms ofasymptotic covariance matrix that still is a best linear trimmed mean? To attack this problem, it involves minimizing the
asymptotic covariance matrix (see the approach of Jaeckel, 1971) and the development of optimal theory will be
complicated that needs further investigation. 5. Monte Carlo study and example
In this section, we first consider a simulation study to compare the feasible GLS ^
b
FG and the feasible generalizedtrimmed mean ^
b
tm. By letting ^e
i¼yix0
i
b
^ls, where ^b
lsis the LSE ofb
, we estimater
by ^r
¼ ^r
byPn
i ¼ 2
e
^ie
^i1=Pni ¼ 2e
^ 2i. With
sample size n =30, the simple linear regression model, yi¼
b
0þb
1xi1þe
iwheree
ifollows the AR(1) error is considered. Forthis simulation, we let the true parameter values of
b
0andb
1be 1’s andr
be 0.3. This simulation is conducted with thesame data generation system, except that the error variable ei is generated from the mixed normal distribution
ð1
d
ÞNð0,1Þ þd
Nð0,s
2Þwithd
¼0:1,0:2,0:3 ands
¼3,5,10 and xiare independent normal random variables with mean i/2
and variance 1. A total of 10 000 replications were performed and we compute the mean squares errors (MSE) for the
feasible generalized LSE ^
b
FG and feasible generalized trimmed mean ^b
tmfor
a
1¼1a
2¼a
¼0:1,0:2,0:3 where the totalmean squared error is the square of the Euclidean distance between the estimator and true regression parameter
b
. Forconvenience, below in this section we re-denote the feasible generalized trimmed mean by ^
b
tmða
Þ. The MSEs are listed inTables 1 and 2.
We may draw several conclusions fromTables 1 and 2:
(a) The case
d
¼0 indicates thate
ifollows a normal distribution. Then the results in these two tables fulfill the statisticaltheory that the feasible generalized least squares estimator ^
b
FG is more efficient than other consistent estimators.However, the trimmed means are still efficient in this ideal design.
(b) The MSE’s of these two estimators both increase when the contaminated percentage
d
increases or contaminatedvariance
s
2increases. This verifies the performance of the usual estimators, robust or non-robust.(c) The feasible generalized trimmed mean is relatively more efficient than the feasible generalized LSE in all cases of contaminated errors. This result shows that the feasible generalized trimmed mean is indeed, among the class of linear trimmed means, a robust one.
(d) The simulation results displaying in these two tables show the MSE in most cases of
d
ands
(not ind
¼0 andð
d
,s
Þ ¼ ð0:1,3Þ an decreasing trend witha
increasing). Some further simulation results in our experience show that MSEgoes up for
a
not too far after 0.3.Next we consider real data regression analysis. Many firms use past sales to forecast future sales. Suppose a wholesale distributor of sporting goods is interested in forecasting its sales revenue for each of the next 5 years. Since an inaccurate forecast may have dire consequences to the distributor, efficiency of the estimation of regression parameters is an
Table 1
MSE’s for ^bFGand ^btmunder contaminated normal distribution (n= 30).
s b^ FG b^ tmð0:1Þ b^ tmð0:2Þ b^ tmð0:3Þ ðd¼0Þ 0.2654 0.2663 0.2681 0.2690 ðd¼0:1Þ 3 0.3746 0.2874 0.2697 0.2698 5 0.7326 0.3644 0.3075 0.2964 10 2.3055 0.5463 0.4184 0.3714 ðd¼0:2Þ 3 0.5543 0.3963 0.3600 0.3300 5 1.2306 0.5819 0.4530 0.4229 10 4.4579 1.4236 0.7820 0.6012 ðd¼0:3Þ 3 0.7075 0.5380 0.4448 0.4101 5 1.7109 0.9723 0.6503 0.5749 10 6.5214 2.8921 1.5105 1.0893
important indicator in accuracy of forecasting. Data collected on a firm’s yearly sales revenue (1000s of dollars) with
sample size n= 35 has been analyzed by Mendenhall and Sincich (1993). Since the scatter plot of the data revealed a
linearly increasing trend, so a simple linear regression model yi¼
b
0þb
1xiþe
i, i ¼ 1, . . . ,35seems to be reasonable to describe the trend, a simple. They first analyzed it with the least squares method that yields
R2= 0.98 which indicates that it is appropriate to be formulated as a linear regression model. They further displayed a plot
of the residuals that revealed the existence of AR(1) errors, and then the Durbin and Watson test was performed, rejecting
the null hypothesis
r
¼0. They also computed the prediction 95% confidence intervals for yearly revenues for years, 36–40,however, the interval estimates are wide, which makes the prediction of future observations less certain (see this point in Mendenhall and Sincich, 1993, p. 481). We expect to have better analysis, based on the feasible generalized trimmed mean, in some sense.
We follow their idea in evaluating the prediction for the yearly revenues for years 36–40. Since the observations of these are available, we may compute the following prediction MSE,
MSE ¼1 3 X35 i ¼ 33 ðyið ^
b
0þ ^b
1xiÞÞ2, Table 2MSE’s for ^bFGand ^btmunder contaminated normal distribution (n= 100).
s b^ FG b^ tmð0:1Þ b^ tmð0:2Þ b^ tmð0:3Þ ðd¼0Þ 0.0921 0.0928 0.0934 0.0937 ðd¼0:1Þ 3 0.1308 0.0964 0.0960 0.0904 5 0.2539 0.1067 0.1059 0.0988 10 0.8494 0.1253 0.1154 0.1111 ðd¼0:2Þ 3 0.1962 0.1257 0.1182 0.1125 5 0.4249 0.1679 0.1343 0.1270 10 1.5763 0.2736 0.1574 0.1543 ðd¼0:3Þ 3 0.2522 0.1670 0.1466 0.1362 5 0.6266 0.2744 0.1844 0.1643 10 2.1937 0.7071 0.2683 0.2147 Table 3
MSE’s for predictors based on some estimators.
Estimator Estimate Observation Prediction MSE
^ bls 1:053 4:239 146:10 151:40 150:90 0 B @ 1 C A 140:94 145:17 149:41 0 B @ 1 C A 67.503 ^ bFG 0:142 4:319 142:67 146:99 151:31 0 B @ 1 C A 31.336 ^ b‘1 0:531 4:268 141:38 145:64 149:91 0 B @ 1 C A 56.304 ^ btmð0:1Þ 0:859 4:386 143:88 148:27 152:65 0 B @ 1 C A 17.786 ^ btmð0:2Þ 0:072 4:364 144:10 148:47 152:83 0 B @ 1 C A 16.302 ^ btmð0:3Þ 0:051 4:336 143:15 147:49 151:83 0 B @ 1 C A 24.775 ^ bFG‘1 0:341 4:258 141:06 145:32 149:58 0 B @ 1 C A 21.343
where b^0 ^ b1 is the estimate of b0 b1
corresponding the estimator. The MSE, in this design, provides a numerical measure for
the performance of future observation prediction. For this example, estimators considered include LSE ^
b
ls, feasible GLS ^b
FG,‘1- norm estimator ^
b
‘1, feasible generalized trimmed mean ^b
tmða
Þand feasible generalized ‘1- norm estimator ^b
FG‘1andtheir evaluated MSE’s are listed inTable 3.
There are several comments can be drawn fromTable 3:
(a) Without implement of the information of AR(1) errors, the least squares estimate ^
b
ls is really not appropriate inprediction since it not only gives confidence intervals too wide for future observations but also leads to large MSE in our design of prediction.
(b) The performance of ‘1- norm estimator ^
b
‘1also suffers from that it does not introduce the correlation between errorvariables into its estimation.
(c) Although the feasible generalized LSE ^
b
FGconsiders the correlation between error variables, its performance is stillpoorer than feasible generalized robust estimators.
(d) Surprisingly the feasible generalized trimmed means for several symmetric trimming proportions have MSE’s that are all smaller than those of the other three estimators. The feasible generalized trimmed mean not only has asymptotic optimal properties in the class of linear trimmed means but also shows an interesting fact in the prediction of future observations. This interesting result imply that the feasible generalized trimmed means are more capable in detection of the main trend showing in the data.
(e) The feasible generalized ‘1- norm estimator is much more efficient than the ‘1- norm estimator ^
b
‘1 since itconsiders the correlation between error variables. This estimator is also competitive with the feasible generalized trimmed mean.
Acknowledgments
The authors would like to express their greatest gratitude to three referees for their valuable comments and suggestions. The constructive and insightful comments from the referees greatly improved the quality of the paper.
Appendix A
Let
e
have distribution function F with probability density function f. Let zij represent the jth element of vector zi.The following conditions are similar to the standard ones for linear regression models as given inRuppert and Carroll
(1980)andKoenker and Portnoy (1987):
(a3) n1Pn i ¼ 1z4ij¼Oð1Þ, (a4) n1Z0Z ¼ Q zþoð1Þ,n1H 0 0Z ¼ Qhzþoð1Þ and n1H 0
0H0¼Qhþoð1Þ where Qzand Qhare positive definite matrices and Qhz
is a full rank matrix.
(a5) n1Pn
i ¼ 1hi¼
y
hþoð1Þ, wherey
his a finite vector.(a6) The probability density function and its derivative are both bounded and bounded away from 0 in a neighborhood of
F1ð
a
Þfora
2 ð0,1Þ.(a7) n1=2ð ^
b
0
b
Þ ¼Opð1Þ.Proof of Theorem 3.1. From condition (a2) and (A.10) ofRuppert and Carroll (1980), HH00AnZ
b
¼b
þopðn1=2Þ. Inserting(1.1) in Eq. (2.7), we have n1=2ð ^
b
ltb
Þ ¼n1=2HH 0 0Ae, ðA:1Þ where we replace ð1r
2Þ1=2e
1 by e1 that have the same asymptotic representation. Now we develop a representation
of n1=2H0
0Ae. Let Ujð
a
,TnÞ ¼n1=2Pi ¼ 1n hijeiIðeioF1ða
Þ þn1=2z0
iTnÞ and Uð
a
,TnÞ ¼ ðU1ða
,TnÞ, . . . ,Upða
,TnÞÞ. Also, let Tnð
a
Þ ¼n1=2½ ^b
ða
Þb
ða
Þ. Then n1=2H0
0Ane ¼ Uð
a
2,Tnða
2ÞÞUða
1,Tnða
1ÞÞ. By conditions (a3) and (a6) and from Jureckovaand Sen’s (1987)extension of Billingsley’s Theorem (see alsoKoul, 1992), we have Ujð
a
,TnÞUjða
,0Þn1F1ða
Þf ðF1ða
ÞÞ Xn i ¼ 1 hijz 0 iTn ¼opð1Þ ðA:2Þfor j =1,y,p and Tn= Op(1). We know that, fromLai et al. (2003), n1=2ð ^
b
ða
Þb
ða
ÞÞ ¼Q1 z f1ðF1ða
ÞÞn1=2 Xn i ¼ 1 ziða
IðeirF1ða
ÞÞÞ þopð1Þ: ðA:3ÞBy condition (a4) and from (6.2) and (6.3) n1=2H0 0Ane ¼ n1=2 Xn i ¼ 1 hieiIðF1ð
a
1ÞreirF1ða
2ÞÞ þF1ða
2ÞQhzQz1n 1=2Xn i ¼ 1 ziða
2IðeirF1ða
2ÞÞ þF1ða
1ÞQhzQz1n1=2 Xn i ¼ 1 ziða
1IðeirF1ða
1ÞÞ: ðA:4ÞThen the theorem follows from (A.1) and (A.4).
The proof of Theorem 3.5 is analogous as it for the above and then is skipped. &
Proof of Lemma 3.3. Denote by plim (Bn) =B if Bnconverges to B in probability. Let
C ¼ HH0ðZ0AnZÞ1Z0:
With this, plim ðCZÞ ¼ plimðHH0ZÞplimðZ0AnZÞ1Z0Z ¼ 0.
Then ~ HQhH~0¼plimðHH0ðHH0Þ0Þ ¼plimððC þðZ0A nZÞ1Z0ÞðC þ ðZ0AnZÞ1Z0Þ0Þ ¼plimðCC0Þ þplimððZ0A nZÞ1Z0ZðZ0AnZÞ1Þ ¼plimðCC0 Þ þ ð
a
2a
1Þ2plimðZ0ZÞ1 Zða
2a
1Þ2Qz1: &Proof of Theorem 4.2. We here only briefly sketch a proof of the theorem. For detailed references, see Chen et al. (2001) andLai et al. (2003). With the fact that n1=2ð ^
r
r
Þ ¼Opð1Þ and conditions (a1), (a3) and (a6), we may see that
n1=2ð ^
b
ltm
b
Þ ¼n1=2HH0
0Aeþ o^ pð1Þ: ðA:5Þ
By letting Mðt1,t2,
a
Þ ¼n1=2Pni ¼ 1hieiIðein1=2t1e
i1rF1ða
Þ þn1=2ðziþn1=2t1xi1Þ0t2þn1=2t1F1ða
ÞÞ, we see that n1=2Z^0A ne ¼ MðT1ða
2Þ,T2,a
2ÞMðT1ða
1Þ,T2,a
1Þ, ðA:6Þ with T 1ða
Þ ¼n1=2ð ^b
ða
Þb
ða
ÞÞand T2¼n1=2ð ^
r
r
Þ. However, using the same methods in the proof of Lemma 3.5 and by (a3)and (a6), we can see that
mðT1,T2,
a
ÞMð0,0,a
Þ ¼F1ða
Þf ðF1ða
ÞÞn1=2 Xn i ¼ 1 hiðz 0 iT2T1F1ða
ÞÞ þopð1Þ ðA:7Þfor any sequences T1=Op(1) and T2= Op(1). Then, from (A.6) and (A.7), we see that n1=2H
0
0A^ne has the same representation
of (A.4). Then (a1) and (A.5) further implies the theorem. &
Proof of Theorem 4.5. The proof can be derived similarly with it for Theorem 3.1 where a representation of ^
Z
ða
Þmay beapplied that can be seen inRuppert and Carroll (1980). Hence, it is skipped. &
References
Aitken, A.C., 1935. On least squares and linear combination of observations. Proceedings of the Royal Society of Edinburgh 55, 42–48. Chen, L.-A., 1997. An efficient class of weighted trimmed means for linear regression models. Statistica Sinica 7, 669–686.
Chen, L.-A., Chiang, Y.C., 1996. Symmetric type quantile and trimmed means for location and linear regression model. Journal of Nonparametric Statistics 7, 171–185.
Chen, L.-A., Welsh, A.H., Chan, W., 2001. Estimators for the linear regression model based on Winsorized observations. Statistica Sinica 11, 147–172. Chiang, Y.-C., Chen, L.-A., Yang, H.-C.P., 2006. Symmetric quantiles and their applications. Journal of Applied Statistics 33, 807–817.
Cochrane, D., Orcutt, G.H., 1949. Application of least squares regressions to relationships containing autocorrelated error terms. Journal of the American Statistical Association 44, 32–61.
De Jongh, P.J., De Wet, T., 1985. Trimmed mean and bounded influence estimators for the parameters of the AR(1) process. Communications in Statistics—Theory and Methods 14, 1357–1361.
De Jongh, P.J., De Wet, T., Welsch, A.H., 1988. Mallows-type bounded-influence-regression trimmed means. Journal of the American Statistical Association 83, 805–810.
Fomby, T.B., Hill, R.C., Johnson, S.R., 1984. Advanced Econometric Methods. Springer-Verlag, New York.
Giltinan, D.M., Carroll, R.J., Ruppert, D., 1986. Some new estimation methods for weighted regression when there are possible outliers. Technometrics 28, 219–230.
Jaeckel, L.A., 1971. Some flexible estimates of location. Annals of Mathematical Statistics 42, 1540–1552.
Jureckova, J., Sen, P.K., 1984. On adaptive scale-equivariant M-estimators in linear models. Statistics and Decisions (1), 31–46. Jureckova, J., Sen, P.K., 1987. An extension of Billingsley’s theorem to higher dimension M-processes. Kybernetica 23, 382–387. Koenker, R.W., Bassett, G.W., 1978. Regression quantiles. Econometrica 46, 33–50.
Koenker, R., Portnoy, S., 1987. L-estimators for linear models. Journal of the American Statistical Association 82, 851–857. Koul, H.L., 1992. Weighted Empiricals and Linear Models. IMS Lecture Notes 21.
Krasker, W.S., 1985. Two stage bounded-influence estimators for simultaneous equations models. Journal of Business and Economic Statistics 4, 432–444. Krasker, W.S., Welsch, R.E., 1982. Efficient bounded influence regression estimation. Journal of the American Statistical Association 77, 595–604. Lai, Y.-H., Thompson, P., Chen, L.-A., 2003. Generalized and pseudo generalized trimmed means for the linear regression with AR(1) error model. Statistics
and Probability Letter 67, 203–211.
Mendenhall, W., Sincich, T., 1993. A Second Course in Business Statistics: Regression Analysis. Macmillan Publishing Company, New York. Ruppert, D., Carroll, R.J., 1980. Trimmed least squares estimation in the linear model. Journal of the American Statistical Association 75, 828–838. Serfling, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.