• 沒有找到結果。

Generalized and pseudo-generalized trimmed means for the linear regression with AR(l) error model

N/A
N/A
Protected

Academic year: 2021

Share "Generalized and pseudo-generalized trimmed means for the linear regression with AR(l) error model"

Copied!
9
0
0

加載中.... (立即查看全文)

全文

(1)

Statistics & Probability Letters 67 (2004) 203–211

Generalized and pseudo-generalized trimmed means for

the linear regression with AR(1) error model

Yi-Hsuan Lai

a

, Peter Thompson

b

, Lin-An Chen

a;∗

aInstitute of Statistics, National Chiao Tung University, 1001 Ta Hsueh Rd, Hsinchu, Taiwan bMathematics Department, Wabash College, Crawfordsville, IN 47933, USA

Received April 2003

Abstract

We propose a generalized and pseudo-generalized trimmed means for the linear regression with AR(1) errors model. These will play the role of robust-type generalized and pseudo-generalized estimators for this regression model. Their asymptotic distributions are developed.

c

 2003 Elsevier B.V. All rights reserved.

Keywords: Generalized estimation; Linear regression; Trimmed mean

1. Introduction

For some regression models such as linear regression with AR(1) errors or the seemingly unrelated regression model, the generalized least-squares estimator (GLSE) and the pseudo-generalized least-squares estimator (PGLSE) have the advantage that their variances (or asymp-totic variances) are smaller than that of the least-squares estimator (LSE). However, the GLSE and the PGLSE are sensitive to departures from normality and to the presence of outliers. Hence, extend-ing these concepts to robust estimation is an interestextend-ing topic in regression analysis. The concept of developing robust-type generalized estimators in regression analysis is not new.Koenker and Portnoy (1990)introduced this interesting idea and developed the generalized M-estimators for the estimation of regression parameters of the multivariate regression model. Although considering only general-ized estimation, their approach initiated interest in robust type generalgeneral-ized and pseudo-generalgeneral-ized estimators for estimation of regression parameters. Rather than multivariate regression, we consider

Corresponding author.

E-mail address: lachen@stat.nctu.edu.tw(L.-A. Chen).

0167-7152/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2003.08.003

(2)

the linear regression with AR(1) errors model yi= xi + i; i = 1; : : : ; n;

i= i−1+ ei; (1.1)

where | | ¡ 1, ei; i = 1; : : : ; n are i.i.d. variables with mean zero and variance 2, and xi is a known

design p-vector with value 1 in its Drst element. From the regression theory on the estimation of , it is known that, when is known, the GLSE and, when is unknown, the PGLSE have (or asymptotically have) the same covariance matrix, which is smaller than that of the LSE. To see the sensitivity of the GLSE and the PGLSE, let X= (x1; : : : ; xn) and  = Cov() with  = (1; : : : ; n),

both GLSE and the PGLSE have a (asymptotic) covariance matrix of the form

2(X−1X )−1: (1.2)

The sensitivity is clear from the fact that 2 could be arbitrarily large when ei has a heavy tailed

distribution.

The fact that 2 is sensitive to the error distribution motivates us to consider robust estimators

that have a (asymptotic) covariance matrix of the form

(X−1X )−1; (1.3)

where robustness means that  is insensitive to heavy tailed distributions. Based on the regression quantiles of Koenker and Bassett (1978), we will introduce the generalized trimmed mean (GTM) and the pseudo-generalized trimmed mean (PGTM) to play the role of robust-type generalized and pseudo-generalized estimators for the linear regression with AR(1) errors model.

We introduce the concepts of GTM and PGTM in Section 2 and establish their large sample theory in Section 3. Finally, the proofs of the theorems are displayed in Appendix.

2. Generalized and pseudo-generalized trimmed means

For the linear regression with AR(1) errors model (1.1), to obtain a GTM we need to specify the quantile for determining the observation trimming and to make a transformation of the linear model to obtain generalized estimators. For the given ith-dependent variable for model (1.1), assuming that i ¿ 2, one way to derive a generalized estimator is to consider the transformation by Cochrane and Orcutt C–O, (1949) as yi= yi−1+ (xi− xi−1) + ei. For error variable e, we assume that it has

distribution function F with probability density function f. With the transformation for generalized estimation, a quantile could be deDned through variable e or a linear conditional quantile of yi−1

and yi. By the fact that xi is a vector with Drst element 1, the following two events determined by

two quantiles are equivalent:

ei6 F−1() (2.1) and (− ; 1)  yi−1 yi  6 (− ; 1)  x i−1 xi  (); (2.2)

(3)

with () =  +   1 1 − F−1() 0p−1   :

The event in inequality (2.1) speciDes the quantile of the error variable e and it, through inequality (2.2), speciDes the conditional quantile of the linear function (− ; 1)yi1

yi



. Here () is called the population regression quantile by Koenker and Bassett (1978). With the speciDcation of quantiles and transformation, we may deDne the generalized trimmed means.

For deDning the generalized trimmed means, we consider the C–O transformation on the matrix form of the linear regression with AR(1) error model of (1.1) which is

y = X + ;

where it is seen that Cov() = 2 with

 = 1 1 − 2         1 2 · · · n−1 1 · · · n−2 ... ... ... ... n−1 n−2 n−3 · · · 1         :

DeDne the half matrix of −1 as

−1=2 =           (1 − 2)1=2 0 0 · · · 0 0 1 0 · · · 0 0 0 − 1 · · · 0 0 ... ... ... ... ... 0 0 0 · · · − 1           :

The C–O transformation is u = Z + ((1 − 2)1=2

1; e2; e3; : : : ; en); (2.3)

where u = −1=2

y and Z = (z1; : : : ; zn)= −1=2X . It is known that the GLSE is simply the LSE of

 for model (2.3).

For 0 ¡  ¡ 1, the th (sample) regression quantile of Koenker and Bassett (1978) for the linear regression with AR(1) errors model is deDned as

ˆG() = argb∈Rpmin

n

i=1

(4)

where ui and zi are the ith rows of u and Z, respectively. We are now ready to deDne a generalized

trimmed mean based on regression quantiles.

Denition 2.1. DeDne the trimming matrix as An=diag{ai=I(ziˆG(1) 6 ui6 ziˆG(2)): i=1; : : : ; n}.

The Koenker and Bassett’s-type GTM is deDned as

LG(1; 2) = (ZAnZ)−1ZAnu: (2.4)

After the development of the GTM, the next interesting problem is whether when the parameter is unknown, the trimmed mean of (2.4) with replaced by a consistent estimator ˆ , will have the same asymptotic behavior as displayed by LG(1; 2). If yes, the theory of generalized least-squares

estimation is then carried over to the theory of robust estimation in this speciDc linear regression model. Let ˆ be the matrix of  with replaced by its consistent estimator ˆ . DeDne matrices

ˆu = ˆ−1=2

y, ˆZ = ˆ−1=2

X and ˆe = ˆ−1=2

. Let the regression quantile when the parameter is unknown be deDned as ˆPG() = argb∈Rpmin n i=1 ( ˆui− ˆzib)( − I( ˆui6 ˆzib));

where ˆui and ˆzi are ith rows of ˆu and ˆZ, respectively.

Denition 2.2. DeDne the trimming matrix as ˆAn = diag{ai = I( ˆziˆPG(1) 6 ˆui6 ˆziˆPG(2)):

i = 1; : : : ; n}. The Koenker and Bassett’s-type PGTM is deDned as LPG(1; 2) = ( ˆZˆAnˆZ)−1ˆZAnˆu:

With the C–O transformation, the half matrix −1=2

has rows with only a Dnite number (not depending on n) of elements that depend on the unknown parameter . This trick, traditionally used in econometrics literature for regression with AR(1) errors (see, for example, Fomby et al., 1984, p. 210–211), makes the study of asymptotic theory for ˆPG() and PGTM LPG(1; 2) similar to what

we have for the classical regression quantile and trimmed mean for linear regression. Large sample representations of the GTM and the PGTM and their role as generalized and pseudo-generalized robust estimators will be introduced in the next section.

3. Asymptotic theory of GTMand PGTM

We state a set of assumptions (a1–a5) related to the design matrix X and the distribution of the error variable e in the Appendix that are assumed to be true throughout the paper. Denote the distribution function of (1 − )e by F . In the following, we give a Bahadur representation for the

generalized regression quantile which follows in a straightforward way from Theorem 3 of Ruppert and Carroll (1980).

Lemma 3.1. The generalized regression quantile has the representation, n1=2( ˆ G() − ()) = Q−1 f−1(F −1())n−1=2 n i=1 zi( − I(ei6 F −1())) + op(1);

(5)

where Q = limn→∞X−1X and F −1() = (1 − )−1F−1(). Furthermore, n1=2( ˆG() − ()) has

a normal asymptotic distribution with mean zero vector and covariance matrix (1 − )f−2(F −1())Q−1 :

In accordance with (1.3), the quantile estimator ˆG() has asymptotic covariance of the form

(XX )−1 with  = (1 − )f−2(F−1

()) which is then asymptotically a generalized estimator

of (), the population regression quantile for the linear regression with AR(1) error model. The representation of LG(1; 2) is also a direct result of Theorem 4 of Ruppert and Carroll (1980).

Theorem 3.2. The GTM has the following representation: n1=2(L G(1; 2) − ( + #Q −1$x)) = 1 2− 1Q −1 n−1=2 n i=1 zi(%(ei) − E(%(e))) + op(1); where # = 1 2−1 F1 (2) F1 (1) ef(e)de, $x= limn→∞n −1 n i=1 xi and %(e) =        F −1(1) if e ¡ F −1(1) e if F−1 (1) 6 e 6 F −1(2): F−1 (2) if e ¿ F −1(2)

The above theorem shows that the GTM is a generalization of the trimmed mean from the linear regression model with i.i.d. errors to that with AR(1) errors.

Corollary 3.3. The normalized GTM n1=2(L

G(1; 2) − ( + #(1 − )$x)) has an asymptotic normal

distribution with zero mean vector and asymptotic covariance matrix 2( 1; 2)Q−1 ; where 2( 1; 2) = (2− 1)−2 F 1 (2) F1 (1)(e − #) 2dF(e) +  1(F −1(1) − #)2+ (1 − 2)(F −1(2) − #)2 −(1F −1(1) + (1 − 2)F −1(2))2  .

The asymptotic covariance matrix of LG(1; 2) is also of the form (XX )−1 with  = 2(1; 2)

which is the asymptotic variance of the trimmed mean for the location model. If we center the columns of X so that $x has all but the Drst element equal to 0, then the asymptotic bias aJects the

intercept alone and not the slope.

In the case where F is symmetric at 0, the asymptotic distribution of the GTM can be simpliDed. Corollary 3.4. If F is symmetric at zero and we let  = 1= 1 − 2 then n1=2(LG(; 1 − ) − )

has an asymptotic normal distribution with zero mean vector and asymptotic covariance matrix 2(; 1 − )Q−1 ; where 2(; 1 − ) = (1 − 2)−2  F1 (1−) F1 () e 2dF(e) + 2(F−1 ())2  .

(6)

How eKcient is the GTM compared with the GLSE? Ruppert and Carroll (1980) computed the values of the term 2(; 1 − ) for e following several contaminated normal distributions. In

comparisons of it with 2, the variance of e, the GTM is strongly more eKcient than the GLSE

when the contaminated variance is large. Along with the results in Huber (1981) andWelsh (1987), Huber’s M-estimator and Welsh’s trimmed mean deDned on model (2.3) are expected to have the same asymptotic distribution as in Corollary 3.3. These then serve as other types of generalized robust estimators. In general, the parameter is unknown. An interesting question is then whether the PGTM has the same representation as that of the GTM. Before we state this result, we need to give a representation of the regression quantile ˆPG().

Lemma 3.5. The regression quantile ˆPG() has the representation,

n1=2( ˆ PG() − ()) = Q−1 f−1(F −1())  n−1=2 n i=1 zi( − I(ei6 F −1())) +f(F−1 ())$zn1=2( ˆ − )F −1()  + op(1);

where $z = limn→∞n−1 ni=1 zi.

The asymptotic representation of ˆPG() is not the same as that of ˆG(). In fact, it relies on

the asymptotic representation of ˆ . In the large sample expansion for the PGTM, we see that the representation for the part ˆZAnˆu involves n1=2( ˆ − ) and n1=2( ˆPG() − ()) with  = 1 and

2. Since the representation of ˆPG() also involves n1=2( ˆ − ), the terms with n1=2( ˆ − ) will

automatically cancel out so the PGTM has a representation free of ˆ in its formulation.

Theorem 3.6. The PGTM has the same representation as that expressed for the GTM in Theorem 3.2.

From Theorem 3.6, the PGTM indeed plays the role of a Pseudo generalized estimator for estimating the regression parameter .

Acknowledgements

We are grateful to the editor and one referee for comments which improved the presentation of this paper. This research work was partially supported by the National Science Council of the Taiwan Grant No. NSC 91-2118-M-009-005.

Appendix

The following conditions concerning the design matrices X and H0 and the distribution of

(7)

Ruppert and Carroll (1980) and Koenker and Portnoy (1990):

(a1) n−1 n

i=1 xij4= O(1) for all j,

(a2) n−1XX = Q + o(1), where Q is positive deDnite matrix.

(a3) n−1 n

i=1 xi= $x+ o(1), where $x is a Dnite vector with Drst element value 1.

(a4) The probability density function and its derivative are both bounded and bounded away from 0 in a neighborhood of F−1

() for  ∈ (0; 1).

(a5) n1=2( ˆ − ) = Op(1).

Proof of Lemma3.5. Let M(t1; t2) = n−1=2

n

i=1

zi{ − I(ei− n−1=2t1i−16 (zi− n−1=2t1xi1)(n−1=2t2+ F.−1()))}:

We want to show that sup (t1;t2)6k |M(t1; t2) − M(0; 0) − F.−1()f(F.−1())n−1=2 n i=1 zi(zit2− t1F.−1())| = op(1): (4.1)

By letting, for k ¿ 0, Sn(t1; t2) = M(t1; t2) − M(0; 0), we will prove (4.1) in two steps. In the Drst

step, we will show that sup

(t1;t2)6k

|Sn(t1; t2) − ESn(t1; t2)| = op(1) (4.2)

based on Lemma 3.2 in Bai and He (1999).

Now we prove (4.2) by checking the three conditions L1; L2 and L3 in the hypothesis of Lemma

3.2 in Bai and He (1999). First, we prove n−1 n i=1 ziziE|I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()) − I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F∗−1())| 6 M(t1− t1∗ + t2− t2∗); for some M ¿ 0: (4.3) DeDne A = n−1 n i=1 z iziE|I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()) − I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()))| and B = n−1 n i=1 ziziE|I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()) −I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()))|:

(8)

Represent A = A1+ A2 as follows: A = n−1 n i=1 ziziEI(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()); ei− n−1=2t1i−1¿ (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1())) + n−1 n i=1 ziziEI(ei− n−1=2t1i−1¿ (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1()); ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1())) = A1+ A2:

Let 2n= n1=2t2+ F∗−1() and Ui−1= i−1− xi−12n. Then,

A1= n−1 n i=1 ziziEI(ei6 zi2n− n−1=2t1Ui−1; ei¿ zi2n− n−1=2t1Ui−1) = n−1 n i=1 z iziE{f(zi2n)n−1=2t1− t1∗Ui−1} 6 Mn−1=2t1− t1∗:

Similarly, A26 Mn−1=2t1−t1∗ and B 6 Mn−1=2t2−t2∗. Hence (4.3) holds and so does condition

(L1) in the hypothesis of Lemma 3.2 inBai and He (1999). Condition (L2) is satisDed automatically since the indicator function is bounded.

Next, similar arguments to those used to prove (4.3) can be used to prove that n−1 n i=1 z iziE  sup t1−t∗ 1+t2−t∗26d |I(ei− n−1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2 + F.−1())) − I(ei− n1=2t1i−16 (zi− n−1=2t1xi−1)(n−1=2t2+ F.−1())) 

is bounded by Mn−1=2d, which implies that condition (L3) holds. Therefore, from Lemma 3.2 in

Bai and He (1999), we obtain sup

(t1;t1)6K

|Sn(t1; t2) − ESn(t1; t2)| = op(1): (4.4)

On the other hand, following the technique of Chen et al. (2001), we get that sup (t1;t2)6k |E(Sn(t1; t2)) − F.−1()f(F.−1())n−1=2 n i=1 zi(zit2− t1F.−1())| = op(1): (4.5)

Combining (4.2) and (4.5), statement (4.1) holds. Using the method ofJurePckovQa (1977, Lemmas (4.2) and (4.1)) again, n1=2( ˆ() − ()) = O

(9)

Proof of Theorem3.6. The PGTM can be formulated as n1=2(L

PG(1; 2) − ) = (n−1ˆZAnˆZ)−1n−1=2ˆZAnˆe:

Since n1=2( ˆ − ) = Op(1), we have n−1=2ˆZAnˆe = n−1=2ˆZAne + op(1). By letting M(t1; t2; ) =

n−1=2 n

i=1 zieiI(ei− n−1=2t1i−16 F.−1() + n−1=2(zi+ n−1=2t1xi−1)t2+ n−1=2t1F.−1()), we see that

n−1=2ˆZAne = M(T

1(2); T2; 2) − M(T1(1); T2; 1) (4.6)

with T

1() = n1=2( ˆ() − ()) and T2= n1=2( ˆ − ). However, using the same methods in the proof

of Lemma 3.5, we can see that

M(T1; T2; ) − M(0; 0; ; ) = F∗−1()f(F.−1())n−1=2 n

i=1

zi(ziT2− T1F.−1()) + op(1) (4.7)

for any sequences T1= Op(1) and T2= Op(1). Then, from Lemmas 3.1, 4.6 and 4.7, we have

n−1=2ˆZA ne = n−1=2 n i=1 zi[eiI(F.−1(1) 6 ei6 F.−1(2)) + F.−1(2)(2− I(ei 6 F−1 . (2))) − F.−1(1)(1− I(ei6 F.−1(1)))] + op(1): (4.8)

Also, a similar discussion of the proof for Lemma 3.5 provides the result

n−1ˆZAnˆZ = Q + op(1): (4.9)

Then (4.8) and (4.9) imply the theorem. References

Bai, Z.-D., He, X., 1999. Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann. Statist. 27, 1616–1637.

Chen, L.-A., Welsh, A.H., Chan, W., 2001. Linear winsorized means for the linear regression model. Statistica Sinica 11, 147–172.

Cochrane, D., Orcutt, G.H., 1949. Application of least squares regressions to relationships containing autocorrelated error terms. J. Amer. Statist. Assoc. 44, 32–61.

Fomby, T.B., Hill, R.C., Johnson, S.R., 1984. Advanced Econometric Methods. Springer, New York. Huber, P.J., 1981. Robust Statistics. Wiley, New York.

JurePckovQa, J., 1977. Asymptotic relations of M-estimates and R-estimates in linear regression model. Ann. Statist. 5, 464–472.

Koenker, R.W., Bassett, G.W., 1978. Regression Quantiles. Econometrica 46, 33–50.

Koenker, R., Portnoy, S., 1990. M estimation of multivariate regression. J. Amer. Statist. Assoc. 85, 1060–1068. Ruppert, D., Carroll, R.J., 1980. Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75,

828–838.

參考文獻

相關文件

Vessella, Quantitative estimates of unique continuation for parabolic equa- tions, determination of unknown time-varying boundaries and optimal stability estimates, Inverse

Understanding and inferring information, ideas, feelings and opinions in a range of texts with some degree of complexity, using and integrating a small range of reading

 Promote project learning, mathematical modeling, and problem-based learning to strengthen the ability to integrate and apply knowledge and skills, and make. calculated

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

◆ Understand the time evolutions of the matrix model to reveal the time evolution of string/gravity. ◆ Study the GGE and consider the application to string and

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

In this paper we establish, by using the obtained second-order calculations and the recent results of [23], complete characterizations of full and tilt stability for locally

Since the generalized Fischer-Burmeister function ψ p is quasi-linear, the quadratic penalty for equilibrium constraints will make the convexity of the global smoothing function