www.elsevier.com/locate/jspi

### Symmetric regression quantile and its application

### to robust estimation for the nonlinear

### regression model

### Lin-An Chen

a_{, Lanh Tat Tran}

*b;∗*

_{, Li-Ching Lin}

a
a_{Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan}

b_{Department of Mathematics, Indiana University, Rawles Hall, Bloomington, IN 47405-5701, USA}
Received 2 July 2001; accepted 4 September 2003

Abstract

Populational conditional quantiles in terms of percentage are useful as indices for identifying outliers. We propose a class of symmetric quantiles for estimating unknown nonlinear regression conditional quantiles. In large samples, symmetric quantiles are more e1cient than regression quantiles considered by Koenker and Bassett (Econometrica 46 (1978) 33) for small or large values of , when the underlying distribution is symmetric, in the sense that they have smaller asymptotic variances. Symmetric quantiles play a useful role in identifying outliers. In estimating nonlinear regression parameters by symmetric trimmed means constructed by symmetric quan-tiles, we show that their asymptotic variances can be very close to (or can even attain) the Cramer–Rao lower bound under symmetric heavy-tailed error distributions, whereas the usual robust and nonrobust estimators cannot.

c

* 2003 Elsevier B.V. All rights reserved.*

Keywords: Nonlinear regression; Regression quantile; Trimmed mean

1. Introduction

Consider the nonlinear regression model with observations yi= g(xi; ) + i; 1 6 i 6 n;

*where g(x; b) is a given function de<ned on an Euclidean space subset × B, and*
where is in the interior of B and i are independent realizations of a random variable
with a distribution function F*(y), y ∈ R. In the nonlinear regression model with*

*∗*_{Corresponding author. Tel.: +1-812-855-7489; fax: +1-812-855-0046.}
E-mail address:tran@indiana.edu(L.T. Tran).

0378-3758/$ - see front matter c* 2003 Elsevier B.V. All rights reserved.*
doi:10.1016/j.jspi.2003.09.014

Table 1

Asymptotic variances of estimates

LSE ‘1 TLSE C–R

3 1:80 1:803 1:295 1:256

10 10:9 1:896 1:431 1:229

25 63:4 1:922 1:466 1:171

*∞* *∞* 1:938 1:489 1:114

normal errors, among all asymptotically normally distributed estimation sequences, the LSE is known to have the best asymptotic covariance matrix (see Bunke and Bunke, 1989). However, the LSE is highly sensitive to small departures from normality and to the presence of outliers. The development of robust alternatives for analyzing the nonlinear regression model has been investigated in several papers. Oberhofer (1982),

Richardson and Bhattacharyya (1987)andWang (1995)studied the ‘1-norm estimators. The trimmed least squares estimator (TLSE) based on regression quantile was proposed by Koenker and Bassett (1978). Many aspects of the TLSE have been explored by

ProchFazka (1988), Koenker and Park (1992), and JureGckovFa and ProchFazka (1994). The consistency of M-estimator for nonlinear regression model has been studied by

Liese and Vajda (1994).

Are the available nonparametric estimators really e1cient when the error variable has heavy-tailed distributions? The LSE, the ‘1-norm estimator and TLSE all have asymptotic normal distributions with asymptotic covariance matrices equal to

2_{Q}*−1* _{(1.1)}

for some function 2_{, where}
Q = lim
*n→∞*n
*−1*n
i=1
@g(xi; )
@
@g(xi; )
@ :

Under some regularity conditions imposed on the regression function g and the p.d.f.
f, the Cramer–Rao (C–R) lower bound for unbiased estimators of also has the
form (1.1) with *−2*_{= E[(f}_{()=f())}2_{] (see}_{Bunke and Bunke, 1989}_{;} _{Cramer, 1989}_{).}
Consider the case where has the mixed normal distribution 0:9N(0; 1) + 0:1N(0; 2_{)}
and compare the asymptotic variances of the estimators mentioned above with the C–R
bound. Table 1 provides the values 2 _{of the estimators. The TLSE has the optimal}
trimming in the sense that it has the smallest asymptotic covariance matrix.

It is obvious that none of these usual robust and nonrobust estimators have
asymp-totic variances close to the C–R lower bound. The TLSE under optimal trimming has
asymptotic variances relatively closer to the C–R lower bound than the other two
esti-mators. However, the discrepancies are still signi<cant when the contaminated variance
2 _{is large.}

Basically, the e1ciency of an estimator depends on the ability of the estimator to deal with good observations and bad observations (outliers). It is well known that the ‘1-norm does not utilize the good observations su1ciently which results in a decrease

in its e1ciency as is shown in the table. On the other hand, although the TLSE, under optimal trimming, improves in e1ciency, the discrepancy between its asymptotic variance and the C–R lower bound still shows inadequacy in utilizing observations (see Table2 in Section 3). This implies that regression quantiles used in detecting outliers cannot precisely classify the observations into groups of good observations and outliers. Thus, in the nonlinear regression problem it is still possible to improve the estimator’s e1ciency by choosing an adequate construction process for data classi<cation.

The questions of interest are: (1) Is there another method to estimate the popula-tion nonlinear regression quantile that can detect outliers more e1ciently? (2) Can we construct nonparametric weighted means based on regression quantiles to estimate regression parameters so that their asymptotic variances are close enough to the C–R lower bound when the error variable has a heavy-tailed distribution? The purpose of this paper is to address these questions. To do this, we <rst introduce the nonlinear symmetric quantile by extending the idea ofKim (1992)andChen and Chiang (1996). In large sample studies, the representation of the nonlinear symmetric quantile shows that it is consistent as an estimator of the population nonlinear symmetric quantile which is the population regression quantile of Koenker and Bassett (1978) whenever the underlying distribution is symmetric. Under a heavy-tailed distribution, the asymp-totic variances of the symmetric regression quantiles of small and large percentage are smaller than those of the corresponding regression quantiles of Koenker and Bassett (seeJureGckovFa and ProchFazka (1994)for the nonlinear regression case). This is useful in identifying outliers since they always fall below the small or above the large th nonlinear conditional quantiles. We demonstrate the e1ciency of the symmetric quantile by considering two symmetric trimmed means. The asymptotic representation shows that when the underlying distribution is asymmetric the symmetric trimmed mean has an asymptotic bias with a form analogous to that of the trimmed mean in the linear regression model (see (5.2) of Ruppert and Carroll, 1980). However, the asymptotic bias disappears when the distribution is symmetric. The asymptotic variances of the symmetric trimmed means are analyzed using heavy-tailed distributions. We demon-strate that the asymptotic variances can be signi<cantly closer to the C–R lower bounds in comparison with those of robust and nonrobust estimators. The trimmed mean based on symmetric regression quantiles is shown to attain the C–R lower bound when the random errors have a contaminated normal distribution.

The nonlinear symmetric quantile is introduced in Section 2 and its large sample properties are investigated in Section 3. Examples of weighted mean constructed by nonlinear symmetric quantile are studied in Section 4. The proofs of the theorems are presented in Appendix. Many terms in the paper depend on the sample size n. However, we have suppressed this index n in their notations for simplicity.

2. Symmetric type quantile

Recall that the nonlinear regression model for the observation (y; x) is y=g(x; )+: For 0 ¡ ¡ 1, the th conditional regression quantile of y given x is

where F*−1*_{() is the ordinary quantile function of F. If the regression function g has a}

constant additive term, that is g(x; ) = 0+ g0(x; 1) for some constant 0, the vector ()=0()

1

with 0()=0+F*−1*_{() is called the population regression quantile. In}

this case, the vector () has been studied by JureGckovFa and ProchFazka (1994) using
the technique ofKoenker and Bassett (1978). When the regression function g does not
have a constant additive term, the population regression quantile is ()=F*−1*_{()}

. In this case, the vector () has been studied by Chen (1988) also using the technique of Koenker and Bassett (1978). The estimator of () by the technique of Koenker and Bassett (1978) is called the regression quantile.

*For 0 ¡ ¡ 1 and a ¿ 0, de<ne ˜F(a) = P(|| 6 a) where is the error variable.*
De<ne the th symmetric quantile of F as ˜F*−1 _{() = inf {a : ˜F(a) ¿ }, and the th}*

nonlinear symmetric conditional quantile as

*{g(x; ) − ˜F−1*_{(); g(x; ) + ˜F}*−1 _{()}:}*

_{(2.2)}

If F is a continuous function, the nonlinear symmetric conditional quantile is easily seen to satisfy

*P(g(x; ) − ˜F−1*_{() 6 y 6 g(x; ) + ˜F}*−1*_{()) = :}

Furthermore, if F is continuous and symmetric at 0, then, for 0 ¡ ¡ 0:5,
˜F*−1 _{(1 − 2) = F}−1_{(1 − ):}*

*In this case, the th and (1 − )th nonlinear conditional regression quantiles in (*2.1)
*and the (1 − 2)th nonlinear symmetric conditional regression quantile in (*2.2) all
coincide. The following theorem follows from Chen and Chiang (1996).

Theorem 2.1. If 0 ¡ ¡ 1, then
˜F*−1*_{() = arg}

a¿0min EF*(|y − g(x; )| − a)( − I{|y − g(x; )| 6 a|}):* (2.3)

Let ˆI be an initial estimator of . Following (2.3), consider the estimator of ˜F*−1*_{()}

de<ned by ˆa() = arg a¿0min n i=1

*(|y*i*− g(x*i; ˆI*)| − a) ( − I{|y*i*− g(x*i; ˆI*)| 6 a}):* (2.4)

The symmetric population quantile is
0+ ˜F*−1*_{()}
1
or ˜F
*−1*_{()}

depending on whether the model has a constant additive term or not. It is estimated by the symmetric regression quantile which, respectively, equals

ˆI + ˆa() 0p−1 and ˆa() ˆI

in the <rst and second case.

3. Large sample properties of the symmetric quantile

The asymptotic distribution of the estimator of the population regression quantile depends on both ˆI and ˆa(). Without loss of generality, we will consider the nonlinear regression model with an additive constant term. The assumptions on error variable, design vectors xi, and the nonlinear function g are presented in the Appendix. They are assumed to hold in the rest of this paper. De<ne ˜di= [@g(xi; )]=@. The asymptotic distribution of ˆa() will be investigated with ˆI as an initial estimator.

De<ne

q0() = n*−1=2*n

i=1

*[ − I{|i| 6 ˜F−1 _{()}] − (f( ˜F}−1*

_{())}

*− f(− ˜F−1*_{())) }_{n}1=2_{( ˆI}* _{− );}*
where = lim

*n→∞*n

*−1*ni=1˜di.

Theorem 3.1. (a) If 0 ¡ ¡ 1, then

n1=2_{( ˆa() − ˜F}−1_{()) = (f( ˜F}*−1 _{()) + f(− ˜F}−1*

_{()))}

*−1*

_{q}

0() + op(1): (b) Suppose that 0 ¡ ¡ 0:5 and F is a symmetric distribution, then

n1=2_{( ˆa(1 − 2) − F}−1_{(1 − ))}

= (2f(F*−1 _{(1 − )))}−1*

_{n}

*−1=2*n

i=1

*[1 − 2 − I{|i| 6 F−1 _{(1 − )}] + op(1):}*

The theorem implies the consistency of ˆa() for ˜F*−1*_{() which indicates that the}

symmetric regression quantile ˆI +_{0}ˆa()* _{p−1}* is consistent for the population symmetric
regression quantile +˜F

*−1*()

0*p−1*

.

When should symmetric regression quantiles be employed in statistical inference in terms of their e1ciencies? We will attempt to answer this by studying (a) the symmetric trimmed means (in Section4) based on symmetric regression quantiles and by (b) comparing the asymptotic variances of symmetric type quantiles and regression quantiles.

For simplicity, we study the following linear regression model: yi= 0+ x

i1+ i;

where F is symmetric andn_{i=1}xi= 0. Under this design, both quantiles are used to
estimate the population quantile0()

1

. Recall that 0()=0+F*−1*_{(). The regression}

quantile, denoted here byˆ0() ˆ1()

has the following representation:
n1=2 ˆ0()
ˆ1()
*−*
0()
1
=f*−1*_{(F}*−1*_{())}
1 0
0 Q*−1*
11
n*−1=2*n
i=1
1
xi
*( − I{i*¡ F*−1 _{()}) + op(1);}*

where Q11= lim*n→∞* ni=1xixi. The symmetric regression quantile is
_{ˆ}

s0() ˆs1()

with ˆs0() = ˆ0+ ˆa() and ˆs1() = ˆ1. Let the initial estimator ˆI =

_{ˆ}
0
ˆ1

be the ‘1-norm estimator. Using (b) of Theorem 3.1 and the representation of the ‘1-norm estimator (seeKoenker and Bassett, 1978), we have, for 0:5 ¡ ¡ 1,

n1=2_{( ˆ}
s0*() − *0()) = f*−1*(0)n*−1=2*
n
i=1
*(0:5 − I{*i*¡ 0}) + 0:5f−1*(F*−1*())n*−1=2*
*×*
n
i=1
*(2 − 1 − I{−F−1*_{() 6 }
i6 F*−1()}) + o*p(1);
and
n1=2_{( ˆ}
s1*() − *1) = f*−1*(0)n*−1=2*Q*−1*11
n
i=1
xi*(0:5 − I{*i*¡ 0}) + o*p(1):

Symmetric quantiles and regression quantiles employed to estimate 0() and 1 all
have normal asymptotic distributions. Those used to estimate 1 have asymptotic
co-variance matrices being Q*−1*

11 multiplied by diQerent constants. The e1ciencies of the estimators can be compared by their constants. If the nonlinear regression model has a general form, the asymptotic variance and covariance matrices of the symmetric quan-tile and regression quanquan-tile are quite complicated and a direct comparison of their asymptotic variances is di1cult. However, this di1culty does not occur for trimming estimators as shown in the next section. Consider the case where the error variable has the contaminated normal distribution

*(1 − ")N(0; 1) + "N(0; *2_{):}

The e1ciency of the symmetric quantile is de<ned as Asymptotic variance of regression quantile Asymptotic variance of symmetric regression quantile:

Table 2

E1ciencies of symmetric quantiles for estimating the quantile parameter 0()

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.98 " = 0:1 = 1 0.87 0.84 0.84 0.84 0.87 0.92 1.01 1.21 1.47 3 0.91 0.89 0.90 0.92 0.98 1.09 1.30 1.84 1.90 5 0.91 0.90 0.91 0.94 1.01 1.15 1.44 2.39 2.02 10 0.90 0.89 0.90 0.93 1.01 1.16 1.49 2.70 2.03 " = 0:2 = 3 0.88 0.87 0.88 0.92 1.00 1.14 1.40 1.78 1.98 5 0.89 0.88 0.91 0.97 1.08 1.28 1.68 2.04 2.00 10 0.89 0.89 0.93 1.01 1.16 1.45 1.98 2.10 2.03 Table 3

E1ciencies of symmetric quantiles for estimating the quantile parameter 1

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 0.98 " = 0:1 = 1 1.02 1.05 1.10 1.18 1.30 1.50 1.86 2.87 5.38 3 1.07 1.12 1.19 1.31 1.51 1.88 2.78 7.14 43.0 5 1.07 1.13 1.21 1.35 1.59 2.07 3.40 15.5 331 10 1.06 1.11 1.20 1.35 1.62 2.17 3.92 42.9 1318 " = 0:2 = 3 1.04 1.09 1.19 1.34 1.61 2.16 3.73 15.2 66.6 5 1.04 1.11 1.23 1.43 1.82 2.75 6.79 114 178 10 1.05 1.13 1.27 1.53 2.06 3.60 17.4 502 681

The parameters to estimate are 0() and 1, respectively. Tables 2 and 3 list the
e1ciencies for the cases "=0:1 and 0:2, where =1; 3; 5; 10 and =0:60; 0:65; 0:70; 0:75;
*0:80; 0:85; 0:90; 0:95 and 0.98. Note that the results for and 1 − are identical.*

Based on Tables 2 and 3, regarding the estimation of the population regression quantile 0()

1

, we notice the following:

(a) In estimating 0(), symmetric quantiles are more e1cient than regression quan-tiles when is small or large. Regression quanquan-tiles are more e1cient than symmetric quantiles when is close to 0.5 on either side.

(b) In estimating the slope parameters 1, symmetric quantiles are more e1cient than regression quantiles uniformly in .

(c) In estimating the population quantile vector 0()

1

, the symmetric quantile is more e1cient than the regression quantile when is either small or large. It seems that symmetric quantiles are more suitable in classifying the data set into groups of good data and outliers. With suitable choice of the trimming percentage, high e1ciency of estimation is attainable by proper weighting of those observations lying outside the estimated symmetric conditional quantile.

Similar to the ordinary quantile function or the regression quantile, the symmetric quantile has many applications in the study of inRuence functions. In the next section, we consider its application in parameter estimation for the case of two weighted means. 4. Weighted means based on symmetric quantile

The weighted means are de<ned based on a linearized model. This linear approxima-tion is a single step Gauss–Newton method (seeKennedy and Gentle, 1980) based on a root-n consistent estimator. Estimation based on linear approximation can be found in

Fox et al. (1980)and Cook and Weisberg (1982). The one-step Huber’s M-estimator by Bickel (1975)is an example of this technique for the linear regression model.

By the Taylor expansion theorem, there exists a function i : Rp*→ R, 0 ¡ *i¡ 1,
such that
0+ g(xi; ) = ˆ0+ g(xi; ˆ1*) + ( − ˆ*I)di
+ 0:5(1*− ˆ*1)@
2_{g(x}_{i}_{; b)}
@b@b *|*b=i(1*− ˆ*1);
where
ˆ
I = ( ˆ0; ˆ1); i= 1+ i( ˆ1) ˆ1; and di=
1
@g(xi; ˆ1)
@ ˆ1
:
The approximate linearized regression model is

yi= ˆ0+ g(xi; ˆ1) + di*∗*+ *∗*i;
where *∗* _{represents the term}

* − ˆ*I and *∗*i = i+ 0:5(1*− ˆ*1) @
2_{g(x}_{i}_{; b)}
@b@b
b=i
(1*− ˆ*1):

The trimmed mean of is de<ned based on this linearized regression model. Let
0 ¡ ¡ 1 and y*∗*

i = yi*− ( ˆ*0+ g(xi; ˆ1)). De<ne the trimming matrix
A = diag(ai: ai*= I{− ˆa*¡ y*∗*i ¡ ˆa*}; i = 1; : : : ; n):*

In addition, denote Dn= d 1 ... d n :

The symmetric trimmed mean for estimating is de<ned as
ˆt() = ˆI + Ln() with Ln() = (DnADn)*−1*DnAy*∗*;
where the vector y*∗*_{= (y}*∗*

1; y*∗*2; : : : ; y*∗*n). The symmetric trimmed mean, de<ned through
a linearization of the nonlinear regression function and the residuals based on an initial
estimator has some advantages:

(a) This estimator is de<ned explicitly while most robust and nonrobust estimators are de<ned implicitly.

(b) This estimator appears to be simpler to compute in comparison with the trimmed means of JureGckovFa and ProchFazka (1994) for nonlinear regression and Koenker and Bassett (1978)for multiple linear regression.

Denote
g() = ˜F*−1*_{()(f( ˜F}*−1 _{()) + f(− ˜F}−1*

_{()))}

*−1*

_{(f( ˜F}

*−1*

_{()) − f(− ˜F}−1_{()));}q1() = Q(f( ˜F

*−1()) + f(− ˜F−1*())) ˜F

*−1() − (f( ˜F−1*())

*− f(− ˜F−1*

_{()))g() }

_{;}q2() = g() n

*−1=2*n i=1

*( − I{|*i

*| 6 ˜F−1()});*q3() = n

*−1=2*n i=1 ˜di(i

*I{|*i

*| 6 ˜F−1()} − ):*

Theorem 4.1. With q1(), q2() and q3() as denoted above, we have

n1=2_{( ˆt() − ( + &)) = }*−1*_{Q}*−1 _{{q1()n}*1=2

_{( ˆI}

*where & =*

_{− ) + q2() + q3()} + op(1)}*−1*

_{Q}

*−1*

_{ with}

=

_{˜F}*−1*_{()}

*− ˜F−1*_{()} dF():

This result reveals that the symmetric trimmed mean ˆt() is not generally
con-sistent for regression parameter vector where consistency holds only if, under our
assumptions, the term disappears. The following corollary displays the desired results.
*Corollary 4.2. Suppose that F is symmetric and = 1 − 2 with 0 ¡ ¡ 0:5. Then*

n1=2_{( ˆt(1 − 2) − ) = (1 − 2)}*−1*
2f(F*−1 _{(1 − ))F}−1_{(1 − )n}*1=2

_{( ˆI}

*+ Q*

_{− )}*−1*

_{n}

*−1=2*n i=1 ˜diiI{−F

*−1*

_{(1 − ) 6 i}_{6 F}

*−1*+ op(1): (4.1)

_{(1 − )}}Table 4

Asymptotic variances of estimates under G = N(0; 2_{)}

" LS ‘1 TLSE ˆt C–R
0.1 3 1.8 1.803 1.295(0.10) 1.305(0.02) 1.256
5 3.4 1.855 1.373(0.13) 1.287(0.03) 1.253
10 10.9 1.896 1.431(0.15) 1.229(0.04) 1.209
25 63.4 1.922 1.466(0.16) 1.171(0.05) 1.161
*∞* *∞* 1.938 1.489(0.16) 1.114(0.05) 1.113
0.2 3 2.60 2.091 1.600(0.16) 1.632(0.04) 1.532
5 5.8 2.226 1.770(0.20) 1.605(0.06) 1.540
10 20.8 2.336 1.905(0.23) 1.492(0.08) 1.455
25 125 2.406 1.988(0.25) 1.377(0.09) 1.356
*∞* *∞* 2.453 2.044(0.24) 1.256(0.10) 1.255

If we further assume that ˆI is the ‘1-norm estimator of , then the representation of ‘1-norm estimator (see Ruppert and Carroll, 1980) implies that

n1=2_{( ˆ}
t*(1 − 2) − ) = (1 − 2)−1*Q*−1*n*−1=2*
n
i=1
˜di[(i+ sgn(i)f*−1*(0)
*×f(F−1 _{(1 − )))I{|}*
i

*| 6 F−1(1 − )}*+ f

*−1*

_{(0)f(F}

*−1*

_{(1 − ))F}−1_{(1 − )}*×sgn(*i

*)I{|*i

*| ¿ F−1(1 − )}] + o*p(1);

which has an asymptotic normal distribution with zero means and asymptotic covariance
matrix 2
s()Q*−1*, where
2
s*() = (1 − 2)−2*
(f(F*−1 _{(1 − ))F}−1_{(1 − )f}−1*

_{(0))}2

_{+ 2}F

*−1*0 2

_{(1−)}_{dF}+ 4f(F

*−1*

_{(1 − ))F}−1_{(1 − ))f}−1_{(0)}F

*−1*0 dF :

_{(1−)}Note that the symmetric trimmed mean also has an asymptotic normal distribution
with zero means and covariance matrix of form (1.1) with 2_{=}2

s(). Thus, in
compar-ing the e1ciencies of these estimators we only need to compare the values of 2 _{and}
the C–R lower bound. Consider the error variable with standard normal distribution
contaminated by a distribution G with location parameter 0 and scale parameter 2_{,}
that is, the error variable has the distribution

*(1 − ")N(0; 1) + "G(0; *2_{):}

We list in Table4 the asymptotic variances for the ‘1-norm estimator, the TLSE, and the symmetric trimmed mean ˆt. The values in parentheses are the trimming proportions corresponding to the trimmed means which achieve the smallest asymptotic variances.

Table 5

Asymptotic variances of estimates for G = Cauchy(0; 2_{)}

" ‘1 TLSE ˆt C–R 0.1 3 1.829 1.340(0.11) 1.246(0.02) 1.216 5 1.872 1.398(0.13) 1.244(0.03) 1.219 10 1.905 1.443(0.14) 1.207(0.04) 1.190 25 1.925 1.471(0.15) 1.161(0.04) 1.153 0.2 3 2.157 1.694(0.18) 1.518(0.05) 1.459 5 2.269 1.826(0.21) 1.523(0.06) 1.472 10 2.359 1.933(0.23) 1.449(0.08) 1.415 25 2.415 2.000(0.24) 1.356(0.09) 1.338

In Table 5, we list the asymptotic variances of the estimators considered above
except the LSE for G = Cauchy(0; 2_{).}

From Tables 4 and 5 we draw several conclusions:

(a) For given ", the TLSE asymptotic variances increase with the variance of the
contaminated distribution, whereas the symmetric trimmed mean behaves in nearly
the opposite way. This interesting property implies that the power of the symmetric
quantiles to detect contaminated data gradually increases with 2_{.}

(b) The symmetric trimmed mean is not only more e1cient than the ‘1-norm and TLSE, but also has an asymptotic variance as small as the C–R lower bound when the contaminated variance goes to in<nity.

Similar to the regression quantile ofKoenker and Bassett (1978), symmetric quantiles have many applications. We consider here a re<ned weighted mean based on symmetric quantiles. We will show that the e1ciencies of symmetric trimmed means still can be improved.

De"nition 4.3. Let 0 ¡ ¡ 1; 0 6 b 6 1 and ˆa() be the solution of (2.4). The sym-metric Winsorized mean indexed by (; b) is de<ned as

ˆ(; b) = ˆI+ ‘n(; b);

with ‘n(; b) = (DnADn)*−1*(DnAy*∗* + b ˆa()DnA*∗*1sgn) and where 1sgn is n-vector of
sgn(y*∗*

i) and A*∗*= In*− A.*

The symmetric (; b)th Winsorized mean and the symmetric th trimmed mean has the following relation:

ˆ(; b) = ˆt+ b ˆa()(DnADn)*−1*DnA*∗*1sgn:
Denote by

g*∗*

Table 6

Asymptotic variances of estimators based on symmetric quantile

& ˆt ˆw C–R & ˆt ˆw C–R
3 0.1 1.305 1.274 1.256 0.2 1.632 1.566 1.532
5 1.287 1.277 1.253 1.605 1.586 1.540
10 1.229 1.227 1.209 1.492 1.490 1.455
and
g*∗*
2() = n*−1=2*
n
i=1
˜di[i*I{|*i*| 6 F−1(1 − )}*
+ bF*−1 _{(1 − ) sgn(}*
i

*)I{|*i

*| ¿ F−1(1 − )}]:*

*Theorem 4.4. Let = 1 − 2 with 0 ¡ ¡ 0:5. If F is symmetric around 0, then*
n1=2_{( ˆ(1 − 2; b) − ) = (1 − 2)}−1_{Q}*−1 _{{g}∗*

1()n1=2( ˆI *− ) + g∗*2*()} + o*p(1):
Let ˆI be the ‘1-norm estimator. From JureGckovFa and ProchFazka (1994), it is seen
that n1=2* _{( ˆ(1 − 2; b) − ) has the normal asymptotic distribution with zero means and}*
covariance matrix 2
wQ

*−1*, where 2 w

*= (1 − 2)−2*2(bF

*−1*

_{(1 − ) + (1 − b)f}−1_{(0)F}

*−1*2 + 2

_{(1 − )f(F}−1_{(1 − )))}_{F}

*−1*0 2

_{(1−)}

_{dF + 4(1 − b)f}−1_{(0)F}

*−1*

_{(1 − )}*× f(F−1*F

_{(1 − ))}*−1*0

_{(1−)}*dF + (1 − 2) (1 − b)*2

_{(f}

*−1*

_{(0)}

*× F−1*2

_{(1 − )f(F}−1_{(1 − )))}_{:}

We now give the asymptotic variances of two weighted means associated with the C–R lower bound in Table 6.

An inspection of the estimators’ asymptotic variances reveals that the e1ciencies of the symmetric trimmed means have improved. It is shown that the high e1ciencies of the symmetric trimmed mean and the symmetric Winsorized mean depend only on optimal settings of the turning constants and b. The adaptive estimator selected with the smallest bootstrap estimate of the <nite sample variance can also achieve high e1ciency (see LFeger and Romano (1990) for information on the adaptive trimmed mean for location estimation).

The following theorem shows that the asymptotic variance of the symmetric trimmed mean can attain the C–R lower bound (as indicated in Table4) when has a contam-inated normal distribution.

Theorem 4.5. Suppose that the error variable has the contaminated normal distri-bution

*(1 − ")N(0; *2_{) + "N(0; &}2_{)} _{(4.2)}

for & ¿ 0 and some known ", 0 ¡ " ¡ 1. In addition, assume that ˆI has a bounded
in9uence function. Then the asymptotic covariance matrix of ˆt*(1 − ") attains the*
C–R lower bound

n*−1 _{(1 − ")}−1*

_{}2

_{Q}

*−1*

_{;}

_{(4.3)}

*as & → ∞.*

Theorem 4.5 has a practical meaning only in the rare cases where the level of contamination is known.

Acknowledgements

The authors would like to express their appreciation to the Associate Editor and two anonymous referees for their valuable comments.

Appendix

We now list the assumptions employed in the paper.
(a.1) n*−1*n

i=1˜di˜di= Q + o(1) and n*−1*
_{n}

i=1 ˜di= + o(1) where Q is positive de<nite and is a <nite vector.

(a.2) n*−1*n

i=1(@g(xi; )=@j)4= O(1), n*−1*i=1n (@2g(xi; )=@jk)2= O(1).
(a.3) n*−1*_{max}

*6b*ni=1*|@g(x*i; )=@j*|*2= O(1) for some b ¿ 0.
n*−1=4*_{max}
*6b|@g(x*i; )=@j*| = O(1);*
n*−1=2*_{max}
*6b|@*2g(xi; )=@j@k*| = O(1);*
n*−1=2*_{max}
*6b|@*3g(xi; )=@j@k@h*| = O(1):*

(a.4) The probability density function f of is bounded away from 0 in a neighborhood
of F*−1*_{() and ˜F}*−1*_{(), for some 0 ¡ ¡ 1. In addition, has a <nite fourth population}

moment.
(a.5) n1=2_{( ˆ}

I*− ) = O*p(1).

To prove Theorem 3.1, we need several lemmas. Let, by replacing di and
[@2_{g(x}
i; b)]=@b@b by di(b) and Gi(b), respectively,
hi(c; t1; t3) =
c
di( + n*−1=2*t3)
t1;
gi(t2; t3) = t2Gi( + n*−1=2*t3)t2;

and
S(t1; t2; t3) = n*−1=2*
n
i=1
*( − I{− ˜F−1*_{() + n}*−1=2*_{h}
i*(−1; t*1; t3)
*− 0:5n−1*_{gi(t2; t3) 6 i}_{6 ˜F}*−1*_{() + n}*−1=2*_{hi(1; t1; t3)}
*− 0:5n−1*_{gi(t2; t3)}):}

Lemma A.1. For any b ¿ 0,
max
*t*j*6b;j=1;2;3*
S(t1; t2; t3) − S(0; 0; 0) + n*−1*
n
i=1
[f( ˜F*−1*_{())hi(1; t1; t3)}
*− f(− ˜F−1*_{())hi(−1; t1; t3) + (f( ˜F}*−1*_{())}
*− f(− ˜F−1*_{()))0:5n}*−1=2*_{gi(t2; t3)]}
= op(1):
Proof. Let 1*= P( ¡ − ˜F−1*_{()) and}

S1(t1; t2; t3) = n*−1=2*
n
i=1
( + 1*− I{*i6 ˜F*−1*()
+ n*−1=2*_{hi(1; t1; t3) − 0:5n}*−1*_{gi(t2; t3)})}
and
S2(t1; t2; t3) = n*−1=2*
n
i=1
(1*− I{*i*6 − ˜F−1*() + n*−1=2*hi*(−1; t*1; t3)
*−0:5n−1*_{gi(t2; t3)}):}
So, S(t1; t2; t3) = S1(t1; t2; t3) − S2(t1; t2; t3).
Let F0 satisfy 0= P( ¡ F0) and

Sa(t1; t2; t3) = n*−1=2*
n

i=1

(0*− I{*i6 F0+ n*−1=2*hi(c; t1; t3*) − 0:5n−1*gi(t2; t3*)}):*
From JureGckovFa (1984) and Chen (1988, pp. 72–75),

max
*t*j*6b;j=1;2;3*
Sa(t1; t2; t3*) − S*a(0; 0; 0) + n*−1*f(F0)
n
i=1
[hi(c; t1; t3)
*− 0:5n−1=2*_{g}
i(t2; t3)]
= op(1): (A.1)

Substituting (0; F0) by (+1; ˜F*−1*_{()) and then by (1; − ˜F}*−1*_{()) in (}_{A.1}_{), we obtain}

the representations of S1(t1; t2; t3) and S2(t1; t2; t3), respectively. Combining these two representations, we then have the lemma.

Lemma A.2. n*−1=2*n

i=1*( − I{|y*i*∗| ¡ ˆa()}) = op(1).*
Proof. We followRuppert and Carroll (1980). Let

G(c) =n i=1

*(|y∗*

i*| − ( ˆa() + c))( − I{|y∗*i*| 6 ˆa()}):*
The right derivative of G at c is

G+* _{(c) = (1 − )}*n
i=1

*I{|y∗*i

*| 6 ˆa() + c} −*n i=1

*I{|y∗*i

*| ¿ ˆa() + c}:*We need to show that n

*−1=2*

_{G}+

_{(0) = o}

p(1): Clearly, G+(c) is nondecreasing. So, for small " ¿ 0;

G+* _{(−") 6 G}*+

_{(0) 6 G}+

_{("):}

Since G achieves its minimum at 0; we have
G+_{(0) 6 lim}
*"→0*+(G
+* _{(") − G}*+

*n i=1*

_{(−")) =}*I{|y∗*i

*| = ˆa()}:*(A.2)

The lemma follows since the term on the right-hand side of (A.2) is bounded.
Lemma A.3. If 0 ¡ ¡ 1, then n1=2_{( ˆa() − ˜F}−1_{()) = O}

p(1). Proof. The following inequality holds:

*P(|n*1=2* _{( ˆa() − ˜F}−1_{())| ¿ k)}*
6 P
min

*|t*0

*|¿k*n 1=2 n i=1

*( − I{|i− n−1=2*_{di( ˆI}_{)}_{T1|}

6 n*−1=2*_{t0}_{+ 0:5n}*−1*_{˜}

IGi( ˆI) ˜I*})*
¡ 0
+ P
n*−1=2*
n
i=1
*( − I{|*i*− n−1=2*di( ˆI)T1*|*
*6 ( ˆa() − ˜F−1*_{()) + n}*−1*_{˜}
IGi( ˆI) ˜I*})*
¿ 0
; (A.3)

where T1 is an arbitrary sequence of random vectors with T1=Op(1) and ˜I=n1=2_{( ˆI}* _{−}*
). Using the method of JureGckovFa (1977, Lemma 5.2) and Lemma A.1, one can
show that for " ¿ 0 there exist numbers 0; k and N0 such that for n ¿ N0, the <rst
term on the right-hand side of (A.3) is less than or equal to ". The proof follows by
LemmaA.1.

Proof of Theorem 2.2. From Lemma A.3,
n1=2
ˆa()
ˆI
*−* ˜F
*−1*_{()}
= Op(1):
Then Lemmas A.1 andA.3 imply that

*−n−1=2*n
i=1
*( − I{− ˜F−1*_{() 6 }
i6 ˜F*−1()})*
=n*−1*n
i=1
f( ˜F*−1*_{())}
1
di( ˆI)
*− f(− ˜F−1*_{())}
*−1*
di( ˆI)
*× n*1=2
*ˆa() − ˜F−1*_{()}
ˆI *− *
+ op(1);
from which the theorem follows.

Proof of Theorem4.1. From the setting of Ln(); we have
n1=2_{( ˆ}

t*− ) = (n−1*DnADn)*−1*n*−1=2*DnA:

Also, the following representation can be found in Ruppert and Carroll (1980) or

JureGckovFa (1984)

n*−1*_{D}

nADn= Q + op(1): (A.4)

It is not hard to show that n*−1=2*_{D}

nA = n*−1=2*˜DnA + op(1) with
˜Dn=
˜d
1
...
˜d
n
:
Let
Uj(t1; t2; t3) = n*−1=2*
n
i=1
˜diji*I{*i¡ a + n*−1=2*hi(c; t1; t2*) − 0:5n−1*gi(t2; t3*)};*
where ˜dij represents the jth element of ˜di and (a; c) is either ( ˜F*−1*_{(); 1) or}

*(− ˜F−1 _{(); −1). Along the line of}*

_{Chen (1988)}

_{and}

_{JureGckovFa (1984)}

_{, we see that}

Uj(T1; T2; T3*) − U*j(0; 0; 0) = n*−1*af(a)
n
i=1
˜di[hi(c; T1; T2*) − 0:5n−1*gi(T2; T3)]
+ op(1): (A.5)

for any sequence T = (T

1; T2; T3) with T = Op(1). Using (A.5), the theorem follows
by imposing T1= n1=2
*ˆa()− ˜F−1*_{()}
ˆI*−*
, T2= n1=2( ˆ1*− *1) and T3= n1=2( ˆI *− ) in the*
following representation:
n*−1=2*_{˜D}
nA = n*−1=2*
n
i=1
˜dii*[I{*i¡ ˜F*−1*() + n*−1=2*hi(1; T1; T2*) − 0:5n−1*gi(T2; T3*)}*
*− I{*i¡ ˜F*−1()}] − n−1=2*
n
i=1
˜dii*[I{*i*¡ − ˜F−1*()
+ n*−1=2*_{h}
i*(−1; T*1; T2*) − 0:5n−1*gi(T2; T3*)} − I{*i*¡ − ˜F−1()}]*
+ n*−1=2*n
i=1
˜dii*I{− ˜F−1*() ¡ i¡ ˜F*−1()}:* (A.6)

Proof of Theorem4.4. Clearly,
n1=2_{( ˆ(1 − 2; b) − ) = (n}−1_{D}

nADn)*−1*(n*−1=2*DnA + n*−1=2b ˆa(1 − 2)D*nA*∗*1sgn):
By Theorem 3.1, (A.4) and (A.6), we only need to consider n*−1=2*_{˜D}

n*A ∗ 1*sgn with
n*−1=2*_{˜D}
n*A ∗ 1*sgn= n*−1=2*
n
i=1

˜di*(I{y*i*∗¿ ˆa(1 − 2)} − I{y∗*i *¡ − ˆa(1 − 2)}:*
By using arguments similar to those of (A.5) and (A.6), we obtain

n*−1=2*_{˜D}

n*A ∗ 1*sgn*= −n−1=2*2f(F*−1(1 − ))Qn*1=2( ˆI *− )*
+ n*−1=2*n

i=1

˜disgn(i*)I{|*i¿ F*−1(1 − )}:*
The proof of Theorem4.4 follows.

Proof of Theorem4.5. Denote by ˜g& the contaminated distribution (4.2). The C–R
bound for is
*(1 − ")−1*
E˜g&
@ ln ˜g&()
@
_{2}*−1*
Q*−1*

which converges to the C–R lower bound given in (4.3*) as & → ∞. On the other hand,*
the contaminated normal distribution of (4.2*) satis<es f() → 0 as → ∞. Since,*
ˆI has a bounded inRuence function, from (4.1), the asymptotic covariance matrix of
ˆs*(1 − ") is*

n*−1*_{Q}*−1 _{(1 − ")}−2*

_{E}

g2*I{|| 6 F**−1(1 − "=2)};*

where g is distribution of N(0; 2*). However, as & → ∞; F**−1(1−"=2) → ∞. Then the*
above variance is also the quantity of (4.3). This completes the proof of the theorem.

References

Bickel, P.J., 1975. One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70, 428–433. Bunke, H., Bunke, O., 1989. Nonlinear Regression, Functional Relations and Robust Methods. Wiley,

New York.

Chen, L.A., 1988. Regression quantiles and trimmed least squares estimators for structural equations and nonlinear regression model. Ph.D. Thesis, University of Illinois.

Chen, L.A., Chiang, Y.C., 1996. Symmetric type quantile and trimmed means for location and linear regression model. J. Nonparametric Statist. 7, 171–185.

Cook, R.D., Weisberg, S., 1982. Residuals and InRuence in Regression. Chapman & Hall, New York. Cramer, J.S., 1989. Econometric Applications of Maximum Likelihood Methods. Cambridge University Press,

Cambridge.

Fox, T., Hinkley, D., Larntz, K., 1980. Jackkni<ng in nonlinear regression. Technometrics 22, 29–33. JureGckovFa, J., 1977. Asymptotic relations of M-estimates and R-estimates in linear regression model. Ann.

Statist. 5, 464–472.

JureGckovFa, J., 1984. Regression quantile and trimmed least square estimator under general design. Kybernetika 20, 345–357.

JureGckovFa, J., ProchFazka, B., 1994. Regression quantiles and trimmed least squares estimator in nonlinear regression model. J. Nonparametric Statist. 3, 201–222.

Kennedy, W., Gentle, J., 1980. Statistical Computing. Dekker, New York.

Kim, S.J., 1992. The metrically trimmed mean as a robust estimator of location. Ann. Statist. 20, 1534–1547.

Koenker, R., Bassett, G.J., 1978. Regression quantile. Econometrica 46, 33–50.

Koenker, R., Park, B.J., 1992. An interior point algorithm for nonlinear quantile regression. Faculty Working Paper 92-0127. College of Commerce and Business Administration, University of Illinois, Urbana-Champaign.

LFeger, C., Romano, J.P., 1990. Bootstrap adaptive estimation: The trimmed-mean example. Canad. J. Statist. 18, 297–314.

Liese, F., Vajda, I., 1994. Consistency of M-estimates in general regression models. J. Multivariate Anal. 50, 93–114.

Oberhofer, W., 1982. The consistency of nonlinear regression minimizing the ‘1-norm. Ann. Statist. 10, 316–319.

ProchFazka, B., 1988. Regression quantiles and trimmed least squares estimator in the nonlinear regression model. Comput. Statist. Data Anal. 6, 358–391.

Richardson, G.D., Bhattacharyya, B.B., 1987. Consistent ‘1-estimators in nonlinear regression for a noncompact parameter space. Shankhya A 49, 377–387.

Ruppert, D., Carroll, R.J., 1980. Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75, 828–838.

Wang, J., 1995. Asymptotic normality of ‘1-estimators in nonlinear regression. J. Multivariate Anal. 54, 227–238.