Regression modeling for nonparametric estimation of distribution and quantile functions

(1)

REGRESSION MODELING FOR NONPARAMETRIC ESTIMATION OF DISTRIBUTION AND QUANTILE FUNCTIONS

Ming-Yen Cheng and Liang Peng

National Taiwan University and Georgia Institute of Technology Abstract: We propose a local linear estimator of a smooth distribution function.

This estimator applies local linear techniques to observations from a regression model in which the value of the empirical distribution function equals the value of true distribution plus an error term. We show that, for most commonly used ker-nel functions, our local linear estimator has a smaller asymptotic mean integrated squared error than the conventional kernel distribution estimator. Importantly, since this MISEreduction occurs through a constant factor of a second order term, any bandwidth selection procedures for kernel distribution estimator can be easily adapted for our estimator. For the estimation of a smooth quantile function, we establish a regression model of the empirical quantile function and obtain a local quadratic estimator. It has better asymptotic performance than the kernel quantile estimator in both interior and boundary cases.

Key words and phrases: Distribution function, empirical quantiles, kernel, local

polynomial estimation, nonparametric estimation, quantile, smoothing.

1. Introduction

Suppose that X1, . . . , Xn are independent and identically distributed

ran-dom variables from a common univariate distribution function F . One obvious estimator of the distribution function F is the empirical distribution function

Fn(x) = _n1ni=1I(Xi ≤ x), where I(A) denotes the indicator function of set A. Though Fn has good properties, one may prefer a smooth estimator of F .

An example is in the estimation of the receiver operating characteristic (ROC) curves for continuous diagnostic tests (see Zou, Hall and Shapiro (1997)). Ker-nel distribution estimators ˜F (x) = 1_nn_i=1K(x−Xi

h ) are smooth and have been

extensively studied in the literature. A potential problem is the boundary eﬀect: one may have substantial bias near the boundaries of the data range.

We introduce smooth distribution estimators that avoid boundary eﬀects. They are derived by writing

(2)

where the _is are error terms. Having the ‘regression model’ (1.1), one can employ

nonparametric regression techniques, for example Nadaraya-Watson (1964b), and local polynomial (Fan and Gijbels (1996)) methods, to the data (X1, Fn(X1)), . . . , (Xn, Fn(Xn)) to construct smooth estimators of F . In this paper we concentrate

on local linear smoothing.

The mean integrated squared errors of ˜F (x) and our local linear distribution

estimator both have the same leading term as that of the empirical distribution. However, for many commonly used kernel functions, the local linear distribution estimator has a smaller second order term, and hence asymptotic mean integrated squared error, than ˜F (x). The reduction in the second order term, which arises

as eﬀects of smoothing, is approximately 60% if the Epanechnikov, Biweight, or triangular kernels are employed. Furthermore, the asymptotic mean integrated squared error expressions suggest that any bandwidth selection procedure tai-lored for ˜F can be used, with a simple constant adjustment, for implementation

of our estimator.

Quantile estimation plays an important role in a wide range of statistical applications: the Q-Q plot, Value-at-Risk in financial risk management, etc. A natural estimator for the quantile function Q(p) is the empirical quantile function. Several smooth quantile estimators have appeared in the literature. Nadaraya (1964a) discussed a kernel estimator defined as the inverse of the kernel distribu-tion estimator. Parzen (1979) proposed kernel quantile estimators, which were subsequently investigated by Yang (1985), Falk (1985), Zelterman (1990), and Sheather and Marron (1990). A unified kernel quantile estimator was given by Cheng and Parzen (1997). We consider local quadratic regression estimation for the quantile function through the relation

Qn(s) = Q(s) + error term, (1.2)

where Qn is the empirical quantile function. We show that, under stronger

smoothness assumptions, the local quadratic estimator of Q(p) is better than the kernel quantile estimator ˆQk

n(p), deﬁned in (3.1), both when p is a ﬁxed number

in (0, 1) and when p tends to 0 or 1.

While F and Q are necessarily nondecreasing, it is possible that our esti-mators of these functions are decreasing at some places. However, since the response factors Fn(x) in (1.1) and Qn(s) in (1.2) are nondecreasing, the locally

ﬁtted curves are not far from nondecreasing. When a nondecreasing estimated curve is desired, it can be achieved by applying isotonic regression techniques, see for example Mammen, Marron, Turlach and Wand (2001), to our estimator.

(3)

Smoothing regression techniques have been applied to estimation of distri-bution and density functions. Cheng, Fan and Marron (1997) obtained local linear density estimators by constructing regression data based on binning the original data. Wei and Chu (1994) regressed responses, obtain by diﬀerencing Fn

on design points i/n, to estimate a density function. Lejeune and Sarda (1992) minimized a locally kernel-weighted L2 _{norm between F}

n and a polynomial to

obtain a distribution estimator, and then used its derivative as an estimator of the density. This distribution estimator is essentially the same as the kernel quantile estimator in the interior. The regression functions in (1.1) and (1.2) are exactly the respective functions to be estimated. Thus our estimation methods are advantageous as they involve only denoising and not other operations, such as diﬀerentiation.

This paper is organized as follows. In Section 2, the local linear distribution estimator is derived, asymptotic results and a few remarks are given. Section 3 discusses local quadratic quantile estimation. Section 4 reports simulation studies on the ﬁnite sample performance of the distribution and quantile estimators. Proofs of the main results are given in the Appendix.

2. Distribution Estimation

The kernel distribution estimation was introduced by Nadaraya (1964a) and is deﬁned by ˜ F (x) = 1 n n i=1 Kx− Xi h , (2.1)

where K is a given distribution function and h = h(n) > 0 (h→ 0 as n → ∞) is the bandwidth. This estimator is the distribution function corresponding to the kernel density estimator based on the kernel k(t) = K(t) and bandwidth

h. Note also that this estimator can be obtained by smoothing the empirical

process Fn(x), using kernel k, in the same way we smooth the quantile process,

see Section 3. Theoretical properties of ˜F (x) as an estimator of the unknown

true distribution function F (x) have been investigated by several authors, see for example Yamato (1973), Reiss (1981) and Falk (1983). For the Edgeworth expan-sions, we refer to Garcia-Soidan, Gonzalez-Manteiga and Prada-Sanchez (1997). Altman and L´eger (1995) and Bowman, Hall and Prvan (1998) investigated the

optimal choice of bandwidth.

The optimal choice of the smoothing parameter h is obtained by minimizing the mean integrated squared error deﬁned by

M ISE(h, ˜F ) = E

(4)

in Altman and L´eger (1995), and

M ISE†(h, ˜F ) = E

( ˜F (x)− F (x))2dx (2.3) in Bowman, Hall and Prvan (1998). Here, W is a bounded, nonnegative weight function supported on a compact set. As pointed out by Bowman, Hall and Prvan (1998), this kind of optimal choice of h is asymptotic to one that produces second order optimality. More specifically, the choice of bandwidth does not affect the first order expansion of M ISE†(h, ˜F ) or M ISE(h, ˜F ), i.e., the n−1

term, only if √nh2 _{→ 0.}

It is known that, under some regularity conditions, the mean integrated squared error for the kernel distribution estimation ˜F has the expansion

M ISE(h, ˜F ) = v1n−1− 2c2v2n−1h + c 2 1v3 4 h 4_{+ o}_nh−1_{+ o}_h4_, _(2.4) where v1 = F (x) [ 1 − F (x)]W (x)f(x) dx, v2 = f2(x) W (x) dx, v3 = (f(x))2_{W (x)f (x) dx, c} 1 = x2_{k(x) dx, c} 2 = xk(x)K(x) dx and f (x) = F(x). See, for example, Falk (1983). Obviously, kernel smoothing provides a second order correction (i.e., deﬁciency): mean integrated squared error of

Fn(x) has the same leading term v1n−1 which is independent of the smoothing parameter h. The asymptotically optimal bandwidth that minimizes the sec-ond order correction induced by smoothing,−2c2v2hn−1+ c12v3h4/4, is h∗( ˜F ) = [2c2v2/(c21v3)]1/3n−1/3. The optimal bandwidth h∗( ˜F ) gives rise to M ISE(h∗( ˜F ),

˜

F ) = v1n−1− 3₄(2c2v2)4/3(c21v3)−1/3n−4/3+ o(n−4/3). The second term in this asymptotic expression is negative. Therefore, kernel smoothing improves empir-ical distribution estimator by a second order eﬀect.

Now we derive our estimator based on (1.1) and local linear smoothing tech-niques. Let k, called a kernel function, be a probability density and h > 0 be a bandwidth. For simplicity of notation, we take k(t) = K(t) with K(t) deﬁned in (2.1). Let (ˆa, ˆb) be the value of (a, b) that minimizes the kernel weighted squared errors n j=1 {Fn(Xj)− a − b(x − Xj)}2k _x_{− X}_j h .

Then the local linear distribution estimator is deﬁned as ˆa and has the follow-ing explicit expression ˆF (x) =

_n j=1wjFn(Xj) n j=1wj , where wj = k( x−Xj h )[sn,2− (x − Xj)sn,1], j = 1, . . . , n, with sn,l = nj=1k(x−Xh j)(x− Xj)l for l = 1, 2. See Fan

(5)

Throughout this paper we assume that the kernel function k is symmetric about zero and has support [−1, 1]. The mean integrated squared error for our local linear estimator, under some regularity conditions similar to those in Altman and L´eger (1995), is M ISE(h, ˆF ) = v1n−1− (4c2− c3)v2n−1h +c 2 1v3 4 h 4_{+ o}_nh−1_{+ o}_h4_{, (2.5)} for C0n−1+ 0 ≤ h ≤ C1n− 1, where 0 ∈ (0, 2/3], 1 ∈ (0, 1/3], C0 and C1 are positive constants, and c3 =

x2k2(x) dx. Proof of (2.5) is given in the Appendix. The constant factor 4c2− c3(≥ 2c2− c3) in (2.5) is positive for most commonly used kernels, see Table 1. So the bandwidth which minimizes−(4c2−

c3)v2n−1h + c 2 1v3 4 h4 is h∗( ˆF ) = (4c2− c3)v2/(c21v3) _1/3 n−1/3, and it results in M ISE(h∗( ˆF ), ˆF ) = v1n−1−3₄[(4c2− c3)v2]4/3(c21v3)−1/3n−4/3+ o(n−4/3). Notice that M ISE(h∗( ˆF ), ˆF ) ≤ MISE(h∗( ˜F ), ˜F ) is asymptotically

equiv-alent to 2c2 − c3 ≥ 0. In addition, comparing the asymptotic expressions of

M ISE(h∗( ˆF ), ˆF ) and M ISE(h∗( ˜F ), ˜F ), we see that{(2c2− c3)/(2c2)}4/3 is the relative improvement of ˆF over ˜F in terms of their second order performances.

Values of (2c2− c3) and {(2c2− c3)/(2c2)}4/3 are tabulated in Table 1 for some kernels that are commonly used in practice. In particular, the improvement is 58% for the Epanechnikov kernel, 62% for the Biweight kernel, and 64% for the triangular kernel. Such an improvement is particularly beneﬁcial when the sam-ple size is small or moderate, and is clearly seen in a simulation study given in Section 4.

Table 1. Comparison of the second order terms of M ISE h∗( ˜F ), ˜F and

M ISE h( ˆF ), ˆFfor some commonly used kernels.

Kernel 2c₂− c₃ (2c₂− c₃)/(2c₂)4/3 Epanechnikov k(x) = 3 4(1− x2)I(|x| ≤ 1) 6 35 ( 2 3)4/3≈ 0.5824 Biweight k(x) = 16 15(1− x 2₎2_{I(|x| ≤ 1)} 5 33 ( 7 10) 4/3_{≈ 0.6215} Triangular k(x) = (1− |x|)I(|x| ≤ 1) 1 6 ( 5 7) 4/3_{≈ 0.6385} Uniform k(x) = 1 2I(|x| ≤ 1) 1 6 ( 1 2) 4/3_{≈ 0.3969}

Observe that the asymptotic expressions of M ISE(h, ˆF ) and M ISE(h, ˜F )

(6)

diﬀer by a constant multiplication factor. These facts suggest that any bandwidth rule for ˜F multiplied by a constant can be readily used for ˆF . Hence there is

no need to invent new bandwidth selector in order to implement our local linear distribution estimator.

Remark 1. One can derive a Nadaraya-Watson type estimator of the distribu-tion funcdistribu-tion based on the model (1.1). It can be shown that this estimator (with an optimal bandwidth) has a greater MISE than ˜F for large n and any kernel

function.

Remark 2. The assumption that the kernel function k has support [−1, 1] may be dropped through a more careful analysis.

3. Quantile Estimation

In this section we discuss estimation of quantile functions. The quantile function corresponding to the distribution function F is deﬁned as Q(p) = inf{x :

F (x)≥ p} for p ∈ (0, 1]. The empirical quantile estimator is Qn(p) =

Xn,s, if (s− 1)/n < p ≤ s/n, s = 1, . . . , n, Xn,1, if p = 0,

where Xn,1 ≤ · · · ≤ Xn,n denote the order statistics of X1, . . . , Xn. The kernel

quantile estimator, proposed by Parzen (1979), is given by

ˆ Qk_n(p) = ₁ 0 h −1_ks− p h Qn(s)ds, (3.1)

where k is a probability density function and h > 0 is the bandwidth. As we see later, this type of kernel quantile estimators has a slower rate of convergence when p is a boundary point than when p is a ﬁxed interior point.

To estimate the quantile Q(p) we utilize (1.2), which establishes a regression relation between the empirical quantile function and the true quantile function, and apply a local quadratic technique as follows. Find the values of a, b and c that minimize the weighted integral of squared error of a quadratic approximation

₁ 0 Qn(s)− a − b(p − s) − c(p − s)2 ₂ kp− s h ds,

where k is a density function and h > 0 is the bandwidth. Then the local quadratic quantile estimator is deﬁned to be the value of a in the above solution. It has the form

ˆ

Q(p) = (a2a4− a23)A0(p)− (a1a4− a2a3)A1(p) + (a1a3− a22)A2(p) a0(a2a4− a23)− a1(a1a4− a2a3) + a2(a1a3− a22)

(7)

where ai=

₁

0(p−s)ik(p−sh )ds, i = 0, 1, . . . , 6, and Ai(p) =

₁

0(p−s)ik(p−sh )Qn(s)ds, i = 0, 1, 2. Note that the a_is are functions of p and h. For simplicity of notation

we suppress this dependence. Throughout this section we assume that k is a symmetric density about zero and has support [−1, 1]. In the case that p is a ﬁxed interior point in (0, 1), ˆQk

n(p) is the same as a local linear quantile estimator

(see Section 3.1), which is deﬁned as the solution in a that minimizes₀1[Qn(s)− a−b(p−s)]2_k(p−s

h )ds. This is the reason why we consider local quadratic estimator

instead of local linear estimator. Another local quadratic estimator is obtained by minimizingn_i=1[Qn(Xi)− a − b(p − Xi)− c(p − Xi)2]2k(p−X_h i) . This estimator

is asymptotically equivalent to ˆQk_n(p). Asymptotic properties of the quantile estimators ˆQk

n(p) and ˆQ(p) are considered in the following two subsections.

3.1. Interior quantiles

In this subsection p is a ﬁxed interior point in (0, 1). Under this circumstance, as n is large enough and h is small enough, a0 = h, a1 = a3 = a5 =0, a2 =

h31

−1s2k(s) ds, a4 = h5

₁

−1s4k(s) ds, a6 = h7

₁

−1s6k(s) ds. Hence the local

quadratic quantile estimator becomes ˆQ(p) = a4A0_a(p)−a2A2(p)

0a4−a22 . Also, if we deﬁne

a kernel function k2 by k2(u) = h(a4−a2u 2_h2₎

a0a4−a2₂ k(u) then we can write ˆQ(p) =

₁

0 1hk2(p−sh )Qn(s)ds, which is a kernel quantile estimator, see (3.1), with the

kernel k2.

If the second derivative of Q is continuous in a neighborhood of p, then the asymptotic mean squared error of ˆQk

n(p) is M SE( ˆQk_n(p)) = n−1p(1− p)[Q(p)]2− 2n−1h[Q(p)]2 ₁ −1sk(s)K(s)ds +1 4h 4_[Q_(p)]2 1 −1s 2_{k(s) ds} 2 + o(n−1h) + o(h4), (3.2) where K(u) =₋₁u k(s) ds (see Sheather and Marron (1990)). If the fourth

deriva-tive of Q is continuous in a neighborhood of p and EX2

1 < ∞, then the mean squared error of our local quadratic estimator ˆQ(p) has the asymptotic expression

M SE( ˆQ(p)) = n−1p(1− p)[Q(p)]2− 2n−1h[Q(p)]2σ2₁ + 1 242h8[Q(4)(p)]2 ₁ −1s 4_k 2(s)ds 2

+o(n−1h) + O(n−3/2log n) + o(h8), (3.3) where σ2₁ =₋₁1 sk2(s)K2(s) ds with K2(s) =

_s

−1k2(t) dt. Proof of (3.3) is given in the Appendix. We remark tha.t the condition EX2

1 <∞ may be removed by a more careful analysis, similar to that in Falk (1984), for example.

(8)

From (3.2) and (3.3) we have the following conclusions. First, ˆQ(p) and

ˆ

Qk_n(p) have the same leading term in their mean squared errors and the leading term is independent of the smoothing. Second, the minimal asymptotic mean squared error of ˆQk_n(p), with respect to the smoothing parameter h, is n−1p(1− p)[Q(p)]2−C1n−4/3 and that for ˆQ(p) is n−1p(1−p)[Q(p)]2−C2n−8/7. Here, C1 and C2 are some positive constants. Therefore, ˆQ(p) has a better mean squared error performance than ˆQk_n(p).

3.2. Boundary quantiles

Throughout this subsection we assume that the distribution function F has a ﬁnite left end point, i.e., Q(0−) ∈ (−∞, ∞). In order to investigate the boundary eﬀect of the kernel quantile estimator and the local quadratic estimator, we assume p = lh where l ∈ (0, 1). We can investigate the right boundary case similarly by taking p = 1− lh where l ∈ (0, 1).

In the boundary case, ˆQk

n(p)− Q(p)/

_l

−1k(s)ds→ 0 and ˆp Qkn(p) is no longer

a consistent estimator of Q(p) unless Q(p) =0. Consider the modiﬁed kernel estimator ¯ Qk_n(p) = ₁ 0 k( s−ph )Qn(s) ds 1 0 k(s−ph ) ds .

Then, if Q is continuous in a neighborhood of zero and n1/2h/ log n → ∞ as n→ ∞, we may show, in a similar way as Falk (1984), that

h−1{ ¯Qk_n(p)− Q(p)}→p _l

−1sk(s) ds/

_l

−1k(s) ds. (3.4)

Note that the optimal choice of h = O(n−1/3) for kernel quantile estimation, in the sense of minimizing the second order error term in (3.2), satisﬁes the condition n1/2h/ log n → ∞ (see Sheather and Marron (1990)). Hence we may

conclude from (3.4) that the kernel quantile estimator does not perform as well at near boundary points as at interior points.

Next we study the boundary eﬀect of the local quadratic estimator. In this case, for p = lh, l ∈ (0, 1), with n tending to inﬁnity and h tending to zero, we have ai = hi+1

_l

−1sik(s) ds, i = 0, 1, 2, 3, 4, 5, and k2(u) = h_d[(a2a4 − a23)− (a1a4− a2a3)hu + (a1a3 − a22)h2u2]k(u), where d = a0(a2a4− a32)− a1(a1a4−

a2a3)+a2(a1a3−a22). If the third derivative of Q is continuous in a neighborhood of zero and EX₁2 <∞, then the mean squared error of ˆQ(p) has the asymptotic

expression M SE(ˆQ(p))=n−1h[Q(p)]2l−2n−1h[Q(p)]2σ₂2+ 1 36h 6_[Q(3)_(p)]2 l −1s 3_k 2(s)ds ₂

(9)

where σ₂2=₋₁l sk2(s)K2(s)ds with K2(s) =

_s

−1k2(t)dt. Proof of (3.5) is similar to proof of (3.3). We remark again that the condition EX2

1 <∞ may be removed by a more careful analysis similar to that in Falk (1984). Note that (3.4) implies

M SE( ¯Qk_n(p)) = O(n−2/3), which is much larger than M SE( ˆQ(p)) = O(n−1) in this boundary case.

4. Simulation Studies

4.1. Distribution estimation

A Monte Carlo study was conducted to compare the mean integrated squared error performances of local linear and kernel distribution estimators. A discrete approximation to M ISE(h, ˜F) is ASE(h, ˜F) = n−1n_i=1[ ˜F (Xi)− F (Xi)]2W (Xi).

The Epanechnikov kernel k(x) = 3₄(1 − x2)I(|x| ≤ 1) was used to construct the two estimators. To compute the average squared errors of ˜F (x) and ˆF (x),

i.e., ASE(h, ˜F ) and ASE(h, ˆF ), we generated 500 samples from Weibull(θ, τ )

distributions Fθ,τ(x) = 1− exp(−θxτ) (x > 0), where θ, τ > 0. The weight

function W was chosen as W (x) ≡ 1. The sample size was n =10, 30, 50 or 70. Noting that h∗( ˆF ) diﬀers from h∗( ˜F ) only by a constant factor, the

plug-in approach proposed by Altman and L´eger (1995) was employed to choose

the bandwidths for both estimators. In this simulation study we replaced ˆF (x)

by Fn(x) whenever the value of the denominator nj=1wj deﬁned in ˆF (x) is

zero. Moreover ˆF (x) can be easily modiﬁed to be a distribution, for example, by

deﬁning ˆF (x) = 0 if x≤ inf{y : ˆF (y) > 0}, ˆF (x) = 1 if x≥ sup{y : ˆF (y) < 1}

and ˆF (x) = ˆF (x) otherwise.

The ratio of the empirical mean of ASE(h∗( ˜F ), ˜F ) to that of ASE(h∗( ˆF ), ˆF )

is reported in Table 2. In Table 3 we report the average of the ratios of

ASE(h∗( ˜F ), ˜F ) to ASE(h∗( ˆF ), ˆF ) with the corresponding standard error in

parentheses. The ﬁgures of Tables 2 and 3 show that our local linear estimator performs better than the kernel distribution estimator in all cases considered. In particular, Table 3 demonstrates clear gains of ˆF ; the average ratios are all

signiﬁcantly greater than 1.

In Figure 1 we plot the empirical distribution Fn(x), kernel distribution

estimate ˜F (x), and local linear estimate ˆF (x) based on one random sample from

the Weibull(6,2) or Weibull(3,2) distribution. Observe that ˆF (x) is better than Fn(x) and ˜F (x) as F (x) is away from zero and one. In another simulation, which

is not reported here, of 500 samples of size 50 from the same two distributions, we also observed that ˆF (x) has the smallest mean squared error among the three

(10)

Table 2. Ratio of the empirical mean of ASE(h∗( ˜F ), ˜F ) to the empirical mean of ASE(h∗( ˆF ), ˆF ). Distribution n =10 n =30 n =50 n =70 Weibull(6,2) 1.371 1.267 1.199 1.156 Weibull(3,2) 1.032 1.092 1.090 1.088 Weibull(6,1) 1.659 1.812 1.933 1.838 Weibull(3,1) 1.133 1.390 1.434 1.392

Table 3. Average of the ratios of ASE(h∗( ˜F ), ˜F ) to ASE(h∗( ˆF ), ˆF ) with

the corresponding standarderror in parentheses.

Distribution n =10 n =30 n =50 n =70 Weibull(6,2) 2.438 (0.097) 1.635 (0.037) 1.382 (0.024) 1.213 (0.015) Weibull(3,2) 1.690 (0.064) 1.300 (0.029) 1.185 (0.020) 1.110 (0.013) Weibull(6,1) 2.100 (0.051) 2.240 (0.058) 2.605 (0.064) 2.792 (0.088) Weibull(3,1) 1.448 (0.036) 1.877 (0.047) 2.195(0.079) 2.172(0.080) 4.2. Quantile estimation

Next we report results of a Monte Carlo study which was conducted to compare the performance of the local quadratic estimator ˆQ(p) with the modiﬁed

kernel quantile estimator ¯Qk

n(p) for a range of values of p. We generated 300

pseudo-random samples of size 100 from the exponential distribution with mean 1 and the Weibull(3,2) distribution. The Epanechnikov kernel k(x) = 3₄(1−

x2)I(|x| ≤ 1) was used. The MSE of the two estimators were calculated for

p = 0.05, 0.10 and with values of the bandwidth ranging from 0.002 to 0.4.

Figures 2 and 3 show that local quadratic estimator ˆQ(p) with its optimal

bandwidth behaves better than the modiﬁed kernel estimator ¯Qk_n(p) with its own optimal bandwidth for smaller quantiles. Notice that the optimal bandwidths that achieved minimal mean squared errors of ¯Qk

n(p) are much smaller than those

for ˆQ(p). Thus, these ¯Qk

n(p) estimates would have very similar appearance to the

empirical quantile estimate which is not smooth. By contrast, our estimator ˆQ(p)

does not have this problem. More importantly, ˆQ(p) is much less sensitive to the

bandwidth choice than ¯Qk_n(p). This is important if we use global bandwidth, for example, choosing h∗ =argmin_α1−α| ˆQ(s)− Q(s)|2ds, where α∈ (0, 1/2).

(11)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 θ = 6, τ = 2, n = 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 θ = 3, τ = 2, n = 50

Figure 1. Distribution Function Estimation Based on One Sample. Solid line represents the true distribution, Weibull (6,2) (top panel) or Weibull (3,2) (bottom panel). Dashed line, dot-and-dash line, and dotted line represent the kernel distribution estimate ˜F (x), the local linear estimate ˆF (x) and the

(12)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.005 0.01 0.015 0.02 0.025 0.03 Bandwidth MS E (a)p=0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.005 0.01 0.015 0.02 0.025 Bandwidth MS E (b)p=0.10

Figure 2. Quantile Estimation for Exponential (1) distribution. Solid line and broken line plot the mean squared errors, against bandwidth h, for the local quadratic estimator ˆQ(p) and the modiﬁed kernel quantile estimator

¯

(13)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 Bandwidth MS E (a)p=0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 1 2 3 4 5 6 7 Bandwidth MS E (a)p=0.10 ×10−3

Figure 3. Quantile Estimation for Weibull (3,2) distribution. Solid line and broken line represent the mean squared errors for the local linear estimator

ˆ

Q(p) and the modiﬁed kernel quantile estimator ¯Qk_n(p), respectively. Sample size n is 100.

(14)

Acknowledgements

We thank the co-editor, an associate editor and a referee for their helpful comments.

Appendix

Proof of (2.5). In order to simplify the proof, we work on ˆ F (x)− F (x) = _n j=1wjFn(Xj) _n j=1wj+ n−q − F (x) = n−2h−4n_j=1wj[Fn(Xj)− F (x)] n−2h−4n_j=1wj+ n−q ,

where q > 0 is some large constant. This trick was used by Fan (1993) for analysis of the local linear regression estimator. First we have

(nh3)−1sn,1−→−fa.s. (x)c1, (nh3)−1sn,2−→f(x)ca.s. 1, (n2h4)−1

n

i=1

wi−→fa.s. 2(x)c1. (4.1)

Let Uj = F (Xj), j = 1, . . . , n. Put Gn(u) = _n1

_n

i=1I(Ui ≤ u) and αn(u) = √

n(Gn(u)− u). From Csörg˝o, Csörg˝o, Horváth and Mason (1986), there exists

a sequence of Brownian bridges Bn(u), 0≤ u ≤ 1, n = 1, . . ., such that

lim sup

n→∞ 0≤u≤1sup n 1/4_|α

n(u)− Bn(u)|/((log n)1/2(log log n)1/4) = 2−1/4 a.s. (4.2)

Because of the new version of ˆF (x), we can treat equations (4.1) and (4.2) as

true in the whole space instead of almost surely. Note that

(n2h4)−1 n j=1 wj[Fn(Xj)− F (x)] = n−2h−4 n j=1 k x− Xj h [sn,2− (x − Xj)sn,1] [Fn(Xj)− F (x)] = (nh)−1 n j=1 kx− Xj h n−1h−3sn,2− f(x)c1 [Fn(Xj)− F (x)] −(nh)−1n j=1 (x− Xj)k _x_{− X} j h n−1h−3sn,1+ f(x)c1 [Fn(Xj)− F (x)] +(nh)−1c1 n j=1 k _x_{− X} j h [f (x) + (x− Xj)f(x)][Fn(Xj)− F (x)] := I1+ I2+ I3.

(15)

Write I3= h−1c1 _∞ −∞[Fn(s)− F (x)]k _x_{− s} h [f (x) + (x− s)f(x)] dFn(s) =(2h)−1c1 ₁ −1k(y)[f (x) + hyf _{(x)] d[F} n(x− yh) − F (x)]2 =(2h)−1c1 ₁ −1[Fn(x− yh) − F (x)]

2_[k_{(y)f (x) + yk}_(y)hf_(x)+k(y)hf_{(x)] dy}

=(2h)−1c1 ₁ −1[Gn(F (x− yh)) − F (x)] 2_{g(y) dy} =(2nh)−1c1 ₁ −1[αn(F (x− yh)) − Bn(F (x− yh))] 2_{g(y) dy} +(nh)−1c1 ₁ −1[αn(F (x− yh)) − Bn(F (x− yh))]

×[Bn(F (x− yh)) + n1/2(F (x− yh) − F (x))]g(y) dy

+(2nh)−1c1 ₁ −1B 2 n(F (x− yh))g(y) dy +(n1/2h)−1c1 ₁

−1Bn(F (x− yh))[F (x − yh) − F (x)]g(y) dy

+(2h)−1c1

₁

−1[F (x− yh) − F (x)]

2_{g(y) dy} = II1+· · · + II5,

where g(y) = k(y)f (x) + yk(y)hf(x) + k(y)hf(x). By (4.2) we have

II1+ II2= O(n−3/2h−1(log n)2+ n−5/4h−1log n + n−3/4log n),

E(II₃2)+E(II3II4)=O(n−2h−1+n−3/2) and II5= 2−1h2c21f2(x)f(x)+O(h3). Furthermore,

E(II₄2)=c2₁n−1h−2 ₁

−1

₁

−1E [Bn(F (x−y1h))Bn(F (x−y2h))] [F (x−y1h)−F (x)] ×[F (x − y2h)− F (x)]g(y1)g(y2) dy1dy2

=c2₁n−1f4(x)[F (x)−F2(x)]+c2₁c3n−1hf5(x)−4c21c2n−1hf5(x)+O(n−1h2), where the last equality follows from moments of Brownian bridges, Taylor ex-pansions and integration by parts. Thus we may show that

E I3

n−2h−4n_j=1wj

₂

= b2_n(x) + σ_n2(x) + On−2h−2n−1/2(log n)2

(16)

where b2_n(x) = 4−1(f(x))2c2₁h4 and σ_n2(x) = (F (x)−F2(x))n−1−4f(x)c2n−1h+

f (x)c3n−1h. The above result and the fact that E (I1+I2+I3)2= E(I32) (1+o(1)) yield E( ˆF (x)−F (x))2 = b2

n(x)+σ2n(x)

(1 + o(1)). Further calculations lead to (2.5).

Proof of (3.3). Let Ui = F (Xi), i = 1, . . .. Cs¨org˝o et al. (1986) have

con-structed a probability space carrying U1, U2, . . . and a sequence of Brownian bridges Bn(s), 0≤ s ≤ 1, n = 1, . . . , such that, for the quantile process βn(s) = n1/2_{{s − U} n(s)}, 0 ≤ s ≤ 1, and Un(s) =    Un,k, if (k− 1)/n < s ≤ k/n, k = 1, . . . , n Un,1, if s = 0,

with Un,1≤ · · · ≤ Un,n denoting the order statistics of U1, . . . , Un, we have

     sup 0≤s≤1n 1/2_|β n(s)− Bn(s)| = O(log n), a.s. sup 0≤s≤1|βn(s)| = O(log n) a.s. (4.3)

Since EX₁2 < ∞ we can treat (4.3) as true in the whole space instead of a.s.

Write (a0a4− a22) ˆ Q(p)− Q(p) = n−1/2 ₁ 0 [− a4+a2(p− s) 2_]k(p− s h )Q _(s)[β n(s)−Bn(s)][1+O(n−1/2log n)] ds +n−1/2 ₁ 0 [− a4+ a2(p− s) 2_]k(p− s h )Q _(s)B n(s)[1 + O(n−1/2log n)]ds + ₁ 0 [a4− a2(p− s) 2_]k(p− s h )[Q(s)− Q(p)]ds = I1+ I2+ I3. One can check that

|I1| = O h6n−1log n,|I3| = ₂₄1 Q(4)(p)(a0a4− a22)h4 ₁ −1s 4_k 2(s)ds + o(h10),

EI₂2= n−1[1 + O(n−1/2log n)][Q(p)]2(a0a4− a22)2(p− p2) + o(h3)

−2h3 1

−1

_y₁

−1 [a

2

4y1+a22h4y31y22− a2a4h2(y1y22+y13)]k(y1)k(y2)dy2dy1

.

Hence (3.3) follows from the above results and the identity

2(a0a4− a22)2σ21 = 2h2 ₁ −1 _y₁ −1(a4y1− a2h 2_y3

(17)

References

Altman, N. and L´eger, C. (1995). Bandwidth selection for kernel distribution function estima-tion. J. Statist. Plann. Inference46, 195-214.

Bowman, A., Hall, P. and Prvan, T. (1998). Bandwidth selection for the smoothing of distri-bution functions. Biometrika85, 799-808.

Cheng, M.-Y., Fan, J. and Marron, J. S. (1997). On automatic boundary corrections. Ann.

Statist. 25, 1691-1708.

Cheng, C. and Parzen, E. (1997). Uniﬁed estimators of smooth quantile and quantile density functions. J. Statist. Plann. Inference59, 291-307.

Csörg˝o, M., Csörg˝o, S., Horváth, L. and Mason, D. M. (1986). Weighted empirical and quantile processes. Ann. Probab. 14, 31-85.

Falk, M. (1983). Relative eﬃciency and deﬁciency of kernel type estimators of smooth distri-bution functions. Statist. Neerlandica37, 73-83.

Falk, M. (1984). Relative deﬁciency of kernel type estimators of quantiles. Ann. Statist. 12, 261-268.

Falk, M. (1985). Asymptotic normality of kernel type estimators of quantiles. Ann. Statist.

13, 428-433.

Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87, 998-1004.

Fan, J. (1993). Local linear regression smoothers and their minimax eﬃciencies. Ann. Statist.

21, 196-216.

Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London.

Lejeune, M. and Sarda, P. (1992). Smooth estimators of distribution and density functions.

Comput. Statist. Data Anal. 14, 457-471.

Garcia-Soidan, P. H., Gonzalez-Manteiga, W. and Prada-Sanchez, J. M. (1997). Edgeworth expansions for nonparametric distribution estimation with applications. J. Statist. Plann.

Inference65, 213-231.

Mammen, E., Marron, J. S., Turlach, B. A. and Wand, M. P. (1999) A general projection framework for constrained smoothing. Statist. Sci. 16, 232-248.

Nadaraya, E. A. (1964a). Some new estimators for distribution functions. Theory Probab. Appl.

9, 497-500.

Nadaraya, E. A. (1964b). On estimating regression. Theory Probab. Appl. 9, 141-142.

Parzen, E. (1979). Nonparametric statistical data modeling. J. Amer. Statist. Assoc. 74, 105-131.

Reiss, R.-D. (1981). Nonparametric estimation of smooth distribution functions. Scand. J.

Statist. 8, 116-119.

Sheather, S. J. and Marron, J. S. (1990). Kernel quantile estimators. J. Amer. Statist. Assoc.

85, 410-416.

Wei, C. Z. and Chu, C. K. (1994). A regression point of view toward density estimation. J.

Nonparametr Statist. 4, 191-201.

Yamato, H. (1973). Uniform convergence of an estimator of a distribution function. Bull. Math.

Statist. 15, 69-78.

Yang, S. S. (1985). A smooth nonparametric estimator of a quantile function. J. Amer. Statist.

Assoc. 80, 1004-1011.

Zelterman, D. (1990). Smooth nonparametric estimation of the quantile function. J. Statist.

(18)

Zou, K. H., Hall, W. J. and Shapiro, D. E. (1997). Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statist. Medicine16, 2143-2156.

Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. E-mail: cheng@math.ntu.edu.tw

School of Mathematics, Georgia Institute of Technology, Atlanta GA 30332-0160, U.S.A E-mail: peng@math.gatech.edu