Bandwidth Selection for Kernel Quantile Estimation

(1)

Bandwidth Selection for Kernel Quantile

Estimation

Ming-Yen Cheng

1

and Shan Sun

2

Abstract

In this article, we summarize some quantile estimators and related bandwidth selection methods and give two new bandwidth selection methods. By four distribu-tions: standard normal, exponential, double exponential and log normal we simulated the methods and compared their efficiencies to that of the empirical quantile. It turns out that kernel smoothed quantile estimators, with no matter which bandwidth se-lection method used, are more efficient than the empirical quantile estimator in most situations. And when sample size is relatively small, kernel smoothed estimators are especially more efficient than the empirical quantile estimator. However, no one method can beat any other methods for all distributions.

Keywords. Bandwidth, kernel, quantile, nonparametric smoothing. Short Title. Quantile Estimation.

JEL subject classification: C14, C13.

1 _{Department of Mathematics, National Taiwan University, Taipei 106, Taiwan.} _Email:

cheng@math.ntu.edu.tw

2_{Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas 79409 - 1042,}

(2)

1 Introduction

The estimation of population quantiles is of great interest when a parametric form for the underlying distribution is not available. In addition, quantiles often arise as the natural thing to estimate when the underlying distribution is skewed. Let X1, X2, · · · , Xn be an independent and identically distributed random sample drawn

from an absolutely continuous distribution function F with density f . Let X(1) ≤

X(2) ≤ · · · ≤ X(n) denote the corresponding order statistics. The quantile function Q

of the population is defined as Q(p) = inf{x : F (x) ≥ p}, 0 < p < 1. Note that Q is the left-continuous inverse of F . Denote, for each 0 < p < 1, the pth quantile of F by ξp, that is, ξp = Q(p).

A traditional nonparametric estimator of the distribution function is the empir-ical function Fn(x), which is defined as

Fn(x) = 1 n n X i=1 I(−∞,x](Xi)

where IA(x) = 1 if x ∈ A and 0 otherwise. Accordingly, a nonparametric estimator

of ξp is the empirical quantile

Qn(p) = inf{x : Fn(x) ≥ p} = X([np]+1),

where [np] denotes the integer part of np. Let pr = r/(n + 1) and qr = 1 − pr. If we

use X(r) to estimate the prth quantile, then the asymptotic bias and variance are

ABias{X(r)} = prqrQ00(pr) 2(n + 2) + prqr (n + 2)2 n1 3(qr− pr)Q 000 r + 1 8Q 0000 r o , AV ar{X(r)} = prqr (n + 2)Q 02 r + prqr (n + 2)2 n 2(qr− pr)Q0rQ 00 r+ prqr(Q0rQ 000 r + 1 2Q 00 r) o . The asymptotic mean squared error of X(r)should be AM SE{X(r)} = ABias{X(r)}2+

AV ar{X(r)}.

When F is continuous, it is more natural to use a smooth random function as an estimator of F since there is a substantial lack of efficiency, caused by the variability of individual order statistics. Indeed, the choice of Fn does not always lead to the

(3)

respect to the integrated square loss). Intuitively appealing and easily understood competitors to Qn are the popular kernel quantile estimators, see Section 2.

Section ?? gives the asymptotic mean squared errors and asymptotically optimal bandwidths for two kernel smoothed quantile estimators. The optimal bandwidths depend on unknown quantities such as density derivatives and quantile derivatives. Kernel estimators and optimal bandwidths for these unknowns are addressed as well. In Section 4, we give four methods to select the bandwidths for the two kernel quantile estimators based on data. In Section 5 we implement these methods on four specific distributions and the results of the simulation. The Appendix gives some proofs.

2 Kernel smoothed quantile estimation

2.1 Inverse of kernel distribution function estimator

A popular kernel quantile estimator is based on the Nadaraya (1964) type estimator for F , defined as ˆ Fn(x) = 1 n n X i=1 Kh(x − Xi) where Kh(x) = Z x −∞ 1 hk( t h)dt,

k is a kernel function satisfying k ≥ 0,R_−∞∞ k(x)dx = 1. Here h = hn > 0 is called

the smoothing parameter or bandwidth since it controls the amount of smoothness in the estimator for a given sample of size n. We make the assumption that h → 0 as n → ∞. The corresponding estimator of the quantile function Q = F−1 is then defined by

ˆ

Qn(p) = inf{x : ˆFn(X) ≥ p}, 0 < p < 1. (1)

Nadaraya (1964) showed under some assumptions for k, f and h, ˆQn(p)

(appropri-ately normalized) has an asymptotic standard normal distribution. Another notable property of ˆQn(p), namely the almost sure consistency, was obtained by Yamato

(1973). Ralescu and Sun (1992) obtained the necessary and sufficient conditions for the asymptotic normality of ˆQn(p). Azzalini (1981) and an unpublished report used

(4)

heuristic arguments based on second order approximations and performed some nu-merical comparisons of ˆQn(p) with the classical sample quantile for estimating the

95th quantile of the Gamma (1) distribution. These studies indicated a considerable amount of empirical evidence to support the superiority of ˆQn(p) for a variety of

smooth distribution functions.

Azzalini (1981) considered second order property of ˆFn under the following

as-sumptions: (i) h → 0 as n → ∞; (ii) the kernel has a finite support, that is, k(t) = 0 if |t| > t0 for some positive t0; (iii) the density f is continuous in the

interval (x − t0h, x + t0h); and (iv) f0(x) exists. He pointed out that the asymptotic

optimal bandwidth for ˆF is of the form hopt = u 4vn 1₃ (2) where u = f (x)nt0− Z t0 −t0 K2(t)dt)o, v =n1 2f 0 (x) Z t0 −t0 t2k(t)dto 2 .

Also, Azzalini (1981) suggested, without offering a proof, that (2) is again the asymp-totically optimal choice of h for ˆQn(p). We state the result in the following theorem

and the proof of the theorem can be found in Shankar (1998). We make the following assumptions:

Assumption A

(1) f is differentiable with a bounded derivative f0;

(2) f0 is continuous in the neighborhood of ξp and f0(ξp) 6= 0;

(3) R_−∞∞ xk(x)dx = 0 and R_−∞∞ x2_{k(x)dx < ∞.}

Theorem 1. Under assumptions (1)-(3), the asymptotic mean squared error of ˆQ(p) is AM SEQ(p) =ˆ p(1 − p) nf (ξp)2 + h 4 4 f0(ξp)2 f (ξp)2 µ2(k)2− h n 1 f (ξp) ψ(k)

and the asymptotically optimal choice of bandwidth for the smoothed empirical quan-tile function ˆQn(p) is hopt,1= f (ξp)ψ(k) n{f0_(ξ p)}2µ2(k)2 1₃ (3)

(5)

where µ2(k) =

R∞

−∞t

2_{k(t)dt and ψ(k) = 2}_{R yk(y)K(y)dy. If we take k as the standard}

normal density, thenR∞

−∞tdK 2_{(t) = 1/}√_{π, µ} 2(k) = 1 and hopt,1 = " f (ξp) √ πn{f_g0∗ n(ξp)} 2 #1₃ .

2.2 Kernel smoothing the order statistics

Another type of smooth quantile estimator, provided by Yang (1985) and also traced to Parzen (1979), is ˜ Qn(p) = n X i=1 X(i) Z _ni i−1 n 1 hk( p − x h )dx. (4)

It is clear that when i/n is close to p, ˜Qn(p) puts more weight on the order statistics

X(i). The asymptotic normality and mean squared consistency of ˜Qn(p) were

pro-vided by Yang (1985), while Falk (1984) showed that the asymptotic performance of ˜

Qn(p) is better than that of the empirical sample quantile Qn(p) in the sense of

rel-ative deficiency for appropriately chosen kernels and sufficiently smooth distribution functions.

Building on Faulk (1984), Sheater and Morron (1990) gave the asymptotic mean squared error (AMSE) of ˜Qn(p) as follows if f is not symmetric or f is symmetric

but p 6= 0.5: AM SE_˜ Qn(p) = p(1 − p) n q 2_{(p) +} 1 4h 4_q0 (p)2µ2(k)2− h nq 2_(p)ψ(k) ₍₅₎

where q = Q0 and q0 = Q00. If q = Q0 > 0 then

hopt,2 = Q0(p)2ψ(k) nQ00_(p)2_µ 2(k)2 1₃ . (6)

Remark 2.1. When F is symmetric and p = 0.5, then

AM SEQ˜n(p) = n−1[q(0.5)]2{0.25 − 0.5hψ(k) + n−1h−1R(k)},

where R(k) =R k2_{(x)dx. In this case, there is no single optimal bandwidth minimizing}

(6)

Remark 2.2. If q = 0, we need higher order terms. The AM SE of ˜Qn(p) can be shown as follows: AM SEQ˜n(p) = ( 1 4 − 1 n)h 4_Q00 (q)2µ2₂(k) + 2n−1h2Q00(q)2 Z (q − ht)tk(t)j(t)dt where j(t) =R_−∞t xk(x)dx. The proof is provided in the Appendix.

3 Density and quantile derivative estimation

The asymptotically optimal bandwidths hopt,1 and hopt,2 for ˆQn(p)) and ˜Qn(p) depend

on f (ξp), f0(ξp), Q0(p) and Q00(p). This section provides nonparametric estimators of

these quantities and the asymptotically optimal bandwidths.

3.1 Density derivative estimation

From (3) we know that we need to estimate f0. A natural estimator of the rth derivative (r ≥ 1) of f can be obtained by differentiating the estimator

ˆ fgn(x) = d dx ˆ Fn(x) = d dx n1 n n X i=1 Kgn(x − Xi) o = 1 n n X i=1 kgn(x − Xi) (7) of the density f (x), giving

ˆ f_g(r)_n(x) = d r dxr 1 ngn n X i=1 k x − Xi gn = 1 ngr+1 n n X i=1 k(r) x − Xi gn (8) where gnis the smoothing parameter (Wand and Jones, 1995). Therefore, the

asymp-totic mean squared error properties of ˆfg(r)n (x) can be derived straightforwardly to obtain (Wand and Jones, 1995)

AM SE{ ˆf_g(r)_n (x)} = 1 ng2r+1 n R(k(r))f (x) + 1 4g 4 n{µ2(k)}2f(r+2)(x) 2 (9) where R(η) = R η2(x)dx for any square-integrable function η. It follows that the AMSE-optimal bandwidth for estimating f(r)_{(x) is of order n}−1/(2r+5)_{. The}

asymp-totically optimal bandwidth for for ˆfgn(x) is given by

g_n∗ = R(k)f (x) n(µ2(k))2f00(x)2 1₅ . (10)

(7)

and the asymptotically optimal bandwidth for ˆf0 gn(x) is g_n∗∗ = 3R(k0)f (x) n(µ2(k))2f000(x)2 1₇ . (11)

When k is the standard Normal density,

g_n∗ = f (x) n√πf00_(x)2 1₅ , g_n∗∗= 3f (x) 4n√πf000_(x)2 1₇ .

3.2 Quantile derivative estimation

Next, we estimate Q0 = q and Q00 = q0 in the following ways. From (4), we know that the estimator of Q0 = q can be constructed as follows:

˜

q(p) = ˜Q0

n(p) =

Pn

i=1X(i)[ka(p −i−1_n ) − ka(p −_ni)]

=Pn

i=2(X(i)− X(i−1))ka(p − i−1_n ) − X(n)ka(p − 1) + X(1)ka(p).

(12) where ka(x) = 1_ak(x_a) and a = an is the bandwidth for ˜q. Jones (1992) derived that

the asymptotic MSE of ˜q(p) is given as follows: AM SE{˜q(p)} = a 4 4q 00 (p)2µ2(k)2+ 1 naq 2_(p) Z k2(y)dy . (13)

Minimizing (13) with respect to a, we obtain the asymptotically optimal bandwidth for ˜q(p) as a∗_opt = Q 0_(p)2_{R k}2_(y)dy nQ000_(p)2_µ 2(k)2 15 . (14)

To estimate Q00 = q0 in (6), note that ˜ Q00 n(p) = d dp ˜ Q0 n(p) = 1 a2 n X i=1 X(i) n k0 p − i−1 n a − k 0 p − i n a o . (15)

Similarly, we obtain the asymptotically optimal bandwidth for ˜Q00

n(p) as a∗∗_opt = 3R k 0_(x)2_dxQ0_(p)2 nµ2(k)2Q(4)(p)2 17 . (16)

(8)

4 Bandwidth selection

In this section, we consider several data-based methods to find the asymptotically optimal bandwidths for the estimators ˆQn(p) and ˜Qn(p). Bandwidth plays a critical

role in implementation of practical estimation. It determines the trade-off between the amount of smoothness obtained and closedness of the estimation to the true distribution. (see Wand and Jones)

4.1 Method 1. Approximate h

opt,1

for ˆ

Q

n

(p) using density

derivative estimators

Note that the asymptotically optimal bandwidth hopt,1for ˆQn(p), given in (3), involves

f (ξp) and f0(ξp), which can be estimated by ˆfgn( ˆξp) and ˆf

0

gn( ˆξp) respectively. Here, ˆξp is the empirical p-th quantile Qn(p). Using gn∗ in (10) with f ( ˆξp) and f00( ˆξp) replaced

by their Normal(µ, σ2_{) reference values, we obtain ˆ}_f g∗

n(x). Using g

∗∗

n in (11) with

f ( ˆξp) and f000( ˆξp) replaced by their Normal(µ, σ2) reference values, we obtain ˆf0g∗∗ n (x). Plugging this into (3), we have a data-based bandwidth

ˆ hopt,1= " ˆ fg∗ n( ˆξp)ψ(k) n_ˆ f0 g∗∗ n ( ˆξp) 2 µ2(k)2 #1₃ (17)

for ˆQn(p). If k is the standard normal density then

ˆ hopt,1 = " ˆ fg∗ n( ˆξp) n√π_ˆ f0 g∗∗ n ( ˆξp) 2 #1₃ . (18)

Remark 4.1. In the expression of the hopt,1, we have the derivative of f in the

denominator. If f0 has zeros, then its estimates at these zeros are also very small. Hence the estimator ˆhopt,1 of hopt,1 at these zeros will be very unstable. For example,

if f is standard normal, then f0 = −xf has a zero at x = 0, which corresponds to p = 0.5, and hence, when p = 0.5, the estimator ˆhopt,1 is very unstable. Similarly,

the first derivative of the double exponential density has a zero at x = 0 and the first derivative of the log normal density has a zero at x = e−1.

(9)

4.2 Method 2.

Approximate h

opt,2

for ˜

Q

n

(p) using quantile

derivative estimators

The asymptotically optimal bandwidth hopt,2, given in (6), for ˜Qn(p) involves the

unknown quantities Q0(p) and Q00(p), which can be estimated by ˜Q0_n(p) and ˜Q00_n(p) in (12) and (15), respectively. The asymptotically optimal bandwidths a∗_opt and a∗∗_opt, given in (14) and (16), for ˜Q0_n(p) and ˜Q00_n(p) depend on Q0(p), Q000(p) and Q(4)(p). We replace these unknowns by their Normal(µ, σ2_{) reference values. Then, using ˜}_Q0

n(p)

with a = a∗_opt and ˜Q00_n(p) with a = a∗∗_opt, we have the following data-based bandwidth ˆ hopt,2 = ( ˜ Q0 n(p)2ψ(k) n ˜Q00 n(p)2µ2(k)2 )1₃ (19) for ˜Qn(p).

4.3 Method 3.

Approximate h

opt,1

for ˆ

Q

n

(p) using quantile

derivative estimators

We introduce an alternative way of estimating f (ξp) and f0(ξp) in hopt,1, see (3), which

uses estimators of the quantile derivatives. Note that

Q0(p) = 1 f (F−1_(p)) = 1 f (Q(p)) = 1 f (ξp) (20) Q00(p) = −f 0_(Q(p)) f3_(Q(p)) = −f0_(ξ p) f3_(ξ p) . (21) Hence, (3) becomes hopt,1= Q0_n(p)5ψ(k) nQ00 n(p)2µ2(k)2 1₃ .

Similar to Method 2, first replace the unknowns in a∗_opt and a∗∗_opt by their Normal reference values, and then use ˜Q0_n(p) with a = a∗_opt and ˜Q00_n(p) with a = a∗∗_opt to get

¯ hopt,2= ( ˜ Q0 n(p)5ψ(k) n ˜Q00 n(p)2µ2(k)2 )1₃ . (22)

If we take k as the standard normal density, then ¯ hopt,2 = ( ˜ Q0 n(p)5 n√π ˜Q00 n(p)2 )1₃ .

(10)

4.4 Method 4.

Approximate h

opt,2

for ˜

Q

n

(p) using density

derivative estimators

From (20) and (21), we have

hopt,2 = f (ξp)4ψ(k) nf0_(ξ p)2µ2(k)2 13 . (23)

Then, plugin the estimators of f (ξp) and f0(ξp) in Method 1, see (17), to obtain

¯ hopt,2= ( ˆ fg∗ n( ˆξp) 4_ψ(k) n ˆf0 g∗∗ n ( ˆξp) 2_µ 2(k)2 )1₃ . (24)

When k is standard normal density, ¯hopt,2 becomes

¯ hopt,2 = ( ˆ fg∗ n( ˆξp) 4 n√π ˆf0 g∗∗ n ( ˆξp) 2 )1₃ .

5 Numerical Performance

We implement the methods in Section 4. Four distributions are selected: Exponential, Double Exponential, Lognormal and standard Normal. We shall use the standard normal density as the kernel k, i.e. k(x) = _√1

2πexp(−x

2_{/2). Then k}0_{(x) = −xk(x).}

and we can find

µ2(k) = Z x2k(x)dx = 1, ψ(k) = 2 Z {k(x)[ Z x −∞ k(t)dt]}dx = √1 π, R(k) = Z k2(x)dx = 1 2√π, R(k0) = Z {k0(x)}2dx = Z x2k2(x)dx = 1 4√π.

5.1 True values

In the following we compute the asymptotically optimal bandwidths and the AM SEs for the four distributions. First, we have the relationship between Q(p) and f (ξp) as

(11)

following Q0(p) = 1 f (ξp) , Q00(p) = −f 0_(ξ p) f (ξp)3 , Q000(p) = 3f 0_(ξ p)2− f (ξp)f00(ξp) f (ξp)5 , Q(4)(p) = 10f (ξp)f 0_(ξ p)f00(ξp) − f (ξp)2f000(ξp) − 15f0(ξp)3 f (ξp)7 . Using the above results, the asymptotic mse of ˜Qn(p) is

AM SEQ˜n(p) = p(1 − p) nf (ξp)2 + h 4_f0_(ξ p)2 4f (ξp)6 − h n√πf (ξp)2 . Also we have a∗ = f (ξp)8 2n√π(3f0_(ξ p)2− f (ξp)f00(ξp))2 1₅ , a∗∗= 3f (ξp)12 4n√π(10f (ξp)f0(ξp)f00(ξp) − f (ξp)2f000(ξp) − 15f0(ξp)3)2 1₇ . Case 1. f is the standard normal density. We have

f (x) = √1 2πe −x2₂ _{, f}0_{(x) = (−x)f (x), f}00_{(x) = (x}2_{− 1)f (x), f}000_{(x) = (3x − x}3_{)f (x).} Hence, with x = ξp, g∗_n= (√ 2 exp (x2_/2) n(x2_{− 1)}2 )1₅ , g_n∗∗= ( 3√2 exp (x2_/2) 4n(3x − x3₎2 )1₇ , AM SEQˆn(p) = 2πp(1 − p) n e x2 − √ 2h n e x2 2 + h 4 4 x 2_, a∗ = √1 2π " e−2x2 √ 2n(2x2_{+ 1)}2 #1₅ , a∗∗ = √1 2π ( 3e−3x2 2√2n(6x3_{+ 7x)}2 )1₇ , AM SEQ˜n(p) = 2πp(1 − p) n e x2 + π2h4x2e2x2 − 2 √ πh n e x2 . Case 2. f is the density of Exponential(1). We have

f (x) = e−x= −f0(x) = f00(x) = −f000(x). Hence g_n∗ = exp(x) n√π 1₅ , g_n∗∗= 3 exp(x) 4n√π 1₇ ,

(12)

AM SE_ˆ Qn(p) = p(1 − p) n e 2x₋ h n√πe x₊h4 4 , a∗ = e−4x 8√πn 1₅ , a∗∗= 3e−6x 144√πn 1₇ , AM SEQ˜n(p) = p(1 − p) n e 2x₊ h4 4 e 4x₋ h n√πe 2x

Case 3. f is the density of lognormal. We have f (x) = √1 2πxe −log2 x 2 , f0(x) = √1 2πe −log2 x 2 (−1 x2 − 1 x2 log x) = − f (x) x (1 + log x), f00(x) = √1 2πe −log2 x₂ 1 x3(1 + 3 log x + log 2_{x) =} f (x) x2 (1 + 3 log x + log 2_x), f000(x) = √1 2πe −log2 x₂ 1 x4(−8 log x−6 log 2 x−log3x) = −f (x) x3 (8 log x+6 log 2 x+log3x). Hence g_n∗ = x4 n√π(1 + 3 log x + log2x)2_{f (x)} 1₅ , g_n∗∗= 3x6

4n√π(8 log x + 6 log2x + log3x)2_{f (x)}

1₇ , AM SE_ˆ Qn(p) = 2πp(1 − p) n x 2_elog2_x − √ 2h n xe log2 x 2 + h 4 4 (1 + log x)2 x2 , a∗ = √1 2π ( e−2 log2x n√2(2 + 3 log x + 2 log2x)2 )1₅ , a∗∗= √1 2π ( 3e−3 log2x

2n√2(5 + 13 log x + 11 log2x + 6 log3x)2

)1₇ , AM SE_˜ Qn(p) = π2h4(1 + log x)2e2 log 2_x + 2πx2elog2xnp(1 − p) −√h π o . Case 4. f is the density of double exponential.

We have f (x) = 1₂e−|x|= f00(x) except at x = 0 and

f0(x) = ( −1 2e −x _{x > 0} 1 2e x _{x < 0} = − 1 2sign(x)e −|x| = f000(x).

(13)

Hence g_n∗∗= 2e |x| n√π 15 , g_n∗∗ = 3e|x| 2n√π 17 AM SEQˆn(p) = 4p(1 − p) n e 2|x|₋ 2h n√πe |x| + h 4 4 , a∗ = e−4|x| 27_n√_π 15 , a∗∗= e−6|x| 210_3n√_π 17 , AM SEQ˜n(p) = 4h4e4|x|+ 4p(1 − p) n e 2|x|₋ 4h n√πe 2|x|_.

5.2 Simulation results

We sampled from the four distributions of size 50, 100, 500, and 1000, and computed the bandwidths and AMSE’s at values of p from 0.05 to 0.95 with step size 0.05. However, by remark 2.1.1, we omitted p = 0.5 0.5 for normal and double exponential distributions and p = 0.35 for lognormal. We repeated the computation for 100 times. In the first several times of simulations, we obtained some extremely large or small bandwidths, which certainly resulted in extremely large asymptotic MSE. Hence we adopted the strategy in Sheather and Marron (1990) to adjust too small or large bandwidths. For example, in method 1, we forced ˆf0(ξp)−2 to be in the interval [0.05,

1.5] as follows: if it is not in the interval, we replace it by the closest endpoint of the interval. Simulation results are displayed by figures. In the figures, plotted against p is the relative efficiency, i.e. the ratio of the AMSE of the different methods to the AMSE of the empirical quantile. Figures 1–4 summarize performance of different methods with the same sample size for the four distributions. Figures 5–8 show performance of one method with different sample sizes.

From Figures 1-4 we can see that the solid line, which corresponds to sample size n = 50, is almost the lowest in each plot. This is because when sample size is small, the empirical quantile has a relatively bigger M SE. Hence the kernel estimators are relatively more efficient.

Generally speaking, the four methods did a better job than empirical quantiles. For example, in Figure 6, we can see that when n = 50 only method 2 gave an efficiency more than 1 with p values between 0.75 and 0.95. Efficiency of all other

(14)

methods are under 1 with all p values. But, unfortunately, no method works better than all the other methods for all distributions and all sample sizes. In Figure 8, for example, Method 2 sometimes works better than the others, but sometimes worse than the others. From this Figure it seems that Method 1 is always more efficient than Method 3. But if we look at Figure 6, Method 3 is more efficient than Method 1 for many p values in each sample size. We can also see from Figures 5-8 that plots of Method 1 (2) are similar to plots of Method 3 (4). This is not casual because we use the same formula to compute their asymptotic M SEs. From Figures 1–4, we observe that another common behavior for Method 2 and Method 4 is that they performance badly near the boundaries, i.e. when p is close to 0 or 1.

In a word, the kernel quantile estimators, wit no matter which bandwidth selec-tion method, are more efficient than the empirical quantile estimator in most situa-tions. And when sample size n is relatively small, say n=50, they are significantly more efficient than the empirical quantile estimator. But no one single method is most efficient in any situations.

References

[1] Azzalini, A. (1981). A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68, 326-328.

[2] Faulk, M. (1984). Relative deficiency of kernel type estimators of quantiles, Ann. Stat., 12, 261-268.

[3] Jones, M. C. (1992). Estimating densities, quantile, quantile densities and density quantiles. Ann. Inst. Statist. Math., 44, 721-727

[4] Nadaraya, E. A. (1964). Some new estimates for distribution functions. Theory Probab. Appl., 9, 497-500.

[5] Parzen, E. (1979). Nonparametric statistical data modeling. J. Amer. Stat. Assoc., 74, 105-131.

(15)

[6] Read, R.R. (1972). The asymptotic inadmissibility of the sample distribution function. Ann. Math. Statist., 43, 89-95.

[7] Ralescu, S. S. and Sun, S. (1993). Necessary and sufficient conditions for the asymptotic normality of perturbed sample quantiles. J. Statist. Plann. Infer-ence, 35 55-64.

[8] Shankar, B. (1998). An optimal choice of bandwidth for perturbed sample quantiles, master thesis.

[9] Sheather, S. J. and Marron, J. S. (1990). Kernel quantile estimators. J. Amer. Statist. Assoc., 85, 410-416.

[10] Wand, M. P. and Jones, M. C. (1995). Kernel smoothing. Chapman and Hall, London.

[11] Yamato, H. (1973). Uniform convergence of an estimator of a distribution function. Bull. Math. Statist., 15, 69–78.

[12] Yang, S. S. (1985). A smooth nonparametric estimation of a quantile function. J. Amer. Stat. Assoc., 80, 1004-1011.

(16)

Appendix

We now provide the proof for AM SE in Remark 2.2. Here we follow the notation of Faulk (1984). Since F−10(q) = Q0(q) = 0, we have

V ar_˜ Qn(p) = n−1R1 0{R k(x)(q − αnx − 1(0,q−αnx)(y))F −10_{(q − α} nx)dx}2dy = n−1R₀1{R k(x)(q − αnx − 1(0,q−αnx)(y))[F −10_{(q) − α} nxF−100(q) + O(α2n)]dx}2dy = n−1R1 0{R k(x)(q − αnx − 1(0,q−αnx)(y))(−αnxF −100_(q))dx}2_{dy + O(n}−1_α2 n) = bR₀1{R k(x)(q − αnx − 1(0,q−αnx)(y))xdx} 2_{dy + O(n}−1_α2 n) = bR1 0{qR xk(x)dx − αnR x 2_{k(x)dx −}_{R xk(x)1} (0,q−αnx)(y))dx} 2_{dy + O(n}−1_α2 n) = bR₀1[αnµ2(k) +R xk(x)1(0,q−αnx)(y))dx] 2_{dy + O(n}−1_α2 n) = bR1 0{α 2 nµ22(k) + 2αnµ2(k)R xk(x)1(0,q−αnx)(y))dx +[R xk(x)1(0,q−αnx)(y))dx] 2_{}dy + O(n}−1_α2 n) = bα2 nµ22(k) + 2cαnµ2(k) R1 0 R xk(x)1(0,q−αnx)(y))dxdy +bR₀1[R xk(x)1(0,q−αnx)(y))dx] 2_{}dy + O(n}−1_α2 n) 4 = bα2_nµ2₂(k) + 2bαnµ2(k)S1+ bS2+ O(n−1α2n)

(17)

where b = n−1α2 nF −100_(q)2_{. But} S1 = R1 0 R xk(x)1(0,q−αnx)(y))dxdy = R xk(x) R₀11(0,q−αnx)(y))dydx = R xk(x)(q − αnx)dx = qR xk(x)dx − αnR x2k(x)dx = −αnµ2(k) and S2 = R1 0[R xk(x)1(0,q−αnx)(y))dx] 2_}dy = R₀1[R q−y αn q−1 αn xk(x)dx]2dy = {y[R q−y αn q−1 αn xk(x)dx]2}|1 0− R1 0 yd{[ R q−y αn q−1 αn xk(x)dx]2} = −2R₀1{y[R q−y αn q−1 αn xk(x)dx]q−y_α n k( q−y αn )(− 1 αn)}dy = _α2 n R1 0{y q−y αn k( q−y αn )[ R q−y αn q−1 αn xk(x)dx]}dy = _α2 n R q−1 αn q αn {(q − αnt)tk(t)[ Rt q−1 αn xk(x)dx]}d(−αnt) = 2R q αn q−1 αn {(q − αnt)tk(t)[ Rt q−1 αn xk(x)dx]}dt = 2R q αn q−1 αn (q − αnt)tk(t)j(t)dt where j(t)=4 Rt

−cxk(x)dx and c is such that k is finitely supported in [−c, c]. Then

V arQ˜n(p) = −n−1α2nF −100

(q)2α2_nµ2₂(k)+2n−1α2_nF−100(q)2 Z

(18)

If we replace αn by h and F−100(q) by Q00(q), then V arQ˜n(p) = −n−1h4Q00(q)2µ22(k) + 2n −1 h2Q00(q)2 Z (q − ht)tk(t)j(t)dt + O(n−1h2). But the bias of ˜Qn(p) is

bias = 1 2h

2

µ2(k)Q00(q) + O(h2) + O(n−1).

Hence the MSE of ˜Qn(p) is

M SEQ˜n(p) = h₄4µ2 2(k)Q 00_(q)2_{+ O(h}4_{) + O(n}−1_h2_{) − n}−1_h4_Q00_(q)2_µ2 2(k) +2n−1h2Q00(q)2R (q − ht)tk(t)j(t)dt + O(n−1h2). That is AM SEQ˜n(p) = ( 1 4− 1 n)h 4 Q00(q)2µ2₂(k) + 2n−1h2Q00(q)2 Z (q − ht)tk(t)j(t)dt.

(19)

Figure 1: Efficiency under double exponential. Different panels correspond to different methods.

(20)

Figure 2: Efficiency under exponential. Different panels correspond to different methods.

(21)

Figure 3: Efficiency under Log Normal. Different panels correspond to different methods.

(22)

Figure 4: Efficiency under standard Normal. Different panels correspond to different methods.

(23)

Figure 5: Efficiency under double exponential. Different panels correspond to different sample sizes.

(24)

Figure 6: Efficiency under exponential. Different panels correspond to different sample sizes.

(25)

Figure 7: Efficiency under Log Normal. Different panels correspond to different sample sizes.

(26)

Figure 8: Efficiency under standard Normal. Different panels correspond to different sample sizes.