• 沒有找到結果。

MODE TESTING IN DIFFICULT CASES

N/A
N/A
Protected

Academic year: 2022

Share "MODE TESTING IN DIFFICULT CASES"

Copied!
27
0
0

加載中.... (立即查看全文)

全文

(1)

Annals of Statistics,

27

(1999), 1294{1315.

MODE TESTING IN DIFFICULT CASES

Ming-Yen Cheng, Peter Hall

Centre for Mathematics and its Applications, Australian National University

ABSTRACT

. Usually, when testing the null hypothesis that a distribution has one mode, against the alternative that it has two, the null hypothesis is interpreted as entailing that the density of the sampling distribution has a unique point of zero slope, which is a local maximum. In this paper we argue that a more appropriate null hypothesis is that the density has two points of zero slope, of which one is a local maximum and the other is a shoulder. We show that when a test for a mode- with-shoulder is properly calibrated, so that it has asymptotically correct level, it is generally conservative when applied to the case of a mode without a shoulder.

We suggest methods for calibrating both the bandwidth and dip/excess mass tests in the setting of a mode with a shoulder. We also provide evidence in support of the converse: a test calibrated for a single mode without a shoulder tends to be anticonservative when applied to a mode with a shoulder. The calibration method involves resampling from a `template' density with exactly one mode and one shoul- der. It exploits the following asymptotic factorisation property for both the sample and resample forms of the test statistic: all dependence of these quantities on the sampling distribution cancels asymptotically from their ratio. In contrast to other approaches, the method has very good adaptivity properties.

KEYWORDS

. Bandwidth, bootstrap, calibration, curve estimation, level accu- racy, local maximum, shoulder, smoothing, turning point.

SHORT TITLE

. Mode testing

AMS SUBJECT CLASSIFICATION

. Primary 62G07, Secondary 62G09.

(2)

1. INTRODUCTION

Testing for modality is one way of nding evidence of sub-populations in the population from which data are drawn. Early tests were often based on parametric mixture models (e.g. Cox 1966), but during the last two decades several nonpara- metric methods have been developed. They are generally conservative, however, and increasing interest is being shown in ways of calibrating them so that their levels are closer to those prescribed. Heuristically, it is to be expected that improving the level accuracy of a conservative test would lead to increased power.

It is usually necessary to have at least an approximate model for densities f representing the \null hypothesis" that is being tested, since we need to calibrate the test under the null. For example, in the case of testing for unimodality against the alternative of multimodality, the null hypothesis is generally thatf has one local maximum, no local minima, and no places of zero gradient that do not correspond to turning points. We shall call this the \classic null hypothesis", H0class it is tested against the alternative,H1, that f has two or more modes.

Such alternative hypotheses are generally relatively easy to distinguish from the null, however. We argue that a test of modality will have better performance if it works well against distributions that are `marginal', or `most dicult' to tell apart from the null | this is the sense in which we use the term `dicult' in our paper. The dicult cases are densities that represent the boundary between one and two modes | that is, those where f has one local maximum, no local minima, and exactly one point x for which f0(x) = 0 but x is a shoulder point (dened by f00(x) = 0 andf000(x)6= 0) rather than a local maximum or local minimum. We term this the `boundary null hypothesis', H0bound. The issue of which null hypothesis is employed determines the type of theory which best describes properties of tests for modality, and aects the tests' level accuracy and power.

Figure 1.1 illustrates some of these issues. Panels (a) and (c) depict densities that are unimodal and bimodal, satisfying H0class and H1 respectively and panel (b) shows a \shoulder" density which in a sense is midway between the other two, and satisesH0bound. Intuitively, when an empirical test nds it hard to distinguish between panels (a) and (c), the problem really arises because the test can't solve the more dicult problem of deciding between panels (b) and (c). To optimise

(3)

performance in these dicult cases the test should be constructed so that it addresses the harder problem, not the easier one.

 Put Figure 1.1 about here, please ]

It is helpful to consider the related, parametric problem of testing composite, one-sided hypotheses, of the form  0 versus > 0, where denotes a scalar parameter. There it is common to construct rst a test of the simple null hypothesis,

= 0, against the alternative hypothesis > 0, and then use the same test in the case of the composite one-sided null hypothesis. When the likelihood ratio is monotone, this approach is optimal and gives uniformly most powerful tests see Kendall and Stuart (1979, Chapter 23). The null hypothesis = 0 is more dicult than < 0 to distinguish from > 0, and the optimal approach is to construct the test in the more dicult case.

In the context of the mode testing problem,H0bound represents the simple null hypothesis = 0 at the boundary, andH0class plays the role of the null hypothesis

< 0. Following the line suggested in the previous paragraph, we argue that the test should be developed for the more dicult null hypothesis,H0bound. Section 2.4 establishes that, analogously to the conclusions reached in the previous paragraph for the parametric case, our test is also appropriate forH0class Figure 3.3 indicates the conservatism of a test ofH0bound when applied toH0class and Figure 3.4 illustrates the anticonservatism of a test for H0class when applied to H0bound.

In this paper we suggest methods, and develop theory, pertaining to this view of testing for modality. We employ two particular tests as examples, the bandwidth test of Silverman (1981) and the dip/excess mass test of Hartigan and Hartigan (1985) and Muller and Sawitzki (1991). Both involve rejecting the null hypothesis if the test statistic exceeds a certain critical point. For either test we discuss a bootstrap calibration method that produces the asymptotically correct level under H0bound, and is slightly conservative under H0class. Related methods, inspired by work of Hartigan (1997), will also be noted. Importantly, the level of the test under H0class does not converge to zero as sample size increases, and so the bootstrap procedure is relatively adaptive to both null hypotheses. In comparison, alternative methods for calibrating tests of H0bound have a level which converges to zero under H0class. Our theoretical description of mode testing under the boundary null hypothesis

(4)

is in contradistinction to existing accounts in the literature, which seem always to assume the classic null hypothesis. Examples include Silverman (1983), Mammen, Marron and Fisher (1992) and Cheng and Hall (1998). The results in the two cases are quite dierent, with respect to order of magnitude as well as asymptotic distribution. For example, under H0class the critical value for the bandwidth test is of size n;1=5, where n is the number of data values (Mammen, Marron and Fisher 1992) but under H0bound it is of size n;1=7. The analogues for critical points in the case of the dip/excess mass tests are n;3=5 and n;4=7, respectively. The limiting distributions in the four cases are all dierent and non-Normal. These facts alone demonstrate that calibration methods developed specically for H0class can be inappropriate for H0bound, and so can suer problems when H0class is only \just true", unless they have the adaptivity property noted in the previous paragraph.

Specically, suppose H0class is true, but only just true (that is, H0bound is

\almost" true) and the test is constructed so as to reject the null hypothesis when the test statistic exceeds a critical point whose asymptotic size is appropriate to H0class. (Therefore, the critical point is of size n;1=5 if the bandwidth test is used, and of size n;3=5 for the excess mass test.) Then the test will tend to incorrectly reject the null hypothesis, for the simple reason that n;1=5 < n;1=7 and n;3=5 <

n;4=7. Our adaptive tests based on bootstrap calibration does not suer from this problem.

Because of the light which these theoretical results shed on the importance of distinguishing between the two types of null hypothesis, we shall discuss our theo- retical work rst, in Section 2. Section 3 will summarise the results of a simulation study that assesses the performance of our adaptive tests. Section 2.1 will describe alternative, non-adaptive approaches. Technical arguments for Section 2 will be placed into Section 4. For simplicity we shall consider only the case of testing for unimodality. There is no technical diculty in stating and deriving analogues of our theory for testing the hypothesis of m modes against that of m+ 1 modes, where m  1, although notation becomes rather complex in that case. The versions of our adaptive tests in that general setting seem prohibitively complex, however. In this multimodal setting, recent work of Hartigan (1997) is particularly deserving of mention. There, a novel sequential (in m) approach to using the excess mass test is suggested.

(5)

2. THEORETICAL PROPERTIES OF TEST STATISTICS

2.1. Summary and conclusions. The bandwidth test, which will be introduced and discussed in Section 2.2, involves rejecting the null hypothesis if a critical bandwidth,

^hcrit, is too large and the dip/excess mass test, to be described in Section 2.3, rejects the null hypothesis if a test statistic ! is too large. When the sampling densityf satises the null hypothesisH0class, and appropriate regularity conditions hold, n1=5^hcrit has a proper limiting distribution that may be written as that of a random variable C1R1, where the nonzero constant C1 depends only on f, and the distribution of the random variableR1 does not depend onf. See Mammen, Marron and Fisher (1992). By way of contrast, we shall point out in Section 2.2 that under H0bound and appropriate conditions on f, n1=7h^crit !C2R2 in distribution, where (here and below) Cj and Rj have the properties ascribed to C1 andR1 above.

Analogous results hold for the dip/excess mass test, where, under H0class and regularity conditions onf,n3=5!!C3R3in distribution (see Cheng and Hall 1998)

and, under H0bound and regularity conditions, n4=7! ! C4R4 in distribution (see Section 2.3).

The formulae for C1:::C4 are very dierent from one another, as too are the distributions of R1:::R4. However, in each case the principle is the same:

the distribution of the test statistic factorises, asymptotically, into a constant that depends only onf and a random variable whose distribution is continuous and is in principle known. Note particularly that even the order of magnitude of the critical points, let alone the constants Cj and the random variables Rj, depends not only on the type of test but also on the particular form of null hypothesis that is chosen.

For both the bandwidth and dip/excess mass tests, the factorisation property may be exploited to construct a test that adapts itself well to either H0class or H0bound. It amounts to computing the ratio of the test statistic (either ^hcrit or !) and its bootstrap form and rejecting the null hypothesis if the bootstrap distribution of the ratio assumes values that are too large. On account of the factorisation, the unknown constants Cj cancel from the ratio in all four cases, and so the bootstrap distribution function of the ratio (a stochastic process) does not depend asymptoti- cally on any unknowns. Unlike the case of more standard statistical problems (such as percentile-t statistics) where scale parameters cancel, the bootstrap versions of

(6)

the distributions of variablesRj are not particularly close to those of the respective Rj's, and so the stochastic process noted just above is not degenerate. Nevertheless, its properties may be determined by Monte Carlo methods, and after suitable cali- bration it has asymptotically correct level under both H0bound and H0class. Adap- tive tests will be introduced in Sections 2.2 (for the bandwidth method) and 2.3 (dip/excess mass method), and Section 2.4 will discuss their properties.

An alternative way to proceed would be to directly estimate that one of the unknown constantsC1:::C4 which is appropriate to the context (e.g.C1 if we were using the excess mass test underH0class), use Monte Carlo methods to calculate the distribution of the respective variable Rj, and thereby approximate the asymptotic distribution of the test statistic under the null hypothesis. If the bootstrap method described in the previous paragraph is likened to Studentizing so to cancel the eects of scale, then this approach is similar to using standard asymptotic approximations after \plugging in" an estimate of scale. However, by its very construction the latter approach is highly sensitive to choice of null hypothesis, be it H0class or H0bound, and in particular it does not enjoy the adaptivity of the bootstrap approach. If it is constructed so that it gives an asymptotically correct test underH0classrespectively, H0bound], then the level of the test under H0bound or H0class] will be 0 or 1].

Moreover, even if these problems are overcome, it is likely that the bootstrap ap- proach captures at least some of the rst-order features of the distribution of the test statistic that a purely asymptotic method misses. In the context of bootstrap versus asymptotic approximations to critical points for Silverman's (1981) bandwidth test, York (1998) has demonstrated this numerically. The bootstrap approach, through taking the resample size equal to the sample size, n, oers a signicantly better approximation than does taking n=1, even if the template density is not the true density.

2.2. Bandwidth test. To introduce the test, letX =fX1:::Xngdenote a random sample drawn from a distribution with unknown densityf, and construct the kernel estimator

f^h(x) = (nh);1 Xn

i=1 K x;Xi h

 (2:1)

where h is a bandwidth and K a kernel function. As in Silverman (1981) we take K to be the standard Normal density, for which the number of modes of ^fh on the

(7)

whole line is a nonincreasing function of h. Furthermore, ^fh is unimodal for all suciently large h. Let ^hcrit denote the inmum of bandwidths such that ^fh has only one mode. A test of the null hypothesis of unimodality consists of rejecting unimodality if ^hcrit is too large.

Mammen, Marron and Fisher (1992) proved that under H0class, and assuming appropriate regularity conditions on f, ^hcrit is of size n;1=5. We show next that it is of size n;1=7 under H0bound. First we state an analogue of Mammen, Marron and Fisher's (1992) regularity conditions (corresponding also to the conditions of Silverman (1983)) in the case of H0bound:

f is supported on a compact interval ab], and has two deriv- atives there f0 = 0 at distinct points x0x1 2(ab), and f0 6= 0 at all other points in (ab) f has respectively two and three Holder-continuous derivatives in neighbourhoods of x0 and x1

f00(x0)<0 f00(x1) = 0 f000(x1)6= 0 f0(a+)>0 f0(b;)<0: (2:2) For 0< r <1 and ;1< s <1, dene

Z(rs) =r;4Z K00(s+u)W(ru)du+ 12(1 +s2)

where W is a standard Wiener process. Put C2 = ff(x1)=jf000(x1)j2g1=7, where x1 is the shoulder point noted in (2.2), and let R2 denote the inmum of all values of r such that the function Z(r) does not change sign on (;11). (In view of total positivity properties of K00 (see Schoenberg, 1950), if Z(r) does not change sign on (;11) then, with probability 1, neither does Z(r0) for any r0 > r.)

Theorem 2.1.

Assume condition (2:2). Then n1=7^hcrit !C2R2 in distribution as n!1.

We should comment on the nature of condition (2.2), which asks thatf decrease linearly to zero at the ends of its support. This ensures that the likelihood of spurious bumps in the tails of the density estimator ^fh is very small. Therefore, the size of

^hcrit is determined by properties of f at points of zero slope interior to (ab). More generally, when f might not satisfy (2.2), one would either conne attention to testing for unimodality away from the tails, or use larger bandwidths in the tails so as to suppress bumps that arise from data sparseness.

(8)

Next we dene the bootstrap version of ^hcrit, and show that it satises a limit law similar to that in Theorem 2.1. Conditional on X, let X = fX1:::Xng denote a resample drawn randomly, with replacement, from the distribution with density ^fcrit = ^f^hcrit, and dene ^fh by (2.1) except that Xi there is replaced by Xi. Write ^hcrit for the inmum of bandwidths such that ^fh is unimodal.

Our proof of Theorem 2.1 in Section 4 will involve constructing W (depending on n) such that

n1=7^hcrit !C2R2 in probability: (2:3) For this W, let W be a standard Wiener process independent of W, and let S be the unique point at which Z(R2) vanishes. Dene

Z(rs) = (rR2);2Z K00(s+u)W(ru)du+Z Z(R2S;R;12 ru)K(u)du

and let R2 denote the inmum of all values of r such that the function Z(r) does not change sign on (;11). It is straightforward to prove that R2 is strictly positive with probability 1.

Theorem 2.2.

Assume condition (2:2), and that W is constructed so that (2.3) holds. Then,

sup

0x<1



P;n1=7^hcrit C2xX);P(R2 xjW)!0 in probability as n!1.

Theorem 2.2 and (2.3) together imply that, under H0bound, sup

0x<1



P;^hcrit^hcrit xX;P(R2=R2 xjW)!0 (2:4) in probability. It follows that the distribution of the stochastic process Gb(x) = P(R2=R2  xjW) does not depend on f, which makes it possible to develop an asymptotically correct test of H0bound. This could be based on tabulation of the distribution of Gb, and applying an asymptotic test, but alternatively it may be accomplished by Monte Carlo methods, as follows. Put Gbn(x) = P(^hcrit=^hcrit  xjX), let f0 denote a \template" density with a shoulder, and let Gb0n denote the version ofGbn that results from an n{sample drawn randomly fromf0. Using Monte Carlo methods we may compute to arbitrary accuracy the value of a constant t =

(9)

t(n) such that PfGb0n(t)1;g=, where is the desired signicance level of the test. Then, the test with the form: reject H0bound in favour of H1 if Gbn(t) 1;, has asymptotically correct level under H0bound.

One would expect the template approach to capture second-order eects better than a purely asymptotic argument. This may be conrmed by simulation. To cap- ture second-order eects even more accurately one could use a skewed template (for example) if there was evidence that the sampling distribution was skewed, although it is dicult to ensure both the right degree of skewness and the right value of C2. 2.3. Dip/excess mass test. It suces to consider the excess mass test statistic, !, which equals twice the dip test statistic. LetFbbe the empirical distribution function of the n{sample X introduced in Section 2.2, and for m1 and  > 0 dene

Enm() = supC

1::: Cm

m

X

j=1



Fb(Cj);kCjk

where the supremum is over disjoint intervals C1:::Cm, Fb(C) is the Fb{measure of C, and kCk equals the length of C. Put Dnm() = Enm();En m;1() and

! = sup Dn2(). We reject the null hypothesis of unimodality if ! is too large.

Cheng and Hall (1998) established that under H0class, ! is of size n;3=5. We show next that under H0bound it is of size n;4=7, for which purpose we augment (2.2) by the condition:

f0 is Holder-continuous within a neighbourhood of

the unique point x2 6=x1 satisfyingf(x2) =f(x1). (2:5) Let W be as in Section 2.2, and dene C4 = ff(x1)4=jf000(x1)jg1=7, !(t1t2u) =

fW(t1);W(t2)g;(t42;t41);u(t2;t1) and R4 = 241=7 sup

;1<u<1



sup

;1<t1<t2<t3<1

!(0t1u) + !(t2t3u)

; sup

;1<t1<1 !(0t1u)

: (2:6)

It may be proved that R4 is nite and positive with probability one, and that its distribution has no atoms.

Theorem 2.3.

Assume conditions (2:2) and (2:5). Then n4=7!!C4R4 in distri- bution as n!1.

(10)

The bootstrap setting for Theorem 2.3 is similar to that for Theorem 2.1. Let ! be the bootstrap version of !, computed using the resample X drawn by sampling from the distribution with density ^fcrit. For a suitable construction of W, Theorem 2.3 may be stated in the stronger sense that n4=7! ! C4R4 in probability. We assume this construction below. LetW be another Wiener process, independent of W dene

U(rs) =r;4 Z K00(s+u)W(ru)du

let R denote the inmum of all r > 0 such that U(rs) + 12(1 +s2), as a function of s, does not change sign on the real line and let S be the unique point at which U(Rs) + 12 (1 +s2) vanishes. Put

"(y1y2u) =W(y1);W(y2)

;R2 Z 1

0

thy22URS+R;1(1;t)y2

;y21URS+R;1(1;t)y1idt

; 1

2

;1 +S2;y22;y21; 16RS;y32;y13

; 1

24

;y24;y14;u(y2;y1)

and, with "=241=7 replacing !, deneR4by (2.6). With probability one,R4 is nite and positive, and its distribution has no atoms.

Theorem 2.4.

Assume conditions (2:2) and (2:5), and that W is constructed so that n4=7!!C4R4 in probability. Then,

sup

0x<1



P;n4=7! C4xX;P(R4xjW)!0 in probability as n!1.

Theorem 2.4 is directly analogous to Theorem 2.2, and implies the obvious analogue of (2.4):

sup

0x<1



P;!!xX;P(R4=R4 xjW)!0 (2:7) Therefore, bootstrap calibration applied to the ratio !=! produces tests ofH0class with asymptotically correct level. Specically, if f0 is the template density intro- duced in Section 2.2, if Hbn(x) =P(!=!xjX), if Hb0n is the version of Hbn when

(11)

the n-sample is drawn from f0 rather than f, and if the constant u is dened by PfHb0n(u)  1;g = , then the test which rejects H0bound if Hbn(u)  1; has asymptotically correct level under H0bound.

Hartigan (1997) has suggested an asymptotic test based on the results in The- orem 2.4, normalising the test statistic using the square root of the number of data values interior to the shoulder segment. If one calibrates via the asymptotic dis- tribution then this ingenious approach avoids using the template density. In order to better capture second-order eects, however, one could compute the template density and then, simulating from that distribution (taking the Monte Carlo sample size equal to the actual sample size), compute an approximation to the distribution of the test statistic under the null hypothesis.

2.4. Adaptivity of bootstrap calibration methods. The factorisation which forms the basis for our bootstrap calibration method is also valid underH0class, where instead of (2.4) and (2.7) it produces results of the form:

sup

0x<1



P;^hcrit^hcrit xX;P(R1=R1 xjW)!0 (2:8) sup

0x<1



P;!!xX;P(R3=R3 xjW)!0: (2:9) A suitable regularity condition for each of these results is the following version of (2.2), where the shoulder point x1 is no longer permitted, thereby ensuring that H0class (rather than H0bound) obtains:

f is supported on a compact interval ab], and has two der- ivatives there f0 = 0 at x0 2(ab), and f0 6= 0 at all other points in (ab) f has two Holder-continuous derivatives in a neighbourhood of x0 f00(x0)<0, f0(a+)>0, f0(b;)<0.

Result (2.8) is discussed in an ANU PhD thesis by M. York (1998), and (2.9) appears in Cheng and Hall (1996). As in the case of R2 andR4, the variablesR1 andR3 are functionals of a standard Weiner process W R1 and R3 are functionals of W and an independent Wiener process W and all variables Rj and Rj have continuous distributions. It follows from (2.8) and (2.9) that ifH0classholds instead ofH0bound, yet we apply the bootstrap test suggested when H0bound is valid, the asymptotic level of the test lies strictly between 0 and 1. In this sense, the tests suggested in Sections 2.2 and 2.3 are adaptive other approaches to calibration, such as that

(12)

discussed towards the end of Section 2.1, do not enjoy this property. Moreover, bootstrap calibration under H0bound turns out to be conservative when H0class is true, as we shall show in the next section.

3. NUMERICAL STUDY

The bandwidth and dip/excess mass tests for H0bound were applied to three Normal mixture densities: the two unimodal-with-shoulder densities given by

8e9=8;1 + 8e9=8;1N(01) +;1 + 8e9=8;1 N(;9p3=80:0625) (3:1) (100=109)N(01) + (9=109)N(1:30:09) (3:2) and illustrated in panels (a) and (b), respectively, of Figure 3.1 and the unimodal- without-shoulder standard Normal density, depicted in panel (d) of that gure. In all cases the bandwidth and dip/excess mass tests forH0bound were calibrated using the methods suggested in Sections 2.2 and 2.3. The template density f0 employed for calibration was taken as

(16=17)N(01) + (1=17)N(;1:250:0625) (3:3) and is unimodal with a shoulder. It is illustrated in panel (c) of Figure 3.1.

 Put Figure 3.1 about here, please ]

The sample sizes used were 50 and 100. In each setting, 500 samples were simulated and conditional on each of these, 500 resamples were drawn. Then, all the required conditional and unconditional probabilities were approximated by their corresponding empirical values. To obtain values of ^hcrit and ^hcrit, kernel density estimates were computed over an equally-spaced grid of 512 points. To avoid problems arising from data sparseness in the tails, only modes that occurred within 1:5 standard deviations of the mean were counted. The same rule was followed when evaluating the dip/excess mass statistics.

Figure 3.2 illustrates the actual versus nominal levels when the two tests for H0bound (calibrated using the density at (3.3) as the template) were applied to data generated from the two shoulder-densities given by (3.1) and (3.2), respectively.

Note that the actual versus nominal curves are close to the diagonal line, especially in the cases illustrated by panels (b), (c) and (d). This indicates that both tests

(13)

have accurate levels. The gure also suggests that, overall, the excess mass test has better level accuracy than the bandwidth test.

 Put Figure 3.2 about here, please ]

Figure 3.3 depicts, for both the bandwidth and dip/excess mass tests, the actual versus nominal levels when the true density is standard Normal and the shoulder densityf0 is used to provide calibration. Note particularly that all the curves always lie below the diagonal line, illustrating the conservatism of a method calibrated for H0bound when it is applied to test H0class.

 Put Figure 3.3 about here, please ]

Figure 3.4 is essentially the obverse of Figure 3.3: in the latter, the sampling density was standard Normal, and we calibrated using f0, but in Figure 3.4 the sampling density is f0 and we calibrate using the standard Normal. The fact that the dashed and dotted lines in both panels of Figure 3.4 lie above the diagonal line demonstrates that, as expected, calibrating a test of H0bound using a template for H0class results in an anticonservative procedure.

 Put Figure 3.4 about here, please ]

4. TECHNICAL ARGUMENTS

4.1. Proof of Theorem 2.1. Let  = n;1=7 and write CR for C2R2, respectively.

We shall prove that

there exist 1 2 >0 such that, if ^hcrit = ^hcrit( 1 2) is re-dened to be the supremum of the set H of values hn;(1=7)+1 such that f^(jh) has at least one turning point in I( 2) = (x1;n2x1+n2) then with probability tending to one, H is nonempty and n1=7^hcrit

has the claimed limit distribution: (4:1)

Arguments similar to those of Mammen, Marron and Fisher (1992) may be employed to prove that (a) for each 1 2(01=7), the probability that for someh n;(1=7)+1 the function ^f(jh1) has more than one turning point in IR converges to 0, (b) for each c >0 and 2 >0, the probability that for some h > cn;1=7 the function ^f(jh) has more than one turning point in IRnI( 2) converges to 0, and (c) with probability 1, ^f(jh) has at least one turning point in I( 2) for each h < ^hcrit. The theorem follows from (4.1) and (a){(c).

(14)

The embedding of Koml#os, Major and Tusn#ady (1975) ensures the existence of a standard Wiener processW1 such that, withW0(t) =W1(t);tW1(1), the empirical distribution function Fb of X may be written as Fb(x) = F(x) +n;1=2W0fF(x)g+ Op(n;1 logn) uniformly inx. It follows that

f^0(xjh) ;Ef^0(xjh) =;;n1=2h2;1Z W1fF(x;hz)g;W1fF(x1)g K00(z)dz +Op(nh2);1 logn

uniformly in ;1< x <1 and h >0. Writing x=x1+y and h=r1, and using standard results on the modulus of continuity of a Wiener process, we deduce that if 1 2 >0 are suciently small then for some 3 >0,

f^0(x1+yjr1);Ef^0(x1+yjr1)

=;;n1=22r21;1Z W1fF(x1) +(y;r1z)f(x1)g

;W1fF(x1)g K00(z)dz+Op;2n;3r1;2

uniformly in 0 < r1 const:n1 and jyjconst:n1, for all values of the constants.

Therefore, dening

W2(t) =;ff(x1)g;1=2W1fF(x1) + f(x1)tg;W1fF(x1)g  we nd that, uniformly in the same values of r1 and y,

;2r12f^0(x1 +yjr1);Ef^0(x1+yjr1)

=f(x1)1=2Z W2(y;r1z)K00(z)dz+Op;n;3: (4:2) Using the fact that f00 is Holder continuous in a neighbourhood of x1 we see that, for 1 2 3 >0 chosen suciently small,

Ef^0(x1+yjr1) =Z f0fx1+(y;r1z)gK(z)dz

= 122;y2+r12f000(x1) +O2;y2+r12n;3 (4:3) uniformly in 0 < r1  const:n1 and jyj const:n2. Combining (4.2) and (4.3) we deduce that

f^0(x1+yjr1) =2hr1;2f(x1)1=2

Z

W2(y;r1z)K00(z)dz

+ 12 ;y2+r12f000(x1) +Op ;r;21 +y2+r12n;3i (4:4)

(15)

uniformly in 0< r1 const:n1 and jyjconst:n2.

Let T = sgnff000(x1)g, C =ff(x1)=jf000(x1)j2g1=7, C0 = ff(x1)2jf000(x1)j3g1=7, y = Crs, r1 = Cr and W2(Ct) = C1=2TW(;t). Then W is a standard Wiener process, and (4.4) implies that for dierent values of 1 2 3 >0, chosen suciently small,

f^0(x1+CrsjCr) =2C0T r;2Z Wfr(z;s)gK00(z)dz

+ 12r2(1 +s2) +Opfr;2+r2(1 +s2)gn;3]

=2C0Tr2Z(rs) +Op;r;4+ 1 +s2n;3  (4:5) uniformly in 0 < r  const:n1 and jyj  const:n2. Result (4.1) follows from this formula.

4.2. Proof of Theorem 2.2. We give the proof only in outline, noting the analogues of steps in the proof of Theorem 2.1 and not pausing to give detailed bounds for remainder terms. In the derivation of Theorem 2.1 we should replace ( ^f(jh)f) by ( ^f(jh)f^crit). Let ^x1 denote the shoulder of ^fcrit. (Thus, ^fcrit0 (^x1) = ^fcrit00 (^x1) = 0.) In place of (4.2) we have, conditional on X and for a standard Wiener process W2 independent of W,

;2r21f^0(^x1+yjr1);Ef^0(^x1 +yjr1)jX

=f(x1)1=2

Z

W2(y;r1z)K00(z)dz+op(1): (4:6) By (4.5) and since ^hcrit ;CR = op() we have, in notation from the proof of Theorem 2.1,

f^crit0 (x1+CRs) = ^fcrit0 (x1+CRsj^hcrit) =2C0TR2Z(Rs) +op;2: Furthermore, ^x1;(x1+CRS) =op(), and so

Eff^0(^x1+yjr1)jXg

=

Z f^crit0 fx^1+(y;r1z)gK(z)dz

=2C0TR2Z ZR(CR);1(^x1;x1) + (CR);1(;r1z)K(z)dz+op;2

=2C0TR2Z ZRS+ (CR);1(y;r1z)K(z)dz+op;2: (4:7)

(16)

Combining (4.6) and (4.7) we deduce that f^0(^x1+yjr1) =2hr;21 f(x1)1=2

Z

W2(y;r1z)K00(z)dz

+C0TR2Z ZRS+ (CR);1(y;r1z)K(z)dzi

+op;2: (4:8)

Making the changes of variable y=Crs, r1 =Cr and W2(Ct) =C1=2W(;t), the right-hand side of (4.8) becomes

C0TR22Z(rs) +op;2: The theorem follows from this approximation.

4.3. Proof of Theorem 2.3. Let a = f(x1) and b = 241 jf000(x1)j. Given 0 1 2 (0min(a1=7)), dene J1 = (0a; 0], J2 = (a; 0a;n;(3=7)+31] and J3 = (a;n;(3=7)+311). Arguing as in the proof of Theorem 2 of Muller and Sawitzki (1991) we may show that

sup2J1 Dn2() =Opf(n;1 logn)2=3g sup

2J

2

Dn2() =Op;n;(4=7);(1=5): Therefore,

2Jsup1J2 Dn2() =op;n;4=7: (4:9) We prove the theorem in the case f000(x1) > 0. The case f000(x1) < 0 may be treated similarly. Since f000(x1) > 0 and condition (4.2) holds, x1 < x0 and there exists a point x2 such that x0 < x2f(x2) =f(x1) and f0(x2) <0. Let  = n;1=7, = n;3 with 3  1=7, I0 = (x1;n1x1+n1), I1 = (x2;n1x2+n1) and I2 = (;n1n1). Given t1:::t3 2I0, put yj = (tj;x1)= 2I2j = 1:::3.

Let sup(1):::sup(7) denote suprema over, respectively, (1) ;1 < t1 < t2 < 1, (2) t1 2 I0t2 2 I1 such that t1 < t2, (3) y1 2 I2, (4) ;1 < t1 < ::: < t4 < 1, (5) t1:::t3 2 I0t4 2 I1 such that t1 < ::: < t4, (6) t1 2 I0t2:::t4 2 I1 such that t1 < ::: < t4, and (7) y1:::y3 2 I2 such that y1 < ::: < y3. Write

=a;b 3, where ;1< <1. Given a standard Wiener process W1, dene W(y) = (a);1=2W1fF(x1) +ayg;W1fF(x1)g 

參考文獻

相關文件

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

(c) If the minimum energy required to ionize a hydrogen atom in the ground state is E, express the minimum momentum p of a photon for ionizing such a hydrogen atom in terms of E

Then, it is easy to see that there are 9 problems for which the iterative numbers of the algorithm using ψ α,θ,p in the case of θ = 1 and p = 3 are less than the one of the

According to the Heisenberg uncertainty principle, if the observed region has size L, an estimate of an individual Fourier mode with wavevector q will be a weighted average of

a) Excess charge in a conductor always moves to the surface of the conductor. b) Flux is always perpendicular to the surface. c) If it was not perpendicular, then charges on

We cannot exclude the presence of the SM Higgs boson below 127 GeV/c 2 because of a modest excess of events in the region. between 115 and 127

3.1(c) again which leads to a contradiction to the level sets assumption. 3.10]) which indicates that the condition A on F may be the weakest assumption to guarantee bounded level

This kind of algorithm has also been a powerful tool for solving many other optimization problems, including symmetric cone complementarity problems [15, 16, 20–22], symmetric