Annals of Statistics,

### 27

(1999), 1294{1315.### MODE TESTING IN DIFFICULT CASES

Ming-Yen Cheng, Peter Hall

Centre for Mathematics and its Applications, Australian National University

### ABSTRACT

. Usually, when testing the null hypothesis that a distribution has one mode, against the alternative that it has two, the null hypothesis is interpreted as entailing that the density of the sampling distribution has a unique point of zero slope, which is a local maximum. In this paper we argue that a more appropriate null hypothesis is that the density has two points of zero slope, of which one is a local maximum and the other is a shoulder. We show that when a test for a mode- with-shoulder is properly calibrated, so that it has asymptotically correct level, it is generally conservative when applied to the case of a mode without a shoulder.We suggest methods for calibrating both the bandwidth and dip/excess mass tests in the setting of a mode with a shoulder. We also provide evidence in support of the converse: a test calibrated for a single mode without a shoulder tends to be anticonservative when applied to a mode with a shoulder. The calibration method involves resampling from a `template' density with exactly one mode and one shoul- der. It exploits the following asymptotic factorisation property for both the sample and resample forms of the test statistic: all dependence of these quantities on the sampling distribution cancels asymptotically from their ratio. In contrast to other approaches, the method has very good adaptivity properties.

### KEYWORDS

. Bandwidth, bootstrap, calibration, curve estimation, level accu- racy, local maximum, shoulder, smoothing, turning point.### SHORT TITLE

. Mode testing### AMS SUBJECT CLASSIFICATION

. Primary 62G07, Secondary 62G09.### 1. INTRODUCTION

Testing for modality is one way of nding evidence of sub-populations in the population from which data are drawn. Early tests were often based on parametric mixture models (e.g. Cox 1966), but during the last two decades several nonpara- metric methods have been developed. They are generally conservative, however, and increasing interest is being shown in ways of calibrating them so that their levels are closer to those prescribed. Heuristically, it is to be expected that improving the level accuracy of a conservative test would lead to increased power.

It is usually necessary to have at least an approximate model for densities f
representing the \null hypothesis" that is being tested, since we need to calibrate
the test under the null. For example, in the case of testing for unimodality against
the alternative of multimodality, the null hypothesis is generally thatf has one local
maximum, no local minima, and no places of zero gradient that do not correspond to
turning points. We shall call this the \classic null hypothesis", H^{0}^{class} it is tested
against the alternative,H^{1}, that f has two or more modes.

Such alternative hypotheses are generally relatively easy to distinguish from
the null, however. We argue that a test of modality will have better performance
if it works well against distributions that are `marginal', or `most dicult' to tell
apart from the null | this is the sense in which we use the term `dicult' in our
paper. The dicult cases are densities that represent the boundary between one
and two modes | that is, those where f has one local maximum, no local minima,
and exactly one point x for which f^{0}(x) = 0 but x is a shoulder point (dened by
f^{00}(x) = 0 andf^{000}(x)^{6}= 0) rather than a local maximum or local minimum. We term
this the `boundary null hypothesis', H^{0}^{bound}. The issue of which null hypothesis is
employed determines the type of theory which best describes properties of tests for
modality, and aects the tests' level accuracy and power.

Figure 1.1 illustrates some of these issues. Panels (a) and (c) depict densities
that are unimodal and bimodal, satisfying H^{0}^{class} and H^{1} respectively and panel
(b) shows a \shoulder" density which in a sense is midway between the other two,
and satisesH^{0}^{bound}. Intuitively, when an empirical test nds it hard to distinguish
between panels (a) and (c), the problem really arises because the test can't solve
the more dicult problem of deciding between panels (b) and (c). To optimise

performance in these dicult cases the test should be constructed so that it addresses the harder problem, not the easier one.

Put Figure 1.1 about here, please ]

It is helpful to consider the related, parametric problem of testing composite,
one-sided hypotheses, of the form ^{} ^{0} versus > ^{0}, where denotes a scalar
parameter. There it is common to construct rst a test of the simple null hypothesis,

= ^{0}, against the alternative hypothesis > ^{0}, and then use the same test in
the case of the composite one-sided null hypothesis. When the likelihood ratio is
monotone, this approach is optimal and gives uniformly most powerful tests see
Kendall and Stuart (1979, Chapter 23). The null hypothesis = ^{0} is more dicult
than < ^{0} to distinguish from > ^{0}, and the optimal approach is to construct
the test in the more dicult case.

In the context of the mode testing problem,H^{0}^{bound} represents the simple null
hypothesis = ^{0} at the boundary, andH^{0}^{class} plays the role of the null hypothesis

< ^{0}. Following the line suggested in the previous paragraph, we argue that the
test should be developed for the more dicult null hypothesis,H^{0}^{bound}. Section 2.4
establishes that, analogously to the conclusions reached in the previous paragraph for
the parametric case, our test is also appropriate forH^{0}^{class} Figure 3.3 indicates the
conservatism of a test ofH^{0}^{bound} when applied toH^{0}^{class} and Figure 3.4 illustrates
the anticonservatism of a test for H^{0}^{class} when applied to H^{0}^{bound}.

In this paper we suggest methods, and develop theory, pertaining to this view of
testing for modality. We employ two particular tests as examples, the bandwidth test
of Silverman (1981) and the dip/excess mass test of Hartigan and Hartigan (1985)
and Muller and Sawitzki (1991). Both involve rejecting the null hypothesis if the
test statistic exceeds a certain critical point. For either test we discuss a bootstrap
calibration method that produces the asymptotically correct level under H^{0}^{bound},
and is slightly conservative under H^{0}^{class}. Related methods, inspired by work of
Hartigan (1997), will also be noted. Importantly, the level of the test under H^{0}^{class}
does not converge to zero as sample size increases, and so the bootstrap procedure
is relatively adaptive to both null hypotheses. In comparison, alternative methods
for calibrating tests of H^{0}^{bound} have a level which converges to zero under H^{0}^{class}.
Our theoretical description of mode testing under the boundary null hypothesis

is in contradistinction to existing accounts in the literature, which seem always to
assume the classic null hypothesis. Examples include Silverman (1983), Mammen,
Marron and Fisher (1992) and Cheng and Hall (1998). The results in the two
cases are quite dierent, with respect to order of magnitude as well as asymptotic
distribution. For example, under H^{0}^{class} the critical value for the bandwidth test is
of size n^{;1}^{=}^{5}, where n is the number of data values (Mammen, Marron and Fisher
1992) but under H^{0}^{bound} it is of size n^{;1}^{=}^{7}. The analogues for critical points
in the case of the dip/excess mass tests are n^{;3}^{=}^{5} and n^{;4}^{=}^{7}, respectively. The
limiting distributions in the four cases are all dierent and non-Normal. These facts
alone demonstrate that calibration methods developed specically for H^{0}^{class} can
be inappropriate for H^{0}^{bound}, and so can suer problems when H^{0}^{class} is only \just
true", unless they have the adaptivity property noted in the previous paragraph.

Specically, suppose H^{0}^{class} is true, but only just true (that is, H^{0}^{bound} is

\almost" true) and the test is constructed so as to reject the null hypothesis when
the test statistic exceeds a critical point whose asymptotic size is appropriate to
H^{0}^{class}. (Therefore, the critical point is of size n^{;1}^{=}^{5} if the bandwidth test is used,
and of size n^{;3}^{=}^{5} for the excess mass test.) Then the test will tend to incorrectly
reject the null hypothesis, for the simple reason that n^{;1}^{=}^{5} < n^{;1}^{=}^{7} and n^{;3}^{=}^{5} <

n^{;4}^{=}^{7}. Our adaptive tests based on bootstrap calibration does not suer from this
problem.

Because of the light which these theoretical results shed on the importance of
distinguishing between the two types of null hypothesis, we shall discuss our theo-
retical work rst, in Section 2. Section 3 will summarise the results of a simulation
study that assesses the performance of our adaptive tests. Section 2.1 will describe
alternative, non-adaptive approaches. Technical arguments for Section 2 will be
placed into Section 4. For simplicity we shall consider only the case of testing for
unimodality. There is no technical diculty in stating and deriving analogues of our
theory for testing the hypothesis of m modes against that of m+ 1 modes, where
m ^{} 1, although notation becomes rather complex in that case. The versions of
our adaptive tests in that general setting seem prohibitively complex, however. In
this multimodal setting, recent work of Hartigan (1997) is particularly deserving of
mention. There, a novel sequential (in m) approach to using the excess mass test is
suggested.

### 2. THEORETICAL PROPERTIES OF TEST STATISTICS

2.1. Summary and conclusions. The bandwidth test, which will be introduced and discussed in Section 2.2, involves rejecting the null hypothesis if a critical bandwidth,

^h^{crit}, is too large and the dip/excess mass test, to be described in Section 2.3,
rejects the null hypothesis if a test statistic ! is too large. When the sampling
densityf satises the null hypothesisH^{0}^{class}, and appropriate regularity conditions
hold, n^{1}^{=}^{5}^{^}h^{crit} has a proper limiting distribution that may be written as that of a
random variable C^{1}R^{1}, where the nonzero constant C^{1} depends only on f, and the
distribution of the random variableR^{1} does not depend onf. See Mammen, Marron
and Fisher (1992). By way of contrast, we shall point out in Section 2.2 that under
H^{0}^{bound} and appropriate conditions on f, n^{1}^{=}^{7}h^{^}^{crit} ^{!}C^{2}R^{2} in distribution, where
(here and below) Cj and Rj have the properties ascribed to C^{1} andR^{1} above.

Analogous results hold for the dip/excess mass test, where, under H^{0}^{class} and
regularity conditions onf,n^{3}^{=}^{5}!^{!}C^{3}R^{3}in distribution (see Cheng and Hall 1998)

and, under H^{0}^{bound} and regularity conditions, n^{4}^{=}^{7}! ^{!} C^{4}R^{4} in distribution (see
Section 2.3).

The formulae for C^{1}:::C^{4} are very dierent from one another, as too are
the distributions of R^{1}:::R^{4}. However, in each case the principle is the same:

the distribution of the test statistic factorises, asymptotically, into a constant that depends only onf and a random variable whose distribution is continuous and is in principle known. Note particularly that even the order of magnitude of the critical points, let alone the constants Cj and the random variables Rj, depends not only on the type of test but also on the particular form of null hypothesis that is chosen.

For both the bandwidth and dip/excess mass tests, the factorisation property
may be exploited to construct a test that adapts itself well to either H^{0}^{class} or
H^{0}^{bound}. It amounts to computing the ratio of the test statistic (either ^h^{crit} or !)
and its bootstrap form and rejecting the null hypothesis if the bootstrap distribution
of the ratio assumes values that are too large. On account of the factorisation, the
unknown constants Cj cancel from the ratio in all four cases, and so the bootstrap
distribution function of the ratio (a stochastic process) does not depend asymptoti-
cally on any unknowns. Unlike the case of more standard statistical problems (such
as percentile-t statistics) where scale parameters cancel, the bootstrap versions of

the distributions of variablesR_{j} are not particularly close to those of the respective
R_{j}'s, and so the stochastic process noted just above is not degenerate. Nevertheless,
its properties may be determined by Monte Carlo methods, and after suitable cali-
bration it has asymptotically correct level under both H^{0}^{bound} and H^{0}^{class}. Adap-
tive tests will be introduced in Sections 2.2 (for the bandwidth method) and 2.3
(dip/excess mass method), and Section 2.4 will discuss their properties.

An alternative way to proceed would be to directly estimate that one of the
unknown constantsC^{1}:::C^{4} which is appropriate to the context (e.g.C^{1} if we were
using the excess mass test underH^{0}^{class}), use Monte Carlo methods to calculate the
distribution of the respective variable R_{j}, and thereby approximate the asymptotic
distribution of the test statistic under the null hypothesis. If the bootstrap method
described in the previous paragraph is likened to Studentizing so to cancel the eects
of scale, then this approach is similar to using standard asymptotic approximations
after \plugging in" an estimate of scale. However, by its very construction the latter
approach is highly sensitive to choice of null hypothesis, be it H^{0}^{class} or H^{0}^{bound},
and in particular it does not enjoy the adaptivity of the bootstrap approach. If it is
constructed so that it gives an asymptotically correct test underH^{0}^{class}respectively,
H^{0}^{bound}], then the level of the test under H^{0}^{bound} or H^{0}^{class}] will be 0 or 1].

Moreover, even if these problems are overcome, it is likely that the bootstrap ap-
proach captures at least some of the rst-order features of the distribution of the test
statistic that a purely asymptotic method misses. In the context of bootstrap versus
asymptotic approximations to critical points for Silverman's (1981) bandwidth test,
York (1998) has demonstrated this numerically. The bootstrap approach, through
taking the resample size equal to the sample size, n, oers a signicantly better
approximation than does taking n=^{1}, even if the template density is not the true
density.

2.2. Bandwidth test. To introduce the test, let^{X} =^{f}X^{1}:::X_{n}^{g}denote a random
sample drawn from a distribution with unknown densityf, and construct the kernel
estimator

f^h(x) = (nh)^{;1} ^{X}^{n}

i^{=1} K x^{;}X_{i}
h

(2:1)

where h is a bandwidth and K a kernel function. As in Silverman (1981) we take K to be the standard Normal density, for which the number of modes of ^fh on the

whole line is a nonincreasing function of h. Furthermore, ^f_{h} is unimodal for all
suciently large h. Let ^h^{crit} denote the inmum of bandwidths such that ^f_{h} has
only one mode. A test of the null hypothesis of unimodality consists of rejecting
unimodality if ^h^{crit} is too large.

Mammen, Marron and Fisher (1992) proved that under H^{0}^{class}, and assuming
appropriate regularity conditions on f, ^h^{crit} is of size n^{;1}^{=}^{5}. We show next that it
is of size n^{;1}^{=}^{7} under H^{0}^{bound}. First we state an analogue of Mammen, Marron
and Fisher's (1992) regularity conditions (corresponding also to the conditions of
Silverman (1983)) in the case of H^{0}^{bound}:

f is supported on a compact interval ab], and has two deriv-
atives there f^{0} = 0 at distinct points x^{0}x^{1} ^{2}(ab), and f^{0} ^{6}= 0
at all other points in (ab) f has respectively two and three
Holder-continuous derivatives in neighbourhoods of x^{0} and x^{1}

f^{00}(x^{0})<0 f^{00}(x^{1}) = 0 f^{000}(x^{1})^{6}= 0 f^{0}(a+)>0 f^{0}(b^{;})<0: (2:2)
For 0< r <^{1} and ^{;1}< s <^{1}, dene

Z(rs) =r^{;4}^{Z} K^{00}(s+u)W(ru)du+ ^{1}^{2}(1 +s^{2})

where W is a standard Wiener process. Put C^{2} = ^{f}f(x^{1})=^{j}f^{000}(x^{1})^{j}^{2}^{g}^{1}^{=}^{7}, where x^{1}
is the shoulder point noted in (2.2), and let R^{2} denote the inmum of all values of
r such that the function Z(r^{}) does not change sign on (^{;1}^{1}). (In view of total
positivity properties of K^{00} (see Schoenberg, 1950), if Z(r^{}) does not change sign
on (^{;1}^{1}) then, with probability 1, neither does Z(r^{0}^{}) for any r^{0} > r.)

### Theorem 2.1.

Assume condition (2:2). Then n^{1}

^{=}

^{7}

^{^}h

^{crit}

^{!}C

^{2}R

^{2}in distribution as n

^{!}

^{1}.

We should comment on the nature of condition (2.2), which asks thatf decrease linearly to zero at the ends of its support. This ensures that the likelihood of spurious bumps in the tails of the density estimator ^fh is very small. Therefore, the size of

^h^{crit} is determined by properties of f at points of zero slope interior to (ab). More
generally, when f might not satisfy (2.2), one would either conne attention to
testing for unimodality away from the tails, or use larger bandwidths in the tails so
as to suppress bumps that arise from data sparseness.

Next we dene the bootstrap version of ^h^{crit}, and show that it satises a limit
law similar to that in Theorem 2.1. Conditional on ^{X}, let ^{X}^{} = ^{f}X^{1}^{}:::X_{n}^{}^{g}
denote a resample drawn randomly, with replacement, from the distribution with
density ^f^{crit} = ^f^{^}_{h}^{crit}, and dene ^f_{h}^{} by (2.1) except that Xi there is replaced by X_{i}^{}.
Write ^h^{}^{crit} for the inmum of bandwidths such that ^f_{h}^{} is unimodal.

Our proof of Theorem 2.1 in Section 4 will involve constructing W (depending on n) such that

n^{1}^{=}^{7}^{^}h^{crit} ^{!}C^{2}R^{2} in probability: (2:3)
For this W, let W^{} be a standard Wiener process independent of W, and let S be
the unique point at which Z(R^{2}^{}) vanishes. Dene

Z^{}(rs) = (rR^{2})^{;2}^{Z} K^{00}(s+u)W^{}(ru)du+^{Z} Z(R^{2}S^{;}R^{;1}^{2} ru)K(u)du

and let R^{}^{2} denote the inmum of all values of r such that the function Z^{}(r^{})
does not change sign on (^{;1}^{1}). It is straightforward to prove that R^{2}^{} is strictly
positive with probability 1.

### Theorem 2.2.

Assume condition (2:2), and that W is constructed so that (2.3) holds. Then,sup

0x<^{1}

P^{;}n^{1}^{=}^{7}^{^}h^{}^{crit} ^{}C^{2}x^{}^{}^{X})^{;}P(R^{}^{2} ^{}x^{j}W)^{}^{}^{!}0
in probability as n^{!}^{1}.

Theorem 2.2 and (2.3) together imply that, under H^{0}^{bound},
sup

0x<^{1}

P^{;}^{^}h^{}^{crit}^{}^{^}h^{crit} ^{}x^{}^{}^{X}^{}^{;}P(R^{}^{2}=R^{2} ^{}x^{j}W)^{}^{}^{!}0 (2:4)
in probability. It follows that the distribution of the stochastic process G^{b}(x) =
P(R^{}^{2}=R^{2} ^{} x^{j}W) does not depend on f, which makes it possible to develop an
asymptotically correct test of H^{0}^{bound}. This could be based on tabulation of the
distribution of G^{b}, and applying an asymptotic test, but alternatively it may be
accomplished by Monte Carlo methods, as follows. Put G^{b}n(x) = P(^h^{}^{crit}=^{^}h^{crit} ^{}
x^{jX}), let f^{0} denote a \template" density with a shoulder, and let G^{b}^{0}n denote the
version ofG^{b}_{n} that results from an n{sample drawn randomly fromf^{0}. Using Monte
Carlo methods we may compute to arbitrary accuracy the value of a constant t =

t_{}(n) such that P^{f}G^{b}^{0}_{n}(t_{})^{}1^{;}^{g}=, where is the desired signicance level of
the test. Then, the test with the form: reject H^{0}^{bound} in favour of H^{1} if G^{b}_{n}(t_{})^{}
1^{;}, has asymptotically correct level under H^{0}^{bound}.

One would expect the template approach to capture second-order eects better
than a purely asymptotic argument. This may be conrmed by simulation. To cap-
ture second-order eects even more accurately one could use a skewed template (for
example) if there was evidence that the sampling distribution was skewed, although
it is dicult to ensure both the right degree of skewness and the right value of C^{2}.
2.3. Dip/excess mass test. It suces to consider the excess mass test statistic, !,
which equals twice the dip test statistic. LetF^{b}be the empirical distribution function
of the n{sample ^{X} introduced in Section 2.2, and for m^{}1 and > 0 dene

E_{nm}() = sup_{C}

1::: Cm

m

X

j^{=1}

Fb(C_{j})^{;}^{k}C_{j}^{k}^{}

where the supremum is over disjoint intervals C^{1}:::Cm, F^{b}(C) is the F^{b}{measure
of C, and ^{k}C^{k} equals the length of C. Put Dnm() = Enm()^{;}En m^{;1}() and

! = sup_{} D_{n}^{2}(). We reject the null hypothesis of unimodality if ! is too large.

Cheng and Hall (1998) established that under H^{0}^{class}, ! is of size n^{;3}^{=}^{5}. We
show next that under H^{0}^{bound} it is of size n^{;4}^{=}^{7}, for which purpose we augment
(2.2) by the condition:

f^{0} is Holder-continuous within a neighbourhood of

the unique point x^{2} ^{6}=x^{1} satisfyingf(x^{2}) =f(x^{1}). (2:5)
Let W be as in Section 2.2, and dene C^{4} = ^{f}f(x^{1})^{4}=^{j}f^{000}(x^{1})^{jg}^{1}^{=}^{7}, !(t^{1}t^{2}u) =

fW(t^{1})^{;}W(t^{2})^{g}^{;}(t^{4}^{2}^{;}t^{4}^{1})^{;}u(t^{2}^{;}t^{1}) and
R^{4} = 24^{1}^{=}^{7} sup

;1<u<^{1}

sup

;1<t^{1}<t^{2}<t^{3}<^{1}

!(0t^{1}u) + !(t^{2}t^{3}u)^{}

; sup

;1<t^{1}<^{1} !(0t^{1}u)

: (2:6)

It may be proved that R^{4} is nite and positive with probability one, and that its
distribution has no atoms.

### Theorem 2.3.

Assume conditions (2:2) and (2:5). Then n^{4}

^{=}

^{7}!

^{!}C

^{4}R

^{4}in distri- bution as n

^{!}

^{1}.

The bootstrap setting for Theorem 2.3 is similar to that for Theorem 2.1. Let !^{}
be the bootstrap version of !, computed using the resample ^{X}^{} drawn by sampling
from the distribution with density ^f^{crit}. For a suitable construction of W, Theorem
2.3 may be stated in the stronger sense that n^{4}^{=}^{7}! ^{!} C^{4}R^{4} in probability. We
assume this construction below. LetW^{} be another Wiener process, independent of
W dene

U(rs) =r^{;4} ^{Z} K^{00}(s+u)W(ru)du

let R denote the inmum of all r > 0 such that U(rs) + ^{1}^{2}(1 +s^{2}), as a function
of s, does not change sign on the real line and let S be the unique point at which
U(Rs) + ^{1}^{2} (1 +s^{2}) vanishes. Put

"(y^{1}y^{2}u) =W^{}(y^{1})^{;}W^{}(y^{2})

;R^{2} ^{Z} ^{1}

0

t^{h}y^{2}^{2}U^{}RS+R^{;1}(1^{;}t)y^{2}^{}

;y^{2}^{1}U^{}RS+R^{;1}(1^{;}t)y^{1}^{}^{i}dt

; 1

2

;1 +S^{2}^{}^{;}y^{2}^{2}^{;}y^{2}^{1}^{}^{;} ^{1}^{6}RS^{;}y^{3}^{2}^{;}y^{1}^{3}^{}

; 1

24

;y^{2}^{4}^{;}y^{1}^{4}^{}^{;}u(y^{2}^{;}y^{1})

and, with "=24^{1}^{=}^{7} replacing !, deneR^{4}^{}by (2.6). With probability one,R^{}^{4} is nite
and positive, and its distribution has no atoms.

### Theorem 2.4.

Assume conditions (2:2) and (2:5), and that W is constructed so that n^{4}

^{=}

^{7}!

^{!}C

^{4}R

^{4}in probability. Then,

sup

0x<^{1}

P^{;}n^{4}^{=}^{7}!^{} ^{}C^{4}x^{}^{}^{X}^{}^{;}P(R^{4}^{}^{}x^{j}W)^{}^{}^{!}0
in probability as n^{!}^{1}.

Theorem 2.4 is directly analogous to Theorem 2.2, and implies the obvious analogue of (2.4):

sup

0x<^{1}

P^{;}!^{}^{}!^{}x^{}^{}^{X}^{}^{;}P(R^{}^{4}=R^{4} ^{}x^{j}W)^{}^{}^{!}0 (2:7)
Therefore, bootstrap calibration applied to the ratio !^{}=! produces tests ofH^{0}^{class}
with asymptotically correct level. Specically, if f^{0} is the template density intro-
duced in Section 2.2, if H^{b}n(x) =P(!^{}=!^{}x^{jX}), if H^{b}^{0}n is the version of H^{b}n when

the n-sample is drawn from f^{0} rather than f, and if the constant u_{} is dened by
P^{f}H^{b}^{0}_{n}(u_{}) ^{} 1^{;}^{g} = , then the test which rejects H^{0}^{bound} if H^{b}_{n}(u_{}) ^{} 1^{;}
has asymptotically correct level under H^{0}^{bound}.

Hartigan (1997) has suggested an asymptotic test based on the results in The- orem 2.4, normalising the test statistic using the square root of the number of data values interior to the shoulder segment. If one calibrates via the asymptotic dis- tribution then this ingenious approach avoids using the template density. In order to better capture second-order eects, however, one could compute the template density and then, simulating from that distribution (taking the Monte Carlo sample size equal to the actual sample size), compute an approximation to the distribution of the test statistic under the null hypothesis.

2.4. Adaptivity of bootstrap calibration methods. The factorisation which forms the
basis for our bootstrap calibration method is also valid underH^{0}^{class}, where instead
of (2.4) and (2.7) it produces results of the form:

sup

0x<^{1}

P^{;}^{^}h^{}^{crit}^{}^{^}h^{crit} ^{}x^{}^{}^{X}^{}^{;}P(R^{1}^{}=R^{1} ^{}x^{j}W)^{}^{}^{!}0 (2:8)
sup

0x<^{1}

P^{;}!^{}^{}!^{}x^{}^{}^{X}^{}^{;}P(R^{3}^{}=R^{3} ^{}x^{j}W)^{}^{}^{!}0: (2:9)
A suitable regularity condition for each of these results is the following version of
(2.2), where the shoulder point x^{1} is no longer permitted, thereby ensuring that
H^{0}^{class} (rather than H^{0}^{bound}) obtains:

f is supported on a compact interval ab], and has two der-
ivatives there f^{0} = 0 at x^{0} ^{2}(ab), and f^{0} ^{6}= 0 at all other
points in (ab) f has two Holder-continuous derivatives in
a neighbourhood of x^{0} f^{00}(x^{0})<0, f^{0}(a+)>0, f^{0}(b^{;})<0.

Result (2.8) is discussed in an ANU PhD thesis by M. York (1998), and (2.9) appears
in Cheng and Hall (1996). As in the case of R^{2} andR^{4}, the variablesR^{1} andR^{3} are
functionals of a standard Weiner process W R^{}^{1} and R^{}^{3} are functionals of W and
an independent Wiener process W^{} and all variables Rj and R^{}_{j} have continuous
distributions. It follows from (2.8) and (2.9) that ifH^{0}^{class}holds instead ofH^{0}^{bound},
yet we apply the bootstrap test suggested when H^{0}^{bound} is valid, the asymptotic
level of the test lies strictly between 0 and 1. In this sense, the tests suggested
in Sections 2.2 and 2.3 are adaptive other approaches to calibration, such as that

discussed towards the end of Section 2.1, do not enjoy this property. Moreover,
bootstrap calibration under H^{0}^{bound} turns out to be conservative when H^{0}^{class} is
true, as we shall show in the next section.

### 3. NUMERICAL STUDY

The bandwidth and dip/excess mass tests for H^{0}^{bound} were applied to three
Normal mixture densities: the two unimodal-with-shoulder densities given by

8e^{9}^{=}^{8}^{;}1 + 8e^{9}^{=}^{8}^{}^{;1}^{}^{}N(01) +^{;}1 + 8e^{9}^{=}^{8}^{}^{;1} ^{}N(^{;}9^{p}3=80:0625) (3:1)
(100=109)^{}N(01) + (9=109)^{}N(1:30:09) (3:2)
and illustrated in panels (a) and (b), respectively, of Figure 3.1 and the unimodal-
without-shoulder standard Normal density, depicted in panel (d) of that gure. In
all cases the bandwidth and dip/excess mass tests forH^{0}^{bound} were calibrated using
the methods suggested in Sections 2.2 and 2.3. The template density f^{0} employed
for calibration was taken as

(16=17)^{}N(01) + (1=17)^{}N(^{;}1:250:0625) (3:3)
and is unimodal with a shoulder. It is illustrated in panel (c) of Figure 3.1.

Put Figure 3.1 about here, please ]

The sample sizes used were 50 and 100. In each setting, 500 samples were
simulated and conditional on each of these, 500 resamples were drawn. Then,
all the required conditional and unconditional probabilities were approximated by
their corresponding empirical values. To obtain values of ^h^{crit} and ^h^{}^{crit}, kernel
density estimates were computed over an equally-spaced grid of 512 points. To avoid
problems arising from data sparseness in the tails, only modes that occurred within
1:5 standard deviations of the mean were counted. The same rule was followed when
evaluating the dip/excess mass statistics.

Figure 3.2 illustrates the actual versus nominal levels when the two tests for
H^{0}^{bound} (calibrated using the density at (3.3) as the template) were applied to data
generated from the two shoulder-densities given by (3.1) and (3.2), respectively.

Note that the actual versus nominal curves are close to the diagonal line, especially in the cases illustrated by panels (b), (c) and (d). This indicates that both tests

have accurate levels. The gure also suggests that, overall, the excess mass test has better level accuracy than the bandwidth test.

Put Figure 3.2 about here, please ]

Figure 3.3 depicts, for both the bandwidth and dip/excess mass tests, the actual
versus nominal levels when the true density is standard Normal and the shoulder
densityf^{0} is used to provide calibration. Note particularly that all the curves always
lie below the diagonal line, illustrating the conservatism of a method calibrated for
H^{0}^{bound} when it is applied to test H^{0}^{class}.

Put Figure 3.3 about here, please ]

Figure 3.4 is essentially the obverse of Figure 3.3: in the latter, the sampling
density was standard Normal, and we calibrated using f^{0}, but in Figure 3.4 the
sampling density is f^{0} and we calibrate using the standard Normal. The fact that
the dashed and dotted lines in both panels of Figure 3.4 lie above the diagonal line
demonstrates that, as expected, calibrating a test of H^{0}^{bound} using a template for
H^{0}^{class} results in an anticonservative procedure.

Put Figure 3.4 about here, please ]

### 4. TECHNICAL ARGUMENTS

4.1. Proof of Theorem 2.1. Let = n^{;1}^{=}^{7} and write CR for C^{2}R^{2}, respectively.

We shall prove that

there exist ^{1} ^{2} >0 such that, if ^h^{crit} = ^h^{crit}( ^{1} ^{2}) is re-dened
to be the supremum of the set ^{H} of values h^{}n^{;(1}^{=}^{7)+}^{}^{1} such that
f^(^{j}h) has at least one turning point in ^{I}( ^{2}) = (x^{1}^{;}n^{}^{2}x^{1}+n^{}^{2})
then with probability tending to one, ^{H} is nonempty and n^{1}^{=}^{7}^{^}h^{crit}

has the claimed limit distribution: (4:1)

Arguments similar to those of Mammen, Marron and Fisher (1992) may be employed
to prove that (a) for each ^{1} ^{2}(01=7), the probability that for someh ^{}n^{;(1}^{=}^{7)+}^{}^{1}
the function ^f(^{j}h^{1}) has more than one turning point in IR converges to 0, (b) for
each c >0 and ^{2} >0, the probability that for some h > cn^{;1}^{=}^{7} the function ^f(^{j}h)
has more than one turning point in IR^{nI}( ^{2}) converges to 0, and (c) with probability
1, ^f(^{j}h) has at least one turning point in ^{I}( ^{2}) for each h < ^{^}h^{crit}. The theorem
follows from (4.1) and (a){(c).

The embedding of Koml#os, Major and Tusn#ady (1975) ensures the existence of a
standard Wiener processW^{1} such that, withW^{0}(t) =W^{1}(t)^{;}tW^{1}(1), the empirical
distribution function F^{b} of ^{X} may be written as F^{b}(x) = F(x) +n^{;1}^{=}^{2}W^{0}^{f}F(x)^{g}+
Op(n^{;1} logn) uniformly inx. It follows that

f^^{0}(x^{j}h) ^{;}Ef^{^}^{0}(x^{j}h) =^{;}^{;}n^{1}^{=}^{2}h^{2}^{}^{;1}^{Z} ^{}W^{1}^{f}F(x^{;}hz)^{g}^{;}W^{1}^{f}F(x^{1})^{g}^{}K^{00}(z)dz
+Op^{}(nh^{2})^{;1} logn^{}

uniformly in ^{;1}< x <^{1} and h >0. Writing x=x^{1}+y and h=r^{1}, and using
standard results on the modulus of continuity of a Wiener process, we deduce that
if ^{1} ^{2} >0 are suciently small then for some ^{3} >0,

f^^{0}(x^{1}+y^{j}r^{1})^{;}Ef^{^}^{0}(x^{1}+y^{j}r^{1})

=^{;}^{;}n^{1}^{=}^{2}^{2}r^{2}^{1}^{}^{;1}^{Z} ^{}W^{1}^{f}F(x^{1}) +(y^{;}r^{1}z)f(x^{1})^{g}

;W^{1}^{f}F(x^{1})^{g}^{}K^{00}(z)dz+O_{p}^{;}^{2}n^{;}^{}^{3}r^{1}^{;2}^{}

uniformly in 0 < r^{1} ^{}const:n^{}^{1} and ^{j}y^{j}^{}const:n^{}^{1}, for all values of the constants.

Therefore, dening

W^{2}(t) =^{;f}f(x^{1})^{g}^{;1}^{=}^{2}^{}W^{1}^{f}F(x^{1}) + f(x^{1})t^{g}^{;}W^{1}^{f}F(x^{1})^{g}^{}
we nd that, uniformly in the same values of r^{1} and y,

^{;2}r^{1}^{2}^{}f^{^}^{0}(x^{1} +y^{j}r^{1})^{;}Ef^{^}^{0}(x^{1}+y^{j}r^{1})^{}

=f(x^{1})^{1}^{=}^{2}^{Z} W^{2}(y^{;}r^{1}z)K^{00}(z)dz+Op^{;}n^{;}^{}^{3}^{}: (4:2)
Using the fact that f^{00} is Holder continuous in a neighbourhood of x^{1} we see
that, for ^{1} ^{2} ^{3} >0 chosen suciently small,

Ef^{^}^{0}(x^{1}+y^{j}r^{1}) =^{Z} f^{0}^{f}x^{1}+(y^{;}r^{1}z)^{g}K(z)dz

= ^{1}^{2}^{2}^{;}y^{2}+r^{1}^{2}^{}f^{000}(x^{1}) +O^{}^{2}^{;}y^{2}+r^{1}^{2}^{}n^{;}^{}^{3}^{} (4:3)
uniformly in 0 < r^{1} ^{} const:n^{}^{1} and ^{j}y^{j} ^{}const:n^{}^{2}. Combining (4.2) and (4.3) we
deduce that

f^^{0}(x^{1}+y^{j}r^{1}) =^{2}^{h}r^{1}^{;2}f(x^{1})^{1}^{=}^{2}

Z

W^{2}(y^{;}r^{1}z)K^{00}(z)dz

+ ^{1}^{2} ^{;}y^{2}+r^{1}^{2}^{}f^{000}(x^{1}) +Op^{ ;}r^{;2}^{1} +y^{2}+r^{1}^{2}^{}n^{;}^{}^{3}^{}^{i} (4:4)

uniformly in 0< r^{1} ^{}const:n^{}^{1} and ^{j}y^{j}^{}const:n^{}^{2}.

Let T = sgn^{f}f^{000}(x^{1})^{g}, C =^{f}f(x^{1})=^{j}f^{000}(x^{1})^{j}^{2}^{g}^{1}^{=}^{7}, C^{0} = ^{f}f(x^{1})^{2}^{j}f^{000}(x^{1})^{j}^{3}^{g}^{1}^{=}^{7},
y = Crs, r^{1} = Cr and W^{2}(Ct) = C^{1}^{=}^{2}TW(^{;}t). Then W is a standard Wiener
process, and (4.4) implies that for dierent values of ^{1} ^{2} ^{3} >0, chosen suciently
small,

f^^{0}(x^{1}+Crs^{j}Cr) =^{2}C^{0}T r^{;2}^{Z} W^{f}r(z^{;}s)^{g}K^{00}(z)dz

+ ^{1}^{2}r^{2}(1 +s^{2}) +Op^{f}r^{;2}+r^{2}(1 +s^{2})^{g}n^{;}^{}^{3}]^{}

=^{2}C^{0}Tr^{2}^{}Z(rs) +Op^{;}r^{;4}+ 1 +s^{2}^{}n^{;}^{}^{3}^{} (4:5)
uniformly in 0 < r ^{} const:n^{}^{1} and ^{j}y^{j} ^{} const:n^{}^{2}. Result (4.1) follows from this
formula.

4.2. Proof of Theorem 2.2. We give the proof only in outline, noting the analogues
of steps in the proof of Theorem 2.1 and not pausing to give detailed bounds for
remainder terms. In the derivation of Theorem 2.1 we should replace ( ^f(^{j}h)f) by
( ^f^{}(^{j}h)f^{^}^{crit}). Let ^x^{1} denote the shoulder of ^f^{crit}. (Thus, ^f^{crit}^{0} (^x^{1}) = ^f^{crit}^{00} (^x^{1}) = 0.)
In place of (4.2) we have, conditional on ^{X} and for a standard Wiener process W^{2}^{}
independent of W,

^{;2}r^{2}^{1}^{}f^{^}^{}^{0}(^x^{1}+y^{j}r^{1})^{;}E^{}f^{^}^{0}(^x^{1} +y^{j}r^{1})^{jX}^{}

=f(x^{1})^{1}^{=}^{2}

Z

W^{2}^{}(y^{;}r^{1}z)K^{00}(z)dz+o_{p}(1): (4:6)
By (4.5) and since ^h^{crit} ^{;}CR = op() we have, in notation from the proof of
Theorem 2.1,

f^^{crit}^{0} (x^{1}+CRs) = ^f^{crit}^{0} (x^{1}+CRs^{j}^{^}h^{crit}) =^{2}C^{0}TR^{2}Z(Rs) +op^{;}^{2}^{}:
Furthermore, ^x^{1}^{;}(x^{1}+CRS) =op(), and so

E^{f}f^{^}^{0}(^x^{1}+y^{j}r^{1})^{jX}^{g}

=

Z f^^{crit}^{0} ^{f}x^^{1}+(y^{;}r^{1}z)^{g}K(z)dz

=^{2}C^{0}TR^{2}^{Z} Z^{}R(CR)^{;1}(^x^{1}^{;}x^{1}) + (CR)^{;1}(^{;}r^{1}z)^{}K(z)dz+o_{p}^{;}^{2}^{}

=^{2}C^{0}TR^{2}^{Z} Z^{}RS+ (CR)^{;1}(y^{;}r^{1}z)^{}K(z)dz+op^{;}^{2}^{}: (4:7)

Combining (4.6) and (4.7) we deduce that
f^^{0}(^x^{1}+y^{j}r^{1}) =^{2}^{h}r^{;2}^{1} f(x^{1})^{1}^{=}^{2}

Z

W^{2}^{}(y^{;}r^{1}z)K^{00}(z)dz

+C^{0}TR^{2}^{Z} Z^{}RS+ (CR)^{;1}(y^{;}r^{1}z)^{}K(z)dz^{i}

+op^{;}^{2}^{}: (4:8)

Making the changes of variable y=Crs, r^{1} =Cr and W^{2}^{}(Ct) =C^{1}^{=}^{2}W^{}(^{;}t), the
right-hand side of (4.8) becomes

C^{0}TR^{2}^{2}Z^{}(rs) +op^{;}^{2}^{}:
The theorem follows from this approximation.

4.3. Proof of Theorem 2.3. Let a = f(x^{1}) and b = ^{24}^{1} ^{j}f^{000}(x^{1})^{j}. Given ^{0} ^{1} ^{2}
(0min(a1=7)), dene ^{J}^{1} = (0a^{;} ^{0}], ^{J}^{2} = (a^{;} ^{0}a^{;}n^{;(3}^{=}^{7)+3}^{}^{1}] and ^{J}^{3} =
(a^{;}n^{;(3}^{=}^{7)+3}^{}^{1}^{1}). Arguing as in the proof of Theorem 2 of Muller and Sawitzki
(1991) we may show that

sup^{2J}^{1} Dn^{2}() =Op^{f}(n^{;1} logn)^{2}^{=}^{3}^{g} _{}sup

2J

2

Dn^{2}() =Op^{;}n^{;(4}^{=}^{7);(}^{}^{1}^{=}^{5)}^{}:
Therefore,

^{2J}sup^{1}^{J}^{2} D_{n}^{2}() =o_{p}^{;}n^{;4}^{=}^{7}^{}: (4:9)
We prove the theorem in the case f^{000}(x^{1}) > 0. The case f^{000}(x^{1}) < 0 may be
treated similarly. Since f^{000}(x^{1}) > 0 and condition (4.2) holds, x^{1} < x^{0} and there
exists a point x^{2} such that x^{0} < x^{2}f(x^{2}) =f(x^{1}) and f^{0}(x^{2}) <0. Let = n^{;1}^{=}^{7},
= n^{;}^{}^{3} with ^{3} ^{} 1=7, ^{I}^{0} = (x^{1}^{;}n^{}^{1}x^{1}+n^{}^{1}), ^{I}^{1} = (x^{2}^{;}n^{}^{1}x^{2}+n^{}^{1})
and ^{I}^{2} = (^{;}n^{}^{1}n^{}^{1}). Given t^{1}:::t^{3} ^{2}^{I}^{0}, put yj = (tj^{;}x^{1})= ^{2}^{I}^{2}j = 1:::3.

Let sup^{(1)}:::sup^{(7)} denote suprema over, respectively, (1) ^{;1} < t^{1} < t^{2} < ^{1},
(2) t^{1} ^{2} ^{I}^{0}t^{2} ^{2} ^{I}^{1} such that t^{1} < t^{2}, (3) y^{1} ^{2} ^{I}^{2}, (4) ^{;1} < t^{1} < ::: < t^{4} < ^{1},
(5) t^{1}:::t^{3} ^{2} ^{I}^{0}t^{4} ^{2} ^{I}^{1} such that t^{1} < ::: < t^{4}, (6) t^{1} ^{2} ^{I}^{0}t^{2}:::t^{4} ^{2} ^{I}^{1}
such that t^{1} < ::: < t^{4}, and (7) y^{1}:::y^{3} ^{2} ^{I}^{2} such that y^{1} < ::: < y^{3}. Write

=a^{;}b^{3}, where ^{;1}< <^{1}. Given a standard Wiener process W^{1}, dene
W(y) = (a)^{;1}^{=}^{2}^{}W^{1}^{f}F(x^{1}) +ay^{g}^{;}W^{1}^{f}F(x^{1})^{g}^{}