• 沒有找到結果。

Kernel methods for optimal change-points estimation in derivatives ∗

N/A
N/A
Protected

Academic year: 2022

Share "Kernel methods for optimal change-points estimation in derivatives ∗"

Copied!
19
0
0

加載中.... (立即查看全文)

全文

(1)

Kernel methods for optimal

change-points estimation in derivatives

Ming-Yen Chengand Marc Raimondo National Taiwan University and University of Sydney

16 June 2007

Abstract

In this paper we propose an implementation of the so-called zero-crossing-time de- tection technique specifically designed for estimating the location of jump-points in the first derivative (kinks) of a regression function f . Our algorithm relies on a new class of kernel functions having a second derivative with vanishing moments and an asymmetric first derivative steep enough near the origin. We provide a software package which, for a sample of size n, produces estimators with an accuracy of order, at least, O(n−2/5).

This contrasts with current algorithms for kink estimation which at best provide an accuracy of order O(n−1/3). In the software, the kernel statistic is standardised and compared to the universal threshold to test the existence of a kink. A simulation study shows that our algorithm enjoys very good finite sample properties even for low sample sizes. The method reveals kink features in real data sets with high noise levels at places where traditional smoothers tend to oversmooth the data.

1 Introduction

1.1 Aims and motivations

Our aim in this paper is to detect and estimate the location of a jump in the first derivative of a regression function f . While there is a large literature on the estimation of a jump- point in the function f there are only few methods available to estimate the location of a jump in the derivative of f , later referred as a kink. In many real data sets the kink scenario is relevant, however. In some applications there is a particular interest in knowing that a change has occurred in the derivative, for example in time series analysis one can think of f as being a general underlying trend in the model where a kink corresponds to a change in the trend: from an upward trend to a downward trend or vice-versa. For generic scatter plot smoothing, the detection and precise location of kinks can be used to improve the visual and numerical performance of traditional smoothing methods, see e.g. Figure 1 and Figure 2.

Key Words and Phrases. Beluga whale nursing time, change-points, derivatives, jump-points, kernel, kink, Legendre polynomials, motorcycle data, optimal rate, zero-crossing time.

Short Title: Change-points estimation in derivatives.

AMS 2000 Subject Classification: primary 62G05; secondary 62G08

Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. cheng@math.ntu.edu.tw

School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia.

marcr@maths.usyd.edu.au

(2)

0.0 0.2 0.4 0.6 0.8 1.0

0100300500

x

y

Figure 1: Scatter plot of the nursing time of beluga whale calf (Simonoff, 1996), with kernel estimates superimposed. The plain curve is a kernel estimate after kink detection and location; the dashed curve is a kernel estimate with the local bandwidth of Herrmann (1997).

1.2 Model and assumptions.

Suppose we observe an unknown function f in white noise,

dY (x) = f (x)dx + σ n−1/2dW (x), x ∈ I = [0, 1], (1) where W (·) is the standard Wiener process on I, σ > 0 is a constant and n > 0. This model is the continuous equivalent of the nonparametric regression model where one observes Y (xi) on a regular grid xi= i/n, i = 1, ..., n, c.f. Brown and Low (1996).

Throughout this paper f(l)(x) = ∂lf (x)/∂xl denotes the derivative of order l of f . The functional class we consider is Fs(a) defined as follows. Let a be a constant (jump size) and s ≥ 2 be an integer (smoothness away from the kink). We say that f ∈ Fs(a) if f(1)(x) has a single change-point (jump) at θ, θ ∈ (0, 1):

[f(1)](θ) = f(1)+) − f(1)) = a, (2) and

f(1)±+ x±) − f(1)±) = x±f(2)±) +x2±

2 f(3)±) + . . . + xs−1±

(s − 1)!f(s)±) + O(xs±), (3) where f(1)+) = lim

x→θ,x>θf(1)(x), f(1)) = lim

x→θ,x<θf(1)(x), x+> 0 and x < 0.

An element f of the class Fs(a) is a function whose order 1 derivative has a jump of size ’a’ at θ. Away from θ, f enjoys derivatives up to order s. In the proof we further assume that the order s derivative of f is bounded by a positive constant L. Condition (3) states that we can approximate f(1)(x) from the left- or right-hand side of θ using Taylor’s formula.

(3)

0.0 0.2 0.4 0.6 0.8 1.0

−100−50050

x

y

Figure 2: Scatter plot of the motorcycle data (Silverman, 1985), with kernel estimates superim- posed. The plain curve is a kernel estimate after kink detection and location; the dashed curve is a kernel estimate with the local bandwidth of Herrmann (1997).

1.3 Results and contribution

Recent theoretical results, derived in a general indirect setting, show that the optimal rate for estimating the location of a kink is

r(n, s) ³ n−s/(2s+1), (4)

where s ≥ 2 indicates the smoothness of f away from the kink, see Goldenshluger et al.

(2006) ([GTZ] in the sequel). While the rate optimal estimator of [GTZ] can be applied in general indirect problems, its computation in a direct setting is not straightforward and it has not been implemented in practice.

Our main contribution is to present a direct approach specifically aimed at finding the location of a jump in the derivative from direct observations (1). Our method builds on the optimal construction of [GTZ]; while less general our estimator is more suited for applications. By working with direct observations we use compactly supported high order kernels to estimate the third order derivative of f . This offers significant computational advantages while preserving the optimal asymptotic property (4). It is worth noting that even in the case of slowest possible rate: s = 2, our estimator achieves an accuracy of order O(n−2/5) which is much faster than the O(n−1/3) accuracy of existing direct competitive methods. Our kernel approach is explicit, simple and fast to compute. The software package used to prepare most figures and tables in this paper is available from the authors’ home page. In the software the kernel statistic is standardised and compared to the universal

(4)

threshold to test the existence of a kink. Our method extends naturally to multiple kink detection and estimation.

1.4 Related work

When estimating the position of a single jump in f from observations (1), Korostelev (1987) constructed an optimal estimator which converges with the rate O(n−1). Wang (1995) considered the estimation of cusps of order α, 0 ≤ α < 1, deriving a wavelet- based method which converges with the rate O

³

(n−1(log n−1)−(1+η))1/(1+2α)

´

, η > 1. Later Raimondo (1998) showed that the log-term could be removed from the previous rate by using a two-step method, deriving a jump and sharp cusp estimator which converges with the rate O¡

n−1/(1+2α)¢

. The method and rate result of Raimondo (1998) also extends to α ≥ 1, i.e. the estimation of a jump or a cusp in the derivative of f of order bαc, the largest integer smaller than α. For example, when estimating the location of a kink from direct observations (1) the estimator of Raimondo (1998) achieves an accuracy of order O(n−1/3).

For s ≥ 2 this is slower than the optimal rate (4).

Other work on change-points in non-parametric regression include M¨uller (1992), Ko- rostelev and Tsybakov (1993), Cline et al. (1995), Gijbels et al. (1999) and Huh and Carri`ere (2002). The detection of change-points in non-parametric regression has recently received a lot of attention, see e.g. Antoniadis and Gijbels (2002), Gijbels and Goderniaux (2004a,b), Raimondo and Tajvidi (2004) and Huh and Park (2004). Work on change-points in non- parametric density estimation include Neumann (1997) and Huh (2002). Some recent ref- erences on change-point methods include Park and Kim (2004) and Reiss (2004).

1.5 Paper organisation

Section 2 introduces a new class of kernel functions designed for optimal kink estimation.

Section 3 gives a step-by-step description of our detection and estimation method. Section 4 is concerned with numerical performance and application to real data sets. A mathematical appendix is given in Section 5 .

2 Kernels for kink estimation

We propose a new class of kernel functions specially designed for optimal kink estimation.

First, we derive a set of sufficient conditions on the kernel to ensure optimal properties of our kink estimator. Secondly, we give an explicit formula to construct general high order polynomial kernels that can be used in practice.

Our method builds on the zero-crossing-time detection technique of [GTZ] and uses approximations to high order derivatives of f . For example, in the indirect setting [GTZ]

use an approximation to the second derivative of f to locate the position of a jump in f . Here we adapt this method and use an approximation to the third derivative of f to estimate the location of a kink in the direct setting.

(5)

2.1 Kernel estimation of derivatives

Let K be a compactly supported kernel function such that K(3) exists and is bounded. A generic kernel approximation of the third derivative of f may be written as

kh(t) = h−4 Z 1

0

K(3)¡ x − t h

¢f (x) dx , (5)

where h > 0 is a bandwidth. Under some mild smoothness assumptions we have, for small h, kh(t) ≈ f(3)(t). In case of noisy observations (1) we define the estimator

kˆh(t) = h−4 Z 1

0

K(3)¡ x − t h

¢dY (x) . (6)

The following result can be found in Wand and Jones (1995), p. 49.

Proposition 1. The Gaussian random process Zh(t) = ˆkh(t) − kh(t), t ∈ I, satisfies EZh(t) = 0, σZ2 = sup

t∈I

E[(Zh(t))2] ≤ c1n−1h−7, (7) where c1 is a constant.

2.2 A separation rate lemma

A key step of the zero-crossing-time technique is the so-called separation rate lemma, c.f.

[GTZ]. Here we propose a version of the separation rate lemma under a set of conditions (assumption C1,s below) milder than the Fourier domain conditions given in [GTZ]. This allows a simple implementation of the zero-crossing-time technique using compactly sup- ported polynomial kernels as described in the next section. Of particular importance to kink estimation is the behaviour of the first derivative of the kernel K near the origin as well as the number of vanishing moments of the second derivative of K. Some (sufficient) con- ditions which lead to the optimal rate of convergence for kink estimation are given below, where K0 denotes the first derivative of K. These conditions depend on the smoothness of f away from the kink.

Assumption C1,s

1. Support(K) = supp(K) = [−1, 1] and K(3) is (at least) Lipschitz continuous.

2. K0(x) = −K0(−x).

3. K0(0) = K0(−1) = K0(1) = 0, K(2)(−1) = K(2)(1) = 0.

4. If s ≥ 3: R1

−1xjK(2)(x)dx = 0, j = 1, ..., s − 2.

(Note that condition 2 implies thatR1

−1 K(2)(x)dx = 0)

5. |K0(x)| ≥ c2|x| , for all x ∈ [−q, q], for some constants q, c2: 0 < q < 1, c2 > 0.

6. K0(x) > 0 on [−b, 0], 0 < b < 1, K0(x) has a unique global maximum at −q,

−q∈ [−b, 0] . (By condition 2, K0(x) < 0 on [0, b] and has a unique global minimum at q, q ∈ [0, b]).

The separation rate lemma given next shows that the accuracy with which one can approximate the location of a kink using (5) depends on the smoothness of the underlying function away from the kink, provided that the kernel K is properly chosen.

(6)

Lemma 1. (δ-separation rate) Let f ∈ Fs(a) and K(·) be a kernel satisfying C1,s. In what follows the constants q and b depend only the kernel K(·). Let h > 0, δ > 0 be such that δ < qh. Let Aδ,h = {t : δ < |t − θ| < qh} and c5 = c5(a, K) be a sufficiently large constant.

Then, there exist constants c3, c4 such that:

(a) |kh(θ)| ≤ c3hs−3,

(b) for all t ∈ Aδ,h and δ ≥ c5hs, |kh(t)| ≥ c4δ h−3, (c) for all t ∈ I such that |θ − t| > bh, |kh(t)| ≤ c3hs−3.

Remark 1. Lemma 1 extends naturally to functions with a jump-point in derivatives of order α ≥ 1. Define assumption Cα,s similarly as C1,s with K(α+2) replacing K(3), K(α+1) replacing K(2) and K(α) replacing K(1). Then lemma 1 holds with kh(t) = khα(t) = h−(α+3) R1

0 K(α+2)¡x−t

h

¢f (x) dx.

2.3 General order kernel formula

−1 −0.5 0 0.5 1

−20 0 20

−1 −0.5 0 0.5 1

−200

−100 0 100 200

−1 −0.5 0 0.5 1

−2 0 2

−1 −0.5 0 0.5 1

−10 0 10

−1 −0.5 0 0.5 1

−0.5 0 0.5

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

Figure 3: Top left: K(3) defined with the degree 5 polynomial (12). Top right: K(3) defined with the degree 7 polynomial (13). Middle and bottom plots give respectively the second, K(2), and first, K(1), derivatives.

We provide an explicit formula for a class of general order kink estimation kernels by modification of the Legendre polynomials. First K(2) is constructed such that it satisfies condition C1,s. Technical details are given in Section 5. We begin with cases where s is an odd number, for s = 3, 5, · · · , and −1 ≤ x ≤ 1, let

K(2)(x) = γs Xs+1 j=[s/2]

αj,sx2j−s+1, (8)

where the multiplicative constant γs = (2s + 3)!/(22s+3(s − 1)!(s + 1)!) and the coefficients αj,s= (−1)[s/2]+j+1(2j)!

j!(s − j + 1)!(2j − s + 1)!. (9)

(7)

The kink estimation kernel K(3) is simply the derivative of K(2),

K(3)(x) = γs Xs+1 j=[s/2]+1

βj,sx2j−s, (10)

βj,s= (−1)[s/2]+j+1(2j)!

j!(s − j + 1)!(2j − s)!. (11)

Remark 2. Kernels in (8) and (10) also work for cases where s is an even number. Indeed, conditions C1,s and C1,s+1 are equivalent. To see this note that, for s = 2, 4, · · · , any kernel K satisfying C1,s+1 meets all the requirements of C1,s. Conversely, suppose that K(2) satisfies C1,s then condition 2 implies that K(2) is an even function and we have readilyR1

−1xs−1K(2)(x)dx = 0. Hence K(2)satisfies C1,s+1. In addition, we note that C1,1 is exactly the same as C1,3.

Example 1. Taking s = 3 in (8) and (10), we get, after simplification of the constant factor with polynomial coefficients, for any x, −1 ≤ x ≤ 1,

K(3)(x) = 945 32

³

7x5− 10x3+ 3x

´

. (12)

The polynomial K(x) with third derivative (12) satisfies C1,3. By remark 2, this kernel also satisfies C1,1 and C1,2. The third derivative (12) as well as the corresponding second and first derivatives are depicted in the LHS plots of Figure 3.

Example 2. Taking s = 5 in (8) and (10), we get, after simplification of the constant factor with polynomial coefficients, for any x, −1 ≤ x ≤ 1,

K(3)(x) = 45045 256

³

− 33x7+ 63x5− 35x3+ 5x

´

. (13)

The polynomial K(x) with third derivative (13) satisfies C1,5. By remark 2, this kernel also satisfies C1,4. The third derivative (13) as well as the corresponding second and first derivatives are depicted in the RHS plots of Figure 3.

3 Kink detection and estimation

The polynomial kernels introduced in the previous section have specific properties with respect to kink estimation. We start with the case of noiseless data where the convolution formula defining kh(t) at (5) approximates the third derivative of f . For a kernel K(·) satisfying C1,s, we have the following properties: i) a kink in f corresponds to the zero- crossing time of kh(t); ii) the zero-crossing time of kh(t) is located between its maximum and minimum. In the case of noisy data we take (6) as an estimator of kh(t) and use the bandwidth parameter h to smooth the noise. We illustrate the step-by-step implementation of our method using simulated examples depicted in Figures 4, 5 and 6.

(a) Localisation step. Using C1,s, the approximation (5) to f(3)(t) may be written as

kh(t) = Lh(t) + O(hs−3), (14)

where

Lh(t) = (±a) h−2K0³ θ − t h

´

. (15)

(8)

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2 2.5

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2 2.5

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2

Figure 4: Illustration of model (1) with xi = i/n, i = 1, . . . , n, with n = 200 and σ = 0 (top), σ = 0.2 (bottom). Left plots: a smooth function with no kink illustrates a function under H0. Right plots: a smooth function with one kink at θ = 0.75 illustrates a function under H1.

Suppose that f has a unique kink of size a > 0 then, by properties of K0, we see that Lh(t) has a unique maximum at t = θ + hq and a unique minimum at t = θ − hq (vice-versa if a < 0). By exploiting properties of K0 near the origin,

Lh(t) ≤ −C h−2, Lh(t) ≥ C h−2, (16) for some positive constant C which depends on a and q (see C1,s, condition 5). From proposition 1, with large probability,

ˆkh(t) = Lh(t) + O(hs−3) + O(n−1/2h−7/2). (17) Combining (16) and (17) we conclude that, if the bandwidth h is chosen large enough so that

h ³ n−1/3−η for some η > 0, (18)

then ˆkh(t) has a unique maximum near t and a unique minimum near t, and we set ˆt = arg min

t∈I

ˆkh(t), ˆt= arg max

t∈I

kˆh(t). (19)

By construction, the interval [ˆt, ˆt] has size O(h) and contains θ with high probability.

This is illustrated in the RHS middle plot of Figure 5.

(b) Kink detection with universal thresholding. The maximum and the minimum at (19) can also be used to support the kink scenario. More formally, we can test the hypothesis H0 : f(1) is smooth (continuous) against H1 : f(1) has at least one jump. To do so we standardise the kernel estimate (6) and compare its extrema to the so-called universal threshold in a fashion similar to that of Wang (1995), see also Donoho et al.

(1995), Raimondo and Tajvidi (2004). The idea is that, under H0, f(1) is continuous so that for small h our estimator ˆkh(t) is dominated by noise, see (17) when Lh(t) = 0. It is well known that the maximum of a standardised Gaussian sequence goes to infinity no

(9)

0 0.2 0.4 0.6 0.8 1

−10

−5 0 5

0.680 0.7 0.72 0.74 0.76 0.78 1

2 3 4 5

0 0.2 0.4 0.6 0.8 1

−1 0 1 2

0 0.2 0.4 0.6 0.8 1

−6

−4

−2 0 2 4

0 0.2 0.4 0.6 0.8 1

−1 0 1 2

Figure 5: Illustration of the zero-crossing-time technique with the noisy data of Figure 4. Top plots: raw data. On the RHS top plot a thick vertical line indicates the location of a kink. Middle plots, plain curve: standardised kernel estimate (20) with h = 0.3; dashed lines: universal threshold

±λ =

2 log n = ±3.25. Using the test (22) on the LHS data no kink is detected; on the RHS data, one kink is detected and located at ˆθ = 0.733 by minimisation (23) within the interval ˆAh.

faster than O(√

2 log n), see e.g. Wang (1995). Introducing a standardised version of our kernel estimate,

Th(t) = σ−1n1/2h7/2ˆkh(t)/¡Z

(K(3)(x))2dx¢1/2

, (20)

and letting In= {i/n, i = 1, . . . , n}, we have, for any h small enough and under H0,

n→∞lim P

³ maxt∈In

Th(t) ≥p 2 log n

´

= 0. (21)

On the other hand, if f(1) has a jump we see from (17) that Lh(t?) ≥ Cn1/2h3/2, hence for any bandwidth h satisfying (18) we have Th(t?) > λ =√

2 log n. By symmetry we also have that Th(t?) < −λ. We detect the presence of a kink when

Φ = 1(max

t∈In

|Th(t)| ≥p

2 log n) (22)

takes the value 1. In practice, the test (22) and the localisation step (19) are done simul- taneously, as illustrated in the middle plots of Figure 5 and Figure 6.

(c) Search for zero-crossing time. If the test (22) takes the value 1 we derive an estimate of the kink location using the zero-crossing-time technique (23) as illustrated in the RHS bottom plot of Figure 5. The idea is that within the interval ˆAh = [ˆt, ˆt], ˆkh(t) ≈ kh(t) so that we can apply the separation rate lemma to locate the zero-crossing time of kh(t). By construction this occurs at t = θ, see (15). Hence, applying the separation rate lemma we can locate θ with an accuracy of order δ, δ < h. This is done by minimising |ˆkh(t)| within Aˆh:

θ = arg minˆ

t∈ ˆAh

|ˆkh(t)| = arg min

t∈ ˆAh

| ˆTh(t)|. (23)

Comparing bounds (17) with those of lemma 1 we see that the minimum at (23) is well defined provided that

δ h−3≥ C0hs−3 and δ h−3 ≥ C00n−1/2h−7/2, (24)

(10)

where C0, C00 are constants. The best rate is obtained by choosing δ as small as possible such that the two bounds at (24) hold. The LHS inequality at (24) gives δ ³ hs which, once substituted into the RHS inequality, gives the optimal choice of the bandwidth,

hopt³ n−1/(2s+1), (25)

deriving an accuracy of order

δopt = hsopt ³ n−s/(2s+1). (26)

Note that the optimal bandwidth (25) satisfies condition (18).

Remark 3. From h to h2. It is worth noting that the improvement in the estimation accuracy achieved from step (a) to step (c) is quite important even for small s. For example, if s = 2 the accuracy derived in step (a) is O(h) whereas the accuracy derived in step (c) is O(h2). This is confirmed by the simulation results of Section 4.

Remark 4. Asymptotic properties. It follows from the minimax theory of [GTZ] that our kink estimator is rate-optimal since the rate (26) can not be improved by any estimator.

This result extends naturally to functions with a jump-point in higher derivatives by sub- stituting ˆkh in (19) and (23) with ˆk(α)h as in remark 1 of Section 2. In the latter case, we can estimate the jump-point with an accuracy of order O(n−s/(2s+2α−1)). A derivation of the multiplicative constant term in (26) may be guided by the preliminary work of [GTZ].

Remark 5. Multiple kink detection and estimation. An attractive feature of our estimation method is that it extends naturally to the multiple kink scenario, thanks to the test (22).

This is illustrated in Figure 6 with two different noise levels. The first step is to compute the standardised statistic (20) and locate (any) exceedances over the universal threshold λ =√

2 log n. The second step is to search the zero-crossing time locally within each interval Aˆh where a kink has been detected.

4 Numerical performance

4.1 Implementation

In the software the convolution (6) is computed in O(n)-steps for direct observations (1) on a regular grid xi= i/n, i = 1, · · · , n. It is anticipated that future versions of the software will include extension to irregular grid design or random grid using standard kernel techniques such as including the design density in (6). As in any change-points estimation problem one needs to deal with edge effects for time points x near 0 or 1. This should be done prior to the localisation/minimisation-steps (19) and (23). In the software the kernel estimator (20) is computed for h/2 ≤ x ≤ 1 − h/2. Better results may be possible by using appropriate boundary kernels, but this is outside the scope of the paper.

4.2 Tuning

The main tuning parameter is the bandwidth h which is chosen so that signal dominates noise in (17). As usual this is a matter of trade-off between bias and variance: a too large bandwidth reduces the variance but increases the bias and vice-versa. ’Differentiation’ is known to amplify noise effects (see proposition 1) hence the bandwidths used to estimate derivatives are typically larger than those used to estimate the function f . There are some

(11)

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−5 0 5

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−5 0 5

0 0.2 0.4 0.6 0.8 1

−4

−2 0

Figure 6: Illustration of model (1) with a smooth function with two kinks located at θ1 = 0.18 and θ2 = 0.75; xi = i/n, i = 1, . . . , n, n = 200 (left plots) and n = 1000 (right plots); top plots with σ = 0; mid-row LHS with σ = 0.2; mid-row RHS with σ = 0.5. Bottom plots, plain curve:

standardised kernel estimate (20) with h = 0.2; dashed lines: universal threshold ±λ = ±3.25 (left),

±λ = ±3.72 (right). All the kinks are detected by the test (22) and located by searching the zero- crossing time (23) within each of the corresponding intervals ˆAh. This yields ˆθ1= 0.185, ˆθ2= 0.74 (LHS), ˆθ1= 0.192, ˆθ2= 0.722 (RHS).

data-driven methods for choosing bandwidths in derivative estimation e.g. the adaptive plug-in method of Herrmann (1997) and Herrmann (2003). These methods can be used to set the bandwidth in kink estimation as the statistic (6) estimates the third derivative of f . There are no universal answers as to which is the best bandwidth to use in practice and this of course depends on the sample size, noise level and Signal to Noise Ratio (SNR). Our simulation results indicate that, when the design range is scaled to x ∈ [0, 1], any value of h within the range 0.2 ≤ h ≤ 0.3 produces good results for data sets where the SNR is greater than 2dB,

SN R(dB)= 10 log10³ ||f||2 σ2

´

. (27)

Larger bandwidths may be used for smaller SNR and vice-versa.

4.3 Simulation study

We study finite sample properties of our kink detection and estimation method using the degree 5 kernel (12) and the two test functions depicted in Figure 4. LHS plot: a smooth function with no kink represents a function under H0; RHS plot: a smooth function with one kink at θ = 0.75 represents a function under H1. In each scenario (H0 or H1) we use two sample sizes (n = 200, n = 1000) and three noise levels (σ = 0.2, 0.35 and 0.5). In this experiment the SNR (27) is between 2dB and 5dB. For lower SNR it becomes very difficult to detect and estimate kinks whereas for larger SNR it is quite easy to detect and estimate kinks. For this experiment (2dB≤SNR≤ 5dB) any bandwidth h within the range 0.2 ≤ h ≤ 0.3 gave very good results. Smaller bandwidths yield smaller power and larger bandwidths yield larger false alarm rate. Using 1000 replications of model (1) under H1and under H0 (as previously described) we computed the Monte Carlo approximation to the false alarm rate, PH0(accept H1), of the test (22) and the Monte-Carlo approximation to

(12)

Table 1: Monte-Carlo approximations with 1000 replications of model (1) with f as in Figure 4.

bandwidth sample size sd (SNR) PH0(accept H1) PH1(accept H1) q

EH1θ − θ)2

h = 0.2 n = 1000 σ =0.2 (10dB) 0.043 0.992 0.0052

h = 0.2 n = 1000 σ =0.35 (5dB) 0.017 0.436 0.0998

h = 0.2 n = 1000 σ =0.5 (2dB) 0.014 0.14 0.0456

h = 0.2 n = 200 σ =0.2 (10dB) 0.07 0.43 0.0443

h = 0.2 n = 200 σ =0.35 (5dB) 0.07 0.14 0.1024

h = 0.2 n = 200 σ =0.5 (2dB) 0.06 0.1 0.16

h = 0.3 n =1000 σ = 0.2 (10dB) 0.84 1 0.008

h = 0.3 n = 1000 σ = 0.35 (5dB) 0.17 1 0.0103

h = 0.3 n = 1000 σ = 0.5 (2dB) 0.07 0.87 0.0178

h = 0.3 n = 200 σ =0.2 (10dB) 0.185 0.995 0.0108

h = 0.3 n = 200 σ =0.35 (5dB) 0.080 0.583 0.0282

h = 0.3 n = 200 σ =0.5 (2dB) 0.063 0.246 0.0611

the power, PH1(accept H1), of the test (22). Under the H1 scenario we also computed the Monte Carlo approximation to the Root Mean Square Error (RMSE) of the kink estimator (23). The results are summarised in Table 1.

Analysis of the results. The pattern seen in Table 1 is consistent with the theoretical properties of the bandwidths (18) and (25). First, we see that larger bandwidths give better results in lower SNR and vice versa. Secondly we see the RMSE decreases as the sample size increases. Importantly, we see that the accuracy of the method is by far superior to the size of the bandwidth O(h) and that relatively large bandwidths produce very good results. This property allows us to detect and estimate kinks in rather large noise levels as illustrated in Figure 5. We also see that the larger bandwidth h = 0.3 yields better power but produces more false alarms than h = 0.2.

4.4 Application to real data

We illustrate our kink detection and estimation method using some benchmark examples borrowed from the smoothing literature: (a) the nursing time of beluga whale data set (Simonoff, 1996), and (b) the motorcycle data set (Silverman, 1985). The two data sets are depicted in Figure 1 and Figure 2 as well as in the top plots of Figure 7 where they have been analysed for kink detection and location. For comparison purposes we depicted in Figure 1 and Figure 2 a kernel smoothing of the data after kink detection and estimation (plain curve) as well as a kernel smoothing of the data using the lokern software of Herrmann (2003) which produces non-parametric fits with a locally varying bandwidth (dashed curve).

A. Nursing time of beluga whale calf. The successful nursing of a beluga whale calf by its natural mother in captivity is an important concern for zoological parks and for the continued survival of the captive beluga population, c.f. Russell et al. (1997). A male beluga calf named Hudson was born at the Aquarium for wildlife conservation in Brooklyn, New York on August 7, 1991. The data examined here, see Figure 1 and RHS of Figure 7, corresponds to roughly the first 55 days of Hudson’s life. In this study, the nursing time, i.e. total suckling time in seconds, was recorded every 6 hours providing us with a sample of

(13)

size n = 224. A rough estimate of the SNR of these data shows that it is greater than 2dB.

We applied our kink detection and estimation method with h = 0.2, which yields a small false alarm rate. The data set tested positive with one kink located at ˆθ = 0.1614 (36th observation). This suggests that a sharp change in Hudson’s nursing behaviour occurred around the 9th day of his life. An examination of the scatter plot in Figure 1 shows that the nursing time rises smoothly up until the 9th day where there is a sharp peak followed by a sudden drop. These results are consistent with the studies on beluga whales of Russell et al. (1997) and Simonoff (1996) (p.160), where it was noted that the nursing time typically peaked at around 7-10 days postpartum and then declined over time. An examination of the scatter plot smoothers in Figure 1 shows a second dip in Hudson’s nursing time at around 30-35 days (0.55 ≤ t ≤ 0.6); this was later revealed to correspond to a bacterial infection, see Russell et al. (1997). A close inspection of our third derivative estimate ˆkh(t) in the LHS bottom plot of Figure 7 shows that the zero-crossing time following the second largest value of ˆkh(t) is around t = 0.6 corresponding to the 35th day of Hudson’s life, which is consistent with previous findings of Russell et al. (1997) and Simonoff (1996), p.160. Our test statistic indicates that this second drop in nursing time is clearly not as significant as the first one and the fits in Figure 1 show that it is well captured by kernel smoothing with a local variable bandwidth.

0 0.5 1

−4

−2 0 2 4 6

0 0.5 1

0 100 200 300 400 500

0 0.5 1

−4

−2 0 2 4 6

0 0.5 1

−150

−100

−50 0 50

Figure 7: Kink analysis of the whale nursing time data n = 224 (left) and the motorcycle data n = 133 (right). Top plots: raw data with kink locations indicated by thick vertical lines. Middle plots, plain curves: standardised kernel estimates (20); dashed lines: universal threshold ±λ = ±3.29 (left), ±λ = ±3.12 (right). For the nursing data (LHS) the test (22) with h = 0.2 detects one kink located at ˆθ = 0.1614 (36th observation). For the motorcycle data (RHS) the test (22) with h = 0.3 detects two kinks located at ˆθ1= 0.4887 (65th observation) and ˆθ2= 0.6992 (93rd observation).

B. Motorcycle data. These data correspond to n = 133 observations of acceleration readings taken through time in an experiment on the efficiency of helmets during impact, see Figure 2 and RHS of Figure 7. The experiment simulates a motorcycle crash and is described in details in Schmidt et al. (1981). In this study the observations are all subject to errors

(14)

and it is of interest both to discern the general shape of the underlying acceleration curve and to draw inference about its minimum and maximum values. In a seminal paper Silver- man (1985) investigated this data set using spline smoothing methods. Spline smoothers capture very well the general shape of the acceleration curve, and enjoys nice data driven tuning. However, as noted in the discussion of Silverman (1985) spline smoothers tend to oversmooth the data over extremely high curvature regions. The resulting estimates have a large bias at places where the local curvature is unusually large or worse if the curve has a kink. This phenomenon renders statistical inference about minimum and maximum difficult. Modern software, e.g. lokern of Herrmann (2003), which produces kernel esti- mators with locally varying bandwidths can enjoy more curvature around sharp turns than traditional smoothers. Nevertheless, locally varying the bandwidth in a data driven fashion as was done in Figure 1 and in Figure 2 fails to capture extreme curvature features or kinks.

More recently, Gijbels and Goderniaux (2004b) have studied the motorcycle data set using a change-point model which reveals kink features in this data set.

We applied our kink detection and estimation method with h = 0.2 and h = 0.3.

For h = 0.3 two kinks, located at ˆθ1 = 0.48 (65th observation) and ˆθ2 = 0.6992 (93rd observation), were detected, see RHS plots of Figure 7 and Figure 2. These results are consistent with the fit of Gijbels and Goderniaux (2004b). The cross-validation method of Gijbels and Goderniaux (2004b) also suggests that there might be a third kink at the time of impact which can also be seen as the first zero-crossing time on the RHS bottom plot of Figure 7, although much less significant than the other two. From the physical point of view the underlying acceleration curve may not have kinks, but sharp turns can be oversmoothed by traditional smoothers whereas they can be captured by fitting kinks to the data, as illustrated in Figure 2.

4.5 Concluding remarks

These preliminary results on kink estimation are quite promising as the method enjoys good computational and theoretical properties. The zero-crossing-time technique produces fine location estimates of sharp curvature features in real data sets with high noise levels.

Acknowledgement. We would like to thank the Editor for helpful suggestions which im- proved the motivations and clarity of this paper. This project began while Marc Raimondo visited National Taiwan University, funded by the Mathematics Division of the National Center of Theoretical Sciences (Taipei).

5 Mathematical appendix

Construction of K(2) via Legendre polynomials. To find a general order K(2) sat- isfying condition C1,s for every s = 3, 5, · · · , we restrict to polynomials of degree s + 3, supported on [-1,1] and having K(3)(−1) = K(3)(1) = 0. These extra constraints are very mild considering condition C1,s and that K(3) is the kernel used in kink estimation. Our construction uses the Legendre polynomials on [−1, 1]:

Pi(x) = di

dxi(1 − x2)iI−1≤x≤1 Xi j=0

pi,jxjI−1≤x≤1, i = 0, 1, 2, · · ·

(15)

Denote µj =R

xjK(2)(x) dx, j = 0, 1, 2, · · · , and write

K(2)(x) = Xs+3 i=0

aiPi(x) .

Without loss of generality, assume that |µs−1| = 1. Then K(2)has a unique global minimum at zero by taking µs−1 = (−1)[s/2]+1. From the moment conditions,

ai||Pi||2 = Z 1

−1

Pi(x) K(2)(x) dx

=









0 , 0 ≤ i ≤ s − 2

(−1)[s/2]+1pi,s−1, i = s − 1 (−1)[s/2]+1pi,s−1+

Xi j=s

pi,jµj, i = s, s + 1, s + 2, s + 3

whence

K(2)(x) = Xs+3 i=s−1

1

||Pi||2 n

(−1)[s/2]+1pi,s−1+ X

s≤j≤i

pi,jµj o

Pi(x) . (28)

Combining (28), the conditions that K(2)(−1) = K(2)(1) = K(3)(−1) = K(3)(1) = 0 and the facts that

Pi(1) = i! (−2)i, Pi(−1) = i! 2i,

Pi0(1) = −i(i + 1)! (−2)i−1, Pi0(−1) = i(i + 1)! 2i−1 we obtain a system of linear equations for µs, · · · , µs+3

0 = Xs+3 i=s−1

1

||Pi||2 n

(−1)[s/2]+1pi,s−1i! (−2)i+ X

s≤j≤i

pi,jµji! (−2)i o

0 = Xs+3 i=s−1

1

||Pi||2 n

(−1)[s/2]+1pi,s−1i! 2i+ X

s≤j≤i

pi,jµji! 2i o

0 = Xs+3 i=s−1

1

||Pi||2 n

(−1)[s/2]+1pi,s−1i(i + 1)! (−2)i−1+ X

s≤j≤i

pi,jµji(i + 1)!(−2)i−1 o

0 = Xs+3 i=s−1

1

||Pi||2 n

(−1)[s/2]+1pi,s−1i(i + 1)! 2i−1+ X

s≤j≤i

pi,jµji(i + 1)! 2i−1 o

which implies that

0 = µs= µs+2 (29)

µs+1 = −(−1)[s/2]+1ps+1,s−1

ps+1,s+1 −(−1)[s/2]+1ps−1,s−1(2s + 3)(s − 1)!||Ps+1||2

2 ps+1,s+1(2s + 5)(s + 1)!||Ps−1||2 (30) µs+3 = (−1)[s/2]+1||Ps+3||2ps−1,s−1(2s + 1) (s − 1)!

16 ps+3,s+3||Ps−1||2(2s + 5)(s + 3)! ps+3,s−1+ ps+3,s+1µs+1 ps+3,s+3 .(31)

(16)

Substituting (29), (30) and (31) back to (28) yields K(2)(x) = (−1)[s/2]+1ps−1,s−1

||Ps−1||2

n

Ps−1(x) − (2s + 3) (s − 1)!

2(2s + 5)(s + 1)!Ps+1(x) + (2s + 1) (s − 1)!

16(2s + 5)(s + 3)!Ps+3(x) o

.

Expanding Ps−1(x), Ps+1 and Ps+3(x) in the above display using Pi(x) =

Xi j=[(i+1)/2]

i!

j!(i − j)!(−1)jx2j−i I−1≤x≤1, i = 0, 1, 2, · · · we obtain, for −1 ≤ x ≤ 1,

K(2)(x) = (−1)[s/2]+1ps−1,s−1

||Ps−1||2

½ Xs−1

j=[s/2]

(−1)j(s − 1)!(2j)!

j!(s − j − 1)!(2j − s + 1)!x2j−s+1

(2s + 3)(s − 1)!

2(2s + 5)(s + 1)!

Xs j=[s/2]

(−1)j+1(s + 1)!(2j + 2)!

(j + 1)!(s − j)!(2j − s + 1)!x2j−s+1

+ (2s + 1) (s − 1)!

16(2s + 5)(s + 3)!

Xs+1 j=[s/2]

(−1)j+2(s + 3)!(2j + 4)!

(j + 2)!(s − j + 1)!(2j − s + 1)!x2j−s+1

¾

= Xs−1 j=[s/2]

(−1)j+[s/2]+1ps−1,s−1(s − 1)!(2j)!

||Ps−1||2j!(s − j − 1)!(2j − s + 1)!

×

·

1 +(2s + 3)(2j + 1) (2s + 5)(s − j)

n

1 + (2s + 1)(2j + 3) 4(2s + 3)(s − j + 1)

x2j−s+1

+(−1)[s/2]+1ps−1,s−1(s − 1)! xs+1 8 ||Ps−1||2

n (2s + 3)!

(s + 1)!2 (2s + 1)(2s + 6)!

2(2s + 5)(s + 3)!2x2 o

. Since

||Pi||2 Z 1

−1

Pi(x)2dx = 22i+1i!2 2i + 1 , the last expression of K(2)(x) reduces to, for −1 ≤ x ≤ 1,

K(2)(x) = (−1)[s/2]+1(2s + 3)!

22s+3(s − 1)!(s + 1)!

Xs+1 j=[s/2]

(−1)j(2j)!

j!(s − j + 1)!(2j − s + 1)!x2j−s+1 as given in (8).

It remains to check that this K(2) fulfills conditions 3, 5 and 6 of assumption C1,s. By the construction and the fact that K(2) is an even degree polynomial, obviously condition 3 holds. Condition 5 is also satisfied since K(2)(0) 6= 0. In addition, one can see that the unique global minimum of K(2) occurs at zero. This together with K(2)(−1) = K(2)(1) = 0 ensure condition 6.

Proof of Lemma 1. We start by the case where f ∈ F2(a) (i.e. s = 2). Changing variable in (5) and integrating by parts using assumption C1,2 gives:

kh(t) = −h−2 Z 1

f0(t + hx)K(2)(x)dx. (32)

(17)

(a) For t such that |t − θ| < h, let τ = (θ − t)/h, −1 ≤ τ ≤ 1, and write (32) in two parts

kh(t) = −h−2 Z τ

−1

f0(t + hx)K(2)(x)dx − h−2 Z 1

τ

f0(t + hx)K(2)(x)dx. (33) Introducing f0) and f0+) inside the first (resp. second) integral on the LHS of (33)

kh(t) = −h−2 Z τ

−1

K(2)(x)f0)dx − h−2 Z 1

τ

K(2)(x)f0+)dx + Jh(t), (34) where

Jh(t) = −h−2 Z τ

−1

¡f0(t + hx) − f0

K(2)(x)dx − h−2 Z 1

τ

¡f0(t + hx) − f0+

K(2)(x)dx.

It follows from (34) that

kh(t) = −h−2f0) h

K(1)(x) iτ

−1− h−2f0+) h

K(1)(x) i1

τ+ Jh(t), since f0 has a jump at θ we have

kh(t) = h−2K(1)(τ )[f0](θ) + Jh(t). (35) First, we bound Jh(t) using (3) with s = 2 and |f(2)(θ)| ≤ L, for all t : |t − θ| < h,

|Jh(t)| ≤ 2 Lh−1R1

−1|(1 + x)||K(2)(x)|dx = c3h−1. At t = θ, τ = 0 and since K(1)(0) = 0, kh(θ) = Jh(θ) which proves bound (a) with s = 2.

(b) For t ∈ Aδ,h, δ < |t − θ| < qh and |τ | ∈ [δ/h, q]. Using this, (2) and assumption C1,2 we see that the first term in the RHS of (35) satisfies

h−2|K(1)(τ )| |[f0](θ)| ≥ h−2|τ ||a| ≥ |a| δ h−3. (36) For any δ such that |a|δh−3 = (c3+ c4)h−1 then applying the triangular inequality to (35),

|kh(t)| ≥ |a| δ h−3− c3h−1≥ c4δ h−3, (37) which proves bound (b) with s = 2.

(c) Starting from (32) and using assumption C1,2 we write kh(t) = −h−2

Z 1

−1

¡f0(t + hx) − f0(t)¢

K(2)(x)dx. (38)

As with |Jh(t)|, it follows that |kh(t)| ≤ c3h−1 as had to be proved.

If f ∈ Fs(a) with s ≥ 3, we derive the same bounds by exploiting extra smoothness assumption (3) provided that the kernel satisfy the vanishing moments property 4 of C1,s.

References

Antoniadis, A. and Gijbels, I. (2002). Detecting abrupt changes by wavelet methods.

J. Nonparametr. Statist. 14 7–29. Statistical models and methods for discontinuous phenomena (Oslo, 1998).

參考文獻

相關文件

as long as every kernel value is composed of two data vectors and stored in a kernel matrix, our proposed method can reverse those kernel values back to the original data.. In

² Stable kernel in a goals hierarchy is used as a basis for establishing the architecture; Goals are organized to form several alternatives based on the types of goals and

Each cabinet requires 250 hours of labor (this is 6 weeks of full time work) and uses 50 board feet of lumber.. Suppose that the company can sell a cabinet for $200, would it

A Comparison of Three Methods of Height Estimation and Their Impact on Low Tidal Volume Ventilation in a Mixed Ethnicity Intensive Care nit: A Real-World

We give both lower and upper bound estimates of the Bergman kernel for a degeneration of Riemann surfaces with constant curvature −1.. As a result, we gave a geometric proof of

If a and b are fixed numbers, find parametric equations for the curve that consists of all possible positions of the point P in the figure, using the angle  as the parameter.. If

11[] If a and b are fixed numbers, find parametric equations for the curve that consists of all possible positions of the point P in the figure, using the angle (J as the

(a) Find parametric equations for the curve C that consists of all possible positions of the point P in the figure using the angle θ as the parameter.. (b) What are the