Kernel methods for optimal change-points estimation in derivatives ∗

(1)

Kernel methods for optimal

change-points estimation in derivatives

^∗

Ming-Yen Cheng^†and Marc Raimondo^‡ National Taiwan University and University of Sydney

16 June 2007

Abstract

In this paper we propose an implementation of the so-called zero-crossing-time de- tection technique specifically designed for estimating the location of jump-points in the first derivative (kinks) of a regression function f . Our algorithm relies on a new class of kernel functions having a second derivative with vanishing moments and an asymmetric first derivative steep enough near the origin. We provide a software package which, for a sample of size n, produces estimators with an accuracy of order, at least, O(n^−2/5).

This contrasts with current algorithms for kink estimation which at best provide an accuracy of order O(n^−1/3). In the software, the kernel statistic is standardised and compared to the universal threshold to test the existence of a kink. A simulation study shows that our algorithm enjoys very good finite sample properties even for low sample sizes. The method reveals kink features in real data sets with high noise levels at places where traditional smoothers tend to oversmooth the data.

1 Introduction

1.1 Aims and motivations

Our aim in this paper is to detect and estimate the location of a jump in the first derivative of a regression function f . While there is a large literature on the estimation of a jump- point in the function f there are only few methods available to estimate the location of a jump in the derivative of f , later referred as a kink. In many real data sets the kink scenario is relevant, however. In some applications there is a particular interest in knowing that a change has occurred in the derivative, for example in time series analysis one can think of f as being a general underlying trend in the model where a kink corresponds to a change in the trend: from an upward trend to a downward trend or vice-versa. For generic scatter plot smoothing, the detection and precise location of kinks can be used to improve the visual and numerical performance of traditional smoothing methods, see e.g. Figure 1 and Figure 2.

∗Key Words and Phrases. Beluga whale nursing time, change-points, derivatives, jump-points, kernel, kink, Legendre polynomials, motorcycle data, optimal rate, zero-crossing time.

Short Title: Change-points estimation in derivatives.

AMS 2000 Subject Classification: primary 62G05; secondary 62G08

†Department of Mathematics, National Taiwan University, Taipei 106, Taiwan. cheng@math.ntu.edu.tw

‡School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia.

marcr@maths.usyd.edu.au

(2)

0.0 0.2 0.4 0.6 0.8 1.0

0100300500

x

y

Figure 1: Scatter plot of the nursing time of beluga whale calf (Simonoff, 1996), with kernel estimates superimposed. The plain curve is a kernel estimate after kink detection and location; the dashed curve is a kernel estimate with the local bandwidth of Herrmann (1997).

1.2 Model and assumptions.

Suppose we observe an unknown function f in white noise,

dY (x) = f (x)dx + σ n^−1/2dW (x), x ∈ I = [0, 1], (1) where W (·) is the standard Wiener process on I, σ > 0 is a constant and n > 0. This model is the continuous equivalent of the nonparametric regression model where one observes Y (x_i) on a regular grid x_i= i/n, i = 1, ..., n, c.f. Brown and Low (1996).

Throughout this paper f^(l)(x) = ∂^lf (x)/∂x^l denotes the derivative of order l of f . The functional class we consider is F_s(a) defined as follows. Let a be a constant (jump size) and s ≥ 2 be an integer (smoothness away from the kink). We say that f ∈ F_s(a) if f⁽¹⁾(x) has a single change-point (jump) at θ, θ ∈ (0, 1):

[f⁽¹⁾](θ) = f⁽¹⁾(θ₊) − f⁽¹⁾(θ₋) = a, (2) and

f⁽¹⁾(θ_±+ x_±) − f⁽¹⁾(θ_±) = x_±f⁽²⁾(θ_±) +x²_±

2 f⁽³⁾(θ_±) + . . . + x^s−1_±

(s − 1)!f^(s)(θ_±) + O(x^s_±), (3) where f⁽¹⁾(θ₊) = lim

x→θ,x>θf⁽¹⁾(x), f⁽¹⁾(θ₋) = lim

x→θ,x<θf⁽¹⁾(x), x₊> 0 and x₋ < 0.

An element f of the class F_s(a) is a function whose order 1 derivative has a jump of size ’a’ at θ. Away from θ, f enjoys derivatives up to order s. In the proof we further assume that the order s derivative of f is bounded by a positive constant L. Condition (3) states that we can approximate f⁽¹⁾(x) from the left- or right-hand side of θ using Taylor’s formula.

(3)

0.0 0.2 0.4 0.6 0.8 1.0

−100−50050

x

y

Figure 2: Scatter plot of the motorcycle data (Silverman, 1985), with kernel estimates superimposed. The plain curve is a kernel estimate after kink detection and location; the dashed curve is a kernel estimate with the local bandwidth of Herrmann (1997).

1.3 Results and contribution

Recent theoretical results, derived in a general indirect setting, show that the optimal rate for estimating the location of a kink is

r(n, s) ³ n^−s/(2s+1), (4)

where s ≥ 2 indicates the smoothness of f away from the kink, see Goldenshluger et al.

(2006) ([GTZ] in the sequel). While the rate optimal estimator of [GTZ] can be applied in general indirect problems, its computation in a direct setting is not straightforward and it has not been implemented in practice.

Our main contribution is to present a direct approach specifically aimed at finding the location of a jump in the derivative from direct observations (1). Our method builds on the optimal construction of [GTZ]; while less general our estimator is more suited for applications. By working with direct observations we use compactly supported high order kernels to estimate the third order derivative of f . This offers significant computational advantages while preserving the optimal asymptotic property (4). It is worth noting that even in the case of slowest possible rate: s = 2, our estimator achieves an accuracy of order O(n^−2/5) which is much faster than the O(n^−1/3) accuracy of existing direct competitive methods. Our kernel approach is explicit, simple and fast to compute. The software package used to prepare most figures and tables in this paper is available from the authors’ home page. In the software the kernel statistic is standardised and compared to the universal

(4)

threshold to test the existence of a kink. Our method extends naturally to multiple kink detection and estimation.

1.4 Related work

When estimating the position of a single jump in f from observations (1), Korostelev (1987) constructed an optimal estimator which converges with the rate O(n⁻¹). Wang (1995) considered the estimation of cusps of order α, 0 ≤ α < 1, deriving a wavelet- based method which converges with the rate O

³

(n⁻¹(log n⁻¹)^−(1+η))^1/(1+2α)

´

, η > 1. Later Raimondo (1998) showed that the log-term could be removed from the previous rate by using a two-step method, deriving a jump and sharp cusp estimator which converges with the rate O¡

n^−1/(1+2α)¢

. The method and rate result of Raimondo (1998) also extends to α ≥ 1, i.e. the estimation of a jump or a cusp in the derivative of f of order bαc, the largest integer smaller than α. For example, when estimating the location of a kink from direct observations (1) the estimator of Raimondo (1998) achieves an accuracy of order O(n^−1/3).

For s ≥ 2 this is slower than the optimal rate (4).

Other work on change-points in non-parametric regression include M¨uller (1992), Ko- rostelev and Tsybakov (1993), Cline et al. (1995), Gijbels et al. (1999) and Huh and Carri`ere (2002). The detection of change-points in non-parametric regression has recently received a lot of attention, see e.g. Antoniadis and Gijbels (2002), Gijbels and Goderniaux (2004a,b), Raimondo and Tajvidi (2004) and Huh and Park (2004). Work on change-points in non- parametric density estimation include Neumann (1997) and Huh (2002). Some recent references on change-point methods include Park and Kim (2004) and Reiss (2004).

1.5 Paper organisation

Section 2 introduces a new class of kernel functions designed for optimal kink estimation.

Section 3 gives a step-by-step description of our detection and estimation method. Section 4 is concerned with numerical performance and application to real data sets. A mathematical appendix is given in Section 5 .

2 Kernels for kink estimation

We propose a new class of kernel functions specially designed for optimal kink estimation.

First, we derive a set of sufficient conditions on the kernel to ensure optimal properties of our kink estimator. Secondly, we give an explicit formula to construct general high order polynomial kernels that can be used in practice.

Our method builds on the zero-crossing-time detection technique of [GTZ] and uses approximations to high order derivatives of f . For example, in the indirect setting [GTZ]

use an approximation to the second derivative of f to locate the position of a jump in f . Here we adapt this method and use an approximation to the third derivative of f to estimate the location of a kink in the direct setting.

(5)

2.1 Kernel estimation of derivatives

Let K be a compactly supported kernel function such that K⁽³⁾ exists and is bounded. A generic kernel approximation of the third derivative of f may be written as

k_h(t) = h⁻⁴ Z ₁

0

K⁽³⁾¡ x − t h

¢f (x) dx , (5)

where h > 0 is a bandwidth. Under some mild smoothness assumptions we have, for small h, k_h(t) ≈ f⁽³⁾(t). In case of noisy observations (1) we define the estimator

kˆ_h(t) = h⁻⁴ Z ₁

0

K⁽³⁾¡ x − t h

¢dY (x) . (6)

The following result can be found in Wand and Jones (1995), p. 49.

Proposition 1. The Gaussian random process Z_h(t) = ˆk_h(t) − k_h(t), t ∈ I, satisfies EZ_h(t) = 0, σ_Z² = sup

t∈I

E[(Z_h(t))²] ≤ c₁n⁻¹h⁻⁷, (7) where c₁ is a constant.

2.2 A separation rate lemma

A key step of the zero-crossing-time technique is the so-called separation rate lemma, c.f.

[GTZ]. Here we propose a version of the separation rate lemma under a set of conditions (assumption C_1,s below) milder than the Fourier domain conditions given in [GTZ]. This allows a simple implementation of the zero-crossing-time technique using compactly supported polynomial kernels as described in the next section. Of particular importance to kink estimation is the behaviour of the first derivative of the kernel K near the origin as well as the number of vanishing moments of the second derivative of K. Some (sufficient) con- ditions which lead to the optimal rate of convergence for kink estimation are given below, where K⁰ denotes the first derivative of K. These conditions depend on the smoothness of f away from the kink.

Assumption C_1,s

1. Support(K) = supp(K) = [−1, 1] and K⁽³⁾ is (at least) Lipschitz continuous.

2. K⁰(x) = −K⁰(−x).

3. K⁰(0) = K⁰(−1) = K⁰(1) = 0, K⁽²⁾(−1) = K⁽²⁾(1) = 0.

4. If s ≥ 3: R₁

−1x^jK⁽²⁾(x)dx = 0, j = 1, ..., s − 2.

(Note that condition 2 implies thatR₁

−1 K⁽²⁾(x)dx = 0)

5. |K⁰(x)| ≥ c₂|x| , for all x ∈ [−q, q], for some constants q, c₂: 0 < q < 1, c₂ > 0.

6. K⁰(x) > 0 on [−b, 0], 0 < b < 1, K⁰(x) has a unique global maximum at −q_∗,

−q_∗∈ [−b, 0] . (By condition 2, K⁰(x) < 0 on [0, b] and has a unique global minimum at q_∗, q_∗ ∈ [0, b]).

The separation rate lemma given next shows that the accuracy with which one can approximate the location of a kink using (5) depends on the smoothness of the underlying function away from the kink, provided that the kernel K is properly chosen.

(6)

Lemma 1. (δ-separation rate) Let f ∈ F_s(a) and K(·) be a kernel satisfying C_1,s. In what follows the constants q and b depend only the kernel K(·). Let h > 0, δ > 0 be such that δ < qh. Let A_δ,h = {t : δ < |t − θ| < qh} and c₅ = c₅(a, K) be a sufficiently large constant.

Then, there exist constants c₃, c₄ such that:

(a) |k_h(θ)| ≤ c₃h^s−3,

(b) for all t ∈ A_δ,h and δ ≥ c₅h^s, |k_h(t)| ≥ c₄δ h⁻³, (c) for all t ∈ I such that |θ − t| > bh, |k_h(t)| ≤ c₃h^s−3.

Remark 1. Lemma 1 extends naturally to functions with a jump-point in derivatives of order α ≥ 1. Define assumption C_α,s similarly as C_1,s with K^(α+2) replacing K⁽³⁾, K^(α+1) replacing K⁽²⁾ and K^(α) replacing K⁽¹⁾. Then lemma 1 holds with k_h(t) = k_h^α(t) = h^−(α+3) R₁

0 K^(α+2)¡_x−t

h

¢f (x) dx.

2.3 General order kernel formula

−1 −0.5 0 0.5 1

−20 0 20

−1 −0.5 0 0.5 1

−200

−100 0 100 200

−1 −0.5 0 0.5 1

−2 0 2

−1 −0.5 0 0.5 1

−10 0 10

−1 −0.5 0 0.5 1

−0.5 0 0.5

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

Figure 3: Top left: K⁽³⁾ defined with the degree 5 polynomial (12). Top right: K⁽³⁾ defined with the degree 7 polynomial (13). Middle and bottom plots give respectively the second, K⁽²⁾, and first, K⁽¹⁾, derivatives.

We provide an explicit formula for a class of general order kink estimation kernels by modification of the Legendre polynomials. First K⁽²⁾ is constructed such that it satisfies condition C_1,s. Technical details are given in Section 5. We begin with cases where s is an odd number, for s = 3, 5, · · · , and −1 ≤ x ≤ 1, let

K⁽²⁾(x) = γ_s Xs+1 j=[s/2]

α_j,sx^2j−s+1, (8)

where the multiplicative constant γ_s = (2s + 3)!/(2^2s+3(s − 1)!(s + 1)!) and the coefficients α_j,s= (−1)^[s/2]+j+1(2j)!

j!(s − j + 1)!(2j − s + 1)!. (9)

(7)

The kink estimation kernel K⁽³⁾ is simply the derivative of K⁽²⁾,

K⁽³⁾(x) = γ_s Xs+1 j=[s/2]+1

β_j,sx^2j−s, (10)

β_j,s= (−1)^[s/2]+j+1(2j)!

j!(s − j + 1)!(2j − s)!. (11)

Remark 2. Kernels in (8) and (10) also work for cases where s is an even number. Indeed, conditions C_1,s and C_1,s+1 are equivalent. To see this note that, for s = 2, 4, · · · , any kernel K satisfying C_1,s+1 meets all the requirements of C_1,s. Conversely, suppose that K⁽²⁾ satisfies C_1,s then condition 2 implies that K⁽²⁾ is an even function and we have readilyR₁

−1x^s−1K⁽²⁾(x)dx = 0. Hence K⁽²⁾satisfies C_1,s+1. In addition, we note that C_1,1 is exactly the same as C_1,3.

Example 1. Taking s = 3 in (8) and (10), we get, after simplification of the constant factor with polynomial coefficients, for any x, −1 ≤ x ≤ 1,

K⁽³⁾(x) = 945 32

³

7x⁵− 10x³+ 3x

´

. (12)

The polynomial K(x) with third derivative (12) satisfies C_1,3. By remark 2, this kernel also satisfies C_1,1 and C_1,2. The third derivative (12) as well as the corresponding second and first derivatives are depicted in the LHS plots of Figure 3.

Example 2. Taking s = 5 in (8) and (10), we get, after simplification of the constant factor with polynomial coefficients, for any x, −1 ≤ x ≤ 1,

K⁽³⁾(x) = 45045 256

³

− 33x⁷+ 63x⁵− 35x³+ 5x

´

. (13)

The polynomial K(x) with third derivative (13) satisfies C_1,5. By remark 2, this kernel also satisfies C_1,4. The third derivative (13) as well as the corresponding second and first derivatives are depicted in the RHS plots of Figure 3.

3 Kink detection and estimation

The polynomial kernels introduced in the previous section have specific properties with respect to kink estimation. We start with the case of noiseless data where the convolution formula defining k_h(t) at (5) approximates the third derivative of f . For a kernel K(·) satisfying C_1,s, we have the following properties: i) a kink in f corresponds to the zero- crossing time of k_h(t); ii) the zero-crossing time of k_h(t) is located between its maximum and minimum. In the case of noisy data we take (6) as an estimator of k_h(t) and use the bandwidth parameter h to smooth the noise. We illustrate the step-by-step implementation of our method using simulated examples depicted in Figures 4, 5 and 6.

(a) Localisation step. Using C_1,s, the approximation (5) to f⁽³⁾(t) may be written as

k_h(t) = L_h(t) + O(h^s−3), (14)

where

L_h(t) = (±a) h⁻²K⁰³ θ − t h

´

. (15)

(8)

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2 2.5

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2 2.5

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2

0 0.2 0.4 0.6 0.8 1

−1

−0.5 0 0.5 1 1.5 2

Figure 4: Illustration of model (1) with xi = i/n, i = 1, . . . , n, with n = 200 and σ = 0 (top), σ = 0.2 (bottom). Left plots: a smooth function with no kink illustrates a function under H0. Right plots: a smooth function with one kink at θ = 0.75 illustrates a function under H1.

Suppose that f has a unique kink of size a > 0 then, by properties of K⁰, we see that L_h(t) has a unique maximum at t^∗ = θ + hq_∗ and a unique minimum at t_∗ = θ − hq_∗ (vice-versa if a < 0). By exploiting properties of K⁰ near the origin,

L_h(t_∗) ≤ −C h⁻², L_h(t^∗) ≥ C h⁻², (16) for some positive constant C which depends on a and q^∗ (see C_1,s, condition 5). From proposition 1, with large probability,

ˆk_h(t) = L_h(t) + O(h^s−3) + O(n^−1/2h^−7/2). (17) Combining (16) and (17) we conclude that, if the bandwidth h is chosen large enough so that

h ³ n^−1/3−η for some η > 0, (18)

then ˆk_h(t) has a unique maximum near t^∗ and a unique minimum near t_∗, and we set ˆt_∗ = arg min

t∈I

ˆk_h(t), ˆt^∗= arg max

t∈I

kˆ_h(t). (19)

By construction, the interval [ˆt_∗, ˆt^∗] has size O(h) and contains θ with high probability.

This is illustrated in the RHS middle plot of Figure 5.

(b) Kink detection with universal thresholding. The maximum and the minimum at (19) can also be used to support the kink scenario. More formally, we can test the hypothesis H₀ : f⁽¹⁾ is smooth (continuous) against H₁ : f⁽¹⁾ has at least one jump. To do so we standardise the kernel estimate (6) and compare its extrema to the so-called universal threshold in a fashion similar to that of Wang (1995), see also Donoho et al.

(1995), Raimondo and Tajvidi (2004). The idea is that, under H₀, f⁽¹⁾ is continuous so that for small h our estimator ˆk_h(t) is dominated by noise, see (17) when L_h(t) = 0. It is well known that the maximum of a standardised Gaussian sequence goes to infinity no

(9)

0 0.2 0.4 0.6 0.8 1

−10

−5 0 5

0.680 0.7 0.72 0.74 0.76 0.78 1

2 3 4 5

0 0.2 0.4 0.6 0.8 1

−1 0 1 2

0 0.2 0.4 0.6 0.8 1

−6

−4

−2 0 2 4

0 0.2 0.4 0.6 0.8 1

−1 0 1 2

Figure 5: Illustration of the zero-crossing-time technique with the noisy data of Figure 4. Top plots: raw data. On the RHS top plot a thick vertical line indicates the location of a kink. Middle plots, plain curve: standardised kernel estimate (20) with h = 0.3; dashed lines: universal threshold

±λ =√

2 log n = ±3.25. Using the test (22) on the LHS data no kink is detected; on the RHS data, one kink is detected and located at ˆθ = 0.733 by minimisation (23) within the interval ˆAh.

faster than O(√

2 log n), see e.g. Wang (1995). Introducing a standardised version of our kernel estimate,

T_h(t) = σ⁻¹n^1/2h^7/2ˆk_h(t)/¡Z

(K⁽³⁾(x))²dx¢_1/2

, (20)

and letting I_n= {i/n, i = 1, . . . , n}, we have, for any h small enough and under H₀,

n→∞lim P

³ maxt∈In

T_h(t) ≥p 2 log n

´

= 0. (21)

On the other hand, if f⁽¹⁾ has a jump we see from (17) that L_h(t^?) ≥ Cn^1/2h^3/2, hence for any bandwidth h satisfying (18) we have T_h(t^?) > λ =√

2 log n. By symmetry we also have that T_h(t_?) < −λ. We detect the presence of a kink when

Φ = 1(max

t∈In

|T_h(t)| ≥p

2 log n) (22)

takes the value 1. In practice, the test (22) and the localisation step (19) are done simul- taneously, as illustrated in the middle plots of Figure 5 and Figure 6.

(c) Search for zero-crossing time. If the test (22) takes the value 1 we derive an estimate of the kink location using the zero-crossing-time technique (23) as illustrated in the RHS bottom plot of Figure 5. The idea is that within the interval ˆA_h = [ˆt_∗, ˆt^∗], ˆk_h(t) ≈ k_h(t) so that we can apply the separation rate lemma to locate the zero-crossing time of k_h(t). By construction this occurs at t = θ, see (15). Hence, applying the separation rate lemma we can locate θ with an accuracy of order δ, δ < h. This is done by minimising |ˆk_h(t)| within Aˆ_h:

θ = arg minˆ

t∈ ˆAh

|ˆk_h(t)| = arg min

t∈ ˆAh

| ˆT_h(t)|. (23)

Comparing bounds (17) with those of lemma 1 we see that the minimum at (23) is well defined provided that

δ h⁻³≥ C⁰h^s−3 and δ h⁻³ ≥ C⁰⁰n^−1/2h^−7/2, (24)

(10)

where C⁰, C⁰⁰ are constants. The best rate is obtained by choosing δ as small as possible such that the two bounds at (24) hold. The LHS inequality at (24) gives δ ³ h^s which, once substituted into the RHS inequality, gives the optimal choice of the bandwidth,

h_opt³ n^−1/(2s+1), (25)

deriving an accuracy of order

δ_opt = h^s_opt ³ n^−s/(2s+1). (26)

Note that the optimal bandwidth (25) satisfies condition (18).

Remark 3. From h to h². It is worth noting that the improvement in the estimation accuracy achieved from step (a) to step (c) is quite important even for small s. For example, if s = 2 the accuracy derived in step (a) is O(h) whereas the accuracy derived in step (c) is O(h²). This is confirmed by the simulation results of Section 4.

Remark 4. Asymptotic properties. It follows from the minimax theory of [GTZ] that our kink estimator is rate-optimal since the rate (26) can not be improved by any estimator.

This result extends naturally to functions with a jump-point in higher derivatives by substituting ˆk_h in (19) and (23) with ˆk^(α)_h as in remark 1 of Section 2. In the latter case, we can estimate the jump-point with an accuracy of order O(n−s/(2s+2α−1)). A derivation of the multiplicative constant term in (26) may be guided by the preliminary work of [GTZ].

Remark 5. Multiple kink detection and estimation. An attractive feature of our estimation method is that it extends naturally to the multiple kink scenario, thanks to the test (22).

This is illustrated in Figure 6 with two different noise levels. The first step is to compute the standardised statistic (20) and locate (any) exceedances over the universal threshold λ =√

2 log n. The second step is to search the zero-crossing time locally within each interval Aˆ_h where a kink has been detected.

4 Numerical performance

4.1 Implementation

In the software the convolution (6) is computed in O(n)-steps for direct observations (1) on a regular grid x_i= i/n, i = 1, · · · , n. It is anticipated that future versions of the software will include extension to irregular grid design or random grid using standard kernel techniques such as including the design density in (6). As in any change-points estimation problem one needs to deal with edge effects for time points x near 0 or 1. This should be done prior to the localisation/minimisation-steps (19) and (23). In the software the kernel estimator (20) is computed for h/2 ≤ x ≤ 1 − h/2. Better results may be possible by using appropriate boundary kernels, but this is outside the scope of the paper.

4.2 Tuning

The main tuning parameter is the bandwidth h which is chosen so that signal dominates noise in (17). As usual this is a matter of trade-off between bias and variance: a too large bandwidth reduces the variance but increases the bias and vice-versa. ’Differentiation’ is known to amplify noise effects (see proposition 1) hence the bandwidths used to estimate derivatives are typically larger than those used to estimate the function f . There are some

(11)

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−5 0 5

0 0.2 0.4 0.6 0.8 1

−2

−1 0

0 0.2 0.4 0.6 0.8 1

−5 0 5

0 0.2 0.4 0.6 0.8 1

−4

−2 0

Figure 6: Illustration of model (1) with a smooth function with two kinks located at θ1 = 0.18 and θ2 = 0.75; xi = i/n, i = 1, . . . , n, n = 200 (left plots) and n = 1000 (right plots); top plots with σ = 0; mid-row LHS with σ = 0.2; mid-row RHS with σ = 0.5. Bottom plots, plain curve:

standardised kernel estimate (20) with h = 0.2; dashed lines: universal threshold ±λ = ±3.25 (left),

±λ = ±3.72 (right). All the kinks are detected by the test (22) and located by searching the zero- crossing time (23) within each of the corresponding intervals ˆAh. This yields ˆθ1= 0.185, ˆθ2= 0.74 (LHS), ˆθ1= 0.192, ˆθ2= 0.722 (RHS).

data-driven methods for choosing bandwidths in derivative estimation e.g. the adaptive plug-in method of Herrmann (1997) and Herrmann (2003). These methods can be used to set the bandwidth in kink estimation as the statistic (6) estimates the third derivative of f . There are no universal answers as to which is the best bandwidth to use in practice and this of course depends on the sample size, noise level and Signal to Noise Ratio (SNR). Our simulation results indicate that, when the design range is scaled to x ∈ [0, 1], any value of h within the range 0.2 ≤ h ≤ 0.3 produces good results for data sets where the SNR is greater than 2dB,

SN R_(dB)= 10 log₁₀³ ||f||² σ²

´

. (27)

Larger bandwidths may be used for smaller SNR and vice-versa.

4.3 Simulation study

We study finite sample properties of our kink detection and estimation method using the degree 5 kernel (12) and the two test functions depicted in Figure 4. LHS plot: a smooth function with no kink represents a function under H₀; RHS plot: a smooth function with one kink at θ = 0.75 represents a function under H₁. In each scenario (H₀ or H₁) we use two sample sizes (n = 200, n = 1000) and three noise levels (σ = 0.2, 0.35 and 0.5). In this experiment the SNR (27) is between 2dB and 5dB. For lower SNR it becomes very difficult to detect and estimate kinks whereas for larger SNR it is quite easy to detect and estimate kinks. For this experiment (2dB≤SNR≤ 5dB) any bandwidth h within the range 0.2 ≤ h ≤ 0.3 gave very good results. Smaller bandwidths yield smaller power and larger bandwidths yield larger false alarm rate. Using 1000 replications of model (1) under H₁and under H₀ (as previously described) we computed the Monte Carlo approximation to the false alarm rate, P_H₀(accept H₁), of the test (22) and the Monte-Carlo approximation to

(12)

Table 1: Monte-Carlo approximations with 1000 replications of model (1) with f as in Figure 4.

bandwidth sample size sd (SNR) PH0(accept H1) PH1(accept H1) q

EH1(ˆθ − θ)²

h = 0.2 n = 1000 σ =0.2 (10dB) 0.043 0.992 0.0052

h = 0.2 n = 1000 σ =0.35 (5dB) 0.017 0.436 0.0998

h = 0.2 n = 1000 σ =0.5 (2dB) 0.014 0.14 0.0456

h = 0.2 n = 200 σ =0.2 (10dB) 0.07 0.43 0.0443

h = 0.2 n = 200 σ =0.35 (5dB) 0.07 0.14 0.1024

h = 0.2 n = 200 σ =0.5 (2dB) 0.06 0.1 0.16

h = 0.3 n =1000 σ = 0.2 (10dB) 0.84 1 0.008

h = 0.3 n = 1000 σ = 0.35 (5dB) 0.17 1 0.0103

h = 0.3 n = 1000 σ = 0.5 (2dB) 0.07 0.87 0.0178

h = 0.3 n = 200 σ =0.2 (10dB) 0.185 0.995 0.0108

h = 0.3 n = 200 σ =0.35 (5dB) 0.080 0.583 0.0282

h = 0.3 n = 200 σ =0.5 (2dB) 0.063 0.246 0.0611

the power, P_H₁(accept H₁), of the test (22). Under the H₁ scenario we also computed the Monte Carlo approximation to the Root Mean Square Error (RMSE) of the kink estimator (23). The results are summarised in Table 1.

Analysis of the results. The pattern seen in Table 1 is consistent with the theoretical properties of the bandwidths (18) and (25). First, we see that larger bandwidths give better results in lower SNR and vice versa. Secondly we see the RMSE decreases as the sample size increases. Importantly, we see that the accuracy of the method is by far superior to the size of the bandwidth O(h) and that relatively large bandwidths produce very good results. This property allows us to detect and estimate kinks in rather large noise levels as illustrated in Figure 5. We also see that the larger bandwidth h = 0.3 yields better power but produces more false alarms than h = 0.2.

4.4 Application to real data

We illustrate our kink detection and estimation method using some benchmark examples borrowed from the smoothing literature: (a) the nursing time of beluga whale data set (Simonoff, 1996), and (b) the motorcycle data set (Silverman, 1985). The two data sets are depicted in Figure 1 and Figure 2 as well as in the top plots of Figure 7 where they have been analysed for kink detection and location. For comparison purposes we depicted in Figure 1 and Figure 2 a kernel smoothing of the data after kink detection and estimation (plain curve) as well as a kernel smoothing of the data using the lokern software of Herrmann (2003) which produces non-parametric fits with a locally varying bandwidth (dashed curve).

A. Nursing time of beluga whale calf. The successful nursing of a beluga whale calf by its natural mother in captivity is an important concern for zoological parks and for the continued survival of the captive beluga population, c.f. Russell et al. (1997). A male beluga calf named Hudson was born at the Aquarium for wildlife conservation in Brooklyn, New York on August 7, 1991. The data examined here, see Figure 1 and RHS of Figure 7, corresponds to roughly the first 55 days of Hudson’s life. In this study, the nursing time, i.e. total suckling time in seconds, was recorded every 6 hours providing us with a sample of

(13)

size n = 224. A rough estimate of the SNR of these data shows that it is greater than 2dB.

We applied our kink detection and estimation method with h = 0.2, which yields a small false alarm rate. The data set tested positive with one kink located at ˆθ = 0.1614 (36th observation). This suggests that a sharp change in Hudson’s nursing behaviour occurred around the 9th day of his life. An examination of the scatter plot in Figure 1 shows that the nursing time rises smoothly up until the 9th day where there is a sharp peak followed by a sudden drop. These results are consistent with the studies on beluga whales of Russell et al. (1997) and Simonoff (1996) (p.160), where it was noted that the nursing time typically peaked at around 7-10 days postpartum and then declined over time. An examination of the scatter plot smoothers in Figure 1 shows a second dip in Hudson’s nursing time at around 30-35 days (0.55 ≤ t ≤ 0.6); this was later revealed to correspond to a bacterial infection, see Russell et al. (1997). A close inspection of our third derivative estimate ˆk_h(t) in the LHS bottom plot of Figure 7 shows that the zero-crossing time following the second largest value of ˆk_h(t) is around t = 0.6 corresponding to the 35th day of Hudson’s life, which is consistent with previous findings of Russell et al. (1997) and Simonoff (1996), p.160. Our test statistic indicates that this second drop in nursing time is clearly not as significant as the first one and the fits in Figure 1 show that it is well captured by kernel smoothing with a local variable bandwidth.

0 0.5 1

−4

−2 0 2 4 6

0 0.5 1

0 100 200 300 400 500

0 0.5 1

−4

−2 0 2 4 6

0 0.5 1

−150

−100

−50 0 50

Figure 7: Kink analysis of the whale nursing time data n = 224 (left) and the motorcycle data n = 133 (right). Top plots: raw data with kink locations indicated by thick vertical lines. Middle plots, plain curves: standardised kernel estimates (20); dashed lines: universal threshold ±λ = ±3.29 (left), ±λ = ±3.12 (right). For the nursing data (LHS) the test (22) with h = 0.2 detects one kink located at ˆθ = 0.1614 (36th observation). For the motorcycle data (RHS) the test (22) with h = 0.3 detects two kinks located at ˆθ1= 0.4887 (65th observation) and ˆθ2= 0.6992 (93rd observation).

B. Motorcycle data. These data correspond to n = 133 observations of acceleration readings taken through time in an experiment on the efficiency of helmets during impact, see Figure 2 and RHS of Figure 7. The experiment simulates a motorcycle crash and is described in details in Schmidt et al. (1981). In this study the observations are all subject to errors

(14)

and it is of interest both to discern the general shape of the underlying acceleration curve and to draw inference about its minimum and maximum values. In a seminal paper Silver- man (1985) investigated this data set using spline smoothing methods. Spline smoothers capture very well the general shape of the acceleration curve, and enjoys nice data driven tuning. However, as noted in the discussion of Silverman (1985) spline smoothers tend to oversmooth the data over extremely high curvature regions. The resulting estimates have a large bias at places where the local curvature is unusually large or worse if the curve has a kink. This phenomenon renders statistical inference about minimum and maximum difficult. Modern software, e.g. lokern of Herrmann (2003), which produces kernel estimators with locally varying bandwidths can enjoy more curvature around sharp turns than traditional smoothers. Nevertheless, locally varying the bandwidth in a data driven fashion as was done in Figure 1 and in Figure 2 fails to capture extreme curvature features or kinks.

More recently, Gijbels and Goderniaux (2004b) have studied the motorcycle data set using a change-point model which reveals kink features in this data set.

We applied our kink detection and estimation method with h = 0.2 and h = 0.3.

For h = 0.3 two kinks, located at ˆθ₁ = 0.48 (65th observation) and ˆθ₂ = 0.6992 (93rd observation), were detected, see RHS plots of Figure 7 and Figure 2. These results are consistent with the fit of Gijbels and Goderniaux (2004b). The cross-validation method of Gijbels and Goderniaux (2004b) also suggests that there might be a third kink at the time of impact which can also be seen as the first zero-crossing time on the RHS bottom plot of Figure 7, although much less significant than the other two. From the physical point of view the underlying acceleration curve may not have kinks, but sharp turns can be oversmoothed by traditional smoothers whereas they can be captured by fitting kinks to the data, as illustrated in Figure 2.

4.5 Concluding remarks

These preliminary results on kink estimation are quite promising as the method enjoys good computational and theoretical properties. The zero-crossing-time technique produces fine location estimates of sharp curvature features in real data sets with high noise levels.

Acknowledgement. We would like to thank the Editor for helpful suggestions which improved the motivations and clarity of this paper. This project began while Marc Raimondo visited National Taiwan University, funded by the Mathematics Division of the National Center of Theoretical Sciences (Taipei).

5 Mathematical appendix

Construction of K⁽²⁾ via Legendre polynomials. To find a general order K⁽²⁾ satisfying condition C_1,s for every s = 3, 5, · · · , we restrict to polynomials of degree s + 3, supported on [-1,1] and having K⁽³⁾(−1) = K⁽³⁾(1) = 0. These extra constraints are very mild considering condition C_1,s and that K⁽³⁾ is the kernel used in kink estimation. Our construction uses the Legendre polynomials on [−1, 1]:

P_i(x) = dⁱ

dxⁱ(1 − x²)ⁱI_−1≤x≤1 ≡ Xi j=0

p_i,jx^jI_−1≤x≤1, i = 0, 1, 2, · · ·

(15)

Denote µ_j =R

x^jK⁽²⁾(x) dx, j = 0, 1, 2, · · · , and write

K⁽²⁾(x) = Xs+3 i=0

a_iP_i(x) .

Without loss of generality, assume that |µ_s−1| = 1. Then K⁽²⁾has a unique global minimum at zero by taking µ_s−1 = (−1)^[s/2]+1. From the moment conditions,

a_i||P_i||² = Z ₁

−1

P_i(x) K⁽²⁾(x) dx

=











0 , 0 ≤ i ≤ s − 2

(−1)^[s/2]+1p_i,s−1, i = s − 1 (−1)^[s/2]+1p_i,s−1+

Xi j=s

p_i,jµ_j, i = s, s + 1, s + 2, s + 3

whence

K⁽²⁾(x) = Xs+3 i=s−1

1

||P_i||² n

(−1)^[s/2]+1p_i,s−1+ X

s≤j≤i

p_i,jµ_j o

P_i(x) . (28)

Combining (28), the conditions that K⁽²⁾(−1) = K⁽²⁾(1) = K⁽³⁾(−1) = K⁽³⁾(1) = 0 and the facts that

P_i(1) = i! (−2)ⁱ, P_i(−1) = i! 2ⁱ,

P_i⁰(1) = −i(i + 1)! (−2)ⁱ⁻¹, P_i⁰(−1) = i(i + 1)! 2ⁱ⁻¹ we obtain a system of linear equations for µ_s, · · · , µ_s+3

0 = Xs+3 i=s−1

1

||P_i||² n

(−1)^[s/2]+1p_i,s−1i! (−2)ⁱ+ X

s≤j≤i

p_i,jµ_ji! (−2)ⁱ o

0 = Xs+3 i=s−1

1

||P_i||² n

(−1)^[s/2]+1p_i,s−1i! 2ⁱ+ X

s≤j≤i

p_i,jµ_ji! 2ⁱ o

0 = Xs+3 i=s−1

1

||P_i||² n

(−1)^[s/2]+1p_i,s−1i(i + 1)! (−2)ⁱ⁻¹+ X

s≤j≤i

p_i,jµ_ji(i + 1)!(−2)ⁱ⁻¹ o

0 = Xs+3 i=s−1

1

||P_i||² n

(−1)^[s/2]+1p_i,s−1i(i + 1)! 2ⁱ⁻¹+ X

s≤j≤i

p_i,jµ_ji(i + 1)! 2ⁱ⁻¹ o

which implies that

0 = µ_s= µ_s+2 (29)

µ_s+1 = −(−1)^[s/2]+1p_s+1,s−1

p_s+1,s+1 −(−1)^[s/2]+1p_s−1,s−1(2s + 3)(s − 1)!||P_s+1||²

2 p_s+1,s+1(2s + 5)(s + 1)!||P_s−1||² (30) µ_s+3 = (−1)^[s/2]+1||P_s+3||²p_s−1,s−1(2s + 1) (s − 1)!

16 p_s+3,s+3||P_s−1||²(2s + 5)(s + 3)! − p_s+3,s−1+ p_s+3,s+1µ_s+1 p_s+3,s+3 .(31)

(16)

Substituting (29), (30) and (31) back to (28) yields K⁽²⁾(x) = (−1)^[s/2]+1p_s−1,s−1

||P_s−1||²

n

P_s−1(x) − (2s + 3) (s − 1)!

2(2s + 5)(s + 1)!P_s+1(x) + (2s + 1) (s − 1)!

16(2s + 5)(s + 3)!P_s+3(x) o

.

Expanding P_s−1(x), P_s+1 and P_s+3(x) in the above display using P_i(x) =

Xi j=[(i+1)/2]

i!

j!(i − j)!(−1)^jx^2j−i I_−1≤x≤1, i = 0, 1, 2, · · · we obtain, for −1 ≤ x ≤ 1,

K⁽²⁾(x) = (−1)^[s/2]+1p_s−1,s−1

||P_s−1||²

½ X_s−1

j=[s/2]

(−1)^j(s − 1)!(2j)!

j!(s − j − 1)!(2j − s + 1)!x^2j−s+1

− (2s + 3)(s − 1)!

2(2s + 5)(s + 1)!

Xs j=[s/2]

(−1)^j+1(s + 1)!(2j + 2)!

(j + 1)!(s − j)!(2j − s + 1)!x^2j−s+1

+ (2s + 1) (s − 1)!

16(2s + 5)(s + 3)!

Xs+1 j=[s/2]

(−1)^j+2(s + 3)!(2j + 4)!

(j + 2)!(s − j + 1)!(2j − s + 1)!x^2j−s+1

¾

= Xs−1 j=[s/2]

(−1)^j+[s/2]+1p_s−1,s−1(s − 1)!(2j)!

||P_s−1||²j!(s − j − 1)!(2j − s + 1)!

×

·

1 +(2s + 3)(2j + 1) (2s + 5)(s − j)

n

1 + (2s + 1)(2j + 3) 4(2s + 3)(s − j + 1)

o¸

x^2j−s+1

+(−1)^[s/2]+1p_s−1,s−1(s − 1)! x^s+1 8 ||P_s−1||²

n (2s + 3)!

(s + 1)!² − (2s + 1)(2s + 6)!

2(2s + 5)(s + 3)!²x² o

. Since

||P_i||² ≡ Z ₁

−1

P_i(x)²dx = 2²ⁱ⁺¹i!² 2i + 1 , the last expression of K⁽²⁾(x) reduces to, for −1 ≤ x ≤ 1,

K⁽²⁾(x) = (−1)^[s/2]+1(2s + 3)!

2^2s+3(s − 1)!(s + 1)!

Xs+1 j=[s/2]

(−1)^j(2j)!

j!(s − j + 1)!(2j − s + 1)!x^2j−s+1 as given in (8).

It remains to check that this K⁽²⁾ fulfills conditions 3, 5 and 6 of assumption C_1,s. By the construction and the fact that K⁽²⁾ is an even degree polynomial, obviously condition 3 holds. Condition 5 is also satisfied since K⁽²⁾(0) 6= 0. In addition, one can see that the unique global minimum of K⁽²⁾ occurs at zero. This together with K⁽²⁾(−1) = K⁽²⁾(1) = 0 ensure condition 6.

Proof of Lemma 1. We start by the case where f ∈ F₂(a) (i.e. s = 2). Changing variable in (5) and integrating by parts using assumption C_1,2 gives:

k_h(t) = −h⁻² Z ₁

f⁰(t + hx)K⁽²⁾(x)dx. (32)

(17)

(a) For t such that |t − θ| < h, let τ = (θ − t)/h, −1 ≤ τ ≤ 1, and write (32) in two parts

k_h(t) = −h⁻² Z _τ

−1

f⁰(t + hx)K⁽²⁾(x)dx − h⁻² Z ₁

τ

f⁰(t + hx)K⁽²⁾(x)dx. (33) Introducing f⁰(θ₋) and f⁰(θ₊) inside the first (resp. second) integral on the LHS of (33)

k_h(t) = −h⁻² Z _τ

−1

K⁽²⁾(x)f⁰(θ₋)dx − h⁻² Z ₁

τ

K⁽²⁾(x)f⁰(θ₊)dx + J_h(t), (34) where

J_h(t) = −h⁻² Z _τ

−1

¡f⁰(t + hx) − f⁰(θ₋)¢

K⁽²⁾(x)dx − h⁻² Z ₁

τ

¡f⁰(t + hx) − f⁰(θ₊)¢

K⁽²⁾(x)dx.

It follows from (34) that

k_h(t) = −h⁻²f⁰(θ₋) h

K⁽¹⁾(x) i_τ

−1− h⁻²f⁰(θ₊) h

K⁽¹⁾(x) i₁

τ+ J_h(t), since f⁰ has a jump at θ we have

k_h(t) = h⁻²K⁽¹⁾(τ )[f⁰](θ) + J_h(t). (35) First, we bound J_h(t) using (3) with s = 2 and |f⁽²⁾(θ)| ≤ L, for all t : |t − θ| < h,

|J_h(t)| ≤ 2 Lh⁻¹R₁

−1|(1 + x)||K⁽²⁾(x)|dx = c₃h⁻¹. At t = θ, τ = 0 and since K⁽¹⁾(0) = 0, k_h(θ) = J_h(θ) which proves bound (a) with s = 2.

(b) For t ∈ A_δ,h, δ < |t − θ| < qh and |τ | ∈ [δ/h, q]. Using this, (2) and assumption C_1,2 we see that the first term in the RHS of (35) satisfies

h⁻²|K⁽¹⁾(τ )| |[f⁰](θ)| ≥ h⁻²|τ ||a| ≥ |a| δ h⁻³. (36) For any δ such that |a|δh⁻³ = (c₃+ c₄)h⁻¹ then applying the triangular inequality to (35),

|k_h(t)| ≥ |a| δ h⁻³− c₃h⁻¹≥ c₄δ h⁻³, (37) which proves bound (b) with s = 2.

(c) Starting from (32) and using assumption C_1,2 we write k_h(t) = −h⁻²

Z ₁

−1

¡f⁰(t + hx) − f⁰(t)¢

K⁽²⁾(x)dx. (38)

As with |J_h(t)|, it follows that |k_h(t)| ≤ c₃h⁻¹ as had to be proved.

If f ∈ F_s(a) with s ≥ 3, we derive the same bounds by exploiting extra smoothness assumption (3) provided that the kernel satisfy the vanishing moments property 4 of C_1,s.

References

Antoniadis, A. and Gijbels, I. (2002). Detecting abrupt changes by wavelet methods.

J. Nonparametr. Statist. 14 7–29. Statistical models and methods for discontinuous phenomena (Oslo, 1998).