UNDER DEPENDENT TRUNCATION

Ming-Yen Cheng* ^{1,2}* Peter Hall

^{1}You-Jun Yang

^{2}

*Dedicated to S´andor Cs¨org˝o on the occasion of his 60th Anniversary*

ABSTRACT. Data truncation is a problem in scientific investigations. So far, sta- tistical models and inferences are mostly based on the assumption that the survival and truncation times are independent, which can be unrealistic in applications.

In a nonparametric setting, we discuss identifiability of the conditional and un- conditional survival and hazard functions when the survival times are subject to dependent truncation, namely, the survival time is dependent on the truncation time. Nonparametric kernel estimators of these unknowns are proposed. Useful- ness of the nonparametric estimators are demonstrated through their theoretical properties, an application and a simulation study.

KEYWORDS. Conditional distribution, dependent truncation, hazard rate, iden- tifiability, kernel, nonparametric estimation, survival function, truncation.

AMS SUBJECT CLASSIFICATION: Primary 62N02, Secondary 62F12.

SHORT TITLE. Dependent Truncation.

1 Centre for Mathematics and its Applications, Australian National University, Canberra, ACT 0200, Australia

2 Department of Mathematics, National Taiwan University, Taipei 106, Taiwan

1. INTRODUCTION

We consider nonparametric inference in truncated survival-data problems, where
the time of onset of a disease, and the time of death, are observable if and only
*if the onset time falls to the left of a time-point t, and the time of death lies to*
*the right of t. Models of this type address settings, commonly seen in prevalent*
*cohort studies, where a population is surveyed at time t, and only individuals who*
have experienced the disease and still have the disease at that time are followed up.

Under this setting, the survival time between the two events is left truncated by the truncation time from the initiating event to enrollment.

In this context there exists a literature on parametric approaches to inference,
and also on nonparametric methods, in cases where the onset time and survival
time are stochastically independent or quasi-independent. We shall discuss shortly
this work. However, this quasi-independent, or quasi-stationary, assumption may
be violated in practice. For example, advances in medical treatment and patient
management may reduce the risk of death for patients with later onset times. To
cope with such dependence, Wang et al. (1993) included a parametric component
in their hazard regression models. Nevertheless, the more general nonparametric
*setting, where t is fixed and no assumptions are made about the relationship between*
onset time and survival time, is not well understood.

At first sight it might appear that there is little hope of making progress with
such a completely general nonparametric approach, since the distributions of on-
set time and survival time are distorted so much by the operation of truncation
*at t. However, a closer inspection reveals that a surprising amount of detail can be*
recovered.

For example, it is possible to nonparametrically identify, up to a constant of
*proportionality and to the right of t, the survival-time density and survival-time*

distribution function. This may be achieved without making any structural as-
sumptions, such as the independence hypothesis mentioned earlier. However, the
*constant of proportionality, and the density to the left of t, are not identifiable from*
a nonparametric viewpoint. Nevertheless, a “working value” of the constant of pro-
portionality is readily obtained by either fitting a model, or proceeding nonpara-
metrically under the assumption of independence. In this way, fully nonparametric
methods can be used to provide a check on the validity of parametric or structural
approaches. For example, a model, or the independence assumption, can be used to
estimate both the constant of proportionality and the density and distribution to
*the right of t. Then, nonparametric estimators of these quantities can be compared*
with the “predictions” of the model.

*Moreover, the conditional hazard rate for survival time s, given that the onset*
*time lies to the right of t − s, is fully identifiable in nonparametric terms, as too*
is a version of the cumulative hazard rate. We shall suggest fully nonparametric
estimators of these quantities and those discussed in the previous paragraph, and
outline their properties.

*A related setting is that where t is random. Here, significantly more information*
*is available for inference than in the fixed-t case, and greater progress can be made*
*under relatively weak assumptions. See, for example, the work of Wang et al. (1993),*
which addressed models for prevalent cohort data where patients are recruited at
different, random time points. In this research the survival-time distribution was
modelled under both independence and dependence assumptions on the initiating
time.

In many contexts the term “survival time” should be replaced by “incubation time,” or “induction time,” or “lag time.” In statistical and mathematical terms these quantities have the same role in the model as survival time. Thus, the descrip-

*tion given by Lagakos et al. (1988) of nonparametric inference for induction time in*
the setting of HIV infection, involves an assumption that is functionally equivalent
to the independence of onset time and survival time in a survival-data problem.

Kalbfleisch and Lawless (1989) presented parametric and nonparametric method- ology under the same independence condition. The nonparametric induction-time distribution estimators in both papers are in the spirit of methods for nonparametric maximum-likelihood estimation discussed by Woodroofe (1985); see also Li (1995).

*If a time-reversal argument is used (see e.g. Lagakos et al., 1988; Kalbfleisch and*
Lawless, 1989) then, after appropriate adjustments, our methods for the conditional
survival and hazard functions can be applied to problems arising from retrospec-
tive studies, where only individuals who have experienced the second event before
*time t are observed. Another closely related situation is that of estimating age-*
specific incidence, where the time horizon is age instead of calendar time. Outside
the realm of biomedical and epidemiological studies, truncated data frequently oc-
cur in actuarial, demographic and physical research. Therefore, our results about
identifiability, and our nonparametric methods, have a wide range of applications.

Keiding (1991) considered estimating age-specific incidence of certain diseases
under independent truncation. Kalbfleisch and Lawless (1991) treated regression of
the survival distribution on other covariates. There further exist many related para-
metric, semiparametric and nonparametric approaches, based on the independence
assumption, for various problems of inference about the survival distribution, cumu-
lative hazard and hazard functions, etc. See, for example, Becker and Marschner
*(1993), Pagano et al. (1994), Gross and Lai (1996), Cui (1999), Grigoletto and*
*Akritas (1999), Sun and Zhou (2001) and Li et al. (2002). Tests for the indepen-*
dence assumption have been proposed by, for example, Tsai (1990), Kalbfkeisch
and Lawless (1991), Efron and Petrosian (1994) and Martin and Betensky (2005).

Since our estimator of the conditional survival distribution conveys the shape of the true curve correctly, it can be used to check the quasi-independence condition and to offer a guild for parametric modeling of the dependence. Further, the proposed methods may be incorporated in semiparametric hazard or survival regression when nonparametric modeling of the dependence is desired.

Section 2 discusses identifiability of the conditional and unconditional survival and hazard functions for both the truncated and original populations. Estimators of these quantities are proposed in section 3. In section 4, an application to a real-data set arising from a cancer study and a simulation study are presented. There, data arise from the right truncation model mentioned two paragraphs before. Section 5 gives theoretical properties of estimators of the conditional distribution, hazard and density functions in the truncated population. Appendix I contains some notation and Appendix II gives a proof of the main theorem.

2. IDENTIFIABILITY ISSUES

*2.1. Model, main problems, and definitions. Assume the random vector (X*^{∗}*, Y** ^{∗}*)

*is distributed in the triangular region 0 < x < y < ∞. We observe (X*

^{∗}*, Y*

*) only if*

^{∗}*X*

^{∗}*≤ t < Y*

^{∗}*, where t > 0 is fixed. Thus, we have no access to information about*

*the distribution of (X*

^{∗}*, Y*

^{∗}*) conditional on X*

^{∗}*> t, and so we shall condition on*

*X*

^{∗}*≤ t. That is, we shall assume that X*

^{∗}*≤ t. This mathematical model describes*

*data resulting from a medical survey, at time t, of a population, where X*

^{∗}*and Y*

*denote respectively the time of onset, and time of cessation, of a particular medical condition in a given individual. Each member of the population who is observed*

^{∗}*at time t to exhibit the condition, is monitored until the condition ceases to be*present.

*Let (X, Y ) have the distribution of (X*^{∗}*, Y*^{∗}*) conditional on Y*^{∗}*> t, and let*

*(X*_{1}*, Y*_{1}*), . . . , (X*_{n}*, Y*_{n}*) be independent random two-vectors distributed as (X, Y ).*

*Using these data we wish to estimate the distribution of the time duration, S** ^{∗}* =

*Y*

^{∗}*− X*

^{∗}*, of the medical condition. Below, we shall refer to S*

*as the survival time;*

^{∗}*if the medical condition is fatal, in which case Y** ^{∗}* denotes the time of death, then
this terminology is standard. We also seek estimators of the hazard function and
cumulative hazard function, each of them conditional on onset time.

*Put S = Y − X, and let f**S*^{∗}*X*^{∗}*(s, x) and f**S*^{∗}*|X*^{∗}*(s | x) denote, respectively, the*
*joint density of S*^{∗}*and X*^{∗}*and the conditional density of S*^{∗}*given X** ^{∗}*. Define

*f*

_{X}

^{∗}*(x), f*

_{X}*(x), etc, analogously. We shall replace f by F when referring to the*corresponding distribution function. In this notation, the survival time distribution

*is F*

*S*

*, and the conditional hazard function and its cumulative are, respectively,*

^{∗}*λ(s | x) =* *f*_{S}^{∗}_{|X}^{∗}*(s | x)*

*1 − F*_{S}^{∗}_{|X}^{∗}*(s | x),* *Λ(s | x) =*
Z _{s}

0

*λ(u | x) du .* *(2.1)*

*2.2. Formulae for the probability of observing (X*^{∗}*, Y*^{∗}*). The definition of the*
*distribution of X, and the fact that f**S*^{∗}*X*^{∗}*= f**X*^{∗}*f*_{S}^{∗}_{|X}* ^{∗}*, entail, for a constant

*q ≥ 1,*

*f**X**(x) = q*
Z _{∞}

*t−x*

*f**S*^{∗}*X*^{∗}*(s, x) ds = q f**X*^{∗}*(x)*
Z _{∞}

*t−x*

*f*_{S}^{∗}_{|X}^{∗}*(s | x) ds*

*= q f**X*^{∗}*(x) {1 − F*_{S}^{∗}_{|X}^{∗}*(t − x | x)} ,*

*for 0 < x ≤ t. Therefore, provided*

*P (S*^{∗}*> t − x | X*^{∗}*= x) > 0 for all 0 < x ≤ t ,* *(2.2)*

*we have, for 0 < x ≤ t,*

*f**X*^{∗}*(x) = p* *f**X**(x)*

*1 − F*_{S}^{∗}_{|X}^{∗}*(t − x | x),* *(2.3)*
*where p = q** ^{−1}* is determined by the fact thatR

*0<x≤t**f**X*^{∗}*(x) dx = 1.*

Note that

*p =*
Z _{t}

0

*{1 − F**S*^{∗}*|X*^{∗}*(t − x | x)} f**X*^{∗}*(x) dx = P {(X*^{∗}*, Y*^{∗}*) is observed} .*

*That is, p equals the proportion of data (X*^{∗}*, Y*^{∗}*) that are observed as pairs (X, Y ).*

*In general the value of p is not identifiable merely from the distribution of (X, Y ).*

*Indeed, we may make p as close to zero, or as close to 1, as we like, by altering*
*the distribution of (X*^{∗}*, Y*^{∗}*) but without affecting the (X, Y ) distribution. However,*
*under more stringent conditions on the distribution of (X*^{∗}*, Y*^{∗}*), p can be identified.*

In particular, if

*S*^{∗}*and X** ^{∗}* are statistically independent

*(2.4)*

*then we may estimate F**S*^{∗}*root-n consistently from n data on (X, Y ) (see e.g.*

Woodroofe, 1985), and thence we may estimate

*p =*

½ Z _{t}

0

*f**X**(x)*

*1 − F*_{S}^{∗}*(t − x)* *dx*

¾_{−1}

*(2.5)*

at the same rate, from the same data. Result (2.5) follows from (2.3), provided (2.2)
*and (2.4) hold. In the context of independence, (2.2) is equivalent to P (S*^{∗}*> t) > 0.*

*He and Yang (1998) suggested an estimator of the truncation probability 1−p using*
a representation different from (2.5).

An alternative approach, more parametric in that it demands particular dis-
*tributions for S*^{∗}*and X** ^{∗}*, but less structured from the viewpoint that it does not

*assume independence, is to fit a model to (X*

^{∗}*, Y*

*) and estimate model parameters*

^{∗}*from the data (X*

*i*

*, Y*

*i*

*). The value of p can then be computed for the model with*parameters fitted.

*In practice, estimating p under either (2.4) or a parametric model provides*
*a “working guide” to choice of p, even if (2.4), or the model, may be false. A*

*graph of nonparametric estimators of F*_{S}^{∗}*, for a range of values of p centred around*
*the working estimator of p, provides information where none might otherwise be*
available.

*2.3. Identifying conditional hazard rate. It follows from the definition of (X, Y )*
*that, provided that s ≥ t − x,*

*P (S > s | X = x) = P (S*^{∗}*> s | S*^{∗}*> t − x , X*^{∗}*= x) ,*
or equivalently, that

*F*¯_{S|X}*(s | x) ≡ 1 − F*_{S|X}*(s | x) =* *1 − F**S*^{∗}*|X*^{∗}*(s | x)*

*1 − F*_{S}^{∗}_{|X}^{∗}*(t − x | x).* *(2.6)*
*Noting the definitions of λ(s | x) and Λ(s | x) at (2.1), and that ¯F*_{S|X}*(t − x | x) = 1,*
*we may deduce from (2.6) that for s ≥ t − x,*

*λ(s | x) = −∂ ¯F*_{S|X}*(s | x)/∂s*

*F*¯_{S|X}*(s | x)* *,* *(2.7)*

*L(s, x) ≡ Λ(s | x) − Λ(t − x | x) = − log*©*F*¯_{S|X}*(s | x)*ª

*.* *(2.8)*

*Since, for this range of values of s, ¯F*_{S|X}*(s | x) is identifiable from data on (X, Y ),*
*then it follows from (2.7) and (2.8) that within the same range, λ(s | x) and Λ(s | x)−*

*Λ(t − x | x) are also identifiable from data on (X, Y ). However, since we can alter*
*the distribution of (X*^{∗}*, S*^{∗}*) in the range s < t − x without affecting the distribution*
*of (X, Y ), then neither λ(s | x) nor Λ(s | x) − Λ(t − x | x) is identifiable for s < t − x,*
*and Λ(s | x) is not identifiable for any s > 0.*

*Of course, λ(s | x) and Λ(s | x) are both identifiable, for s > 0 and 0 < x ≤ t,*
*under the independence assumption, (2.4). In this context the conditioning on x is*
*irrelevant, and λ and Λ are functionals of F**S** ^{∗}*.

*2.4. Identifying overall hazard and distribution functions. Assuming (2.2) holds,*
we may use (2.3) and (2.6) to obtain the formulae,

*F**S*^{∗}*(s) = p*
Z _{t}

0

*f**X**(x) F**S*^{∗}*|X*^{∗}*(s | x)*

*1 − F*_{S}^{∗}_{|X}^{∗}*(t − x | x)dx for s > 0* *(2.9)*

*= 1 − p*
Z _{t}

0

*f**X**(x) ¯F*_{S|X}*(s | x) dx for s ≥ t ,* *(2.10)*
*f*_{S}^{∗}*(s) = p*

Z _{t}

0

*f*_{X}*(x) f*_{S|X}*(s | x) dx* *for s ≥ t .* *(2.11)*

*Let us assume, for the time being, that p is known. Then, in view of (2.9),*
*F**S*^{∗}*is identifiable from data on (X, Y ), if and only if the integral on the right-hand*
*side of (2.9) is identifiable. Since knowing (X, Y ) is equivalent to knowing (S, X),*
*which is constrained by S ≥ t − X, then we cannot, in general, identify F*_{S}^{∗}_{|X}^{∗}*(s | x)*
*outside the range s ≥ t − x. However, if s < t then the integral at (2.9) requires*
*knowledge of F*_{S}^{∗}_{|X}^{∗}*(s | x) for s to the left of t−x. Therefore, if the only information*
*we have is in terms of the distribution of (X, Y ), we cannot identify F**S*^{∗}*(s) outside*
*the range s ≥ t. Indeed, for the reasons just given it is possible to construct two*
*examples of smooth, non-pathological distributions of (X*^{∗}*, Y*^{∗}*) for which p, and the*
*distribution of (X, Y ), are identical, but the values F**S*^{∗}*(s) differ for s < t. Within*
*the identifiable range s ≥ t, (2.10) provides an explicit formula for F*_{S}* ^{∗}* in terms of

*p and the identifiable functions f*

_{X}*and F*

*.*

_{S|X}Formula (2.10) may equivalently be written as

*F**S*^{∗}*(s) = 1 − π P (S > s | S > t) for s ≥ t ,* *(2.12)*

*where the function P (S > s | S > t) is of course identifiable for s ≥ t, and the*
*constant π ≡ P (S*^{∗}*> t) is not identifiable from data on (X, Y ) alone. We argued in*
*section 2.2 that a “working approximation” to p might be constructed by estimating*
*p under the assumption that S*^{∗}*and X** ^{∗}*are independent, or via a parametric model.

*The same approach can be taken for π, and the function P (S > s | S > t) estimated*
*root-n consistently from data on (X, Y ).*

Note particularly that (2.11), and also (2.12), imply that we can identify, up
*to a constant of proportionality and to the right of t, the survival-time density f*_{S}* ^{∗}*.

In addition, from (2.10) and (2.11), the overall hazard rate

*λ*_{S}^{∗}*(s) =* *f*_{S}^{∗}*(s)*
*1 − F**S*^{∗}*(s)*
*is identifiable from data on (X, Y ) for all s ≥ t.*

3. METHODOLOGY

*3.1. Outline of methodology. We shall suggest kernel-based methods for estimating*
*F*_{S|X}*(s | x), using the data (X**i**, Y**i*). The estimators are of Nelson–Aalen or product-
*limit type. They can be applied directly to estimate L(s, x), given at (2.8), and*
*indirectly to estimate λ(s | x) and F**S*^{∗}*(s). Our methods allow these quantities to be*
*estimated throughout the range where they are identifiable, which for L(s, x) and*
*λ(s | x) is s ≥ t − x and, for F*_{S}^{∗}*(s), is s ≥ t.*

*The estimator of F*_{S}^{∗}*(s) may be smoothed, and then differentiated, to give an*
*estimator of f**S*^{∗}*(s) for s ≥ t. An alternative approach to estimating F**S*^{∗}*and f**S** ^{∗}* is
to proceed via the representation (2.12). That method too will be discussed. Our
estimators enjoy optimal nonparametric convergence rates.

*3.2. Estimators of F*_{S|X}*and L. We shall treat only second-order methods, designed*
for estimating conditional distributions or densities with two bounded derivatives.

Higher-order estimators may be treated similarly.

*Let K, a kernel function, be a bounded, compactly supported, symmetric prob-*
*ability density, and let h denote a bandwidth. Put K**h**(u) = h*^{−1}*K(u/h). Two*
*estimators of F*_{S|X}*, consistent in the range s ≥ t − x and 0 < x ≤ t, are:*

*F*b_{S|X}*(s | x) = 1 − exp*

½

*−*
X*n*

*i=1*

*I(S*_{i}*≤ s) K*_{h}*(x − X** _{i}*)
P

*j≤n**I(t − X**j* *≤ S**i* *≤ S**j**) K**h**(x − X**j*)

¾

*,* *(3.1)*

*F*e*S|X**(s | x) = 1 −*
Y*n*
*i=1*

½

*1 −* *K*_{h}*(x − X** _{i}*)

P

*j≤n**I(t − X**j* *≤ S**i* *≤ S**j**) K**h**(x − X**j*)

¾_{I(S}_{i}_{≤s)}*.*

*The first estimator, based on (2.8), derives from a Nelson-Aalen estimator of L(s, x):*

*L(s, x) =*b
X*n*
*i=1*

*I(S*_{i}*≤ s) K*_{h}*(x − X** _{i}*)
P

*j≤n**I(t − X**j* *≤ S**i* *≤ S**j**) K**h**(x − X**j*)*.*

The second estimator, e*F*_{S|X}*(s | x), is of product-limit type, and may be viewed as an*
approximation by Taylor expansion to b*L(s, x). It is analogous to another estimator*
*of L(s, x), eL(s, x) say.*

Result (5.5), below, will show that b*L converges to L at rate (nh)*^{−1/2}*+ h*^{2},
where the first term derives from error-about-the-mean and the second from bias.

A central limit theorem for b*L is obtainable from a bivariate form of (5.7).*

*3.3. Smoothing bF*_{S|X}*and eF*_{S|X}*, and estimating λ. We cannot immediately differ-*
entiate either b*F*_{S|X}*(s | x) or eF*_{S|X}*(s | x) in s, since they are not smooth functions of*
that variable. However, this difficulty can be eliminated by replacing the indica-
*tor function I(S**i* *≤ s), in the definitions of both estimators, by M {(s − S**i**)/H},*
*where M denotes the distribution function corresponding to the density K, and*
*H is another bandwidth. Provided K is continuous, the functions bF**S|X**(s | x) and*
*F*e*S|X**(s | x) now have continuous derivatives with respect to s, and these derivatives*
*can be viewed as estimators of f*_{S|X}*(s | x).*

If we were interested only in smoothing b*F** _{S|X}* and e

*F*

*to remove visually distracting jumps in graphs of function estimates, such as b*

_{S|X}*L(s, x), we would choose*

*H by eye. If we wished to estimate f*

_{S|X}*(s | x) or λ*

_{S|X}*(s | x), we would generally*

*take H = h, so that only one bandwidth had to be chosen. This approach gives*estimators, ˆ

*f*

*and ˆ*

_{S|X}*λ, say, of f*

_{S|X}*and λ, respectively, with optimal convergence*

*rates. Indeed, under the assumption that f*

*SX*has two bounded derivatives, using

*the above smoothing technique, and choosing H = h = h(n) → 0 and nh*

^{2}

*→ ∞*

*as n → ∞, the pointwise convergence rates of ˆf*

*S|X*

*≡ ∂ bF*

*S|X*

*/∂s and ∂ eF*

*S|X*

*/∂s to*

*f*

_{S|X}*(s | x) equal O*

_{p}*{(nh*

^{2})

^{−1/2}*+ h*

^{2}

*}. Taking h to be of size n*

*, this gives the*

^{−1/6}*optimal mean-square convergence rate of n** ^{−2/3}*, for functions with two bounded
derivatives. In the case of ˆ

*f*

*these results follow from (5.6) below, which also implies a central limit theorem for the estimator.*

_{S|X}In more detail, substitution into (2.7) gives

*λ(s | x) =*ˆ *∂ bF*_{S|X}*(s | x)/∂s*
*1 − bF*_{S|X}*(s | x)* =

X*n*
*i=1*

*K**h**(s − S**i**) K**h**(x − X**i*)
P

*j≤n**I(t − X**j* *≤ S**i**≤ S**j**) K**h**(x − X**j*)*, (3.2)*
*f*ˆ_{S|X}*(s | x) =* ©

*1 − bF*_{S|X}*(s | x)*ª*λ(s | x) .*ˆ

*Here, different values of the smoothing parameter h could be used to construct*
*λ(s | x) and b*ˆ *F*_{S|X}*(s | x); the bandwidths would be of sizes n*^{−1/6}*and n** ^{−1/5}*, respec-
tively, reflecting the fact that the curve estimation problems are bivariate for the
former and univariate for the latter.

*3.4. Alternative estimator of F*_{S|X}*, and edge effects. Another approach to estimat-*
*ing F**S|X* is to proceed via a kernel estimator, ˇ*f**SX* *say, of f**SX*:

*f*ˇ*SX**(s, x) =* 1
*nh*^{2}

X*n*
*i=1*

*K*

µ*s − S**i*

*h*

¶
*K*

µ*x − X**i*

*h*

¶
*.*

*Since, for s ≥ t − x,*

*F**S|X**(s | x) =*
R

*t−x<u<s**f**SX**(u, x) du*
R

*u>t−x**f**SX**(u, x) du* *,*
*then, for appropriate choice of the bandwidth h,*

*F*ˇ_{S|X}*(s | x) =*
R

*t−x<u<s**f*ˇ*SX**(u, x) du*
R

*u>t−x**f*ˇ*SX**(u, x) du*
*is a consistent estimator of F**S|X**(s | x).*

A disadvantage of this approach, however, is that it is relatively sensitive to
the impact of edge effects, for example those that occur near the diagonal boundary
*defined by s+x = t. These effects can have substantial influence on the bias of ˇF** _{S|X}*,

with the result that ˇ*F*_{S|X}*(s | x) is, in general, not consistent at either the diagonal*
*boundary or the vertical boundaries defined by x = 0 and x = t. Corrections for*
edge effects need to be incorporated, in particular if ˇ*F** _{S|X}* is used as the basis for

*an estimator of F*

*S*

*. Those adjustments can be tedious and cumbersome.*

^{∗}The estimators b*F** _{S|X}* and e

*F*

*are substantially more robust than ˇ*

_{S|X}*F*

*against edge effects due to bias. See, for example, the first part of the theorem in section 5,*

_{S|X}*where the order of bias, O(h*

^{2}), in (5.5) is valid along the diagonal boundary and

*within h of the vertical boundary. Along the vertical boundaries the order of bias*

*is O(h), rather than O(1) as in the case of ˇF*

*, and so the estimators b*

_{S|X}*F*

*and*

_{S|X}*F*e

*are consistent along the vertical boundaries.*

_{S|X}However, b*F** _{S|X}* and e

*F*

*can suffer problems in the upper tail of the distribu-*

_{S|X}*tion, in particular if it happens that for some (s, x),*

*F*_{S|X}*(s | x) = 1 .* *(3.3)*

Condition (5.1), in our discussion below of theoretical properties of b*F*_{S|X}*(s | x),*
*reflects this point. However, provided the distribution of S given X is unbounded*
*to the right, the identity (3.3) will never arise, and the upper limit of values of s for*
which methods, based on b*F** _{S|X}* or e

*F*

_{S|X}*, can estimate F*

*effectively, will steadily increase with sample size.*

_{S|X}*3.5. Estimators of F**S*^{∗}*, f**S*^{∗}*and λ**S*^{∗}*. One approach to estimating F**S** ^{∗}* is to construct

*an estimator of F*

*S|X*, as suggested in section 3.2, then compute an estimator of ˆ

*f*

*X*

*of f*_{X}*using standard kernel methods, and combine them into an estimator of F*_{S}* ^{∗}*,

*for a particular value of p, by substituting into (2.10):*

*F*b_{S}^{∗}*(s) = 1 − p*
Z _{t}

0

*f*ˆ_{X}*(x) {1 − bF*_{S|X}*(s | x)} dx*

*for s ≥ t. Provided we appropriately undersmooth when constructing ˆf** _{X}* and b

*F*

*,*

_{S|X}*which means choosing a bandwidth h satisfying n*^{−1+δ}*≤ h = O(n** ^{−1/4}*) for some

1

2 *< δ ≤* ^{3}_{4}, the estimator b*F**S*^{∗}*is root-n consistent for F**S** ^{∗}*.
It is simpler, however, to work from (2.12), and take

*F*b*S*^{∗}*(s) = 1 − π bG(s, t)* *(3.4)*

*for s ≥ t, where*

*G(s, t) =*b
P

*i* *I(S**i* *> s)*
P

*i* *I(S*_{i}*> t)* *(3.5)*

*estimates G(s, t) ≡ P (S > s | S > t). Provided P (S > s) > 0, the estimator bG is*
*root-n consistent for G.*

*Discontinuities in s can be removed using the method suggested in section 3.3.*

*Here this amounts to replacing the indicator function I(S*_{i}*> s) in (3.5) by 1 −*
*M {(s − S*_{i}*)/H}, where M is a distribution function and H denotes a bandwidth.*

As in section 3.3, if our reason for smoothing was to remove jumps in b*G( · , t) then*
*H would be small, of order n** ^{−1}*, and would be chosen by eye. On the other hand, if

*our aim was to estimate f*

*S*

^{∗}*then we would use a larger bandwidth. Taking H = h*

*in this case, we differentiate (3.4) with respect to s, obtaining a kernel estimator of*

*f*

*S*

^{∗}*(s):*

*f*ˆ*S*^{∗}*(s) = π*
P

*i* *K**h**(s − S**i*)
P

*i* *I(S**i* *> t)* *(3.6)*

*for s ≥ t, where K = M*^{0}*. Choosing h to be of size n** ^{−1/5}* gives an estimator

*f*ˆ

*S*

^{∗}*with optimal mean-square convergence rate n*

^{−4/5}*, provided π assumes its true*

*value. At the very least, without any attempt to evaluate π, (3.6) gives, for s > t,*

*a completely nonparametric estimator of a function, f*

*S*

^{∗}*(s)/π, that is proportional*

*to the true survival density, f*

_{S}

^{∗}*(s).*

*As suggested in section 2.4, one approach to choosing π would be to use either*
*a parametric model, or the structural, but nonparametric, assumption that S** ^{∗}* and

*X** ^{∗}* are independent, to produce a working value of the non-identifiable quantity

*π = P (S*

^{∗}*> t); and to plot the estimates ˆf*

*S*

^{∗}*(s), s ≥ t, for values of π on either*side of the working value.

Forming ratio of ˆ*f**S*^{∗}*(s) and 1 − bF**S*^{∗}*(s) we have an estimator*
*λ*ˆ*S*^{∗}*(s) =*

P

*i* *K**h**(s − S**i*)
P

*i* *I(S**i* *> s)* *(3.7)*

*of the overall hazard rate λ**S*^{∗}*(s) for all s to the right of t.*

The estimators at (3.6) and (3.7) can suffer edge effects at the left- hand bound-
*ary, i.e. near s = t. These can be removed by conventional means, for example*
through using a boundary kernel there or by binning the data and fitting a local
linear smoother.

4. NUMERICAL STUDY

*4.1. An application. We present here an application to a dataset from a breast*
cancer study conducted by the National Cancer Institute of Canada Clinical Trials
Group. Between April 1998 and July 1999, 305 eligible patients, who had inoperable
metastatic or recurrent breast cancer with an Eastern Cooperative Oncology Group
score of 0 to 2, were entered onto the study from 38 centres in Eastern Europe
*(n = 192), Canada (n = 46), South Africa (n = 34), Western Europe (n = 24), and*
*Australia (n = 9).*

Let e*X** ^{∗}*, e

*Y*

^{∗}*and t (= 355), all expressed in months and calculated from 1st*January 1970, respectively denote the time a patient was first diagnosed with breast cancer, the time the patient had the first recurrence of breast cancer, and the time to the end of the study. Of major interest is the survival time, or progression free time, e

*S*

*= e*

^{∗}*Y*

^{∗}*− eX*

*from the first incidence of breast cancer to the first recurrence.*

^{∗}In this study, only individuals with e*Y*^{∗}*≤ t can be observed. Equivalently,*
the survival time e*S*^{∗}*is subject to right truncation by t − eX** ^{∗}*. Let ( e

*X, eS) denote*

the truncated version of ( e*X*^{∗}*, eS** ^{∗}*), i.e. ( e

*X, eS) equals ( eX*

^{∗}*, eS*

*) conditional on e*

^{∗}*S*

^{∗}*≤*

*t − eX*

*. The left panel of Figure 1 plots e*

^{∗}*X + eS against eX for the 287 recurrent*breast cancer patients, among the total of 305 metastatic or recurrent breast cancer cases.

If the time-reversal transformation

*X*^{∗}*= t − eX*^{∗}*,* *S*^{∗}*= t − eS*^{∗}*,* *X = t − eX ,* *S = t − eS ,* *(4.1)*

*is employed, then S*^{∗}*is subject to left truncation and (X, S) has the distribution*
*of (X*^{∗}*, S*^{∗}*) conditional on S*^{∗}*≥ t − X** ^{∗}*. Hence, the methods proposed in section 3
can be applied to analyse this data set. The right panel of Figure 1 shows the

*transformed data on the truncated population of (X, S). Note that one isolated*point, corresponding to a much earlier initial time of original breast cancer than the others, has been deleted.

In section 4.1.1, the conditional survival and hazard functions of e*S given eX*
are estimated using (4.1) and estimators of the conditional survival and hazard
*functions of S given X given in section 3.2. Section 4.1.2 studies the original*
population ( e*X*^{∗}*, eS*^{∗}*). Under the model in section 2, F**S*^{∗}*(s) is identifiable up to*
*a constant of proportionality and λ**S*^{∗}*(s) is identifiable for s ≥ t; F*_{S}^{∗}_{|X}^{∗}*(s|x) is*
*identifiable up to a constant of proportionality and λ*_{S}^{∗}_{|X}^{∗}*(s|x) is identifiable for*
*s > t − x. Under the model here, neither Fe**S** ^{∗}*(e

*s) nor λe*

*S*

*(e*

^{∗}*s) is identifiable for*

*any 0 < es < t; Fe*

*S*

^{∗}*|*

*X*e

*(e*

^{∗}*s|ex) is identifiable up to a constant of proportionality and*

*λe*

*S*

^{∗}*|*

*X*e

*(e*

^{∗}*s|ex) is identifiable up to a constant for all 0 < es < t − ex.*

*4.1.1. Truncated distribution. Following from the time-reversal transformation*
*(4.1), for all 0 ≤ es ≤ t − ex,*

*Fe**S|**X*e(e*s | ex) = 1 − F*_{S | X}*(s*^{−}*| x) ,* *fe**S|**X*e(e*s | ex) = f*_{S|X}*(s | x) ,*

*where s = t − es and x = t − ex. Substituting into this formula our estimators of*
*F*_{S|X}*(s | x) and f*_{S|X}*(s | x), given in (3.1), we obtain estimators of Fe**S|**X*e(e*s | ex) and*
*fe**S|**X*e(e*s | ex).*

Observe from Figure 1 that truncation is heavy for large e*x values, and that there*
are very few points with small e*x values. Also, there are edge effects. Therefore, in*
the analysis of conditional survival distributions, e*x was fixed at ten middle values*
*220, 230,. . . , 310. The kernel function was K(u) =* ^{15}_{16}*(1 − u*^{2})^{2}*I(|u| ≤ 1). The*
*bandwidth h was taken as 15 for the first five ex values, and 7.5 for the others.*

Figure 2 depicts the estimates of the truncated conditional distribution func-
*tion, Fe**S|**X*e(e*s | ex), over the range 0 ≤ es ≤ t− ex where it is identifiable. The truncated*
conditional distributions are noticeably different. Therefore, analysis based on the
overall distribution of the true survival time e*S** ^{∗}*, instead of on its conditional distri-
bution on e

*X*

*, could be quite misleading.*

^{∗}*Note from (4.1) that the conditional hazard rate λe**S|**X*e(e*s | ex) is*
*λe**S|**X*e(e*s | ex) =* *f**S|X**(s | x)*

*F*_{S|X}*(s*^{−}*| x)* *= λ*_{S | X}*(s | x)* *1 − F**S|X**(s | x)*
*F*_{S|X}*(s*^{−}*| x)* *,*

*for all 0 ≤ es ≤ t− ex. Hence, to estimate λe**S|**X*e(e*s | ex), one could replace the unknowns*
*F*_{S|X}*(s | x) and λ*_{S|X}*(s | x) on the right-hand side by their estimators provided above*
and in (3.2).

*4.1.2. Original distribution. The survival distribution in the original population*
( e*X*^{∗}*, eS** ^{∗}*) is of significant interest. Lagakos et al. (1988) derived the product-limit

*estimator of Fe*

*S*

*(e*

^{∗}*s) using (4.1) and assuming independence between eS*

*and e*

^{∗}*X*

*. Then one can obtain the corresponding Nelson-Aalen estimator of the cumulative*

^{∗}*hazard. The product-limit estimate of Fe*

*S*

*(e*

^{∗}*s) is shown in Figure 3. On the other*

*hand, for all 0 < es < t,*

*Fe**S** ^{∗}*(e

*s) = P (S*

^{∗}*≥ s) = p*Z

_{t}0

*f**X**(x) ¯F**S*^{∗}*|X*^{∗}*(s*^{−}*| x)*

*1 − F*_{S}^{∗}_{|X}^{∗}*(t − x | x)dx .* *(4.2)*

*Hence, without the independence condition, neither Fe**S** ^{∗}*(e

*s) nor λe*

*S*

*(e*

^{∗}*s) is identifiable*using data on ( e

*S, eX) for any 0 < es < t. Even with information on p, this difficulty*remains as the integral in (4.2) cannot be changed to the form in (2.10) for any

*0 < es < t. This in can be seen from the data structure eX*

^{∗}*< eX*

*+ e*

^{∗}*S*

^{∗}*< t where t*is a constant.

*From (2.6) and (4.1), for 0 ≤ es ≤ t − ex,*

*Fe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) = ¯F*

_{S}

^{∗}

_{|X}

^{∗}*(s*

^{−}*| x) = ¯F*

_{S|X}*(s*

^{−}*| x) ¯F*

_{S}

^{∗}

_{|X}

^{∗}*(t − x | x) .*

*(4.3)*

*The conditional truncation probability F*_{S}^{∗}_{|X}^{∗}*(t−x | x) is not identifiable from data*
on the truncated version ( e*S, eX). Under the independence assumption, ¯F*_{S}^{∗}_{|X}^{∗}*(t −*
*x | x) equals ¯F**S*^{∗}*(t − x), or equivalently Fe**S*^{∗}*(t − ex), which can be estimated by the*
product-limit method described before. Combining this working value of ¯*F*_{S}^{∗}_{|X}^{∗}*(t−*

*x | x) and our estimator of ¯F*_{S|X}*(s*^{−}*| x) we obtain an estimator of Fe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) over*

*the range 0 ≤ es ≤ t − ex. The results are also depicted in Figure 3. It shows that the*conditional distribution of the survival time e

*S*

*given the initial time e*

^{∗}*X*

*varies over the initial time. And the product-limit estimate of the unconditional distribution function differs from the estimates of the conditional distribution functions in shape.*

^{∗}Often, the hazard function gives better insight into the survival distribution
*than the survival function does. From (4.3), for 0 ≤ es ≤ t − ex,*

*λe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) = λ*

*S*

^{∗}*|X*

^{∗}*(s | x)*

*F*¯

_{S}

^{∗}

_{|X}

^{∗}*(s | x)*

*F*

_{S}

^{∗}

_{|X}

^{∗}*(s*

^{−}*| x)*

*= λ*_{S}^{∗}_{|X}^{∗}*(s | x)* *F*¯_{S|X}*(s | x) ¯F*_{S}^{∗}_{|X}^{∗}*(t − x | x)*
*1 − ¯F*_{S|X}*(s*^{−}*| x) ¯F*_{S}^{∗}_{|X}^{∗}*(t − x | x).*

*Therefore, while λ*_{S}^{∗}_{|X}^{∗}*(s | x) is identifiable for all s ≥ t − x, λe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) is not*

*identifiable in the region 0 ≤ es ≤ t − ex. An estimator is available using our*

*estimators of F*

*S|X*

*(s | x) and λ*

*S*

^{∗}*|X*

^{∗}*(s | x) (= λ*

*S|X*

*(s | x) for s ≥ t − x) given in (3.1)*

*and (3.2), and a working value of F*

_{S}

^{∗}

_{|X}

^{∗}*(t − x | x), e.g. the product-limit estimator*

*of F*_{S}^{∗}*(t − x), by assuming independence. The results are depicted in Figure 4.*

When e*x = 260 or 270, the first peak in the conditional hazard estimates occurs at*
around e*s = 40 or 20, much earlier than the first peak (around es = 65) in the other*
curves. Alternatively, under independence of e*S** ^{∗}* and e

*X*

*, kernel smoothing the*

^{∗}*Nelson-Aalen estimator yields an estimator of the overall hazard function λe*

*S*

*(e*

^{∗}*s).*

This overall hazard rate estimate, shown in Figure 4, does not resemble any of the conditional hazard rate estimates.

The observations made from Figures 3 and 4 suggest that, for this data set,
dependence between the survival time e*S*^{∗}*and the truncation time t − eX** ^{∗}* exists. In
that case, traditional approaches that rely on the independence assumption, such
as the product-limit and Nelson-Aalen estimators, may not be useful. By contrast,
our methods provide a better description of the dependence by illustrating how the
conditional survival and hazard change over the truncation time.

*4.2. Simulation study. Here we use a simulation study to demonstrate that, in*
the case of dependent truncation, our methods indeed recover information about
the original distribution from the truncated data more accurately than methods
based on the independence assumption. We consider a setting similar to the breast-
*cancer data: n = 300 observations were collected on ( eX, eY ), which equals ( eX*^{∗}*, eY** ^{∗}*)
conditional on e

*X*

^{∗}*≤ eY*

^{∗}*≤ t. Let the truncation time t be 350. Take eX*

^{∗}*= 350 − Z,*

*where Z follows an Exponential distribution with rate 0.03 and is constrained by*

*Z ≤ t (= 350). Suppose the conditional distribution of the survival time, eS*

*=*

^{∗}*Y*e

^{∗}*− eX*

*, given e*

^{∗}*X*

*= e*

^{∗}*x is a log-logistic distribution with density function*

*fe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) =*

*α(ex)*

^{−1}*β(ex)*© e

*s/α(ex)*ª* _{β(}*e

*h*

^{x)−1}1 +© e

*s/α(ex)*ª* _{β(}*e

*x)*i

_{2}

*,*e

*s > 0 .*

The survival and hazard functions are respectively
*Fe*¯*S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) =*

h 1 +©

e

*s/α(ex)*ª* _{β(}*e

*i*

^{x)}

_{−1}*, λe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex) =*

*α(ex)*

^{−1}*β(ex)*© e

*s/α(ex)*ª* _{β(}*e

*x)−1*

h 1 +©

e

*s/α(ex)*ª* _{β(}*e

*x)*i

*.*

*We set α(ex) = exp(ex/70) and β(ex) = (ex + 100)/100. The methods described in*section 4.1 were applied to 1000 random samples generate from this model.

*Panel (a) of Figure 5 plots the overall hazard function λe**S** ^{∗}*(e

*s) and the con-*

*ditional hazard function λe*

*S*

^{∗}*|*

*X*e

*(e*

^{∗}*s | ex) for ex = 210, 230, 250 and 270. Panel (b) of*Figure 5 gives the pointwise 10th, 50th and 90th percentiles of the product-limit

*estimator (by independence assumption) of λe*

*S*

*(e*

^{∗}*s) and the true curve. As expected,*this estimator fails here since independence does not hold and the overall hazard function does not resemble any of the conditional hazard functions. Given in panel (c) of Figure 5 are five typical realizations of the product-limit estimator that cor- respond to the 10th, 30th, 50th, 70th and 90th percentiles of the 1000 integrated squared errors, calculated from zero to the 95th quantile of the true distribution of

*S*e

*. All of them depart from the true curve dramatically.*

^{∗}Figure 6 plots the pointwise 10th, 50th and 90th percentiles of our estimator of
*λe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex), described in section 4.1.2, for ex = 210, 230, 250 and 270. Although the*

*product-limit estimator, used as a working value of F*

_{S}

^{∗}

_{|X}

^{∗}*(t − x | x) in our method,*is biased without the independence assumption, our estimator performs quite well.

Comparing Figures 5 and 6, it is clear that our observation in section 4.1.2 are not just artifacts of random fluctuation. Instead, our estimators indeed recover shapes of the underlying curves accurately.

*We also experimented with replacing the working value of F*_{S}^{∗}_{|X}^{∗}*(t − x | x) by*
*the true value in the estimation of λe**S*^{∗}*|**X*e* ^{∗}*(e

*s | ex). The outcomes are not given here,*but this approach always does better. However, only for large values of e

*s or ex is it*significantly better than our estimator.

*As stated in section 4.1.1, the conditional hazard function λe**S|**X*e(e*s | ex) is iden-*
tifiable; we gave an estimator of it there. Figure 7, presenting performance of our
*estimator of λe**S|**X*e(e*s | ex), is an analog of Figure 6. It validates our estimator for a*
*relatively small sample size, n = 300.*

5. THEORETICAL PROPERTIES

First we discuss the estimator b*F** _{S|X}*, introduced in section 3.2. Results similar to
those below may be derived in the case of e

*F*

*S|X*. Our main theoretical formula, from which our other results will follow, will describe uniform convergence properties of

*F*b

_{S|X}*(s | x) for values of (s, x) satisfying 0 < t − x < s < s*

_{0}

*, where s*

_{0}has the property:

sup

*0<x<t* *F (s*0*| x) < 1 .* *(5.1)*

Since our estimators are based on second-order kernel methods then their biases
*are of order h*^{2}. This is true for both density and distribution estimators. Concise
*expressions for the order h*^{2} bias terms are especially complex, however, and so we
*shall not give them here; they will be represented simply as O(h*^{2}) in asymptotic
approximations. On the other hand, we shall be relatively concise about error-
about-the mean terms.

*Next we introduce notation. Let c*_{j}*and β** _{j}* denote functions, not depending on

*n or h, which are introduced in Appendix I. Put*

*a*1*(s, x ; s*^{0}*, x*^{0}*| h) = K**h**(x − x*^{0}*) c*1*(s, x ; s*^{0}*, x*^{0}*) ,*

*a*2*(s, x ; s*^{0}*, x*^{0}*| h) = K**h**(x − x*^{0}*) K**h**(s − s*^{0}*) c*2*(s, x ; s*^{0}*, x** ^{0}*)

*and b**j**(s, x ; s*^{0}*, x*^{0}*| h) = a**j**(s, x ; s*^{0}*, x*^{0}*| h) − E{a**j**(s, x ; S, X | h)}. We shall make the*
following assumptions:

*the distribution of (S, X) is continuous, with density f**SX*, which has two
*continuous derivatives on its support; the support of f**X*equals the interval
*[0, t], on which f** _{X}* is bounded away from zero;

*(5.2)*

*K is a bounded, symmetric, H¨older-continuous probability density, sup-*

*ported on [−1, 1];* *(5.3)*

*for some δ ∈ (0, 1), h = h(n) satisfies n*^{δ}*h → 0 and n*^{1−δ}*h → ∞.* *(5.4)*

*Theorem 1. Assume (5.2)–(5.4), and that s*0 *satisfies (5.1). Define bF*_{S|X}*by (3.1).*

*Then, uniformly in (s, x) in the range that 0 < t − x < s < s*0 *with h ≤ x ≤ t − h*
*or s ≤ min(t, t − x + h)*

*F*b_{S|X}*(s | x) − F*_{S|X}*(s | x)*

*F*¯_{S|X}*(s | x)* *= h*^{2}*β*_{1}*(s, x) +* 1
*n*

X*n*
*i=1*

*b*_{1}*(s, x ; S*_{i}*, X*_{i}*| h)*
*+ o** _{p}*©

*(nh)*^{−1/2}*+ h*^{2}ª

*.* *(5.5)*

*Theorem 2. Assume the conditions in Theorem 1. If, in the definition at (3.1), we*
*replace I(S*_{i}*≤ s) in the numerator on the right-hand side by M {(s − S*_{i}*)/h}, where*
*M denotes the distribution function corresponding to density K; if we then define*
*f*ˆ_{S|X}*(s | x) as the derivative of bF*_{S|X}*(s | x) with respect to s and ˆλ(s | x) as in (3.2);*

*and if we replace the assumption “n*^{1−δ}*h → ∞” in (5.4) by “n*^{1−δ}*h*^{2} *→ ∞”; then,*
*for (s, x) satisfying 0 < t − x < s < s*0 *with h ≤ x ≤ t − h and s > min(t, t − x + h),*

*λ(s | x) − λ(s | x) = λ(s | x)*ˆ

½

*h*^{2}*β*2*(s, x) +* 1
*n*

X*n*
*i=1*

*b*2*(s, x ; S**i**, X**i**| h)*

¾

*+ o**p*

©*h*^{2}*+ (nh*^{2})* ^{−1/2}*ª

*,* *(5.6)*

*f*ˆ_{S|X}*(s | x) − f*_{S|X}*(s | x) = f*_{S|X}*(s | x)*

½

*h*^{2}*β*3*(s, x) +* 1
*n*

X*n*
*i=1*

*b*2*(s, x ; S**i**, X**i**| h)*

¾

*+ o**p*

©*h*^{2}*+ (nh*^{2})* ^{−1/2}*ª

*.* *(5.7)*

Theorem 1 does not specifically address the order of the bias of b*F*_{S|X}*within h*
*of the vertical boundaries, where x = 0 or x = t. However, the bias there can be*
*shown to be of order h. Formulae for the error-about-the-mean of bF**S|X* in those
places are given in (A.8).

The series on the right-hand side of (5.5) is of course a sum of independent
and identically distributed random variables, and so (5.5) can be used readily to
derive a central limit theorem for b*F*_{S|X}*. Indeed, n** ^{−1}* P

*i* *b*1*(x, s; S**i**, X**i**| h) is asymp-*
*totically normally distributed with zero mean and variance (nh)*^{−1}*σ*1*(s, x)*^{2}, where
*σ*1*(s, x)*^{2} *= κ f**X**(x) E{c*1*(s, x; S, X)*^{2}*} and κ =*R

*K*^{2}. Therefore, (5.5) implies that
*F*b*S|X**(s | x) − F**S|X**(s | x) = ¯F**S|X**(s | x)*©

*(nh)*^{−1/2}*σ*1*(s, x) N**n**(s, x) + h*^{2}*β*1*(s, x)*ª
*+ o**p*

¡*h*^{2}¢

*, (5.8)*
*where the random variable N**n**(s, x) is asymptotically normally distributed with*
zero mean and unit variance.

Standard arguments may be used to prove that, under the conditions of the the-
*orem, n** ^{−1}* P

*i* *b*_{1}*(x, s; S*_{i}*, X*_{i}*| h) is of order (nh)*^{−1/2}*(log n)*^{1/2}*, uniformly in (s, x)*
*in the range 0 < t − x < s < s*0. Hence, (5.5) may also be used to prove that

sup

*(s,x) : 0<t−x<s<s*0*,*
*h≤x≤t−h*

¯¯

¯ b*F*_{S|X}*(s | x) − F*_{S|X}*(s | x)*

¯¯

*¯ = O**p*

©*(nh)*^{−1/2}*(log n)*^{1/2}*+ h*^{2}ª
*.*

Analogously to the way in which (5.8) follows from (5.5), (5.6) and (5.7) imply that ©

*λ*_{S|X}*(s | x)*ª* _{−1}*©

*λ*ˆ

_{S|X}*(s | x) − λ*

_{S|X}*(s | x)*ª

and ©

*f*_{S|X}*(s | x)*ª* _{−1}*©

*f*ˆ

_{S|X}*(s | x) −*

*f*

_{S|X}*(s | x)*ª

are each asymptotically normally distributed with respective asymp-
*totic means h*^{2}*β*2*(s, x) and h*^{2}*β*3*(s, x) and common variance (nh*^{2})^{−1}*σ*_{2}^{2}*(s, x), where*
*σ*_{2}^{2}*(s, x) = κ*^{2}*f*_{SX}*(s, x)E*©

*c*_{2}*(s, x; S, X)*^{2}ª
.

*APPENDIX I: DEFINITIONS OF FUNCTIONS c*_{j}*AND β** _{j}*
Put

*κ**j**(h, s, x) =*
Z

*v : min(0,t−s)<x−vh<t*

*v*^{j}*K(v) dv ,* *ρ**j**(h, s, x) =* *κ**j**(h, s, x)*
*κ*_{0}*(h, s, x),*
*c*_{j}*(s, x ; s*^{0}*, x** ^{0}*) =

*I(j = 1) I(s*

^{0}*≤ s) + I(j = 2)*

*κ*0*(h, s*^{0}*, x) r(s*^{0}*, x)* *−*©

*I(j = 1) I(t − x*^{0}*≤ s) + I(j = 2)*ª

*× I(t − x*^{0}*≤ s** ^{0}*)
Z

_{s}

^{0}*t−x*^{0}

©*I(j = 1) I(u ≤ s) + I(j = 2)*ª

*f*_{SX}*(u, x)*
*κ*0*(h, u, x) r(u, x)*^{2} *du ,*