• 沒有找到結果。

Association Measures and Copula Models Suitable for Truncation Data 9

在文檔中 相依截切資料的統計推論 (頁 15-0)

Chapter 2 Literature Review 6

2.3 Association Measures and Copula Models Suitable for Truncation Data 9

For truncation data, we observe (X,Y) only if X ≤ . Hence joint analysis has to be Y restricted in the upper wedge RU ={(x,y):0≤ xy<∞}. Consequently the aforementioned descriptive measures and models may not be directly applicable to describe (X,Y) if they have a truncation relationship.

Kendall’s tau defined in (2.1) is obviously not identifiable for truncation data. Tsai (1990)

suggested to consider the event ~( )}

) ( :

ij ω ij ω

ij X Y

A = ≤ , where Xij = XiXj ,

j i

ij Y Y

Y~ = ∧

. Notice that under the truncation scheme, as long as Xij Y~ij)∈RU ,

( , or

equivalently XijY~ij , it follows that (Xi,Yi) and (Xj,Yj) are both in R . By U conditioning on the event A , Tsai proposed the modified version of Kendall’s tau such that ij

1 )

| (

2 Δ −

= ij ij

a E A

τ , (2.6)

where )(Xi,Yi and (Xj,Yj) be two independent replications of (X,Y), which are known

to satisfy the truncation scheme with Xi ≤ and Yi Xj ≤ given Yj A . The measure ij τa is a well-defined measure for truncation data.

To measure local dependence for truncation data, Chaieb et al. (2006) adopted Tsai’s idea to modify equation (2.3). Specifically for x≤ they proposed to consider y

~ ) (x, ))

, (

| 1 Pr(

)) (x,

~ ) , (

| 0 ) Pr(

, (

* X Y y

y Y

y X x

ij ij ij

ij ij ij

=

= Δ

=

=

= Δ

θ (2.7)

The value of θ*(x,y) can be interpreted in the same way as ~( , ) y

θ x . Notice that θ*(x,y) in (2.7) and ~( , )

y

θ x in (2.3) differ in the way of choosing the corner position. Specifically for ~( , )

y

θ x , the corner is chosen to be ~) ( , )

~ ,

(Xij Yij = XiXj YiYj while, for truncation data, the corner is ~ ) ( , )

,

(Xij Yij = XiXj YiYj . The measure ~( , ) y

θ x is not appropriate for truncation data since given ~ ) U

,

(Xij YijR , it is still possible that (Xi,Yi) or (Xj,Yj) may fall outside R . In contrast, by choosing U ~ )

,

(Xij Yij as the target in making the conditioning arguments, the two points will fall in R . U

For truncation data, Chaieb et al. (2006) suggested to impose the model structure on the

“semi-survival” function, defined as Pr(Xx,Y>y) (x≤ ), which is a more natural y

descriptive measure than the joint survival function Pr(X>x,Y>y). Furthermore since no information is available in the lower wedge {(x,y):0≤ y< x<∞} , the function

)

| , ( Pr ) ,

(x y = XxY>y XY

π can be identifiable nonparametrically while Pr(Xx,Y>y) is not. Accordingly, adapting to the nature of truncation, Chaieb et al. (2006) suggested to impose the AC structure on π(x,y) such that

c y S x

F y

x, ) [ { X( )} { Y( )}]/ ( φα 1φα φα

π = + (x≤ ), y (2.8)

where FX(⋅) and SY(⋅) are continuous distribution and survival functions respectively and c is a unknown normalizing constant satisfying

∫∫

<

+

− ∂

=

y x

Y

X x S y dxdy

y F

c x 1[ { ( )} { ( )}]

2

α α

α φ φ

φ . (2.9)

Note that under model (2.8), the normalizing constant c may not be the truncation proportion Pr(XY), but it makes the model (2.8) to have a valid density function. Note that when φα(t)=−log(t), quasi-independence between X and Y holds.

2.4 Statistical Inference for Truncated Data under Quasi-Independence

For truncation data, we observe (X,Y) only if X ≤ . Replications of Y (X,Y) are located in the upper wedge RU ={(x,y):0≤xy<∞} . The sample consists of

)}

, , 1 ( ) , (

{ Xj Yj j = … n subject to XjYj . We can consider the sample )}

, , 1 ( ) , (

{ Xj Yj j = … n as iid from the cumulative distribution function )

| , Pr(

) ,

(x y X x Y y X Y

H = ≤ ≤ ≤ . Let X and Y be positive independent random variables having the marginal distribution functions Pr(Xx) and Pr(Yy) . The independence between X and Y cannot be tested from data since the information for the lower wedge is unavailable. Thus, the independence assumption

∫∫

∫∫

=

v u x y

v Y d u X d v u I v Y d u X d v u I y

x

H( , ) ( ) Pr( ) Pr( ) ( ) Pr( ) Pr( )

0 0

may not be acceptable unless independence between X and Y is known from prior knowledge. Instead, Wang, Jewell and Tsai (1986) assumed the model,

H : 0 H x y I u v dFX u dFY v c

x y

/ ) ( ) ( ) ( )

, (

0 0

∫∫

= ,

where FX andFY are arbitrary distribution functions and c is the normalizing constant satisfying

∫∫

=

y x

Y

X x dF y

dF

c0 ( ) ( ).

Tsai (1990) called the assumption under H as “quasi-independence”. 0

Using the semi-survival function, the assumption of quasi-independence can be simplified as

H : 0 Pr(Xx,Y > y|XY)=FX(x)SY(y)/c,

where FX and SY are arbitrary right continuous distribution and survival functions, and c 0 is the normalizing constant satisfying

∫∫

=

y x

Y

X x dS y

dF

c0 ( ) ( ).

Define the support of X as [xL,xU] , where xL =inf{u;FX(u)>0} and }

1 ) (

;

sup{ <

= u F u

xU X . Similarly define the support of Y as [yL,yU] , where }

1 ) (

;

inf{ <

= u S u

yL Y and yU =sup{u;SY(u)>0}. It is usually assumed that xLyU so that c>0 . In general, the true distributions FX and SY cannot be estimated nonparametrically without further assumptions. However the following conditional distributions are estimable:

) ,

| Pr(

)

0(

L U

X x X x X y Y x

F = ≤ ≤ ≥ , )SY0(y)=Pr(Y > y|XyU,YxL .

Under the assumption of quasi-independence, Lynden-Bell (1971) derived the nonparametric maximum likelihood estimators (NPMLE) for the two marginal distributions which can be

expressed as following explicit formula:

> ⎭⎬⎫

⎩⎨

⎧ − − −

=

x u

X R u u

u R u x R

F ( , )

) 0 , ( ) 0 , 1 ( )

ˆ ( ,

⎭⎬⎫

⎩⎨

⎧ − ∞ − ∞ +

=

y u

Y R u u

u R u y R

S ( , )

) , ( ) , 1 (

)

ˆ ( , (2.10)

where R(x,y)

=

= n

j

j

j x Y y

X I

1

) ,

( . Woodroofe (1985) showed the uniform consistency results

0

| ) ( ) ˆ (

|

supx>0 FX xFX0 x ⎯⎯→P ; 0supy>0 |SˆY(y)−SY0(y)|⎯⎯→P .

Wang et al. (1986) derived a simple asymptotic variance for the Lynden-Bell’s estimator, which turns out to be an analogy of the asymptotic variance of the Kaplan-Meier estimator. A necessary condition for the above Lynden-Bell’s estimators to be consistent estimators for

FX and SY is that xU < yU and xL < yL so that FX =FX0 and SY =SY0. In other words, there exists two positive number yL < xU such that

0 ) ( L >

X y

F , SY(yL)=1, 1FX(xU)= and SY(xU)>0.

2.5 Statistical Inference for Dependent Truncation Data

Recall the modified version of Kendall’s tau proposed by Tsai in (2.6):

1 )

| (

2 Δ −

= ij ij

a E A

τ .

Based on the sample consists of {(Xj,Yj)(j=1,…,n)} subject to Xj ≤ , Tsai (1990) Yj proposed to estimate τa by

} 1 {

} { } 2

{

} { )}

)(

sgn{(

ˆ −

⋅ Δ

=

=

<

<

<

<

j i

ij j

i

ij ij

j i

ij j

i

ij j i j i

a I A

A I A

I

A I Y Y X X

τ . (2.11)

Under the semi-survival AC assumption in (2.8), Chaieb et al. (2006) proposed to estimate α by utilizing the concordant information provided by Δ since its (conditional) ij expected value reveals the information of α . Their idea can be viewed as an extension of the

methods by Clayton and Cuzick (1985) for bivariate right censored data and by Fine et al.

(2001) for semi-competing risks data. Specifically under the semi-survival AC model assumption, it follows that

)}

, ( { 1 ) 1 ) , (

~) , (

|

( X Y x y A c x y

E ij ij ij ij

π θα

= +

=

Δ ,

where the relationship between θα(.) and φα (.) is given in equation (2.5). Accordingly they proposed the following estimating function:

< ⎥⎥

⎢⎢

− + Δ

=

j

i ij ij

ij ij ij c ij

w c A w X Y c X Y

U ~)}

, ˆ( { 1 ) 1 ,~

~ ( } { 1 ) ,

~ (

, θ π

α

α

α , (2.12)

where )~ ( ,

, x y

wαc is a weight function and ) , ˆ(x y

π

=

>

= n

i

i

i x Y y n

X I

1

/ ) ,

( .

Note that when ~ ( , ) 1

, x y =

wα c , the above estimating function is equivalent to

<

<

+

=

j i

ij j

i

ij ij

ij ij

ij

a I A

A I Y X c Y

X c

} {

} {

~)}]

, ˆ( { 1 /[

~)}]

, ˆ( { 1 [ ˆ

π θ π

θ

τ α α , (2.13)

where the right-hand side can be viewed as an model-based estimator of τa. Notice that ~ ( , )

c

Uw α involves the truncation proportion parameter c which is

unknown. In the special case of the Clayton model with φα(t)=t(α1) −1 (α >1) and α

θα(v)= , ~ ( , ) c

Uw α depends only on α . This implies that ~ ( , ) c

Uw α alone is not enough for estimation of α . Chaiebl et al. (2006) proposed their second estimating procedure which was motivated by the paper of Rivest and Wells (2001) on marginal estimation for dependent censored data. Their idea was inspired by the paper of Zheng and Klein (1995).

Now we describe the second estimation procedure proposed by Chaiebl et al. (2006). Let t n

t1 < < 2 be ordered observed points of (X1,…,Xn,Y1,…,Yn) and t0 =0 . Define

=

j

j

j t Y t

X I t

t

R(, ) ( , ) . Replacing π(t,t) by R(t,t+)/n in equation (2.8) , they

obtained a set of estimating equations:

)}

( { )}

( ) {

, (

i Y i

X i

i F t S t

n t t

cR α α

α φ φ

φ = +

⎭⎬

⎩⎨

⎧ +

(i=1,…,2n−1). (2.14) To solve the above equations, Chaieb et al. (2006) modified the algorithm of Rivest and Wells (2001) originally proposed for dependent censored data. Specifically they first estimated the jumps, φα{SY(ti)}−φα{SY(ti−)} and φα{FX(ti)}−φα{FX(ti+)}, and then summed them up over all the failure times prior to t to obtain the estimators for φα{FX(t)} and

)}

( {SY t

φα . Then by plugging in all the marginal estimators into the equations in (2.14), an estimating function for c can be obtained. In Section 3 and Section 4, we propose different methods for estimating (α,c) and solving the equations in (2.14), respectively.

Chapter 3 The Proposed Approach for Semi-parametric Inference

In this chapter, we develop a new inference approach to analyzing semi-survival AC models of the form in (2-8). Specifically two types of estimating functions are needed to estimate the unknown parameters, α, c, FX(⋅) and SY(⋅) . One is for estimating the association parameter and the other is related to marginal estimation. The present method is semiparametric in the sense that we do not specify the form of FX(⋅) and SY(⋅), but specify the functional form φα(.).

3.1 Estimation of Association

3.1.1. Conditional Likelihood Approach

In this section, we consider estimation of α under the semi-survival AC model in (2.8).

To simplify the analysis, we assume that there is no ties and, temporarily, we ignore external censoring. The sample consists of {(Xj,Yj)(j =1,…,n)} subject to XjYj. Here we generalize Clayton’s likelihood approach (Clayton, 1978) to truncation data. Define the set of grid points as follows:

⎭⎬

⎩⎨

⎧ ≤ ≤ = = = ≥ =

=

∑ ∑

=

=

1 ) ,

( , 1 ) ,

( ,

| ) , (

1 1

n

j

j j

n

j

j

j x Y y I X x Y y

X I y x y

ϕ x .

For a point (x,y) in ϕ, we can define the “risk set” ℜ(x,y)={i;Xix,Yiy}. Denote )

, (x y

R

=

= n

i

j

j x Y y

X I

1

) ,

( as the number of observations in ℜ(x,y) . Let

=

=

=

=

Δ n

i

j

j xY y

X I y

x

1

) ,

( )

,

( , which indicates whether failure occurs at (x,y). Given r

y x

R( , )= for (x,y)∈ϕ and under model (2.8), the variable Δ(x,y) follows a Bernoulli distribution with the probability

)} about α , we can construct the following conditional likelihood function

However for other members in the AC family, estimation of α requires the information of c. It is important to note that, for most models, ∂logL(α,c)/∂c yields the same estimating function as UL(α,c). This implies that the likelihood function can not identify (α,c) simultaneously. Joint estimation of (α,c)will be discussed later in Section 3.2.

3.1.2 Estimation based on Two-by-Two Tables

Following the ideas proposed by Day et al. (1997) and Wang (2003), we can construct the following 2× table at an observed failure point 2 (x,y) with xy . Let

=

= n ≤ =

i

i

i x Y y

X I dy

x N

1

1( , ) ( , ) and

=

= n = ≥

i

i

i x Y y

X I dy

x N

1

1 ( , ) ( , ) . The table can be represented as follows:

Table: Two-by-two Table for Truncated Data

The odds ratio of the above table is the sample analogy of the cross ratio function )

,

*( y

θ x defined in (2.7). Given the marginal counts, the conditional mean of Δ(x,y) can be derived as a function of θ*(x,y) or θα{cπ(x,y)} under model (2.8). The nuisance parameter π(x,y) can be estimated by πˆ(x,y). Motivated by the log-rank type statistic, we can combine all the tables at different values of (x,y) and then construct the following estimating function

∫∫

⎢ ⎤

+

− − Δ

=

y x

c

w R x y c x y

y x c dy x N y dx y N

x y x w c

U ( , ) 1 { ˆ( , )}

)}

, ˆ( { ) , ( ) , ) (

, ( ) , ( )

,

( , 1 1

π θ

π α θ

α α α

∫∫

⎢ ⎤

+

− − Δ

=

ϕ α

α α θ π

π θ

) , (

, ( , ) 1 { ˆ( , )}

)}

, ( ˆ ) {

, ( ) , (

y x

c R x y c x y

y x y c

x y x

w , (3.3)

where wα,c(x,y) is a weight function. Note that in derivation of (3.3), we use the assumption that the data have no ties and hence N1(dx,y)N1(x,dy)=1 if and only if

ϕ

∈ ) ,

(x y .

3.1.3 Construction based on Concordance Indicators

Here we review the idea proposed by Chaieb et al. (2006) and present a more general version of their estimating function . Based on (2.7) and for x≤ , it follows that y

y

Y = Y > y x

X = Δ(x,y) N1(dx,y) x

X <

) ,

1(x dy

N R(x,y)

~)} be located in the identifiable region R for certain. The following function can be viewed as U a generalization of Oakes’ method (1986):

wαc is a weight function. Note that the estimating function proposed by Chaieb et al. (2006) sets ~ ( , ) 1

, x y =

wα c , and is related to the conditional Kendall’s tau as mentioned in equation (2.13).

3.1.4 Equivalence Condition for Different Approaches

Now we establish the relationship among different estimating functions. This idea was motivated by the analysis of Clayton & Cuzick (1985) who expressed Clayton’s likelihood estimator in terms of concordance/discordance indicators. Consider the truncation setting.

Some algebraic calculations yield the following identity:

∫∫

The above equation provides a unified framework for comparing different estimating functions. Our proposed estimating function UL(α,c) using the conditional likelihood principle, is a special case of Uw(α,c) constructed based on the two-by-two construction with the weight function:

)}

Furthermore UL(α,c) is also a special case of ~ ( , ) c

Uw α , constructed based on the concordance indicators, with the weight function:

)}

, ˆ( { 1 ) , (

)}

, ˆ( { 1 )}

, ˆ( {

)}

, ˆ( ) {

,

~ (

, R x y c x y

y x c y

x c

y x y c

x w c

π θ π θ π

θ π θ

α α α

α α − +

− +

= . (3.7)

The estimator proposed by Chaieb et al. (2006) is Uw(α,c) with

)}

, ˆ( { 1

)}

, ˆ( { 1 ) , ) (

,

, (

y x c

y x c y

x y R

x w c

π θ

π θ

α

α + α

+

− −

= .

Its another representation is of the form ~ ( , ) c

Uw α with ~ ( , ) 1

, x y =

wα c .

The above analysis implies that the three different estimation procedures yield the same form of estimating functions with different choices of the weight function. Now the next question is which weight function produces better results? Some authors such as Fine et al.

(2001) have suggested practical guidelines for choosing the weight function under Clayton model but did not provide any theoretical justification. It seems that no simple theory is available for choosing the optimal weight in the estimating function (3-5). Here we recommend to use UL(α,c) since it utilizes some likelihood information. We will see in our simulations that it also produces more efficient results than the weighted concordance estimator with ~ ( , ) 1

, x y =

wα c .

3.2 Estimation of Marginal Functions and Truncation Probability

3.2.1 The Approach of Chaieb et al. (2006)

Here we adopt the framework of Chaiebl et al. (2006) but propose a different estimating algorithm. Let’s briefly describe their setup. Let t1 < <t2n be ordered observed points of

) , , , , ,

(X1Xn Y1Yn and t0 =0. Replacing π( tt, )by R t t n I X t Y t n

j

j

j , )/

( /

) ,

( +

>

in equation (2-8) with x= y=t, it follows that

)}

( { )}

( ) {

, (

i Y i

X i

i F t S t

n t t

cR α α

α φ φ

φ = +

⎭⎬

⎩⎨

⎧ +

(i=1,…,2n−1). (3.8) The idea of constructing the above estimating equations was motivated by the paper of Rivest and Wells (2001) who considered dependent censoring. For solving the equations, Chaieb et al. (2006) mimicked the approach of Rivest and Wells (2001) by estimating the difference

)}

( { )}

(

{SY tiα SY ti

α φ

φ and φα{FX(ti)}−φα{FX(ti+)}. Then the estimated differences

are summed up to obtain the estimators of (φα{FX(tj)},φα{SY(tj)}). The marginal estimators are plugged into equation (3-8) to obtain an estimating function involving (α,c). We find that it is difficult to understand the algorithm of Chaieb et al. (2006) and hence decide to propose a different algorithm.

3.2.2 Recursive Solution to the Moment Constraints

Here we propose to solve the equations in (3-8) in a different way. Suppose that Fˆ and X

Sˆ are step functions with jumps only at observed points. Then, the unknown parameters are Y 2

2 1

1), , ( ), ( ), , ( )}

( , ,

c FX XFX Xn SY Y − … SY Yn− ∈R n+ .

Total 2n+2 non-homogeneous moment constraints are needed to produce a unique solution to the set of equations. However (3.8) only contains 2n−1equations which permit numerous solutions. With no prior information at hand, two boundary conditions ˆ ( ) 1

1 2n =

X t

F and

1 ) ˆ (

1 =

t

SY would provide reasonable candidates for the additional constraints to be added into (3.8). Together with the constraint UL(α,c)=0 of the likelihood equation, we obtain the full

2

2n+ equations, giving a unique moment estimator for

)}

( , ), ( ), ( , ), ( , ,

c FX X1FX Xn SY Y1− … SY Yn− .

Fixing an arbitrary value for (α,c), we regard an equation in (3.8) as an estimating function for {FX(ti),SY(ti)}. For instance, the initial constraint ˆ ( ) 1

1 =

t

SY immediately gives

the solution (FˆX(t1)=cR(t1,t1+)/n,SˆY(tj)=1). The proposed procedure can be performed

We can show that the solutions to the above algorithm have the following explicit formula:

⎟⎠ (3.11) reduce to the Lynden-Bell’s estimators and the natural estimator of the truncation proportion (He and Yang, 1998). It is worthy to note that the representation of the

Lynden-Bell’s estimator as a solution to the moment equation in (3.8) with φα(t)=−log(t) is new in the literature. Compared with the traditional expression as a product-limit estimator, our approach provides a more general estimating scheme which allows for dependent truncation.

In principle, any other boundary constraints imposed on FX(t2n1) and SY(t1) can give a different but unique solution to (3.8) and UL(α,c)=0. Here, our subjective choice of using

1 ) ˆ (

1 2n =

X t

F and ˆ ( ) 1

1 =

t

SY facilitates the proposed recursive algorithms that leads the explicit solutions in (3.9) , (3.10) and (3.11). Compared with the results of Chaieb et al. (2006), the proposed estimators based on (3.9) and (3.11) are different from theirs. However, the proposed estimator in (3.10) is identical to the estimator proposed by Chaieb et al. (2006).

3.3 Asymptotic Analysis

3.3.1 General Results for Asymptotic Properties

Under the regularity conditions (A-I)~(A-V) listed in Appendix 3.A (part I), the estimators )(αˆ,cˆ which jointly solve UL(α,c)=0 in (3.2) and Uc(α,c)=0 in (3.11) are consistent and asymptotically normal. Weak convergence of the marginal estimators is also established. The results are formally stated in the following theorems.

Theorem 3.1 Random vector (αˆ,cˆ) is consistent.

Theorem 3.2 The random vector n1/2(αˆ −α0,cˆ−c0)T converges in distribution to a bivairate normal distribution with mean-zero and the covariance matirix given by A1B(A1)T , where [ ( , )]

0

0, X Y

U E

A= α c , [ ( , ) ( , ) ]

0 0 0

0, ,

T c

c X Y U X Y

U E

B= α α

and the definitions of ( , )

0

0, X Y

Uα c and ( , )

0

0, X Y

Uα c are given in (A.4).

Theorem 3.3 The bivariate stochastic process n1/2(SˆY(t)−SY(t),FˆX(t)−FX(t))T indexed by a single time t∈[0,∞) convergences weakly to the mean-zero Gaussian random field G(t)=(GX(t),GY(t))T in the space {D[0,∞)}2 with the covariance function given in equation (A.4). for 0≤ ts, <∞.

Note that Chaieb et al. (2006) establish similar results for their estimator which solves )

,

~ ( c

Uw α with ~ ( , ) 1

, x y =

wα c by applying properties of U-statistics. However this approach may not be applicable when ~ ( , )

, x y

wαc involves the plugged-in estimator πˆ(x,y) as in our case. Here we take a different approach which can handle more general weight functions. Specifically asymptotic linear representations of the proposed estimating functions are obtained. By applying the functional delta method (Van Der Vaart, 1998, theorem 20.8) and properties of empirical processes, large-sample properties of the proposed estimators can be established. The sketch of the proof is given in Appendix 3.A (part II). Since the analytic derivations involve complicated formula, we suggest to use the jackknife method or other re-sampling tools for variance estimation. This approach is also suggested by Chaieb et al.

(2006).

3.3.2 Asymptotic Behavior under Independence

Given )φα(t)=−log(t , the condition for quasi-independence, the asymptotic expression of 0Uα,c(Xi,Yi)= in Appendix A. (part V) reduces to the iid representation obtained in both Stute (1993) and He and Yang (1998). Specifically it follows that

) 1 ) (

; , ( ) (

)

; , ( ) 1 (

) ( ) ˆ (

) ( ) ˆ (

1 2 / 1 2

/ 1

p n

i i i

X X

i i Y Y

X X

Y

Y o

t Y X L t F

t Y X L t S t n

F t F

t S t

n S ⎥+

⎢ ⎤

= −

⎥⎦

⎢ ⎤

=

,

where

The linear expression can be estimated by:

) ,

The above expression implies that the variance can be estimated by:

On the other hand, Wang, Jewell & Tsai (1986) suggested the Greenwood-type estimator:

Now we numerically compare the two different approaches for estimating the asymptotic variance. The variables (X,Y)were generated from independent exponential distributions with hazard rates (λ12) having the support [0,xU] and [0,∞ respectively. The point ) estimate for the variance estimator for Fˆ is compared for X n=50 and n=1000. Two point estimates exhibit a little numerical difference in the small sample with n=50. When

=1000

n , the difference seems negligible.

Table 3.1. Comparison of Two Variance Estimates based on n=50,xU =10

Table 3.2. Comparison of Two Variance Estimates based on n=1000,xU =10

The asymptotic expression via influence functions has significant advantage when we study the joint behavior of (FˆX,SˆY). Now we fix a point (x,y)∈RU. Based on the asymptotic linear expression,

⎥⎦

⎢ ⎤

⎡ ⎟⎟

⎜⎜ ⎞

⎟⎟ ⎛

⎜⎜ ⎞

⎯→ ⎛

⎥⎯

⎢ ⎤

X XY

XY d Y

X X

Y Y

V V

V N V

x F x F

y S y n S

0 0 )

( ) ˆ (

) ( ) ˆ (

. )

,

1 λ2 FX(t)

Based on influence function ))

ˆ ( ˆ( nF t

V X

Base on WJT(1986) )) ˆ ( ˆ( nF t

V X

0.259 (t=0.2) 0.1957 0.1953

0.451 (t=0.4) 0.2890 0.2871

0.593 (t=0.6) 0.3057 0.3119

(1.5,0.5)

0.698 (t=0.8) 0.3273 0.3212

0.393 (t=0.2) 0.2567 0.2549

0.632 (t=0.4) 0.2695 0.2725

0.776 (t=0.6) 0.2144 0.2170

(2.5,0.5)

0.864 (t=0.8) 0.1861 0.1878

) ,

1 λ2 FX(t)

Based on influence function ))

ˆ ( ˆ( nF t

V X

Base on WJT(1986) )) ˆ ( ˆ( nF t

V X

0.259 (t=0.2) 0.1318 0.1279

0.451 (t=0.4) 0.2715 0.2695

0.593 (t=0.6) 0.3562 0.3517

(1.5,0.5)

0.698 (t=0.8) 0.2884 0.2858

0.393 (t=0.2) 0.2291 0.2358

0.632 (t=0.4) 0.2693 0.2817

0.776 (t=0.6) 0.2139 0.2183

(2.5,0.5)

0.864 (t=0.8) 0.1439 0.1462

The terms in the covariance matrix can be estimated as follows.

] )

; , ( [ ) ( )

; , ˆ ( ) ˆ (

2 2

2 2

x Y X L E x F V x Y X n L

x

F X

X X i

i i

X

X → = ,

] )

; , ( [ ) ( )

; , ˆ ( ) ˆ (

2 2

2 2

y Y X L E y S V y Y X n L

y

S Y

Y Y i

i i

Y

Y → = ,

and

)].

; , ( )

; , ( [ )

; , ˆ ( )

; , ˆ ( ) ˆ ( ) ˆ (

x Y X L y Y X L E V x Y X L y Y X n L

y S x

F Y X

XY i

i i X i i Y Y

X

→ =

Using the delta method, we obtain

) , 0 ( ))

( ) ( ) ˆ ( ) ˆ (

(S y F x S y F x N V

n Y XY X ⎯⎯→d ,

where the asymptotic variance is

X Y XY Y X Y

X x V F x S yV S y V

F

V = ( )2 +2 ( ) ( ) + ( )2 .

Simulation studies confirm the satisfactory results about the proposed estimators of VXY and V .

Table 3.3: Performance of the estimators for the covariance matrix based on 5000 runs (n=100, xU =20, yL =0.0001)

) ,

1 λ2 x

(FX(x)) y nCov(FˆX,SˆY) E{VˆXY) nVar(FˆXSˆY) E(Vˆ)

0.2 0.0534 0.0425 0.2495 0.2014 0.4 0.0888 0.0694 0.2297 0.1901 0.6 0.0828 0.0799 0.2009 0.1697 0.2

(0.259)

0.8 0.0880 0.0820 0.1708 0.1468 0.4 0.0617 0.0651 0.4154 0.3481 0.6 0.0879 0.0870 0.4021 0.3226 (1.5,0.5)

0.4 (0.451)

0.8 0.1198 0.0957 0.3802 0.2893

0.2 0.0374 0.0403 0.3226 0.2836 0.4 0.0810 0.0651 0.3269 0.2698 0.6 0.0717 0.0708 0.2844 0.2409 0.2

(0.393)

0.8 0.0806 0.0715 0.2412 0.2109 0.4 0.0580 0.0508 0.5355 0.4029 0.6 0.0745 0.0654 0.4845 0.3774 (2.5,0.5)

0.4 (0.632)

0.8 0.0802 0.0697 0.4395 0.3451

Table 3.4: Performance of the estimators for the covariance matrix based on 5000 runs (n=250, xU =20, yL =0.0001)

) ,

1 λ2 x

(FX(x)) y nCov(FˆX,SˆY) E{VˆXY) nVar(FˆXSˆY) E(Vˆ)

0.2 0.0625 0.0475 0.2594 0.2205 0.4 0.0661 0.0757 0.2393 0.2049 0.6 0.1007 0.0850 0.2080 0.1805 0.2

(0.259)

0.8 0.0853 0.0874 0.1848 0.1563 0.4 0.0687 0.0703 0.4625 0.3803 0.6 0.0979 0.0927 0.4427 0.3565 (1.5,0.5)

0.4 (0.451)

0.8 0.1083 0.1006 0.3860 0.3171 0.2 0.0493 0.0445 0.3473 0.3080 0.4 0.0777 0.0694 0.3288 0.2899 0.6 0.0788 0.0750 0.2822 0.2580 0.2

(0.393)

0.8 0.0730 0.0747 0.2560 0.2256 0.4 0.0618 0.0531 0.5346 0.4494 0.6 0.0603 0.0688 0.4750 0.4204 (2.5,0.5)

0.4 (0.632)

0.8 0.0720 0.0719 0.4604 0.3762

Theorem 3.3 describes the weak convergence result of Lynden-Bell’s estimator as a special case. By applying the independence copula, φα(t)=−log(t), to the theorem, we obtain the following corollary:

Corollary 3.2 ( Wang, Jewell & Tsai, 1986)

The above result was first obtained by Wang et al. (1986). They also proved the same weak convergence result by applying the classical empirical distribution theory of Breslow and Crowley (1974). Based on the functional delta method, we provide a different proof given below.

Proof of Corollary 3.2: It follows that

) .

)

3.4. Extension and Modification

3.4.1. Extension under Right Censoring

In addition to the truncation scheme discussed previously, we now allow Y to be censored by another random variable C. Assume that C is independent of (X,Y). Let and ϕ but change their definitions as follows. Let

= = =

In presence of left truncation and right censoring, the proposed estimating function is the estimator proposed by Chaieb et al. (2006) constructed based on concordance indicators.

Let t1 < <t2n be ordered observed points of (X1,…,Xn,Z1,…,Zn). Letting x= y =ti, we aim to solve the equations:

)}

SC . The following procedure can be performed successively for j=1,2,...,2n−1.

and Explicit formula of the proposed estimators are given by

= ⎪⎭

The estimating function in (Step 2) is equivalent to

⎟⎠ Appendix 3.C, we derive the proposed estimating functions explicitly for selected examples.

3.4.2. Modification for Small Risk Sets

The proposed estimation procedure, as well as that proposed by Chaieb et al. (2006) are both based on the implicit assumption that R(tj,tj+)≥1 for all t . However it j sometimes happens that an empty risk set may occur especially in the tail area. Several

remedies have been proposed to handle this problem (Klein & Moeschberger, 2003, p. 122).

Here we adopt the idea of Lai and Ying (1991) and propose the following modification:

=

⎥ ≥

⎢⎢

⎪⎭

⎪⎬

⎪⎩

⎪⎨

⎧ −

⎪⎭−

⎪⎬

⎪⎩

⎪⎨

− ⎧

=

1 ,

;

} )

~( { ) ˆ (

1 )

~(

* )

ˆ ( )

~(

* )}

ˆ ( {

j j t z j

a j j

c j

j c

j

Y I R z bn

z S n

z c R z

S n

z c R t

S

δ α α

α φ φ

φ , (3.19)

where 0< a<1 and b>0 are arbitrary tuning parameters. Modifications for φα{FˆX(t)}

and SˆC(t) are obtained in a similar way. Based on our simulation results not reported here, we recommend to take b=1 and a=1/10, by which estimators are less biased.

3.5 Numerical Analysis

3.5.1 Simulation Studies

The main purposes of the simulation studies are (i) to check the validity of the proposed estimators and (ii) to compare the performance of our method with its competitor proposed by Chaieb et al. (2006). Random replications of (X,Y) were generated from the Clayton and Frank models subject to X ≤ with the marginal distributions following exponential Y distributions. For the Clayton model, the values of −log(α) were chosen to be 0.511 and 1.099 and, for the Frank model, the value of log(α) were set to be 2.380 and 5.746. The former transformation corresponds to τ =0.25 and the latter corresponds to τ =0.5. The censoring variable C was also exponentially distributed. Denote c=Pr(XY) and

) Pr(

* X Y C

c = ≤ ∧ . For each setting, we report the bias and the MSE based on 500 replications.

Two estimators of the association parameter α were compared under the Clayton model and Frank model respectively. The proposed method solve UL(α,c)=0 and the competing estimator proposed by Chaieb et al. (2006) solve U~w(α,c) = 0 with

1 ) ,

~ (

, x y =

wα c . Explicit formulas for the Clayton and Frank models were available in

Appendix 3.C. Tables 3.5.A and 3.5.B summarize the results. We see that both methods are approximately unbiased, and the MSE decreases as the sample size increases. Comparing two estimators, the MSE of the proposed estimator is uniformly smaller, and the efficiency gain is remarkable in the Clayton model but modest in the Frank model. Notice that the two approaches produce similar results under the Frank model in absence of external censoring

)

(c=c* . Via the relationship in equation (3.6), we find that for the uncensored case of the Frank model,

)}

, ˆ( { 1

)}

, ˆ( { 1 ) , . ( )}

, ˆ( {

)}

, ˆ( {

y x c

y x c y

x const R y

x c

y x c

π θ

π θ π

θ π θ

α α α

α

+ +

× −

which explains why the numerical results are close. When the degree of external censoring increases, the advantage of the proposed estimator becomes more obvious.

The proposed recursive algorithm was evaluated jointly with UL(α,c)=0 to obtain the estimators of the marginal functions and c . The performances of (FˆX(t),SˆY(t)) were evaluated at points t with FX(t)=0.2,0.4,0.6,0.8 and SY(t)=0.2,0.4,0.6,0.8. Table 3.6.A and Table 3.6.B report the results for the Clayton model and Frank model respectively. Denote

)

| Pr(C Y X Z

PCEN = < ≤ which measures the censoring proportion in the truncated sample.

We see that when this value decreases, the performance improves. In all the cases, ))

ˆ ( ), ˆ (

*,

(c FXSY ⋅ are fairly unbiased. It is worthy to note that the estimated probabilities may have nicer performance in the tail area but poorer performance in a middle time point, which behave differently from the Kaplan-Meier estimator without considering truncation.

Table 3.5.A: Comparison of two Estimators for the Association Parameter under the Clayton Model

Each cell contains the bias (×103) and MSE (×102) (in parenthesis) of the corresponding estimator based on 500 replications.

Table 3.5. B: Comparison of two Estimators for the Association Parameter under the Frank Model

=250

n n=500

) log(

- α

)

( cc, *) Proposed Chaieb Proposed Chaieb

0.3 (.17) 5.4 (0.53) 2.5 (0.08) 3.7 (0.26) -0.9 (.44) 1.5 (1.13) -0.3 (0.18) 2.9 (0.53) 0.7 (0.29) 0.5 (0.74) -1.2 (0.12) -1.5 (0.38) 1.6 (0.44) 6.1 (1.04) -0.6 (0.19) -1.3 (0.49) 0.5 (0.35) 5.2 (0.80) 0.3 (0.14) 1.2 (0.40) 0.5108

) 25 . 0 (

(0.80,0.80) (0.80,0.63) (0.66,0.53) (0.66,0.45) (0.55,0.39)

(0.55,0.34) 3.6 (0.37) 7.8 (0.98) -2.2 (0.19) -0.3 (0.51) -6.7 (0.28) -2.1 (0.86) 0.6 (0.13) 0.6 (0.38)

2.5 (0.56) 6.2 (1.44) -0.7 (0.24) 0.2 (0.74) -0.2 (0.20) 4.0 (0.54) -4.6 (0.18) -2.9 (0.47) -5.4 (0.52) -5.5 (1.27) -0.1 (0.23) 0.2 (0.62) -3.5 (0.44) -3.2 (0.95) -1.5 (0.18) 2.7 (0.49) 1.0986

) 5 . 0 (

(0.86,0.86) (0.86,0.66) (0.74,0.58) (0.74,0.48) (0.63,0.42)

(0.63,0.36) 3.0 (0.44) 7.1 (1.09) -0.3 (0.20) -0.5 (0.52)

=250

n n=500

) log(

- α

)

( cc, *) Proposed Chaieb Proposed Chaieb

-68.6 (37.55) -68.1 (37.62) -26.1 (20.53) -25.8 (20.53) -53.5 (52.87) -36.0 (55.31) -19.4 (26.87) -24.5 (28.49) -162.9 (95.06) -156.4(102.07) -35.9 (44.27) -34.6 (48.62) -102.2(100.42) -106.3(116.84) -99.2 (51.85) -88.4 (59.76) -294.2(201.13) -342.5(239.21) -140.5 (94.01) -141.8 (96.03) 2.380

) 25 . 0 (

(0.81,0.81) (0.81,0.63) (0.63,0.51) (0.63,0.43) (0.50,0.36)

(0.50,0.31) -371.9(216.14) -360.6(241.92) -243.3 (131.45) -257.1 (151.72) -128.3 (41.53) -128.2 (41.56) -27.8 (21.91) -27.5 (22.01)

-57.0 (64.71) -21.4 (72.73) -78.0 (33.97) -78.0(36.76) -136.9(100.55) -142.4(104.41) -114.0 (49.40) -100.0 (51.47) -182.2(129.78) -155.9(147.13) -0.1259 (68.20) -96.1 (73.32) -367.3(223.11) -368.2(247.30) -246.1 (115.74) -252.5 (129.13) 5.746

) 5 . 0 (

(0.88,0.88) (0.88,0.66) (0.69,0.53) (0.69,0.44) (0.50,0.34)

(0.50,0.29) -429.6(293.47) -411.3(332.07) -373.9(130.48) -349.2 (146.83)

Each cell contains the bias (×103) and MSE (×102) (in parenthesis) of the corresponding estimator based on 500 replications.

Table 3.6.A: The proposed estimators of marginal functions and

Table 3.6.A: The proposed estimators of marginal functions and

在文檔中 相依截切資料的統計推論 (頁 15-0)

相關文件