• 沒有找到結果。

Chapter 4. Inference based on Accelerated Failure Time Model

4.2 Inference based on right censored data

4.3.1 Modify M-estimator

1

26 resulting estimator. The best choice which leads to the maximum likelihood estimator is gf F1.

km as the mass intervals. The next step is to express the likelihood based on {(   iL( ), iR( ))(i1,..., )}n in terms of {(tkL( ), tkU( )], k1,..., }m . Define

27

Given an estimate of , the above maximization procedure can be performed. The resulting estimate of F, denoted as ˆF, will be plugged into the following

estimating function to solve for the next estimate of : S( ,Fˆ ) 4.3.2 Method for a simplified AFT model with univariate covariate

Besides the methods we discussed earlier, Li and Pu (2003) developed an interesting way to estimate . Consider a simplified AFT model:

i i i i

Y Z  , (i1,...n).

where Zi is a one-dimensional covariate. The main idea of this paper is based on the assumption that i and Zi are uncorrelated. Kendall’s provides a rank-invariant measure for assessing the association between two variables. Suppose that ( ,i Zi) and ( ,j Zj) are independent realizations from ( , ) Z . The pair is concordant if

{( i j)( i j) 0}

I   ZZ and discordant if I{( ij)(ZiZj)0}. The population version of Kendall’s tau is defined as

Pr{( i j)(Zi Zj) 0} Pr{( i j)(Zi Zj) 0}

If complete data are available, one can solve

1 {( ( ) ( ))( ) 0} {( ( ) ( ))( ) 0} 0

28

Hence the total number of known concordant pairs becomes

{ ( i j) ( iR( ) jL( )) ( i j) ( iL( ) jR( ))}

and the total number of known discordant pairs is

{ ( i j) ( iL( ) jR( )) ( i j) ( iR( ) jL( ))}

The modified Kendall τ coefficient can be write as

1 [ ( ) ( )][ ( ( ) ( )) ( ( ) ( ))]

The resulting estimation function is given by

( ) [ ( ) ( )][ ( R( ) L( )) ( L( ) R( ))]

This method has two major drawbacks. One is the restrictive assumption that Z is univariate. The other is the lack of efficiency if the data contains very few orderable paired intervals.

29

Chapter 5. Inference based on Proportional Odds Model

5.1 Model and the likelihood

Besides the PH and AFT models, the proportional odds (PO) model is also a popular choice. The proportional odds model is defined as

l o g i t {FZ t ( ) } = l o g { B ( t ) }t Z The log-likelihood function is

1 The presence of B t( ) makes it difficult to directly obtain the M.L.E. by maximizing the log-likelihood function. We will present two methods both of which suggested to

30

replace B t( ) by an approximated function which is easier to handle.

5.2 Smoothing method for approximating the baseline function

Shen (1998) proposed a sieve method to approximate B t( ) in the likelihood function. The method can be applied to not only right censored data but also interval censored data. Now we briefly describe the approach. The basic idea is that the as splines with variable orders and knots. An example of s t( ) is depicted in Figure 5.1. There are two knots which form three intervals. Each interval contains a polynomial of different orders. From this figure, we see that s t( ) can be used to approximate any smooth functions.

The approximated function of B t( ) is defined as

31

unknown, we cannot obtain it directly. For each fixed h , let ˆi be the sieve maximum likelihood estimate of  without the ith observation, and ˆPi(.) is the corresponding estimated distribution. Shen (1998) suggests that we can use the statistics

1 1

log ˆ ( )

n

i i

i

n P

to estimate E l( ( ))ˆ where i is the ith observation. This is the selector value of h . Then we choose optimal h that minimizesR h( ).

Figure 5.1. An example of s t( ). Here t(1) 4 andt(2) 8. The polynomials from left to right are -x +6x + x - 243 2 ,x +8 and x -15x +722 , respectively.

Hence we find the universal sieve maximum likelihood estimator by estimating

 and h recursively. The detail of the algorithm is as following.

Step 1: Initial spline

For any fixed order mNmax , estimate { , }  using the maximum likelihood

32

method with single polynomial. Then we choose the optimal order m that minimizes ( )

R h

Step 2: Adding knots

Consider a candidate knot point t( )i within an interval spanned by existed knots. For any fixed order M (m m0, 1), estimate { , }  as step 1. Find the order M that minimize R h( ). This value is the selector of this candidate. Then the optimal t( )i is found using Fibonacci search to minimize the selector.

Step3: Comparison

Compare the original sieve maximum likelihood estimate based on the spline without

( )i

t with the new one including t( )i . If the new maximum likelihood estimate has a smaller value in terms of the selector, then split the interval into two and proceed further as in Step 2. Otherwise, go to Step 4.

Step4: Repeat Steps 2-3 for all intervals spanned by existing knots until no new knot can be added

5.3 Sieve method by Huang and Rossini (1997) The proportional odds model is expressed as

l o g i tF tZ ( ) l o g i tF to T( )Z ( 5 . 5 ) where F t0( )F t( | 0) is the baseline distribution function. Let 0( )t logit F to( ), the distribution function can be written as

0 Rossini (1997) proposed to is difficult to estimate this function by a function with nice analytic properties. The idea of this approximation is similar to the previous method.

33

If the real function0( )t is known, we might choose some knots 0t(1)< <t( )k   and let

0 1 ( 1 ) 1 ( )( 1 ) ( )

1 ( ) ( 1 ) ( ) ( 1 )

ˆ ( ) [ ] ( )

k

j j j j j j

j j

j j j j j

b b b t b t

t t I t t t

t t t t

 

   

 

( 5 . 6 )

where bj 0(t( )j ) and b1 bk. Here we choose k and t which satisfy ( )i 1. k be an integer that grows at rate O n( a) 0 a 1

2. max1 j k(t( )jt(j1))Cna for some constant C

Figure 5.2. The curve of α (t)0 (dashed line) and its approximate function(real line). Here α t =0( ) log( 3t) and we take t = 1,3,5,7,9(.) There is some difficulty to implement the idea. Since the true function is unknown,

0(( ))

j j

b  t is also unknown. Treating bj as unknown, the restriction that

1 k

b  b has to be considered in the maximization. The estimator of Shen (1998) is easier to implement since the unknown parameters have no specific restrictions.

34

Chapter 6 Conclusion

Most textbooks on survival analysis focus on right censored data. However empirical medical data are often interval censored. In this thesis, we review important inference methods which can be applied to interval censored data. We emphasize how the fundamental ideas of inference are extended to this complicated data structure.

From the discussions, we see that many elegant techniques adopted for right censored data no longer applied. Instead, numerical algorithms become very important in analysis of interval censored data. Because the main purpose of the thesis is to provide a general review of many different methods, we do not investigate thoroughly on specific methods or algorithms. This can be interesting topics for future study.

35

References

[1] Breslow, N. E. (1975). Analysis of survival data under the proportional hazards model. Internat. Statist. Rev., 43, 45-58

[2] Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time data. Biometrics, 42, 845–854.

[3] Huang, J. and Rossini, A. J. (1997). Sieve estimation for the proportional odds failure-time regression model with interval censoring. J. Amer. Statist. Ass., 92, 960-967.

[4] Kaplan, E.L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist. Ass., 53, 457-81.

[5] Kalbfleisch, J. D. and Prentice, R. L. (2002). Statistical analysis of failure time data, 2nd ed. New York. Wiley

[6] Li, L. and Pu, Z. (2003). Rank estimation of log-linear regression with interval censored data. Lifetime Data Analysis, 9, 57–70.

[7] Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika, 65, 167-179.

[8] Rabinowitz , D., Tsiatis, A. and Aragon, J. ( 1995). Regression with interval censored data. Biometrika, 82, 501-513.

[9] Ritov, Y. (1990). Estimation in a linear regression model with censored data. Ann.

Statist, 18, 303-328

[10] Shen, X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika, 85, 165-177

[11] Sun, J. (2006). The statistical analysis of interval-censored failure time data.

USA. Springer

[12] Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Ass., 69, 169-173.

[13] Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped censored and truncated data. J. Roy. Statist. Soc. Ser. B, 38, 290– 295.

36

Appendix

Proof of E[ (S β ,F0 ε)] = 0. Let Xi (Xi0Xi1  Xi ni1) be the ith patient’s ordered sequence of examination times, where ni denote the number of examination.

For convenience. define Xi0 , and 1

i ni

X  . Define Li be the last of the ith subject's examination times preceding T , and let i Ri be the first examination time following T . i

For a p-dimensional vector b , define bracketing examination times on the time scale of the residual by

( ) log , ( ) log

37

1 0

[ { ( )}] [ { ( )}]

ni

ik ik

k

g F b g Fb

1 0

[ { ( )}] [ { ( )}]

ni

ik ik

k

g F b g Fb

1 0

[ { ( )}] [ { ( )}]

i ni i

g F b g Fb

 

[ { }] [ { }]

g F g F

   

[1] [0] 0

g g

  

The proof is complete.

相關文件