Complete data - Inference based on Nonparametric Analysis

Chapter 2. Inference based on Nonparametric Analysis

2.1 Complete data

Complete data are denoted as { (T i_i 1,..., )}n which form a random sample of T . Based on the fact that S t( )E I T[ ( t)] and applying the method of moment, one can estimate it by the empirical estimator

Although the method of moment provides a simple solution, it does not guarantee any optimality property. Formally the nonparametric likelihood estimator should be pursued as a better option. The first step is to re-write the likelihood function based on grid points and the probabilities at these points are the parameters to be estimated. The likelihood function can be written as

( ) Applying the theorem of Lagrange multipliers, let ₁

The above product-limit expression is elegant since the censoring effect is cancelled out in estimation of the hazard function. However under other types of censoring mechanism, estimation of the hazards does not provide any advantage. Instead, the techniques of self-consistency and nonparametric MLE can be generalized to other data structures. Now we illustrate the self-consistency algorithm for right censored data. The self-consistency equation can be written as

which allows us to solve S tˆ( )₍₁₎ and then successively for S tˆ( )_{( )}_j j2,...,M.

Figure 2.1 Self-consistency for right censored data.

Weight for X_i^= 1; weight for X_i = ( ) (S t S X_i); weight for X_i^* = 0 likelihood function larger when ( ₍ ₎)

S x is replaced by S t(_{( )}_j )¸This suggest that we can instead consider

Direct maximization of L in terms of ^* Pr(t₍_j_₁₎ T t_{( )}_j ) or S t(_{( )}_j ) is not suggested for right censored data. Alternatively L can be re-parameterized in terms ^* of hazards. Specifically we have _{( )} _{( )} which is the Kaplan-Meier estimator.

Figure 2.2:Idea for modifying the likelihood L

Turnbull (1974) suggested the following steps to implement the algorithm. First, partition the observation period: (0,t₍₁₎], …,(t₍_M_₁₎,t₍_M₎], where t₍₁₎  ... t₍_M₎ are ordered time points. Then define three types of observations in each interval

(1)

Turnbull’s idea is to combine “exact” and “left censored” observations by imputation.

Specifically for a point in (t₍_k_₁₎,t_{( )}_k ] with   1, the conditional probability of

(1) (0)

(N_j ,N_j )(j1,..., )m which contains “pseudo-exact” and “right censored” points.

Turnbull (1974) suggested to use them to estimate the hazard probability in the interval (t₍_j_₁₎,t_{( )}_j ] by N⁽¹⁾_j /Y_j where ⁽¹⁾ ⁽⁰⁾

k m

j k k

k j

Y N N





 denotes the number at

risk and N⁽¹⁾_j denotes the number of failure. The product-limit expression gives a direct relationship between the survival function and the hazard probabilities:

(1)

ˆ( )( )_j {1 _i / }_i

i j

S t N Y









( 2 . 7 ).

Note that since both N⁽¹⁾_j and Y_j are still functions of S tˆ( ),...,₍₁₎ S tˆ(₍_M₎), iterations are needed which can be summarized by the following steps.

Step 1: Give initial values: Sˆ ( ),...⁽⁰⁾ t₍₁₎ ,Sˆ⁽⁰⁾(t₍_M₎); Step 2: compute : ˆ_kj⁽⁰⁾

Step 3:obtain : Sˆ ( ),...⁽¹⁾ t₍₁₎ ,Sˆ⁽¹⁾(t₍_M₎);

Step 4: repeat (2) and (3) until the pre-specified convergence criteria is reached.

Figure 2.3 Self –consistency for double censored data

Weight forX = S X_i ( _i) S t ; weight for ( ) X_i^= Pr(t < T ≤ X_i′ ) F X ( _i^)

We discuss the NPMLE approach for doubly censored data. The following result is summarized form the work of Mykland and Ren (1996). Let the log-likelihood function of complete data be

section 2.1. But under doubly censoring, we cannot observe all the failure time. Hence the EM algorithm is applied. To obtain the iteration, we need to calculate

0 0 ( )

11 Applying the theorem of Lagrange multiplier, the maximization is attained at

ˆ_k _k / 1,...,

q   n k  M

The steps of iterations are summarized below:

Step 1: set qˆ_k⁽⁰⁾ 1/M (k 1,...,M);

Step 2: set qˆ_k^{( )}ⁱ 



_k /n (k 1,...,M);

Step 3: repeat i2,3,... (Step 2) for until convergence.

2.4 Interval censored data

Recall that we observe {( ,L R_i _i] (i1,..., )}n and know that T_i( ,L R_i _i]. Turnbull (1976) discussed nonparametric estimation of Pr(T t) based on interval censored data. The first crucial step is to construct ordered disjoint intervals in which the probability can be estimated. To achieve this goal, the data {( ,L R_i _i](i1,..., )}n

The self-consistency equation for an estimator of p_j satisfies

¹ ¹ ¹ ¹ self-consistency solution. The likelihood based on the original data can be written as

¹ which can be re-expressed as

13 this “complete” information is given by

and the corresponding log-likelihood is



 reduces to the multinomial problem.

Applying the theorem of Lagrange multiplier, maximization is attained at

Maximization is attained at

ˆ (

j ij

p w





pˆ ,...,1 pˆ ) /_m n which is equivalent to the

self-consistency equation. The steps of iterations are summarized below.

Step 1: set

p

ˆ⁽⁰⁾_j



m

(j1,..., )m ;

Step 2: set ^{( )}

ˆ (

n k

j ij

p w





pˆ1⁽^k^¹⁾,..., pˆ_m⁽^k^¹⁾) /n;

Step 3: repeat k 2,3,... (Step 2) for until convergence

Figure 2.4 Construction of the mass interval censored data and the idea of self consistency. Weight for( ,L R in estimating₁ ₁] ˆP ,₁ ˆP ,₂ ˆP₃ are 1P₁ (1    P₁ 1 P₂ 0 P₃)

2 1 2 3

1P (1    P 1 P 0 P) and 0 ,respectively.

A different algorithm which directly estimates the survival function was suggested by Turnbull (1976). His idea is based on the product-limit expression of

Chapter 3. Inference based on Proportional Hazards Model

The proportional hazards model specifies the effect of covariate on the hazard of T such that The survival function can be written as

exp( )

3.1 Inference under right censoring

Right censored data can be denoted as (X_i, ,_i Z_i) (i1,..., )n where

which yields the score function:

( )

3.2 Inference under interval censoring

Finkelstein (1986) was the first author to address the inference of proportional

The likelihood function in (3.5) is represented based on original observations. For the purpose of maximization when the nuisance function S₀(.) is involved, this function will be expressed in terms of grid points. Figure 3.1 shows the construction of grid points based on an example containing four observed intervals ( ,L R _i _i] (i1, 2,3, 4). In this example, t₍₁₎ t₍₂₎  t₍₈₎ are formed. In general, we let

(0) (1) ( 1)

0t t  t_M_   be ordered grid points. The i th observation to the likelihood (3.5) can be re-expressed as

Figure 3.1 Construction of the mass interval censored data

Note that S₀(.) is a function but the maximization of L will only be evaluated on its value at the grid points, namely S t₀(_{( )}_j ) for j1,...,M . Since the parameters

0(( )_j )

S t for j1,...,M have natural constraint i.e.,0S t₀(₍₁₎) S t₀(₍_M₎) 1 , it make the mle hard to obtain. Finkelstein (1986) suggested that one can re-parametrize the S t₀(_{( )}_j ) by _k log[ log S t₀(_{( )}_k )]. It can remove the range restriction of the

parameters S t₀(_{( )}_j ), but it doesn’t remove the order restriction.

Sun (2006) proposed a corrective method that can remove the nature constraint.

The following is the detail of the method. Based on the product-limit expression, one can write

0(( )_j )

S t ₀ _{( )}

(1 ( ))

k k





   ^.

Thus L becomes a function of  and  ₀( t₍₁₎),..., ₀( t₍_M₎) . Note that

0( t( )_k ) (0,1)

   is a conditional probability. To remove the natural constraint, we

can re-parameterize  ₀( t_{( )}_k ) by setting  ₀( t_{( )}_k ) 1 e^^exp(^^k⁾.

Thus the log of likelihood becomes

1 constraints. The estimation process is summarized below.

Denote the score statistics as ( l_T , l_T)^T

U  

 

   . The score functions of  and

 are given by

solve U0 so that the likelihood can be maximized. The Newton-Raphson method can be applied to obtain the solution. Let ¹¹ ¹² algorithm can be summarized as follow:

Chapter 4. Inference based on Accelerated Failure Time Model

The accelerated failure time (AFT) is a popular alternative to the PH model. An AFT model can be written as

Y Z^T  ( 4 . 1 ) where Y logT and  is the error variable with the density function f_(.). Existing research focuses on semi-parametric estimation of  without specifying the form of f_(.). We will briefly review inference methods developed for complete data and right censored data and then focus more on interval censored data.

4.1 Inference based on complete data

Consider the transformed variable under the error scale,  _i( ) Y_i Z_i^T for 1,...,

i n. Note that when ₀ is the true value, { ( _i ₀) (i1,..., )}n form an iid sample which becomes a useful property to construct nonparametric inference methods. Let  ₍₁₎( )₍₂₎( )  _{( )}_n ( ) be the order statistics with the

4.2 Inference based on right censored data

Right censored data are denoted as {(X_i, ,_i Z_i) (i1,2,..., )}n . Let methods of modifying the linear rank statistics introduced earlier.

4.2.1 Linear rank statistics

Let  ₍₁₎( )₍₂₎( )  _{( )}_k ( ) be the distinct uncensored ordered error variables and let ₁( ),..., ( )

i imi

    be censored error variables in ^_^{  }^{( )}ⁱ ^{( ),}^ˆ ^{( 1)}ⁱ^ ^{( )}^^ˆ



Following the book of Kalbfleisch and Prentice (2002), the modified linear rank statistics can be written as

_{( )} respectively. The requirement is given below:

The efficiency of the resulting estimator depends on the underlying error distribution, the censoring distribution and the forms of c and _i Ci. For the Wilcoxon statistics, Prentice (1978) suggested that

4.2.2 Log-rank statistics

As mentioned earlier the log-rank statistics can be expressed as a special linear rank statistics but it has its own representation as follows:

4.2.3 M-estimator by Ritov (1990)

The idea was motivated by the parametric likelihood analysis. The likelihood function and log-likelihood based on ( ( ), ) (  _i _i i1,..., )n can be written as

respectively. Taking derivative respect to , we get

where S_( ) can be estimated by the following product-limit estimator

The form of g(.) and the underlying error distribution affect the efficiency of

Note that if the error distribution is specified, we have

t ( )

4.3 Inference under interval censoring

Consider interval censored data ( ,L R Z_i _i_, _i) (i1,..., )n which can be to interval censored data.

4.3.1 Modify M-estimator by Rabinowitz et al. (1995) Consider the log-likelihood function

26 resulting estimator. The best choice which leads to the maximum likelihood estimator is g f_ F_^¹.

k m as the mass intervals. The next step is to express the likelihood based on {(   _i^L( ), _i^R( ))(i1,..., )}n in terms of {(t_kL( ), t_kU( )], k1,..., }m . Define

Given an estimate of , the above maximization procedure can be performed. The resulting estimate of F_, denoted as ˆF_, will be plugged into the following

estimating function to solve for the next estimate of : S( ,Fˆ_ ) 4.3.2 Method for a simplified AFT model with univariate covariate

Besides the methods we discussed earlier, Li and Pu (2003) developed an interesting way to estimate . Consider a simplified AFT model:

i i i i

Y Z  , (i1,...n).

where Z_i is a one-dimensional covariate. The main idea of this paper is based on the assumption that _i and Z_i are uncorrelated. Kendall’s provides a rank-invariant measure for assessing the association between two variables. Suppose that ( ,_i Z_i) and ( ,_j Z_j) are independent realizations from ( , ) Z . The pair is concordant if

{( _i _j)( _i _j) 0}

I   Z Z  and discordant if I{( _i  _j)(Z_i Z_j)0}. The population version of Kendall’s tau is defined as

Pr{( _i _j)(Z_i Z_j) 0} Pr{( _i _j)(Z_i Z_j) 0}

If complete data are available, one can solve

1 {( ( ) ( ))( ) 0} {( ( ) ( ))( ) 0} 0

Hence the total number of known concordant pairs becomes

{ ( _i _j) ( _i^R( ) _j^L( )) ( _i _j) ( _i^L( ) _j^R( ))}

and the total number of known discordant pairs is

{ ( _i _j) ( _i^L( ) _j^R( )) ( _i _j) ( _i^R( ) _j^L( ))}

The modified Kendall τ coefficient can be write as

1 [ ( ) ( )][ ( ( ) ( )) ( ( ) ( ))]

The resulting estimation function is given by

( ) [ ( ) ( )][ ( ^R( ) ^L( )) ( ^L( ) ^R( ))]

This method has two major drawbacks. One is the restrictive assumption that Z is univariate. The other is the lack of efficiency if the data contains very few orderable paired intervals.

Chapter 5. Inference based on Proportional Odds Model

5.1 Model and the likelihood

Besides the PH and AFT models, the proportional odds (PO) model is also a popular choice. The proportional odds model is defined as

l o g i t {F_Z t ( ) } = l o g { B ( t ) }^t Z The log-likelihood function is

¹ The presence of B t( ) makes it difficult to directly obtain the M.L.E. by maximizing the log-likelihood function. We will present two methods both of which suggested to

replace B t( ) by an approximated function which is easier to handle.

5.2 Smoothing method for approximating the baseline function

Shen (1998) proposed a sieve method to approximate B t( ) in the likelihood function. The method can be applied to not only right censored data but also interval censored data. Now we briefly describe the approach. The basic idea is that the as splines with variable orders and knots. An example of s t( ) is depicted in Figure 5.1. There are two knots which form three intervals. Each interval contains a polynomial of different orders. From this figure, we see that s t( ) can be used to approximate any smooth functions.

The approximated function of B t( ) is defined as

unknown, we cannot obtain it directly. For each fixed h , let ˆ_i be the sieve maximum likelihood estimate of  without the ith observation, and ˆP__i(.) is the corresponding estimated distribution. Shen (1998) suggests that we can use the statistics

1 1

log ˆ ( )

i i

n^ P_









to estimate E l_( ( ))ˆ where _i is the ith observation. This is the selector value of h . Then we choose optimal h that minimizesR h( ).

Figure 5.1. An example of s t( ). Here t₍₁₎ 4 andt₍₂₎ 8. The polynomials from left to right are -x +6x + x - 24³ ² ,x +8 and ^{x -15x +72}² , respectively.

Hence we find the universal sieve maximum likelihood estimator by estimating

 and h recursively. The detail of the algorithm is as following.

Step 1: Initial spline

For any fixed order mN_max , estimate { , }  using the maximum likelihood

method with single polynomial. Then we choose the optimal order m that minimizes ( )

R h

Step 2: Adding knots

Consider a candidate knot point t_{( )}_i within an interval spanned by existed knots. For any fixed order M (m m₀, ₁), estimate { , }  as step 1. Find the order M that minimize R h( ). This value is the selector of this candidate. Then the optimal t_{( )}_i is found using Fibonacci search to minimize the selector.

Step3: Comparison

Compare the original sieve maximum likelihood estimate based on the spline without

( )i

t with the new one including t_{( )}_i . If the new maximum likelihood estimate has a smaller value in terms of the selector, then split the interval into two and proceed further as in Step 2. Otherwise, go to Step 4.

Step4: Repeat Steps 2-3 for all intervals spanned by existing knots until no new knot can be added

5.3 Sieve method by Huang and Rossini (1997) The proportional odds model is expressed as

l o g i tF t_Z ( ) l o g i tF t_o ^T( )Z ( 5 . 5 ) where F t₀( )F t( | 0) is the baseline distribution function. Let ₀( )t logit F t_o( ), the distribution function can be written as

0 Rossini (1997) proposed to is difficult to estimate this function by a function with nice analytic properties. The idea of this approximation is similar to the previous method.

If the real function₀( )t is known, we might choose some knots 0t₍₁₎< <t_{( )}_k   and let

₀ ¹ ⁽ ^{1 )} ^{1 (} ⁾₍ _{1 )} ₍ ₎

1 ( ) ( 1 ) ( ) ( 1 )

ˆ ( ) [ ] ( )

j j j j j j

j j

j j j j j

b b b t b t

t t I t t t

t t t t

 ^ ^ ^ _

  

 

   

 



( 5 . 6 )

where b_j ₀(t_{( )}_j ) and b₁ b_k. Here we choose k and t which satisfy _{( )}_i 1. k be an integer that grows at rate O n( ^a) 0 a 1

2. max₁_{ }_{j k}(t_{( )}_j t₍_j_₁₎)Cn^^a for some constant C

Figure 5.2. The curve of α (t)₀ (dashed line) and its approximate function(real line). Here α t =₀( ) log( 3t) and we take t = 1,3,5,7,9_(.) There is some difficulty to implement the idea. Since the true function is unknown,

0(( ))

j j

b  t is also unknown. Treating b_j as unknown, the restriction that

1 k

b  b has to be considered in the maximization. The estimator of Shen (1998) is easier to implement since the unknown parameters have no specific restrictions.

Chapter 6 Conclusion

Most textbooks on survival analysis focus on right censored data. However empirical medical data are often interval censored. In this thesis, we review important inference methods which can be applied to interval censored data. We emphasize how the fundamental ideas of inference are extended to this complicated data structure.

From the discussions, we see that many elegant techniques adopted for right censored data no longer applied. Instead, numerical algorithms become very important in analysis of interval censored data. Because the main purpose of the thesis is to provide a general review of many different methods, we do not investigate thoroughly on specific methods or algorithms. This can be interesting topics for future study.

References

[1] Breslow, N. E. (1975). Analysis of survival data under the proportional hazards model. Internat. Statist. Rev., 43, 45-58

[2] Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time data. Biometrics, 42, 845–854.

[3] Huang, J. and Rossini, A. J. (1997). Sieve estimation for the proportional odds failure-time regression model with interval censoring. J. Amer. Statist. Ass., 92, 960-967.

[4] Kaplan, E.L. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist. Ass., 53, 457-81.

[5] Kalbfleisch, J. D. and Prentice, R. L. (2002). Statistical analysis of failure time data, 2nd ed. New York. Wiley

[6] Li, L. and Pu, Z. (2003). Rank estimation of log-linear regression with interval censored data. Lifetime Data Analysis, 9, 57–70.

[7] Prentice, R. L. (1978). Linear rank tests with right censored data. Biometrika, 65, 167-179.

[8] Rabinowitz , D., Tsiatis, A. and Aragon, J. ( 1995). Regression with interval censored data. Biometrika, 82, 501-513.

[9] Ritov, Y. (1990). Estimation in a linear regression model with censored data. Ann.

Statist, 18, 303-328

[10] Shen, X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika, 85, 165-177

[11] Sun, J. (2006). The statistical analysis of interval-censored failure time data.

USA. Springer

[12] Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Ass., 69, 169-173.

[13] Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped censored and truncated data. J. Roy. Statist. Soc. Ser. B, 38, 290– 295.

Appendix

Proof of E[ (S β ,F₀ _ε)] = 0. Let X_i (X_i₀ X_i₁  X_{i n}_i_₁) be the ith patient’s ordered sequence of examination times, where n_i denote the number of examination.

For convenience. define X_i₀ , and ₁

i ni

X _  . Define L_i be the last of the ith subject's examination times preceding T , and let _i R_i be the first examination time following T . _i

For a p-dimensional vector b , define bracketing examination times on the time scale of the residual by

( ) log , ( ) log

1 0

[ { ( )}] [ { ( )}]

ik ik

g F  _ b g F  b







1 0

[ { ( )}] [ { ( )}]

ik ik

g F  _ b g F  b







1 0

[ { ( )}] [ { ( )}]

i ni i

g F  _ b g F  b

 

[ { }] [ { }]

g F g F

   

[1] [0] 0

g g

  

The proof is complete.

在文檔中區間設限資料之統計推論-文獻回顧 (頁 10-0)