• 沒有找到結果。

區間設限資料之統計推論-文獻回顧

N/A
N/A
Protected

Academic year: 2021

Share "區間設限資料之統計推論-文獻回顧"

Copied!
43
0
0

加載中.... (立即查看全文)

全文

(1)

國 立 交 通 大 學

統計學研究所

碩 士 論 文

區間設限資料之統計推論-文獻回顧

Statistical Inference based on

Interval Censored Data

A Literature Review

研 究 生:黃祥福

指導教授:王維菁 教授

(2)

區間設限資料之統計推論-文獻回顧

Statistical Inference based on Interval Censored Data

A Literature Review

研 究 生:黃祥福 Student:

Hsiang-FuHuang

指導教授:王維菁 教授 Advisor:Dr. Wei-Jing Wang

國 立 交 通 大 學

統 計 學 研 究 所

碩 士 論 文

A Thesis

Submitted to Institute of Statistics

College of Science

National Chiao Tung University

in partial Fulfillment of the Requirements

for the Degree of

Master

in

Statistics

May 2011

Hsinchu, Taiwan, Republic of China

中華民國一百年五月

(3)

i

區間設限資料之統計推論

研究生:黃祥福 指導教授:王維菁 教授

國立交通大學理學院

統計學研究所

摘要

在此論文中,我們回顧區間設限資料的推論問題,為呈現概

念的建構原則,亦簡述其它類型的不完整資料.論文分兩大部份,

一部份為無母數估計,另一部份為迴歸分析.我們回顧了兩類的

無母數估計法,其中自我一致演算法可視為動差法的延伸.另一

方法為無母數最大概似估計法.我們介紹三種較廣泛使用的迴歸

模型: 包含比例風險模型,加速失敗模型和比例勝算模型.推論

的困難度在於模式存在未知函數,需要利用平滑的技巧處理之.

本論文以介紹點估計的概念為主,並未涵蓋如何由分佈理論推導

信賴區間與統計檢定問題.

關鍵字:區間設限, 無母數

(4)

ii

Statistical Inference based on

Interval Censored Data

Student:Hsiang-Fu Huang Advisor:Dr. Wei-Jing Wang

Institute of Statistics

National Chiao Tung University

Hsinchu, Taiwan

Abstract

We review inference methods for analyzing incomplete data with

focus on interval censored data. For nonparametric analysis, two

estimation approaches are examined. Self-consistency can be viewed

as an extension of the method of moment by imputing incomplete

information by its expected value. The other is the nonparametric

likelihood estimation. We also introduce three popular regression

models, namely the proportional hazards model, accelerated failure

time model, and proportional odds model. These models contain

unknown nuisance functions and different smoothing techniques are

employed to handle them in the estimation procedure. The thesis

focuses on point estimation so that second ordered properties are not

investigated.

(5)

iii

致 謝

很榮幸能夠成為王維菁 教授的指導學生,在跟老師寫論文的這

一年當中,老師總是不餘遺力的教導和督促我們.天性散漫的我受到

老師的影響,態度也變得較積極.在課業方面,老師除了幫忙解惑之外,

還訓練我如何去找問題的核心所在,進而去尋找解決問題的方法.這

讓我從一味只會全盤接受他人的想法,慢慢的進步到去思考別人的想

法的正確性,對我的獨立思考能力有相當的幫助.此外,表達能力也是

老師教學的重點.這些都是我先前相當缺乏的能力.雖然我在碩二這

一年沒有修課,但是有這些訓練,讓我覺得比在課堂上學的更多.相信

在之後的職場上也有很大的幫助.

此外也要特別感謝李博文 學長和蘇健霖 學長對我論文的協助,

我的論文才能在時間內順利完成.

黃祥福 僅誌

中華民國 一百年夏

于交通大學統計學研究所

(6)

iv

Contents

Chapter 1. Introduction ... 1

1.1 Motivation ... 1

1.2 Different types of incomplete data ... 1

1.3 Parametric analysis for interval censored data... 3

1.4 Organization of the Thesis ... 3

Chapter 2. Inference based on Nonparametric Analysis ... 4

2.1 Complete data ... 4

2.2 Right censored data ... 5

2.3 Double censored data ... 8

2.4 Interval censored data ... 12

Chapter 3. Inference based on Proportional Hazards Model ... 16

3.1 Inference under right censoring ... 16

3.2 Inference under interval censoring... 18

Chapter 4. Inference based on Accelerated Failure Time Model ... 22

4.1 Inference based on complete data ... 22

4.2 Inference based on right censored data ... 23

4.2.1 Linear rank statistics ... 23

4.2.2 Log-rank statistics ... 24

4.2.3 M-estimator ... 24

4.3 Inference under interval censoring... 25

4.3.1 Modify M-estimator ... 25

4.3.2 Method for a simplified AFT model with univariate covariate ... 27

Chapter 5. Inference based on Proportional Odds Model ... 29

5.1 Model and the likelihood ... 29

5.2 Smoothing method for approximating the baseline function ... 30

5.3 Sieve method ... 32

Chapter 6 Conclusion ... 34

References ... 35

(7)

1

Chapter 1. Introduction

1.1 Motivation

In survival analysis we are interested in studying the behavior of the time to the event of interest, denoted as T , which often cannot be completely observed. Textbooks on survival analysis focus on right censored data in which many elegant results have been derived. In the thesis, we consider interval censored data. Although such data are commonly seen in practical applications, related statistical methods are not formally taught in the classroom. The lack of an overall review at the introductory level provides the motivation of the thesis.

To enhance a thorough understanding about the methodology, we will still review related materials for complete and right censored data and address how the basic ideas are modified when the data become interval censored. We will consider two types of applications. One is nonparametric analysis of the survival function Pr(Tt). The other is regression analysis. Specifically let Z be the covariate vector which provides individual information. A regression model imposes additional assumptions on how Z affects T . We will review statistical methods for the proportional hazards (PH) model, the accelerated failure time (AFT) model and the proportional odds model when T is subject to interval censoring.

1.2 Different types of incomplete data

It is worthy to introduce different types of incomplete data and compare their differences.

(1). Right censored data

Observed variables can be expressed as

( , )

X

where X  T C and

( )

I T C

   . Thus when

1

, XT and XC; when

0

, XC and

XT (the true failure time T is located in the right-hand side of the observed time

(8)

2

Most literature in survival analysis consider right censored data.

(2) Left censored data:

We use the same notations to denote observed variables which have different definitions. One observes

( , )

X

where X  T C and  I T( C) . When

1

, XT and XC; when

0

, XC and XT (the true failure time T is located in the left-hand side of the observed time X ). Here is an example of left censoring. Denote T as the time until first marijuana use among high school boys in California. If the event had already occurred prior to the recruitment of the study, the subject is said to be left censored.

(3) Doubly censored data

Doubly censored data include three types of observations, namely “observed”, “right censored” and “left censored” ones. Let C be the right censoring variable and r

l

C be the left censoring variable. It is assumed that ClCr. Observed variables are

( , )

X

where X max(min(C T Cr, ), l) and

1

if XT;

0

if XCr; and

 

1

if XCl.

(4) Interval censored data

The observation is a random interval ( , ]L R such that T falls in the interval but its exact value is unknown unless LR. Such data can include right censoring if

R  and left censoring if L0. Interval censored are commonly seen in practice. For example a subject is under study has a regular appointment schedule (say every three months) to the hospital. The left endpoint L refers to the last follow-up time that the event has not occurred and the right endpoint R refers to the first follow-up time that the event is detected.

(9)

3

1.3 Parametric analysis for interval censored data

The thesis focuses on nonparametric and semi-parametric methods. However these methods are still motivated by the parametric approach. Here we briefly summarize the parametric analysis for interval censored data denoted as {( ,L Ri i] (i1,..., )}n . Assume that the distribution function of T is specified as

( )

F  . The likelihood function and the log-likelihood function can be written as

1 ( ) [ ( ) ( )] n i i i LF R F L  

 ( 1 . 1 ) and 1 log ( ) log{ ( ) ( )} n i i i LF R F L  

 . (1.2)

The resulting score function is

1 log ( ) ( ) ( ) ( ) ( ) ( ) n i i i i i L f R f L S F R F L             

 . (1.3)

By solving S( ) 0, we obtain the maximum likelihood estimator of  . Our analysis implies that parametric inference is straightforward for interval censored data. In fact, difficulties arise when the functional form of F( ) is completely unspecified as in the nonparametric setting or partially specified as in the semi-parametric setting.

1.4 Organization of the Thesis

Chapter 2 considers non-parametric inference and Chapter 3 to Chapter 5 discuss semi-parametric regression models, namely the proportional hazards (PH) model, the accelerated failure time (AFT) model and the proportional odds (PO) model respectively. Chapter 6 gives a brief summary of the thesis.

(10)

4

Chapter 2. Inference based on Nonparametric Analysis

The major objective is to estimate the survival function S t( )Pr(Tt)

without specifying the parametric distribution. We will review nonparametric methods based on several data structures, from the simplest to the complicated data settings. This will help us to understand the fundamental techniques more thoroughly.

2.1 Complete data

Complete data are denoted as { (T ii 1,..., )}n which form a random sample of

T . Based on the fact that S t( )E I T[ ( t)] and applying the method of moment, one can estimate it by the empirical estimator

1 1 ( ) ( ) n i i S t I T t n  

 . Although the method of moment provides a simple solution, it does not guarantee any optimality property. Formally the nonparametric likelihood estimator should be pursued as a better option. The first step is to re-write the likelihood function based on grid points and the probabilities at these points are the parameters to be estimated. Suppose that there are only M distinct observed values denoted as t(1) ... t(M). Let

( ) 1 ( ) n j i j i d I T t

 be the number of observations at t( )j and pjS t( ( )j  ) S t(( )j ). The likelihood function can be written as

( ) 1 Pr( ) n j j L T t  

 1 2 1 2 ( ) (d ) ...(d )dM M p p p  . (2.1)

The NPMLE is the solution which maximizes L p( ,...,1 pM) subject to

1 1 M j j p  

. Applying the theorem of Lagrange multipliers, let 1

1 ( ,..., ) log ( 1) M M i i g p p Lp   

and set g p( ,...,1 pM)0 Hence the Lagrange equation system is given by

1 0 ( 1,..., ) and 1 M j j j j d j M p p     

 . ( 2 . 2 )

(11)

5

The above equations can be solved easily and the solution is pˆjdj /n for 1,...,

jM and n.

2.2 Right censored data

For right censored data, observed data become {(Xi, ) (

i i1,..., )}n , where

min( , )

i i i

XC T

iI T( iCi). The survival function can be estimated by the Kaplan-Meier estimator ( , 1) ˆ( ) 1 ( ) i i i u t i i I X u S t I X u              

. The above product-limit expression is elegant since the censoring effect is cancelled out in estimation of the hazard function. However under other types of censoring mechanism, estimation of the hazards does not provide any advantage. Instead, the techniques of self-consistency and nonparametric MLE can be generalized to other data structures. Now we illustrate the self-consistency algorithm for right censored data. The self-consistency equation can be written as

1 ˆ 1 ( ) ˆ( ) ( ) ( , 0) ˆ ( ) n i i i i i S t S t I X t I X t n   S X         

1 1 ( ) (1 ) ( ) n i i i i i I X t I X t w n   

    . (2.3)

Notice that for points with Xit, the full weight 1 is assigned; for points with i

Xt and i 1, zero weight is assigned; and for points with Xit and i 0, partial weight wiS tˆ( ) / (S Xˆ i) is assigned. The estimator can be solved successively and explicitly. Since S Xˆ( ) 1i  for Xit(1) with i 0, we have

(1) ˆ( ) S t

(1) (1) (1)

1 1 ˆ ( ) (1 ) ( ) ( ) n i i i i I X t I X t S t n   

   

(12)

6

which allows us to solve S tˆ( )(1) and then successively for S tˆ( )( )j j2,...,M.

Figure 2.1 Self-consistency for right censored data.

Weight for Xi= 1; weight for Xi = ( ) (S t S Xi); weight for Xi* = 0

Now we discuss the nonparametric MLE approach for right censored data. Define t(1) ... t(M) as distinct observed failure times and {x(j1),...,x(jm j)} as

ordered censored points in the interval [t( )j ,t(j1)). Also let ( ) 1 ( , 1) n j i j i i d I X t   

  and ( ) ( 1) 1 ( [ , ), 0)) n j i j j i i m I X t t   

  . The likelihood function can be written as

( ) ( ) 1 1 Pr( ) ( ) j j d m M j jl j l L T t S x     

. ( 2 . 4 )

Since the survival function S t( ) is an non-increasing function, we can make the likelihood function larger when ( ( ))

l

j

S x is replaced by S t(( )j )¸This suggest that we can instead consider

* ( 1) ( ) ( ) 1 ( ) ( ) ( ) j j d M m j j j j L S t S t S t      

  .

(13)

7

Direct maximization of L in terms of * Pr(t(j1) T t( )j ) or S t(( )j ) is not suggested for right censored data. Alternatively L can be re-parameterized in terms *

of hazards. Specifically we have ( ) ( )

1 ( ) [1 ( )] j j i i S tt  

 and hence 1 ( 1) ( ) ( ) ( ) 1 ( ) ( ) [1 ( )] [ ( )] j j j i j i S t S ttt     

 . It follows that 1 * ( ) ( ) ( ) 1 1 1 { [1 ( )] [ ( )]} {j [1 ( )]} j j j M d m i j i j i i L

t

t

t     

 

( ) ( ) 1 [ ( ) ] [ 1j ( j ) ]j M d n d j i j t t     

 (2.5) where ( ) 1 ( ) n j i j i n I X t

 . Maximization in terms of (t( )j )j, we obtain

ˆ / j dj nj   and accordingly ( ) ( ) ( ) ˆ( ) {1 ˆ( )} {1 } j j j j t t t t j d S t t n    

 

 which is the Kaplan-Meier estimator.

(14)

8

2.3 Double censored data

Recall that observed data can be denoted as: {(Xi, ) (

i i1,..., )}n , where

max{min( , ), }

i ri i li

XC T C and

i  1 if XiCli ;

i 1 if XiTi and 0

i

  if XiCri. The self-consistency equation can be written as

1 1 ˆ( ) ( , 1) n i i i S t I X t n   

  ( , 1) ˆ( ) ˆ( ) ˆ 1 ( ) i i i i S t S X I X t S X        ˆ( ) ( , 0) ˆ ( ) i i i S t I X t S X     . ( 2 . 6 )

Turnbull (1974) suggested the following steps to implement the algorithm. First, partition the observation period: (0,t(1)], …,(t(M1),t(M)], where t(1)  ... t(M) are ordered time points. Then define three types of observations in each interval

(1) ( 1) ( ) 1 ( ( , ], 1) n j i j j i i N I X tt  

   , ( 1) ( 1) ( ) 1 ( ( , ], 1) n j i j j i i NI X tt  

    (0) ( 1) ( ) 1 ( ( , ], 0) n j i j j i i N I X tt  

   .

Turnbull’s idea is to combine “exact” and “left censored” observations by imputation. Specifically for a point in (t(k1),t( )k ] with   1, the conditional probability of falling in (t(j1),t( )j ] (for all kj) is

( ) ( 1) ( 1) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 ( ) j j j j kj k k F t F t S t S t F t S t         .

Thus the imputed value ( 1) ( )

1 ( ( , ]) n i j j i I T t t  

is given by (1) (1) ( 1)ˆ k m j j k kj k j N N N      

.

(15)

9

(1) (0)

(Nj ,Nj )(j1,..., )m which contains “pseudo-exact” and “right censored” points.

Turnbull (1974) suggested to use them to estimate the hazard probability in the interval (t(j1),t( )j ] by N(1)j /Yj where (1) (0) k m j k k k j Y N N  

 denotes the number at risk and N(1)j denotes the number of failure. The product-limit expression gives a direct relationship between the survival function and the hazard probabilities:

(1) ( ) ˆ( )j {1 i / }i i j S t N Y  

 ( 2 . 7 ).

Note that since both N(1)j and Yj are still functions of S tˆ( ),...,(1) S tˆ((M)), iterations are needed which can be summarized by the following steps.

Step 1: Give initial values: Sˆ ( ),...(0) t(1) (0) ( ) ˆ ,S (tM ); Step 2: compute : ˆkj(0) Step 3:obtain : Sˆ ( ),...(1) t(1) (1) ( ) ˆ ,S (tM );

Step 4: repeat (2) and (3) until the pre-specified convergence criteria is reached.

Figure 2.3 Self –consistency for double censored data

(16)

10

We discuss the NPMLE approach for doubly censored data. The following result is summarized form the work of Mykland and Ren (1996). Let the log-likelihood function of complete data be

1 l o g ( ) n c i i l P T t  

 ( 2 . 8 )

When the failure time ( ,...,T1 T is known, the NPMLE is easy to obtain as in n) section 2.1. But under doubly censoring, we cannot observe all the failure time. Hence the EM algorithm is applied. To obtain the iteration, we need to calculate

0 0 ( ) 1 ( | ) [log ( ) | , ] n P i i i i Q P P E P T t X   

where P0 is the initial distribution of T . Here we do not group the observed data as

in Turnbull(1974). Instead, we assume that the distribution of T has mass only at each observations X(1)  X( )n . Then we discuss the following conditions.

(1) 0[log ( ) | ( ), ( ) 1] log ( ( )) P i i i i i E P Tt X    P TX (2) ( ) ( ) 0 (0) ( ) ( ) ( ) (0) [log ( ) | , 0] log ( ) i j X X j P i i i j i E P T t X P P T X

S

     

(3) ( ) () 0 (0) ( ) ( ) (0) ( ) [log ( ) | , 1] log ( )

1

i j X X j P i i i i j E P T t X P P T X

S

      

where (0) i P and (0) i

S is the initial probability and survival at X( )i , respectively. Hence for all distinct points w1w2 wM of the set {X(1)  X( )n},

(0) ( ) 0 ( ) (0) 1 ( | ) ( 1) log ( ) ( 0) log ( ) i j j n i i i i i j i Q P P I P T X I P P T X

S

             

(0) (0) ( ) ( 1) log ( )

1

j i j i i j I P P T X

S

         

(17)

11 ( ) ( ) (0) (0) 1 1 ( 0, ) ( 1) i n n i j i j j i i I X X I P

S

          

( ) ( ) (0) ( ) (0) 1 ( 1, ) log ( ) i n i j j j i i I X X P P T X

S

        

( ) (0) (0) 1 1 ( i 0, ) M n i k k k k k i i I X w P

S

          

( ) (0) (0) 1 ( 1, ) log ( ) i n i k k k k i i I X w P P T w

S

         

where ( ) 1 ( , 1) n k i k i i I X w    

  , ( ) 1 ( ) n k i k i I X w   

 and 0 0 ( ) k k PP Tw . For simply the exception Q P P( | 0) we define

( ) ( ) ( ) (0) (0) (0) (0) 1 1 ( i 0, ) ( i 1, ) n n i k i j k k k k k k i i i i I X w I X X P P

S

S

              

and ( k) k 1,..., P Twq kM

Hence Q P P( | 0) can be re-expressed as

1 log M k k k q  

. Then the M-step is given by 1 m a x l o g M k k k q  

subject to 1 1 M k k q  

(2.9)

Applying the theorem of Lagrange multiplier, the maximization is attained at

ˆ

k k

/

1,...,

q

n k

M

.

The steps of iterations are summarized below:

Step 1: set

q

ˆ

k(0)

1/

M

(

k

1,...,

M

)

; Step 2: set

q

ˆ

k( )i

k

/

n

(

k

1,...,

M

)

;

(18)

12

2.4 Interval censored data

Recall that we observe {( ,L Ri i] (i1,..., )}n and know that Ti( ,L Ri i]. Turnbull (1976) discussed nonparametric estimation of Pr(Tt) based on interval censored data. The first crucial step is to construct ordered disjoint intervals in which the probability can be estimated. To achieve this goal, the data {( ,L Ri i](i1,..., )}n

are re-arranged in an ascending order and the indentify the intervals such that a left-endpoint and a right-endpoint are adjacent to each other. Denote the disjoint intervals as {( , ] (l rj j j1,..., )}m . It can be shown that only these intervals can receive positive mass. Note that an observation ( ,L Ri i] can occupy more than one intervals ( , ]l rj j for some j . Denote the ijI{( , ]l rj j ( ,L Ri i)} which indicates whether ( ,L Ri i] overlaps with ( , ]l rj j . Denote pjF r( )j F l( )j as the mass in

( , ]l rj j .

The self-consistency equation for an estimator of pj satisfies

1 1 1 1 1 1 ... n n ij j j ij i i im m i p p w n p p n         

(j1,..., )m ( 2 . 1 0 )

where w can be viewed as the proportion contributed by ij ( ,L Ri i] in the estimation

of pj and hence 1 1 m ij j w  

.

We discuss the NPMLE for interval censored data and see how it relates to the self-consistency solution. The likelihood based on the original data can be written as

1 ( ) ( ) n i i i L F R F L  

 ( 2 . 1 1 ) which can be re-expressed as

(19)

13 * 1 1 n m ij j j i

L

p

 

where

0

p

j

1

and 1

1

m j j

p

. Notice that the summation in

L

* creates numerical difficulty in the maximization. The idea of EM is employed to solve the problem. If the variable

I

ij

I T

(

i

( , ])

l r

j j can be known, the likelihood based on this “complete” information is given by

* 1 1 ij n m I j i j

L

p

 

 

and the corresponding log-likelihood is

* 1 1

log

log

n m ij j i j

L

I

p

 



.

Maximization of

log L

* subject to

1

1

m j j

p

reduces to the multinomial problem. Applying the theorem of Lagrange multiplier, maximization is attained at

1

ˆ

/

n j ij i

p

I

n

.

However

I

ij is not known and the “E-step” imputes its value by

1 1 1 ( ,..., ) ... ij j ij m i im m p w p p p p       .

Consequently the “M-step” maximizes

1 1 1

( ,...,

) log

n m ij M j i j

w p

p

p

 



. ( 2 . 1 2 )

(20)

14 Maximization is attained at 1

ˆ

(

n j ij i

p

w

p

ˆ ,...,

1

p

ˆ ) /

m

n

which is equivalent to the self-consistency equation. The steps of iterations are summarized below.

Step 1: set

p

ˆ

(0)j

1/

m

(

j

1,..., )

m

; Step 2: set ( ) 1

ˆ

(

n k j ij i

p

w

( 1) 1

ˆ

k

,...,

p

p

ˆ

m(k1)

) /

n

;

Step 3: repeat

k

2,3,...

(Step 2) for until convergence

Figure 2.4 Construction of the mass interval censored data and the idea of self consistency. Weight for( ,L R in estimating1 1] ˆP ,1 ˆP ,2 ˆP3 are 1P1 (1    P1 1 P2 0 P3)

2 1 2 3

1P (1    P 1 P 0 P) and 0 ,respectively.

A different algorithm which directly estimates the survival function was suggested by Turnbull (1976). His idea is based on the product-limit expression of

(21)

15

S(t). The hazard probability in the interval ( , ]l rj j can be estimated by dj/Yj

where k m j k k j Y d   

and 1 n ij j j i k ik k p d p   

 

. The algorithm is given below.

Step 1: set (0) 1 n ij j i k ik

d

 

, (0) (0) k m j k k j Y d   

(

j

1,..., )

m

Step 2: set ( ) ( ) ( )

ˆ

l

{1

jl

}

j l i j j

d

S

Y

;

(22)

16

Chapter 3. Inference based on Proportional Hazards Model

The proportional hazards model specifies the effect of covariate on the hazard of

T such that 0 1 ( ) lim Pr( [ , ) | , ) Z t T t t T t Z         0( ) e x p (t ZT , ) ( 3 . 1 )

where 0( )t is an arbitrary baseline hazard rate and  is the vector of parameter . The survival function can be written as

exp( )

0 0

0 0

( ) exp( t ( ) ) exp( t ( ) exp( T ) ) ( ) Z

Z Z

T

S t  

u du  

u ZduS t

where S tZ( )Pr(Tt Z| ). The main objective is to estimate  and usually estimation of 0( )t is of less interest except for the purpose of prediction.

3.1 Inference under right censoring

Right censored data can be denoted as (Xi, ,i Zi) (i1,..., )n where i i i

X  T C , iI T( iCi), and Zi is the covariate. Due to the expression of the likelihood, 1 1 1 ( ) ( ) ( ) i ( ) i i n n Z i Z i i i i i Z Z x S x L f xS x      

, ( 2 . 2 )

the likelihood function under the PH model can be rewritten as

0 0 0

0 1

) ( ) exp( ) exp( ( ) exp( ) )

( , i i n T T i i x x Z u Z du L

 

      

.

Note that under the semi-parametric setting, the parametric structure of 0( )t is

unspecified.

The partial likelihood approach based on right censored data allows one to estimate  without dealing with 0( )t . Let t(1),...,t(M) be the ordered observed failure times. Let Ai be the set containing the information about who fails at t ( )i

(23)

17

occurs at time t( )i . The likelihood function can be expressed in terms of the sets

{(A Bj, j)(j1,...,M)} such that 1 1 2 2 Pr(B A B A ... BM AM)

L

       ( ) ( 1 ) ( 1 ) ( 1 ) 1 P r ( | ) P r ( | ) M j j j j j j j A B AB BA   

  ( ) ( 1 ) ( 1 ) ( 1 ) 1 1 Pr( | ) Pr( | ) M M j j j j j j j j A B AB BA    

where A( )j (A1 ... Aj) and B( )j (B1 ... Bj). Notice that

( ) ( 1) Pr(A Bj | j ,A j ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) j k j j j j j k R t z z t dt t dt    

( ) 0 ( ) ( ) ( ) 0 ( ) ( ) ( ) ( ) exp( ) ( ) exp( ) j T j j j T j k j k R t t Z dt t Z dt      

( ) ( ) ( ) exp( ) exp( ) j T j T k k R t z z    

in which 0( )t disappears in the last equation. It can be shown that ignoring the

second component of

L

still leads to an unbiased estimating equation. Specifically the partial likelihood

( ) ( 1 ) 1 ( ) P r ( | ) M j j c j i LA B A   

 ( ) 1 (( )) exp( ) exp( ) T M j T j k k Rt j Z Z     

 

( 3 . 3 )

which yields the score function:

( ) ( ) 1 ( ) ( ) ( ) exp( ) ( ) log ( ) exp( ) T k k M k R c j T j k k R j j t t Z Z U L Z Z            

.

The estimator of  is the one which solves U( ) 0. Breslow (1975) suggested to estimate 0 0 0 ( ) ( ) t tu du  

by

(24)

18 1 0 0 1 ( , 1 ) ( ) ˆ ( ) exp( ) n t i i i n T i i i I X u t I X u Z         

. ( 3 . 4 ) 3.2 Inference under interval censoring

Finkelstein (1986) was the first author to address the inference of proportional hazards models for interval censored data. Observed data can be written as ( ,L R Z i i, i) (i1,..., )n . It is assumed that

T

i and ( ,L R are independent given i i)

i

Z

. The likelihood function can be written as

 

0

 

0

 

1 1 [ ( )] { }iT { }iT i i n n Z Z Z i Z i i i i i L S L S R S LS R      

 

( 3 . 5 )

where S tZ( )Pr(Tt Z| ). Unlike the case of right censoring, there is no way to remove S0(.) from (3.5). This means that we need to estimate  and S0(.) jointly. The likelihood function in (3.5) is represented based on original observations. For the purpose of maximization when the nuisance function S0(.) is involved, this function will be expressed in terms of grid points. Figure 3.1 shows the construction of grid points based on an example containing four observed intervals ( ,L R i i] (i1, 2,3, 4). In this example, t(1)t(2)  t(8) are formed. In general, we let

(0) (1) ( 1)

0tt  tM   be ordered grid points. The i th observation to the likelihood (3.5) can be re-expressed as

1 ( 1) ( ) 1 [ ( ) ( )] i i M ij Z j Z j j S t S t     

1 exp( ) exp( ) 0 ( 1) 0 ( ) 1 [ ( ) iT ( ) iT ] M Z Z ij j j j S tS t      

where ij 1 if (t(j1),t( )j ]( ,L Ri i] and 0 otherwise. Accordingly the likelihood in (3.5) can be written as 1 exp( ) exp( ) 0 ( 1) 0 ( ) 1 1 [ ( ) iT ( ) iT ] n M Z Z ij j j j i LS tS t      

 .

(25)

19

Figure 3.1 Construction of the mass interval censored data

Note that S0(.) is a function but the maximization of L will only be evaluated on its value at the grid points, namely S t0(( )j ) for j1,...,M . Since the parameters

0(( )j )

S t for j1,...,M have natural constraint i.e.,0S t0((1)) S t0((M)) 1 , it

make the mle hard to obtain. Finkelstein (1986) suggested that one can re-parametrize the S t0(( )j ) by k log[ log S t0(( )k )]. It can remove the range restriction of the parameters S t0(( )j ), but it doesn’t remove the order restriction.

Sun (2006) proposed a corrective method that can remove the nature constraint. The following is the detail of the method. Based on the product-limit expression, one can write 0(( )j ) S t 0 ( ) 0 (1 ( )) j k k t  

   .

Thus L becomes a function of  and  0( t(1)),..., 0( t(M)) . Note that

0( t( )k ) (0,1)

   is a conditional probability. To remove the natural constraint, we can re-parameterize  0( t( )k ) by setting

exp( ) 0( ( )) 1 k k t e      .

(26)

20

Figure 3.2 shows that by the above transformation,  k  . Thus

0 exp( ) 0 ( ) 0 exp( ) ( ) e j k k k j j k S t e      

  .

Figure 3.2 The relationship between Λ (Δ0 t( )k )andα k

Finally, let 0 exp( ) j j k k     

,

for j1,...,M and 0  and M1 . We have obtained

1 1 1 1 { e x p [ e x p ( ) ] e x p [ e x p ( ) ] } n M T T k ij i j i j j i LZ   Z       

   ( 3 . 6 )

Thus the log of likelihood becomes

1

1

1 1

log {exp[exp( ) ] exp[exp( ) ]} n M T T ij i j i j i j lZ   Z       

 

   .

Note that the unknown parameters are ( , 1,...,M), all of which have no additional constraints. The estimation process is summarized below.

Denote the score statistics as ( , )T T T l l U     

  . The score functions of  and  are given by

(27)

21 1 1 , 1 , 1 1 ( ) n M i i ij i j i j i j l Z gf f          

and 1 , , 1 n i i j i j i j l g b c      

( j1,...,M ) where 1 ( 1) ( ) 1 [ ( ) ( )] i i M i ij Z j Z j j gS t S t    

 , , (( )) log (( )) i i i j Z j Z j fS t S t , fi,0fi M,10, , exp( ) T i j j i b   Z  , 1 , ( 1) i(( )) M i j ik ik Z k k j c   S t   

 and i m2 0. The objective is to

solve U0 so that the likelihood can be maximized. The Newton-Raphson method can be applied to obtain the solution. Let 11 12

21 22

[A A ]

A

A A

 as the observed Fisher information matrix. The elements of A11 include

2 , , , , , , , 2 1 ( ) n i j i k i j i k i j i k i j i j k i i b b c c b b c l g g        

for jk. The elements of 12 21T AA include 2 1 1 , 1 , , 1 , 2 , 1 , 1 1 { [ ( ) ] ( )} n M M i j i i j i i j il il i l il i l i l i l j l j i c l Z b g c f f f g                

.

The elements of A include 22

2 1 1 1 1 , 1 , , 1 , 1 1 1 2 ( ) ( ) n M M T i i i ij i j i j i ij i j i j T j j i l Z Z gf f gh h                 

where , , log i(( )) , i j i j Z j i j hf S tf .

Denote Xn as the estimate of ( , )  in the n -iteration, the Newton-Raphson algorithm can be summarized as follow:

1 1

K K K K

X XA U (K 0,1,...),

where AK and UK are A and U evaluated at the K th estimate of ( , )  . The procedure is stopped when XK1XK is close to zero.

(28)

22

Chapter 4. Inference based on Accelerated Failure Time Model

The accelerated failure time (AFT) is a popular alternative to the PH model. An AFT model can be written as

YZT  ( 4 . 1 )

where Y logT and  is the error variable with the density function f(.). Existing research focuses on semi-parametric estimation of  without specifying the form of f(.). We will briefly review inference methods developed for complete data and right censored data and then focus more on interval censored data.

4.1 Inference based on complete data

Consider the transformed variable under the error scale,  i( ) Yi ZiT for 1,...,

in. Note that when 0 is the true value, { ( i 0) (i1,..., )}n form an iid sample which becomes a useful property to construct nonparametric inference methods. Let  (1)( )(2)( )  ( )n ( ) be the order statistics with the corresponding covariates denoted as Z(1),Z(2),...,Z( )n . Define c as the score of i Z ( )i satisfying 1 0 n i i c  

. Consider the linear rank statistic of the form: ( ) 1 ( ) n i i i vZ c  

. ( 4 . 2 )

Note that when   0, Pr(Z( )iZk)1/n. As a result,

0 ( ) 1 1 ( ( )) ( ) 0 n n i i i i i E vc E Z c Z   

 .

This implies that one can estimate  by solving the equation v( ) 0. The error distribution and the form of c both affect the efficiency of the resulting estimator. i

The Wilcoxon statistic corresponds to 1

2 ( 1) 1

i

ci n   and the log-rank statistic corresponds to 1 1 1 i j j n   

(29)

23

4.2 Inference based on right censored data

Right censored data are denoted as {(Xi, ,i Zi) (i1,2,..., )}n . Let

( ) T

i Yi Zi

    , where Yi log(Xi). Censored data under the error scale can be written as {( ( ), ) (  i i i1,..., )}n . Note that {( (  i 0), ) (i i1,..., )}n form a random sample of { (  0), } which is a censored version of  ( 0). We discuss two methods of modifying the linear rank statistics introduced earlier.

4.2.1 Linear rank statistics

Let  (1)( )(2)( )  ( )k ( ) be the distinct uncensored ordered error variables and let 1( ),..., ( )

i

i im

    be censored error variables in   ( )i ( ),ˆ ( 1)i ( )ˆ

. Following the book of Kalbfleisch and Prentice (2002), the modified linear rank statistics can be written as

( ) 1 1 ( ) ( ) i m k i i i ij i j vc Z C Z   

( 4 . 3 )

where Z( )i and Zij represent the corresponding covariates of  ( )i ( ) and  ij( )

respectively. The requirement is given below:

1 ( ) 0 k i i i i c m C   

.

The efficiency of the resulting estimator depends on the underlying error distribution, the censoring distribution and the forms of c and i Ci. For the Wilcoxon statistics, Prentice (1978) suggested that

1 1 2 1 i j i j j n c n    

and 1 1 1 i j i j j n C n    

,

(30)

24

4.2.2 Log-rank statistics

As mentioned earlier the log-rank statistics can be expressed as a special linear rank statistics but it has its own representation as follows:

1 1 ( ( ) ( ) ) ( ) ( ( ) ( )) n j i j j LR i i n i j i j I Z U Z I                          

. ( 4 . 4 ) 4.2.3 M-estimator by Ritov (1990)

The idea was motivated by the parametric likelihood analysis. The likelihood function and log-likelihood based on ( ( ), ) (  i i i1,..., )n can be written as 1 1 ( , ) ( ( i) ) ( (i ) ) n i i i Lf f    S     

( 4 . 5 ) and 1 ( , ) log ( ( )) (1 ) log ( ( )) n i i i i i lff    S    

 

respectively. Taking derivative respect to , we get

1 ( , ) ( ( )) ( ( )) [ ( ) (1 ) ( )] ( ( )) ( ( ) n i i i i i i i i i l f f f Z Z f S                       

.

The difficulty comes from the fact that f , f and S are all unknown functions. The idea of M-estimator is to replace ( )

( ) f f     

 by g( ) which is a known function.

Accordingly l( , f)    can be replaced by ( ) 1 ( ) ( ) ( , ) [ ( ( )) (1 ) ] ( ( )) i n M i i i i i i g u dS u U S Z g S                

 

( 4 . 6 )

where S( ) can be estimated by the following product-limit estimator

1 1 ( ( ) , 1) ˆ ( ; ) 1 ( ( ) ) n j j j n u t j j I u S t I u                        

.

(31)

25

the resulting estimator. Two common choices of g( ) are g u( )u or

( ) | | . k if u k g u u if u k k if u k        

Note that if the error distribution is specified, we have

( ) t f  

fx dx ( ) ( ) ( ) t f x f x dx f x       

( ) ( ) t g x dF x   

which suggests the best form of g(.) is (.)

(.) f f   

since it yields the maximum

likelihood estimator.

4.3 Inference under interval censoring

Consider interval censored data ( ,L R Zi i, i) (i1,..., )n which can be transformed under the error scale such that  iL( )logLiZiT and

( ) log

R T

i Ri Zi

    . Now we discuss how to extend the ideas of previous methods to interval censored data.

4.3.1 Modify M-estimator by Rabinowitz et al. (1995)

Consider the log-likelihood function

1 ln[ { ( )} { ( )}] n R L i i i L F   F    

 . The MLE of  can be obtained by solving the score function 1 { ( ) } { ( ) } ( ) 0 { ( ) } { ( ) } R L n i i i R L i i i f f S Z F F                  

( 4 . 7 )

and it follows that E S( (0))0. Since the forms of f and F are un-specified, one can replace the middle part of S( ) by

[ { ( )}] [ { ( )}] ( , ) { ( )} { ( )} R L i i i R L i i g F g F c F F F                 

where the function g with domain [0,1] satisfies g(0)g(1)0. Thus one can estimate  by solving

(32)

26 1 ( , ) ( , ) 0 n i i i SFcF Z  

 .

In Appendix, we show why the restrictions on g makes E S[ (0,F)]0. Again the

form of g(.) and the underlying error distribution affect the efficiency of the resulting estimator. The best choice which leads to the maximum likelihood estimator is gf F1.

Recall that in Section 4.2.3, the nuisance function S has an explicit product-limit estimator and hence UM( , Sˆ(. | )) 0 is solved. However for interval censored data, there is no explicit formula for estimating S or F. One way of estimating F is by maxing the likelihood based on

{(   iL( ), iR( ))(i1,..., )}n . The grid intervals in which F receives masses can be constructed similar to the setup in Section 2.4. Denote tk( ) (tkL( ), tkU( )] for

1,...,

km as the mass intervals. The next step is to express the likelihood based on

{(   iL( ), iR( ))(i1,..., )}n in terms of {(tkL( ), tkU( )], k1,..., }m . Define

, ( ) max{ , ( ) : , ( ) ( )}

L

i L j L j L i

t   tt    and ti U, ( ) min{tjR( ) : tjR( )  iR( )} . That is (ti L, ( ), ti R, ( )] becomes a new representation of (   iL( ), iR( )] which will be a union of consecutive tk( ) . The corresponding log-likelihood function becomes , , 1 ln[ { ( )} { ( )}] n i U i L i l F tF t   

 subject to , , {i U( )} {i L( )} [0,1] ( 1,2,..., ) F t bF t bin and { ( )} [0,1] (j 1,2,..., ) F t   jm .

(33)

27

Given an estimate of , the above maximization procedure can be performed. The resulting estimate of F, denoted as ˆF, will be plugged into the following

estimating function to solve for the next estimate of : S( ,Fˆ ) 1 ˆ ( , ) 0 n i i i cF Z  

. ( 4 . 8 )

4.3.2 Method for a simplified AFT model with univariate covariate

Besides the methods we discussed earlier, Li and Pu (2003) developed an interesting way to estimate . Consider a simplified AFT model:

i i i i

Y Z  , (i1,...n).

where Zi is a one-dimensional covariate. The main idea of this paper is based on the assumption that i and Zi are uncorrelated. Kendall’s provides a rank-invariant measure for assessing the association between two variables. Suppose that ( ,i Zi) and ( ,j Zj) are independent realizations from ( , ) Z . The pair is concordant if

{( i j)( i j) 0}

I   ZZ and discordant if I{( ij)(ZiZj)0}. The population version of Kendall’s tau is defined as

Pr{( i j)(Zi Zj) 0} Pr{( i j)(Zi Zj) 0}

           .

The sample estimate of  is

{( )( ) 0} {( )( ) 0} / 2 2 i j i j i j i j i j I Z Z I Z Z n                    

.

If complete data are available, one can solve 1 {( ( ) ( ))( ) 0} {( ( ) ( ))( ) 0} 0 ( 1)i j i j i j i j i j I Z Z I Z Z n n                  

to estimate  . However  i( ) YiiZi is subject to interval censoring such that we only know that  i( )(   iL( ), iR( )]. Some interval observations provide the

(34)

28

complete information for the order of  i( ) and  j( ). Notice that if

( ) ( )

R L

i j

    , then  ij.

Figure 4.1 The relative position of ε and i ε when j εiR( )β < ε βjL( )

Hence the total number of known concordant pairs becomes

{ ( ) ( R( ) L( )) ( ) ( L( ) R( ))} i j i j i j i j i j I Z Z I     I Z Z I          

and the total number of known discordant pairs is

{ ( i j) ( iL( ) jR( )) ( i j) ( iR( ) jL( ))} i j I Z Z I     I Z Z I          

.

The modified Kendall τ coefficient can be write as 1 [ ( ) ( )][ ( ( ) ( )) ( ( ) ( ))] ( 1) R L L R i j i j i j i j i j I Z Z I Z Z I I n n                

.

The resulting estimation function is given by

( ) [ ( ) ( )][ ( R( ) L( )) ( L( ) R( ))] n i j i j i j i j i j KI Z Z I Z Z I     I      

     

This method has two major drawbacks. One is the restrictive assumption that Z is

univariate. The other is the lack of efficiency if the data contains very few orderable paired intervals.

數據

Figure 2.1 Self-consistency for right censored data.
Figure 2.2:Idea for modifying the likelihood  L
Figure 2.3 Self –consistency for double censored data
Figure 2.4 Construction of the mass interval censored data and the idea of self  consistency
+6

參考文獻

相關文件

• Extension risk is due to the slowdown of prepayments when interest rates climb, making the investor earn the security’s lower coupon rate rather than the market’s higher rate.

• A delta-gamma hedge is a delta hedge that maintains zero portfolio gamma; it is gamma neutral.. • To meet this extra condition, one more security needs to be

For obvious reasons, the model we leverage here is the best model we have for first posts spam detection, that is, SVM with RBF kernel trained with dimension-reduced

• Extension risk is due to the slowdown of prepayments when interest rates climb, making the investor earn the security’s lower coupon rate rather than the market’s... Prepayment

• As all the principal cash flows go to the PAC bond in the early years, the principal payments on the support bond are deferred and the support bond extends... PAC

FIGURE 5. Item fit p-values based on equivalence classes when the 2LC model is fit to mixed-number data... Item fit plots when the 2LC model is fitted to the mixed-number

O.K., let’s study chiral phase transition. Quark

We showed that the BCDM is a unifying model in that conceptual instances could be mapped into instances of five existing bitemporal representational data models: a first normal