We now illustrate two forms of ˜F (t|Z) which are the most popular choices of regression models in survival analysis. We will also propose model checking procedures to verify the validity of the model assumption. To simplify the presentation, we focus on the two-sample case with Z = 0, 1.
Assume the survival function ˜S(t) follows the form of a PH model such that
S(t|Z = 1) = ˜˜ S(t|Z = 0)k, (15)
where k is a pre-determined constant. Notice that S(t|Z) = exp
−θ(z) ˜F (t|Z)
= exp
θ(z) ˜S(t|Z) − 1
. (16)
Thus we have
log S(t|Z)
θ(z) + 1 = ˜S(t|Z). (17)
It follows that
log S(t|Z = 1)
θ(1) + 1 = ˜S(t|Z = 1) =n ˜S(t|Z = 0)ok
= log S(t|Z = 0) θ(0) + 1
k
,
Accordingly
log log S(t|Z = 1) θ(1) + 1
= k log log S(t|Z = 0) θ(0) + 1
. (18)
Notice that equation (18) now involves only estimable quantities. The function S(t|z) can be estimated by the Kaplan-Meier estimator
S(t|Z = z) =ˆ Y
0<u≤t
1 −
Pn
i=1I(Xi = u, δi = 1, Zi = z) Pn
i=1I(Xi ≥ u, Zi = z)
.
Denote ˆθ(z) be the nonparametric estimate of θ(z). Define
Xi = log log ˆS(ti|z = 1) θ(1)ˆ + 1
!
and
Yi = log log ˆS(ti|z = 0) θ(0)ˆ + 1
! ,
where t1 < t2 < · · · < tD are ordered failure points. If the PH assumption holds, equation (18) indicates that Y follow a linear relationship passing through the origin. We present some plots of ˆS(t|Z) for z = 0, 1 and the corresponding diagnostic plot of Xi versus Yi for i = 1, 2, · · · , D based on 1000 simulated observations.
The diagnostic plots in Figure 5-2 reveal clear linear pattern for most points. Figure 5-3 presents the plots when the PH assumption is violated. There is a curved relationship between Xi and Yi.
0 1 2 3 4 5
0.00.20.40.60.81.0
Time
Survival
Gamma( 1 , 1 ),cure rate= 0.3 ,censore= 0.4 Gamma( 1 , 2 ),cure rate= 0.5 ,censore= 0.58
●
PH model
x
Weilbull( 1 , 3 ),cure rate= 0.3 ,censore= 0.35 Weilbull( 1 , 2 ),cure rate= 0.5 ,censore= 0.53
●
PH model
x
y
(b) Case II
Figure 5-2: K-M curves and diagnostic plots when the PH assumption holds.
0 5 10 15 20
0.00.20.40.60.81.0
Time
Survival
Log−normal( 0.3 , 5 ),cure rate= 0.3 ,censore= 0.36 Log−normal( 1 , 0.1 ),cure rate= 0.4 ,censore= 0.53
●●
PH model
x
Log−normal( 1 , 1 ),cure rate= 0.3 ,censore= 0.62 Log−normal( 2.5 , 0.25 ),cure rate= 0.4 ,censore= 0.79
●●
PH model
x
y
(b) Case II
Figure 5-3: K-M curves and diagnostic plots when the PH assumption is violated.
5.3 Short Term Effect: Accelerated Failure Time Model
Under the AFT model, the survival function ˆS(t|z) for z = 0, 1 follows the relationship S(t|Z = 1) = ˜˜ S(kt|Z = 0)
for k being a prespecified constant. It follows that S(t|z = 1) = expn
θ(1)h ˜S(t|z = 1) − 1io
= expn
θ(1)h ˜S(kt|z = 0) − 1io . Accordingly we have
log S(t|Z = 1)
θ(1) + 1 = ˜S(t|Z = 1) = ˜S(kt|Z = 0) = log S(kt|Z = 0)
θ(0) + 1. (19)
We want to find a clear relationship from the above equation.
Define p1 < · · · < pM as some constants locating in (0, 1). Then we solve Xi satisfying log S(xi|Z = 1)
θ(1) = pi, i = 1, 2, · · · , M . Thus, for each pi we have
Xi =SZ=1−1ˆ n exp
piθ(1)ˆ o .
Similar steps can be derived based on the right-hand side of equation (19). Set Yi =SZ=0−1ˆ n
exp
piθ(0)ˆ o
for i = 1, 2, · · · , M . When the AFT model holds, (Xi, Yi) (i = 1, 2, · · · , M ) will follow a straight line through the origin. Plots following the AFT model are presented in Figure 5-4, in which n=1000.
The diagnostic plots in Figure 5-4 reveal clear linear pattern for most points. Figure 5-5 presents the plots when the AFT assumption is violated. There is a curved relationship between Xi and Yi.
0 1 2 3 4 5
0.00.20.40.60.81.0
Time
Survival
Gamma( 1 , 3 ),cure rate= 0.3 ,censore= 0.38 Gamma( 1 , 3 ),cure rate= 0.5 ,censore= 0.57
●
Gamma( 2 , 3 ),cure rate= 0.3 ,censore= 0.41 Weilbull( 2 , 5 ),cure rate= 0.5 ,censore= 0.64
●
Figure 5-4: K-M curves and diagnostic plots when the AFT assumption holds.
0 1 2 3 4 5
0.00.20.40.60.81.0
Time
Survival
Weilbull( 0.5 , 1 ),cure rate= 0.4 ,censore= 0.47 Log−normal( 1 , 5 ),cure rate= 0.3 ,censore= 0.62
●
Gamma( 5 , 2 ),cure rate= 0.4 ,censore= 0.47 Log−normal( 1 , 5 ),cure rate= 0.3 ,censore= 0.48
●
Figure 5-5: K-M curves and diagnostic plots when the AFT assumption is violated.
Chapter 6 Conclusion
In the thesis, we study the non-mixture approach for analyzing survival data in presence of cure. This formulation has an interesting biologial interpretation. In parametric analysis, we find that outliers in T which often occur when N = 1 will affect estimation of the cure rate. For nonparametric inference, we propose two algorithms to solve the score funcions of nonparametric MLE. One is the classical Lagrange multiplier method and the other is by change of variables. Two regression models are considered under the simplified two-sample setting. One is the proportional hazard model and the other is the accelerated failure time model. We propose diagnostic plots which can verify the form of regression effect.
References
[1] J. Berkson and R. P. Gage (1952). Survival Curve for Cancer Patients Following Treatment. J. Amer. Statist. Assoc. 47, 501-515.
[2] Boag, JW. (1949). Maximum Likelihood Estimates of the Proportion of Patients Cured by Cancer Therapy. J. Royal Statist. Soc., Series B 11:15-44.
[3] Tsodikov, A. D. (1998). A Proportional Hazards Model Taking Account of Long-Term Survivors. Biometrics 54, 1508-1516.
[4] Tsodikov, A. D. (2001). Estimating of Survival Based on Proportional Hazards When Cure is a Possibility. Mathematical and Computer modelling. 33, 1227-1236.
[5] Tsodikov, A. D., Ibrahim, J. G. and Yakovlev A. Y. (2003). Estimating Cure Rates From Survival Data : An Alternative to Two-Component Mixture Models . E. J. Am.
Statist. Assoc. 98, 1063-1078.
Appendix: Additional Figures
0 5 10 15 20
0.00.40.8
survival function
t ST|Z((t))
Gamma( 1 , 1 ) Gamma( 4 , 1.5 ) Gamma( 6 , 2 )
0 5 10 15 20
0.00.10.20.30.4
density function
t fT|Z((t))
Gamma( 1 , 1 ) Gamma( 4 , 1.5 ) Gamma( 6 , 2 )
0 5 10 15 20
0.00.10.20.30.4
hazard function
t hT|Z((t))
Gamma( 1 , 1 ) Gamma( 4 , 1.5 ) Gamma( 6 , 2 )
0 5 10 15 20
0.00.40.8
survival function
t ST|Z((t))
Weibull( 1 , 3 ) Weibull( 2 , 10 ) Weibull( 4 , 15 )
0 5 10 15 20
0.00.10.20.30.4
density function
t fT|Z((t))
Weibull( 1 , 3 ) Weibull( 2 , 10 ) Weibull( 4 , 15 )
0 5 10 15 20
0.00.10.20.30.4
hazard function
t hT|Z((t))
Weibull( 1 , 3 ) Weibull( 2 , 10 ) Weibull( 4 , 15 )
0 5 10 15 20
0.00.40.8
survival function
t ST|Z((t))
Log−normal( 1 , 1 ) Log−normal( 2 , 0.4 ) Log−normal( 2.5 , 0.25 )
0 5 10 15 20
0.00.10.20.30.4
density function
t fT|Z((t))
Log−normal( 1 , 1 ) Log−normal( 2 , 0.4 ) Log−normal( 2.5 , 0.25 )
0 5 10 15 20
0.00.10.20.30.4
hazard function
t hT|Z((t))
Log−normal( 1 , 1 ) Log−normal( 2 , 0.4 ) Log−normal( 2.5 , 0.25 )
Figure A-1: Survival density and hazard functions for selected parametric distributions.
0 1 2 3 4
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.33
0.0 0.5 1.0 1.5 2.0
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.45
0.0 0.5 1.0 1.5
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.49
Figure A-2: Estimated survival functions when ˜F is correctly specified as Gamma(1,1)
1 2 3 4
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.34
1 2 3 4 5 6
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.48
1 2 3 4
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.64
Figure A-3: Estimated survival functions when ˜F is correctly specified as Gamma(4,1.5)
1 2 3 4 5 6
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.3
1 2 3 4 5
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.42
1 2 3 4
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.51
Figure A-4: Estimated survival functions when ˜F is correctly specified as Gamma(6,2)
0 2 4 6 8 10 12
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.33
0 1 2 3 4 5 6
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.38
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.53
Figure A-5: Estimated survival functions when ˜F is correctly specified as Weibull(1,3)
5 10 15
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.36
5 10 15 20
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.47
5 10 15
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.6
Figure A-6: Estimated survival functions when ˜F is correctly specified as Weibull(2,10)
4 6 8 10 12 14 16 18
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.33
5 10 15 20
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.47
6 8 10 12 14 16 18
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.63
Figure A-7: Estimated survival functions when ˜F is correctly specified as Weibull(4,15)
0 2 4 6 8 10 12 14
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.34
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.43
0 1 2 3 4
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.51
Figure A-8: Estimated survival functions when ˜F is correctly specified as log-normal(1,1)
2 4 6 8 10 12 14
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.3
2 4 6 8 10 12 14
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.38
4 6 8 10 12
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.5
Figure A-9: Estimated survival functions when ˜F is correctly specified as log-normal(2,0.4)
10 15 20
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.39
8 10 12 14 16 18
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.49
8 10 12 14 16 18
0.00.20.40.60.81.0
Time
S(t)
True MLE NPMLE
p= 0.3 , censoring rate = 0.57
Figure A-10: Estimated survival functions when F˜ is correctly specified as log-normal(2.5,0.25)
Appendix: Simulation Results
Table A-1: Relationship between C∼Unif[0,k] and Pr(δ = 0) for Gamma Distributions Gamma(1,1) k Gamma(4,1.5) k Gamma(6,2) k
p=0.3 Pr(δ=0)=0.3 × Pr(δ=0)=0.3 × Pr(δ=0)=0.3 × Pr(δ=0)=0.4 4 Pr(δ=0)=0.4 11 Pr(δ=0)=0.4 13 Pr(δ=0)=0.5 2 Pr(δ=0)=0.5 7 Pr(δ=0)=0.5 7 p=0.5 Pr(δ=0)=0.5 × Pr(δ=0)=0.5 × Pr(δ=0)=0.5 ×
Pr(δ=0)=0.6 3 Pr(δ=0)=0.6 8 Pr(δ=0)=0.6 10 Pr(δ=0)=0.7 2 Pr(δ=0)=0.7 5 Pr(δ=0)=0.7 6
Table A-2: Relationship between C∼Unif[0,k] and Pr(δ = 0) for Weibull Distributions Weibull(1,3) k Weibull(2,10) k Weibull(4,15) k
p=0.3 Pr(δ=0)=0.3 × Pr(δ=0)=0.3 × Pr(δ=0)=0.3 × Pr(δ=0)=0.4 10 Pr(δ=0)=0.4 33 Pr(δ=0)=0.4 58 Pr(δ=0)=0.5 6 Pr(δ=0)=0.5 22 Pr(δ=0)=0.5 36 p=0.5 Pr(δ=0)=0.5 × Pr(δ=0)=0.5 × Pr(δ=0)=0.5 ×
Pr(δ=0)=0.6 8 Pr(δ=0)=0.6 18 Pr(δ=0)=0.6 45 Pr(δ=0)=0.7 4 Pr(δ=0)=0.7 17 Pr(δ=0)=0.7 26
Table A-3: Relationship between C∼Unif[0,k] and Pr(δ = 0) for Log-normal Distributions log-normal(1,1) k log-normal(2,0.4) k log-normal(2.5,0.25) k
p=0.3 Pr(δ=0)=0.3 × Pr(δ=0)=0.3 × Pr(δ=0)=0.3 ×
Pr(δ=0)=0.4 14 Pr(δ=0)=0.4 33 Pr(δ=0)=0.4 54 Pr(δ=0)=0.5 8 Pr(δ=0)=0.5 19 Pr(δ=0)=0.5 33
p=0.5 Pr(δ=0)=0.5 × Pr(δ=0)=0.5 × Pr(δ=0)=0.5 ×
Pr(δ=0)=0.6 12 Pr(δ=0)=0.6 25 Pr(δ=0)=0.6 40 Pr(δ=0)=0.7 8 Pr(δ=0)=0.7 15 Pr(δ=0)=0.7 25
Table A-4: Maximized likelihood estimators for Gamma distributions with p = 0.3 and Pr(δ = 0) = 0.3
Gamma(1,1) Gamma(4,1.5) Gamma(6,2)
bias sd bias sd bias sd
n=100 α 0.032 0.159 0.129 0.706 0.252 0.086 β 0.069 0.285 0.079 0.345 0.082 0.344 θ 0.014 0.147 0.003 0.165 0.030 0.163 p 0.001 0.044 0.005 0.049 0.005 0.048 n=300 α 0.001 0.074 0.042 0.359 0.061 0.512 β 0.014 0.134 0.022 0.175 0.021 0.206 θ 0.003 0.091 0.004 0.093 0.012 0.102 p 0.002 0.027 0.003 0.028 0.005 0.031
Table A-5: Maximized likelihood estimators for Weibull distributions with p = 0.3 and Pr(δ = 0) = 0.3
Weibull(1,3) Weibull(2,10) Weibull(4,15)
bias sd bias sd bias sd
n=100 k 0.008 0.107 0.028 0.197 0.079 0.461 λ 0.028 0.483 0.051 0.746 0.068 0.683 θ 0.013 0.141 0.012 0.154 0.034 0.190 p 0.007 0.043 0.007 0.046 0.005 0.053 n=300 k 0.001 0.057 0.024 0.102 0.049 0.240 λ 0.016 0.287 0.039 0.503 0.048 0.383 θ 0.007 0.095 0.021 0.101 0.008 0.096 p 0.001 0.028 0.005 0.030 0.001 0.028
Table A-6: Maximized likelihood estimators for Log-normal distributions with p = 0.3 and Pr(δ = 0) = 0.3
log-normal(1,1) log-normal(2,0.4) log-normal(2.5,0.25)
bias sd bias sd bias sd
n=100 µ 0.007 0.110 0.005 0.058 0.001 0.037 σ2 0.014 0.094 0.010 0.031 0.001 0.021 θ 0.010 0.166 0.039 0.173 0.028 0.178 p 0.001 0.048 0.007 0.050 0.004 0.005 n=300 µ 0.008 0.091 0.001 0.033 0.001 0.023 σ2 0.005 0.057 0.005 0.020 0.001 0.015 θ 0.035 0.101 0.007 0.094 0.001 0.091 p 0.009 0.029 0.001 0.028 0.001 0.027
Table A-7: Maximized likelihood estimators for Gamma distributions with p = 0.3 and Pr(δ = 0) = 0.4
Gamma(1,1) Gamma(4,1.5) Gamma(6,2)
bias sd bias sd bias sd
n=100 α 0.036 0.175 0.154 0.768 0.221 1.183 β 0.095 0.423 0.071 0.384 0.090 0.480 θ 0.041 0.260 0.035 0.201 0.018 0.194 p 0.003 0.067 0.005 0.057 <0.001 0.055 n=300 α 0.001 0.091 0.093 0.426 0.117 0.677 β 0.022 0.237 0.046 0.215 0.051 0.275 θ 0.016 0.139 0.005 0.107 0.005 0.112 p 0.002 0.039 <0.001 0.032 0.003 0.021
Table A-8: Maximized likelihood estimators for Weibull distributions with p = 0.3 and Pr(δ = 0) = 0.4
Weibull(1,3) Weibull(2,10) Weibull(4,15)
bias sd bias sd bias sd
n=100 k 0.016 0.131 0.049 0.232 2.979 0.134 λ 51.455 1111.197 0.060 1.314 85.573 2163.867 θ 0.535 7.980 0.030 0.226 0.375 6.338 p 0.016 0.085 0.002 0.058 0.008 0.081 n=300 k 0.001 0.076 0.003 0.127 0.032 0.243 λ 0.112 0.698 0.001 0.623 0.053 0.377 θ 0.024 0.169 0.007 0.113 0.002 0.110 p 0.003 0.047 <0.001 0.033 0.001 0.033
Table A-9: Maximized likelihood estimators for Log-normal distributions with p = 0.3 and Pr(δ = 0) = 0.4
log-normal(1,1) log-normal(2,0.4) log-normal(2.5,0.25)
bias sd bias sd bias sd
n=100 µ 0.035 0.289 0.003 0.063 0.002 0.038 σ2 0.001 0.149 0.007 0.041 0.003 0.026
θ 0.067 0.354 0.022 0.182 0.007 0.179
p 0.007 0.073 0.002 0.052 0.003 0.053
n=300 µ 0.010 0.152 <0.001 0.037 <0.001 0.023 σ2 0.004 0.085 0.002 0.025 0.001 0.014
θ 0.013 0.144 0.001 0.101 0.001 0.114
p 0.001 0.042 0.002 0.030 0.001 0.032
Table A-10: NPMLE of θ and p for Gamma distributions with p = 0.3 and Pr(δ = 0) = 0.3.
Gamma(1,1) Gamma(4,1.5) Gamma(6,2)
bias sd bias sd bias sd
n=100 θ 0.158 0.172 0.131 0.143 0.131 0.139 p 0.040 0.043 0.034 0.036 0.034 0.036 n=300 θ 0.130 0.142 0.125 0.138 0.12 0.129 p 0.034 0.036 0.033 0.031 0.032 0.033
Table A-11: NPMLE of θ and p for Weibull distributions with p = 0.3 and Pr(δ = 0) = 0.3.
Weibull(1,3) Weibull(2,10) Weibull(4,15)
bias sd bias sd bias sd
n=100 θ 0.130 0.142 0.048 0.249 0.026 0.266 p 0.034 0.036 0.004 0.083 0.004 0.090 n=300 θ 0.124 0.134 0.039 0.250 0.033 0.258 p 0.033 0.034 0.001 0.085 0.001 0.087
Table A-12: NPMLE of θ and p for Log-normal distributions with p = 0.3 and Pr(δ = 0) = 0.3.
log-normal(1,1) log-normal(2,0.4) log-normal(2.5,0.25)
bias sd bias sd bias sd
n=100 θ 0.137 0.309 0.141 0.289 0.056 0.246 p 0.035 0.037 0.036 0.037 0.007 0.082 n=300 θ 0.136 0.297 0.139 0.279 0.059 0.239 p 0.035 0.037 0.035 0.037 0.008 0.080
Table A-13: NPMLE of θ and p for Gamma distributions with p = 0.3 and Pr(δ = 0) = 0.4.
Gamma(1,1) Gamma(4,1.5) Gamma(6,2)
bias sd bias sd bias sd
n=100 θ 0.090 0.337 0.148 0.389 0.096 0.327 p 0.017 0.073 0.036 0.041 0.019 0.071 n=300 θ 0.092 0.331 0.147 0.381 0.097 0.321 p 0.018 0.072 0.036 0.040 0.020 0.070
Table A-14: NPMLE of θ and p for Weibull distributions with p = 0.3 and Pr(δ = 0) = 0.4.
Weibull(1,3) Weibull(2,10) Weibull(4,15)
bias sd bias sd bias sd
n=100 θ 0.143 0.375 0.079 0.231 0.146 0.405 p 0.035 0.040 0.015 0.075 0.036 0.041 n=300 θ 0.144 0.413 0.082 0.227 0.145 0.396 p 0.035 0.040 0.016 0.073 0.036 0.040
Table A-15: NPMLE of θ and p for Log-normal distributions with p = 0.3 and Pr(δ = 0) = 0.4.
log-normal(1,1) log-normal(2,0.4) log-normal(2.5,0.25)
bias sd bias sd bias sd
n=100 θ 0.140 0.264 0.142 0.274 0.066 0.239 p 0.036 0.039 0.036 0.038 0.010 0.079 n=300 θ 0.138 0.259 0.141 0.267 0.069 0.234 p 0.035 0.039 0.036 0.037 0.012 0.077