Date Generation with Covariates

Chapter 5 Simulations

5.1 Data Generation

5.1.2 Date Generation with Covariates

Now we generate data which include a binary covariate. We assume the marginal effect follows the Cox proportional hazard model (1972), such as:

, )

( )

( _i_,₀ ^exp(^Zⁱ^' ⁱ⁾

i t S t

S = ^β (5.2) where Z is the covariate and _i S_i_,₀(t) is the baseline survival function at time t (i=1,2). The general procedure can be stated as follows. Let S(T)=U, where U ~ U(0,1), and under the Cox proportional hazard model, we have S₀(T)^exp(^Z^{′ )}^β =U. Hence it follows that

) exp(

) )) log(

( log( ₀

β Z T U

S = ′ , (5.3) which implies that

⎥⎦

⎢ ⎤

⎣

⎡

= ⁻ ′ )

) exp(

) exp( log(

0 Zβ

S U

T . (5.4) We still need to specify the form of S₀(t). For most distributions, the inverse of S₀(t) may not have an explicit expression which increases the numerical difficulty in the analysis. In our simulations, we specify the baseline survival function to be S₀(t)=exp(−t) and obtain the following explicit expression:

) exp(

) log(

β Z T U

′

= − , (5.5)

where −log(U) which follows exp(1).

The data generation procedure is summarized below.

Step (i) Generate Z from Bernoulli(0.5) for _ij i=1,2.

Step (ii) Generate failure times (T₁^*_j,T₂^*_j) for the baseline group with

Zij using the above algorithm, whereT_ij^* ~exp(1)(i=1,2).

Step (iii) Given the value of β_i, for those with Z_ij =1, we set

) exp( ^'

i ij ij

ij Z

T T

= β (i =1,2).

Step (iv) Generate {(C₁_j,C₂_j)(j=1,2,...,n)}, both of which are exponential distributed.

Finally with {(T₁_j,T₂_j,C₁_j,C₂_j,Z₁_j,Z₂_j)(j=1,...,n)} , we can create observed data )}

,..., 2 , 1 ( ) , , , , ,

{(X₁_j X₂_j δ₁_j δ₂_j Z₁_j Z₂_j j= n such that X_ij =T_ij ∧C_ij and δ_ij = I(T_ij ≤C_ij). 5.2 Simulation Results

5.2.1 Results without Covariates

In this section, we evaluate two approaches based on two-stage estimation and the construction of two-by-two tables. Two sample sizes with n=100 and n=500 are considered. The parameter α ranges form 21. to 19 which correspond to τ from 0.1 to 0.9. For each estimator, the average bias and standard deviation of α are reported based on 500 replications. To achieve the targeted censoring rates, 30 % and 60 %, we set C to _i follow U(0,5.5) and U(0,2.5), respectively for i=1,2.

Table 1.1 summarizes the results in absence of censoring. Note that for the approach of two-stage estimation, we also present the results when the first stage of estimation is performed parametrically. Recall that we assume the marginal distribution of failure time X is exp(λ =1). Hence, for the parametric two-stage procedure, we use

∑

= _n= j

j n

j j

ˆ 1

λ (in complete data x ˆ= 1

λ ) (5.6)

to plug in the second stage for estimating α . This approach yields better results (smaller bias and smaller variation) than the semi-parametric two-stage procedure. As for the approach constructed based on two-by-two tables, it produces fairly nice results despite that it makes no assumption on the marginal distributions. Specifically it is fairly unbiased and the variance is only slightly larger than the parametric two-stage procedure. Note that the variation of all the

estimators becomes larger when α increases.

Table 1.2 and 1.3 are the results in presence of external censoring. Although variation of the estimators are larger than those without external censoring, the estimators still perform well. For Table 1.1 ~ 1.3, the variation is close to zero with the increasing of sample size (from 100 to 500). Hence, we can conclude that all of the estimations satisfy the property of consistency.

5.2.2 The Method Proposed by Hsu & Prentice

In this section, we examine the performance of the estimator proposed by Hsu and Prentice (1996) under the same settings. The results based on complete data are summarized below.

Strangely, based on Table A, we found that αˆ seems to be not consistent when 3

≥2

α . To investigate what caused the problem, we checked several things. The details are summarized in the Appendix. We suspected that the problem may be attributed to the estimation instability of ψ(t₁,t₂;α) in (3.10) in some region of (t₁,t₂). The plugged-in marginal estimators are usually unstable in the tail area. Hence we trimmed the integration area from [0,∞)×[0,∞) to a bounded region. Because this modification has no theoretical justification, we only evaluate the case without censoring.

Table B below contains the results for the modified estimator which is analyzed in a bounded region. Specifically, for each margin, we trim 50 % of the tail region. Based on Table B, we can know that both of bias and standard deviation of αˆ have significant improvement.

With the sample size increases from 100 to 500, the standard deviation of αˆ have less n=500

) 10 (st.error 10

bias× ⁻² × ^-2 2

1. -2.283 (5.861)

1.5 -6.701 (7.396)

1.85714 -15.267 (8.659) 3

2. -31.961 (8.996)

3.0 -66.637 (7.981) 4.0 -138.223 (5.786)

6 .

5 -284.485 (3.156)

9.0 -606.891 (1.162) 19.0 -1602.436 (0.860)

Table A: Original version of Hsu & Prentice’s estimator with no censoring

variation. Despite of the improvement, the method still performs not as well as the previous two approaches. One possible reason is that the model-based expectation ψ(t₁,t₂;α) in (3.10) contains more nuisance parameters than the other two approaches.

n=500 n=100 α

) 10 (st.error 10

bias× ⁻² × ^-2 2

1. 0.229 (10.741) 4.398 (24.211) 1.5 -0.162 (13.234) 3.805 (31.675) 1.85714 -0.145 (16.388) 3.804 (39.416)

2. -0.489 (20.883) 4.530 (49.133) 3.0 -0.161 (28.358) 5.530 (64.276) 4.0 0.922 (39.522) 9.705 (89.874)

6 .

5 2.931 (63.757) 12.754 (149.819) 9.0 13.172 (147.715) 43.050 (351.349) 19.0 104.310 (678.372) 322.146 (1652.888)

Table B: Modified version of Hsu and Prentice’s estimator with no censoring

5.2.3 Results with Covariates

We have proposed to extend two inference approaches to a more complex situation that covariates affect the marginal distribution. Now we check the validity of the extension by simulations. Here we assume the Cox Proportional Hazard model to describe marginal heterogeneity. For the two-stage estimation approach, we only report the results that the marginal distributions are estimated non-parametrically. The parameter of α ranges from

2 .

1 to 19 and β₁ =β₂=0.8. We also evaluate the situation in presence of censoring with 30

% and 60 % censoring. To achieve the targeted censoring rates, we let C follow _i )

5 . 3

exp(λ = and exp(λ =1.5), respectively (i=1,2). Two sample sizes with n=100 and

=500

n are evaluated. For each estimator, the average bias and standard deviation are reported based on 500 replications.

Table 2.1 summaries the result with covariates in absence of censoring. Our focus is on comparing the two methods after adjustment for the effects of β₁ andβ₂. The variation of the two approaches is close. However, the two-by-two table approach seems to produce less biased estimates. Table 2.2 and 2.3 are the results in presence of right censoring. The estimators of α have larger variation but still perform well. All of the estimators are consistent when the sample size increases. In the simulations not reported here, we find that the estimators of α become invalid if the marginal heterogeneity is ignored.

Chapter 6 Conclusion

In the thesis, we review three inference approaches for estimating the association parameter for copula models. The existing methods are originally developed for analyzing homogeneous data. Here we extend these methods to account for marginal heterogeneity explained by covariates.

The two-stage estimation procedure proposed by Shih and Louis (1995) is easy to implement but not applicable under more complicated data structures such as semi-competing risks data that involves dependent censoring. The proposed approach based on two-by-two tables is motivated by the Log-Rank statistics. In comparison, it is a simple procedure from both aspects of analytic derivations and computation. It also has nice performance in simulations. Since this approach only utilizes some moment conditions, it can be easily modified for different data structures. The estimator of Hsu and Prentice (1996) has poor performance in our simulations. If our numerical algorithm is correct, the poor performance may be caused by the plugged-in estimators of the nuisance functions.

The proposed method and the method by Hsu and Prentice (1996) are both moment-based procedures but their performances are very different. We have found that, for AC models, the odds ratio of the two-by-two table provides a better descriptive measure for the association. In contrast, the covariance function of martingale residuals proposed by Hsu and Prentice is much less natural. That is why it produces an estimating function that involves many nuisance parameters.

Table 1.1 Comparison of two approaches without external censoring.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-Stage

parametric semi-parametric Two-by-Two Table

Parametric semi-parametric Two-by-Two Table 2

1. -0.213 (8.114) 0.669 (8.382) -0.026 (8.275) 0.968 (14.767) 3.8 (16.271) 0.383 (15.232) 1.5 -0.191 (10.047) 1.179 (10.531) 0.065 (10.536) 1.464 (19.040) 6.054 (20.966) 0.28 (20.749) 1.85714 -0.252 (12.514) 1.192 (13.326) 0.031 (13.401) 2.138 (23.801) 7.592 (26.739) -0.074 (27.063)

2. -0.393 (15.841) 0.844 (17.039) -0.042 (17.170) 2.793 (30.204) 7.993 (33.793) -0.028 (34.737)

3.0 -0.578 (20.526) 0.018 (22.080) -0.208 (22.244) 3.637 (39.343) 7.128 (43.565) -1.11 (45.182) 4.0 -0.882 (27.588) -1.561 (29.449) -0.306 (29.807) 4.775 (53.258) 4.507 (57.379) -2.252 (60.220)

6 .

5 -1.438 (39.439) -5.398 (41.709) -0.683 (42.362) 6.37 (76.751) -2.423 (81.810) -0.963 (85.493)

9.0 -2.532 (63.241) -16.53 (65.670) -1.721 (66.599) 9.213 (123.879) -29.453 (128.361) 0.159 (136.928) 19.0 -5.370 (134.733) -72.600 (138.037) -2.178 (142.870) 16.84 (264.620) -173.33 (271.930) 9.597 (303.381)

Table 1.2 Comparison of two approaches with censoring rate 0.3.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-Stage

parametric semi-parametric Two-by-Two Table

Parametric semi-parametric Two-by-Two Table 2

1. 0.073 (9.088) 0.566 (9.182) 0.142 (9.068) 2.061 (16.984) 4.258 (18.502) 1.702 (17.562) 1.5 0.234 (11.437) 1.024 (11.641) 0.34 (11.635) 2.747 (22.384) 5.737 (23.998) 1.252 (23.918) 1.85714 0.19 (14.198) 0.965 (14.721) 0.334 (14.909) 3.487 (27.112) 6.941 (29.197) 1.423 (30.365)

2. 0.024 (17.916) 0.395 (18.709) 0.244 (19.105) 4.139 (34.322) 7.02 (36.550) 1.254 (39.061) 3.0 -0.314 (22.888) -0.743 (24.042) 0.255 (24.793) 4.853 (44.807) 5.003 (47.158) 0.051 (51.647) 4.0 -1.224 (30.484) -3.377 (31.640) 0.078 (33.127) 3.853 (60.379) -2.213 (61.451) 0.297 (69.924)

6 .

5 -2.872 (43.457) -9.567 (44.718) 0.114 (47.279) 0.406 (86.207) -21.197 (83.557) 5.469 (99.999) 9.0 -9.223 (68.992) -31.33 (69.585) 0.634 (74.918) -21.92 (141.180) -88.543 (128.043) 16.75 (165.275) 19.0 -70.26 (158.490) -189.043 (152.155) 3.405 (161.384) -212.22 (348.587) -481.44 (260.105) 33.59 (369.187)

Table 1.3 Comparison of two approaches with censoring rate 0.6.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-Stage

parametric semi-parametric Two-by-Two Table

parametric semi-parametric

Two-by-Two Table

1. 2 0.44 (10.714) 0.584 (10.759) 0.53 (10.736) 4.018 (21.149) 5.067 (22.596) 0.846 (21.655) 1.5 0.323 (13.202) 0.703 (13.357) 0.442 (13.401) 4.091 (27.924) 6.032 (29.777) 0.639 (29.280) 1.85714 0.317 (16.350) 0.685 (16.582) 0.409 (16.822) 5.072 (34.088) 7.577 (36.085) 1.563 (36.934) 2. 3 0.199 (20.272) 0.409 (20.862) 0.416 (21.371) 6.378 (42.449) 8.277 (43.924) 0.798 (46.024) 3.0 0.073 (26.114) -0.205 (27.040) 0.577 (28.097) 7.54 (54.756) 7.428 (55.933) 0.861 (60.081) 4.0 -0.299 (34.732) -1.645 (36.026) 0.928 (37.965) 7.942 (72.487) 2.874 (73.717) 2.214 (83.575)

6 .

5 -1.692 (49.035) -5.835 (50.423) 1.059 (53.934) 6.253 (104.281) -10.142 (103.610) 7.152 (119.437) 9.0 -7.103 (78.580) -19.496 (79.399) 2.336 (86.260) -6.881 (166.990) -53.666 (159.488) 25.38 (196.765) 19.0 -54.14 (175.675) -124.01 (168.489) 6.457 (184.836) -165.24 (373.784) -377.02 (315.480) 32.51 (453.817)

Table 2.1 Comparison of two approaches under marginal heterogeneity without external censoring.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-by-Two Two-Stage Two-by-Two

α β₁ = β₂

αˆ ¹

βˆ β^ˆ₂

αˆ ¹

βˆ β^ˆ₂ 2

1. -0.501 (6.391) 0.059 (6.288) -0.385 (9.188) -0.188 (9.640) -0.694 (14.900) -0.396 (14.132) 0.022 (21.557) -0.758 (21.390) 1.5 -1.983 (7.925) -1.230 (7.934) -0.257 (9.150) -0.345 (9.759) -3.612 (17.381) -2.002 (17.066) -0.526 (20.297) 1.460 (21.300) 1.85714 -3.055 (9.437) -1.924 (9.747) 0.729 (9.110) 0.194 (9.501) -7.404 (21.619) -5.029 (21.922) 1.015 (21.017) 0.766 (22.258)

2. -4.229 (11.624) -2.016 (11.763) -0.265 (8.924) 0.400 (9.970) -11.885 (25.475) -6.569 (26.423) 1.496 (21.028) 1.476 (21.540) 3.0 -5.615 (15.486) -2.360 (15.760) 0.626 (9.976) 0.494 (9.766) -15.534 (34.260) -6.149 (36.794) 1.836 (21.810) 2.830 (22.377) 4.0 -7.863 (20.590) -2.752 (21.118) 1.845 (9.404) 1.580 (9.646) -25.774 (47.262) -10.616 (50.432) 1.717 (22.160) 2.244 (23.109)

6 .

5 -14.769 (26.894) -5.651 (27.678) -0.111 (9.259) 0.027 (9.180) -42.934 (59.398) -14.262 (66.131) -0.876 (21.050) -0.829 (21.554) 9.0 -31.589 (42.410) -11.676 (43.023) 0.402 (9.888) 0.364 (9.788) -98.841 (99.761) -34.205 (105.528) 0.519 (21.985) 0.929 (21.968) 19.0

0.8

-114.648 (89.175) -41.539 (93.940) 0.494 (9.542) 0.473 (9.435) -341.087 (219.164) -127.069 (245.475) 0.420 (23.016) 0.205 (22.920)

Table 2.2 Comparison of two approaches under marginal heterogeneity with censoring rate 0.3.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-by-Two Two-Stage Two-by-Two

α β₁ = β₂ αˆ ¹

βˆ β^ˆ₂

αˆ ¹

βˆ β^ˆ₂ 2

1. -0.267 (7.589) -0.250 (8.413) -0.345 (10.011) 0.080 (10.170) 0.638 (16.577) 1.978 (18.085) -0.614 (22.271) 0.593 (23.457) 1.5 -1.858 (9.418) -1.567 (10.379) 0.083 (9.424) -0.124 (10.435) -2.140 (21.365) -1.309 (23.301) -0.466 (22.783) 1.363 (25.114) 1.85714 -1.963 (10.775) -1.007 (11.722) 0.337 (9.852) 1.079 (10.411) -4.958 (23.924) -0.703 (28.945) 0.933 (22.150) 0.658 (23.185)

2. -3.272 (14.306) -0.576 (16.869) 0.232 (10.123) 0.704 (10.714) -10.582 (31.164) -3.821 (36.515) -0.872 (23.256) -0.840 (24.870) 3.0 -6.165 (17.556) -1.408 (20.790) 0.327 (10.057) 0.704 (10.849) -21.709 (37.096) -10.014 (45.931) 1.367 (24.254) 1.187 (23.103) 4.0 -7.834 (23.841) 0.285 (27.205) -0.431 (11.057) -0.273 (10.434) -24.520 (52.442) 3.311 (69.246) 1.304 (24.693) 1.672 (24.308)

6 .

5 -17.583 (31.227) -1.378 (36.732) 0.574 (10.675) 0.352 (10.524) -57.033 (69.257) -2.362 (98.199) 1.592 (23.854) 1.033 (23.943) 9.0 -109.124 (55.087) -58.938 (66.582) 0.586 (10.134) 0.349 (10.517) -259.518 (111.386) -135.602 (152.304) 2.719 (23.804) 0.992 (24.344) 19.0

0.8

-292.511 (127.019) -97.013 (136.070) 0.397 (11.115) 0.562 (11.293) -668.705 (210.578) -183.206 (335.635) 2.082 (24.015) 2.155 (24.233)

Table 2.3 Comparison of two approaches under marginal heterogeneity with censoring rate 0.6.

) 10 (st.error 10

bias× ⁻² × ^-2

n=500 n=100 Two-Stage Two-by-Two Two-Stage Two-by-Two

α β₁ = β₂ αˆ ¹

βˆ β^ˆ₂

αˆ ¹

βˆ β^ˆ₂ 2

1. -0.022 (9.175) 0.647 (11.317) 0.480 (11.516) 0.484 (12.338) 0.535 (19.467) 5.426 (27.392) 0.403 (27.579) 1.314 (28.132) 1.5 -0.700 (11.438) 0.997 (14.396) -0.311 (11.200) 0.597 (12.072) 0.269 (27.556) 2.762 (34.610) -0.788 (25.720) 0.609 (27.642) 1.85714 -1.103 (13.657) 0.187 (16.934) -0.092 (11.824) 0.619 (11.884) -2.658 (29.413) 1.710 (39.691) 1.143 (26.587) 4.003 (29.461)

2. -2.726 (18.460) 0.039 (23.483) -0.025 (11.456) 0.230 (12.124) -4.620 (40.159) 5.159 (57.367) -0.679 (27.484) -0.800 (28.109) 3.0 -5.260 (21.654) -1.091 (27.524) -0.243 (11.857) 0.166 (11.525) -14.801 (51.198) -1.219 (74.896) 1.004 (28.668) 2.123 (27.966) 4.0 -13.133 (28.383) -5.477 (37.248) 0.005 (12.170) 0.090 (11.847) -34.456 (62.953) 2.777 (97.260) 1.595 (27.887) -0.243 (26.747)

6 .

5 -22.755 (42.830) -6.343 (56.938) -0.314 (11.934) 1.115 (13.131) -70.193 (94.419) -13.234 (133.771) 0.490 (27.494) 0.838 (26.873) 9.0 -70.929 (68.613) -24.824 (92.450) -0.423 (12.367) 0.509 (11.945) -196.595 (142.673) -53.852 (234.286) 0.169 (29.724) -1.050 (26.876) 19.0

0.8

-395.364 (166.251) -185.900 (228.931) 0.254 (12.735) 0.040 (12.080) -840.543 (281.685) -326.036 (561.663) 2.275 (29.854) 0.901 (28.127)

References

Clayton, D. G. (1978). A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrics, 65, 141-151.

Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. Wiley.

Genest, C. and Mackay, J. (1986). The Joy of Copulas: Bivariate Distributions with Uniform Marginals. The American Statistician, 40, No. 4.

Genest, C. and Rivest, L. P. (1993). Statistical Inference Procedures for Bivariate Archimedean Copulas. Journal of the American Statistical Association, 88, No.

423.

Hsu, L. and Prentice, R. L. (1996). On Assessing the Strength of Dependency Between Failure Time Variates. Biometrika, 83, 491-506.

Hsu, L. and Zhao, L. P. (1996). Assessing Familial Aggregation of Age at Onset, by Using Estimating Equations, with Application to Breast Cancer. Am. J. Hum.

Genet., 58, 1057-1071.

Nelsen, R. B. (1997). Dependence and Order in Families of Archimedean Copulas.

Journal of Multivariate Analysis, 60, 111-122.

Oakes, D. (1989). Bivariate Survival Models Induced by Frailties. Journal of the American Statistical Association, 84, 487-493.

Prentice, R. L. and Cai, J. (1992). Covariance and Survival Function Estimation Using Censored Multivariate Failure Time Data. Biometrika, 79, 495-512.

Shih, J. H. and Louis, T. A. (1995). Inference on the Association Parameter in Copula Models for Bivariate Survival Data. Biometrics, 51, 1384-1399.

Wang, W. (2003). Estimating the Association Parameter for Copula Models under Dependent Censoring. Journal of the Royal Statistical Society: Series B, 65, 257-73.

Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions. Journal of the American Statistical Association, 84, 1065-1073.

Appendix: Checking the Validity of the Method by Hsu and Prentice

Investigation #1: Is the distribution of αˆ reasonable?

Figure A.1

Finding: There seems to be a bound on αˆ .

Investigation #2: Whether the above problem is caused by the root-finding procedure?

Figure A.2 plot of α = 9

Finding: The estimating equation has a unique but wrong solution in some situation.

Investigation #3: Whether the plug-in estimators for the nuisance functions are not accurate?

Figure A.3 the marginal survival function and its estimator

Figure A.4 the cumulative hazard function and its estimator

Finding: The plugged-in estimator have reasonable performance only in some region.

在文檔中 Copula模式之下雙變元存活資料之統計推論 (頁 24-0)