• 沒有找到結果。

Conclusions and Discussions

In this thesis, we consider an innovative and cost-effective sampling design, the two-phase PDS design, for the survival data. The advantage of the PDS design proposed is that it allows for a continues variable Y and a vector of available covariates Z to be used in selecting a more informative second-phase data set. We use the simple linear model to estimate the first-phase sample data and obtain the parameter estimators and then implement the AFT model for the resulting biased sample. We conducted simulations studies under various settings and investigated the optimal design by evaluating the trace of the asymptotic variance-covariance matrix between our proposed estimator and the simple random sample estimator.

We concluded that:

(i) the PDS design enables investigators to collect more informative samples under a fixed budget;

(ii) the PDS estimator is more efficient than those from an SRS of the same sample size;

(iii) the PDS estimator is efficient than the ODS estimator, especially when the censoring rate is high;

(iv) the estimator from the cutpoint quantile (0.1, 0.9) has smaller standard error than it from (0.3, 0.7);

(v) the second-phase probability value can be consider between 80% and 95%;

(vi) the corresponding optimal ρ0 (n0/n) can be selected between 0.7 and 0.75.

For future work, other possible survival models can be taken into considera-tion, such as the proportional hazards model or the additive hazard model. It would also be interesting to explore different approaches for estimating the prob-ability (φk) using the first-phase data.

Appendix

We first provide the proof of the consistency of ˆβ. Let φ(·) denote the density function of the standard normal random variable. The convex objective functions of Un,G(β) and ˜Un,G(β), Ln,G(β) and ˜Ln,G(β), respectively, are then

By applying Lemma 2 in Johnson and Strawdermann (2009) , limn→∞supβ∈B|Ln,G(β)−

L0(β)| = 0 where L0(β) is strictly convex for β ∈ B. It can also be shown that limn→∞supβ∈B| ˜Ln,G(β) − ˜L(β)| = 0 by the strong law of large numbers for U -statistics, asymptotic convergence results on finite population sampling, and Lemma 1 in Kong, Cai and Sen (2006) . Combining these two results and by applying the triangle inequality, limn→∞supβ∈B| ˜Ln,G(β) − L0(β)| = 0. Condition (iv) ensures that L0(β) is strictly convex at β0, a unique minimizer of L0(β).

Then, the unique minimizer of ˜Ln,G(β), ˆβ, converges to β0 almost surely.

To establish the asymptotic normality of ˆβ, we first show the asymptotic normality of ˆβ, solution to Un,G(β) = 0 where Un,G(β) is the weighted version of

Then, we show the asymptotic equivalence between the distribution of√

n( ˜β −β0)

and √ (2006) to the stochastic integral representation of Un,G =Pn

i=1

the second term in (6.1) is decomposed into

Pn

These three terms are asymptotically uncorrelated. Moreover, the first term in (6.1) and these three terms are asymptotically uncorrelated. Thus, by apply-ing Lemma 3 in the supplementary materials of Kang and Cai (2009) and the multivariate central limit theorem, we have the desired asymptotic normality of

√n−1Un,G0)) whose mean is 0 and asymptotic covariance function isP

consistency of ˆβ to β0 follows from the similar arguments of showing the con-sistency of ˆβ. Using this with the arguments in Theorem 2 of Ying (1993), it can be shown that √

n( ˜β − β0) = −P−1 A (β0)√

n−1Un,G0) + op(1 +√ n k β − β˜ 0 k). Then, by incorporating the asymptotic normality of √

n−1Un,G(β0), To establish the equivalence of the distributions of√

n( ˜β −β0) and√

n( ˆβ −β0) asymptotically, it is sufficient to show that, as n → ∞,

(i) ∂ ˜Un,G(β)/∂β> converges to P

A˜(β) in probability uniformly in β ∈ B, (ii) √

n−1( ˜Un,G(β) − Un,G(β)) converges to P

A˜(β) in probability uniformly in β ∈ B.

To show (i), we decompose ∂ ˜Un,G(β)/∂β>−P

A˜(β) into (∂ ˜Un,G(β)/∂β>−∂Un,G(β)/∂β>)+

(∂Un,G(β)/∂β>−P

A˜(β)). The second term converges to 0 in probability uni-formly in β ∈ B by Lemma 3 in Johnson and Strawdermann (2009). The first term cna also be shown to converge 0 in probability uniformly in β ∈ B by applying the strong law of large numbers for U -statistics, Lemma 1 in Kong, Cai and Sen (2006), and the asymptotic convergence results on finite sampling.

Combining these two and by applying the triangle inequality, we have the desired result.

Note that, for u ∈ R, |u(Φ(√

nu) − I(u ≥ 0))| = sign(u)(uΦ(√

n−1|u|)) where sign(u) = 2I(u ≥ 0)−1. Since Φ(−u) ≤ (√

2πu)−1exp(−u2/2), limn→∞supu∈R|u(Φ(√ nu)−

I(u ≥ 0))| = 0. Then, (ii) follows from this result and by applying the strong law of large number for U -statistics, Lemma 1 in Kong, Cai and Sen (2006), and the asymptotic convergence results on finite sampling.

Bibliography

[1] Brown, B. M., and Wang, Y.-G. Standard errors and covariance ma-trices for smoothed rank estimators. Biometrika 92, 1 (2005), 149–158.

[2] Brown, B. M., and Wang, Y.-G. Induced smoothing for rank regression with censored survival times. Statistics in Medicine 26, 4 (2007), 828–836.

[3] Chatterjee, N., Chen, Y.-H., and Breslow, N. E. A pseudo-score estimator for regression problems with two-phase sampling. Journal of the American Statistical Association 98, 461(Mar.,2003) (2003), 158–168.

[4] Chiou, H., Kang, S., and Yan, J. Fitting accelerated failure time models in routine survival analysis with r package aftgee. Journal of Statistical Software 61, 11 (2014), 1–23.

[5] Chiou, H., Kang, S., and Yan, J. Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. Journal of the American Statistical Association 110 (2015), 621–629.

[6] Ding, J., Zhou, H., Liu, Y., Cai, J., and Longnecker, M. p. Es-timating effect of environmental contaminants on women’s subfecundity for the moba study data with an outcome-dependent sampling scheme. Bio-statistics 15, 4 (2014), 636–650.

[7] Gehan, E. A. A generalized wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 1/2 (1965), 203–233.

[8] Hajek, J. Limit theorems for a simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5 (1960), 361–374.

[9] JIN, Z., LIN, D. Y., and YING, Z. Rank regression analysis of multivari-ate failure time data based on marginal linear models. Journal of Statistic 33 (2006), 1–23.

[10] Johnson, L. M., and Strawderman, R. L. Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96 (2009), 577–590.

[11] Kang, S., and Cai, J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96 (2009), 887–901.

[12] Kong, L., Cai, J., and Sen, P. K. Asymptotic results for fitting semi-parametric transformation models to failure time data from case-cohort stud-ies. Statistica Sinica 16 (2009), 135–151.

[13] PRENTICE, R. L. Linear rank tests with right censored data. Biometrika 65, 1 (1978), 167–180.

[14] Serfling, R. J. Approximation theorems of mathematical statistics. Statis-tics,(Vol. 162) (2009), John Wiley & Sons.

[15] Song, R., Zhou, H., and Kosorok, M. R. On semiparametric effi-cient inference for two-stage outcome dependent sampling with a continuous outcome. Biometrika 96, 1 (2009), 221–228.

[16] Weaver, M. A. Semiparametric methods for continuous outcome regres-sion models with covariate data from an outcome dependent subsample.

[17] Ying, Z. A large sample study of rank estimation for censored regression data. The Annals of Statistics 21 (1993), 76–99.

[18] Yu, J., Liu, Y., Sandler, D. P., and Zhou, H. Statistical inference for the additive hazards model under outcome-dependent sampling. Statistical Society of Canada 43, 3 (2001), 436–453.

[19] Yu, J., Liu, Y., Sandler, D. P., and Zhou, H. Statistical inference for the additive hazards model under outcome-dependent sampling. The Canadian Journal of Statistics 43 (2015), 436–453.

[20] Zhou, H., Song, R., Wu, Y., and Qin, J. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome.

Biometrics 67, 1 (2011), 194–202.

[21] Zhou, H., and Weaver, M. A. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling.

Journal of the American Statistical Association 100, 470 (2005), 459–469.

[22] Zhou, H., Weaver, M. A., Qin, J., Longnecker, M., and Wang, M. C. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58, 2 (2002), 413–421.

[23] Zhou, H., Xu, W., Zeng, D., and Cai, J. Semiparametric inference for data with a continuous outcome from a two-phase probability-dependent sampling scheme. Royal Statistical Society Soc.B 76 (2014), 197–215.

[24] Zhu, H., and Wang, M.-C. Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing. Biometrika 101, 3 (2014), 519–533.

相關文件