經過兩階段選取的存活資料的統計推論

全文

(1)國立臺灣師範大學數學系碩士班碩士論文. 指導教授：呂翠珊教授. 經過兩階段選取的存活資料的統計推論 Statistical inference for failure time data from a two-phase probability-dependent sampling scheme. 研究生：陳世驊. 中華民國一零九年七月.

(2) 中文摘要對於增進估計的效益，多階段取樣實驗是其中一種方式。本研究考慮對於存活數據資料採用兩階段機率依賴採樣設計，其中第一階段為簡單隨機採樣，第二階段為機率依賴採樣。此方法目的為在有限的經費或是資源中選取更有資訊的樣本。模擬研究的估計結果會與相同樣本數的簡單隨機採樣以及結果依賴採樣的估計做比較。模擬結果顯示，兩階段機率依賴採樣的設計方法的估計結果比其他兩種估計方法具有更好的性質。此外我們也發展出在固定樣本數的情況下，兩階段機率依賴採樣的最佳設計。最後，我們使用研究的結果去分析巴瑟爾頓健康研究。關鍵字：兩階段機率依賴採樣設計；存活資料；加速失效模型；最佳設計. ii.

(3) Abstract It has been shown that multiphased designs is one of the approaches to enhance study efficiency. In this thesis, we consider a two-phase probability dependent sampling scheme for failure time data. Where one selects a simple random sample at the first phase and targets more informative subjects based on a certain probability at the second phase. Simulation studies show that the proposed estimator performed the two competitive estimators, one from a simple random sample of the same sample size and the other from the outcome-dependent sampling design. We also develop the optimal allocation of the subsamples for the two-phase probability dependent sampling scheme under the fixed sample size. We then apply our proposed design and estimator to the Busseltion Health Study.. keywords: two-phase probability dependent sampling; failure time data; accelerated failure time model; optimal design. iii.

(4) 致謝時光匆匆，說長不長說短不短的碩士生涯即將畫下了句點。這段時間有許多需要感謝的貴人。首先，衷心感謝我的指導老師呂翠珊老師，入學時期對未來論文的目標並不明確，在老師的指導以及建議之下才踏入了生物統計的領域，進而發展了研究目標，一步步地完成論文。從迴歸分析、存活分析到最後的類別資料分析，包刮了程式的撰寫應用以及報表的寫法，都是非常大的收穫。最後再撰寫論文時也獲得老師專業的建議，讓我的論文更加的完善，在此對老師的教導獻上最深的敬意與謝意。再者，也要感謝其他校內的統計老師，像是程毅豪老師的數理統計以及貝氏統計，老師的解說又有不一樣的體會，也接觸到了統計軟體的應用以及編寫。還有張少同老師的統計計算以及專題，讓我第一次接觸到如何用統計軟體程式編寫相關的演算法或是模擬資料的分析。讓我從以前學習的內容，真正的轉變為統計軟體分析以及應用，這時才體會到研究生與大學不同的地方。再來也要感謝百忙之中來擔任我的口試委員們，其中校內的口試委員蔡碧紋老師以及校外的口試委員王藝華老師，在口試時也給予了許多寶貴的建議，宏觀的探討給予我豐富的收穫。此外，感謝幫助我的同學，同屆的夥伴們在學習上相互扶持一起討論分析，雖然大家的領域不盡相同，但還是彼此幫助。最後，感謝我最敬愛的家人，在一路求學過程中給予我精神上的支持與鼓勵，讓我能順利完成目標，在未來會更努力來報答你們的養育之恩。在這三年能被各位貴人指導以及幫助是我的榮幸與驕傲，而我也即將踏入人生的下一段旅程，再一次的感謝大家的支持與幫助，這一段的時光會是我未來的動力，也是最深刻的記憶。陳世驊謹識於國立臺灣師範大學數學所統計組中華民國一零九年七月. iv.

(5) Contents 1 Introduction. 1. 2 Design and Statistical Inference Method. 6. 2.1. AFT model and notations . . . . . . . . . . . . . . . . . . . . . .. 6. 2.2. Probability-dependent sampling . . . . . . . . . . . . . . . . . . .. 7. 2.3. Statistical inference method . . . . . . . . . . . . . . . . . . . . .. 8. 3 Asymptotic Properties. 11. 4 Simulation Studies. 13. 4.1. Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13. 4.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14. 4.3. The optimal design . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 5 The Busselton Health Study Data Analysis. 30. 6 Conclusions and Discussions. 34. v.

(6) List of Figures 4.1. The TARE in different sample size . . . . . . . . . . . . . . . . .. 26. 5.1. The TARE of Busselton Health Study. 32. vi. . . . . . . . . . . . . . . ..

(7) List of Tables 4.1. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . .. 4.2. 17. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. 4.3. . . . . . . . . . . . . . . . . . . . . . .. 18. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . . . .. 4.4. 19. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . . . .. 4.5. 20. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . . . .. vii. 21.

(8) 4.6. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the supplemental sample include all cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.7. 22. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.1 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . .. 4.8. 23. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1), Z ∼ B(0.5) and ∼ Gumble(0, 1), and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . .. 4.9. 24. Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the fixed total sample size and various proportions of n1 , and the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 4.10 Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), fixed the total sample size and change the proportion of SRS size, the cutpoints for the design were 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . . . .. 28. 4.11 Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the fixed total sample size and various the proportions of SRS size, and the cutpoints for the design were. 5.1. 0.3 and 0.7 sample quantiles. . . . . . . . . . . . . . . . . . . . . .. 29. Result from Busselton Health Study . . . . . . . . . . . . . . . . .. 33. viii.

(9) Chapter 1 Introduction In many epidemiological studies that relate the disease outcome to individual exposures and other characteristics play a key role in understanding the determinants of disease in humans. Accordingly, much of the cost is spent on acquiring measurements of the main exposure variable. Large cohort studies could cost hundreds of millions of dollars to conduct so that cost-efficient designs and procedures are always desirable for research investigators.. Outcome-dependent sampling design (ODS) (Zhou et al., 2002) is one of the appealing sampling schemes that enhance the efficiency and reduce the cost of a study by allowing investigators to observe the exposure with a probability that depends on the outcome. The ODS design assumes that the values of the response are known for all subjects, but the exposure variable may be expensive or difficult to assess. The ODS design is more efficient than simple random sampling since the outcome lie in its two distributional tails, the exposure values were more likely to occur at its distributional tails. The advantage of such ODS design is that, while providing overall information about the population, it allows the investigators to target sample on certain regions of the population believed to. 1.

(10) be more informative. For example, we are interested in the relationship between coronary heart disease and blood pressure. Thus, we are concerned about the abnormal blood pressure. For parameter estimation, Weaver (2001) and Zhou et al. (2002) developed a semiparametric empirical likelihood inference procedures. Weaver and Zhou (2005) proposed a maximum estimated likelihood approach. Chatterjee et al. (2003), Song et al. (2009), and Zhou et al. (2011b) developed inferential methodologies for the two-phase ODS design. Assume that the domain of the continuous outcome, which is denoted by Y , can be partitioned into three mutually exclusive intervals:. (−∞, yL ] ∪ (yL , yU ] ∪ (yU , ∞), where yL and yU are. the known cutpoints, for example, the quantile of Y . Therefore an ODS sample sample is composed of three samples: (i) an overall simple random sample drawn from the underlying population; (ii) a supplemental sample select if Y < yL ; (iii) a supplemental sample select if Y > yU .. Zhou et al (2014) proposed a new two-phase design. The first-phase is drawn the simple random sample and fitted a model at the first-phase data. They can predict the probability of the exposure variable from its tails and select the supplemental sample with a probability-dependent sample scheme(PDS) at the second phase. If an investigator has enough information about interval where each individual’s exposure variable values fall, the investigator can draw supplemental samples from those intervals, for example, the upper or lower tail segments. However, the exposure variable may be expensive or difficult to assess. Therefore, they use the response and other covariates to predict the exposure variable value and estimate the probability that the predicted value lies in the distribution tails. Their proposed two-phase PDS is outlined as follows. Let. 2.

(11) Y denote the response variable, X be the primary exposure variable and Z be the collection of all other covariates. In the first phase of the PDS, a simple random sample is drawn from the underlying population so (X, Y, Z) are observed. The domain of X is partitioned into three mutually exclusive intervals: (−∞, xL ] ∪ (xL , xU ] ∪ (xU , ∞), where xL and xU are fixed and known cutpoints. ˜ conditional on Y = y, Z = z, being in (−∞, XL ] and They predict a new X, ˆ X ˜ < xL |Y, Z) and φˆ3 (y, z) = Pr( ˆ X ˜ > xU |Y, Z). The (XU , ∞) by φˆ1 (y, z) = Pr( conditional probability model, for example, linear regression or logistic regression ˜ is calculated. In model, can be fitted for the first phase data and the predicted X the second phase, the supplemental samples are drawn from those who are likely to have high or low X values is considered that has more information about exposure variable. For example, the supplemental sample can be selected from those having φˆ1 (y, z) > 80% or φˆ3 (y, z) > 80%. The resulting observed data are overrepresented by individuals who are more likely to be on the distribution tails of X. The ascertainment of X is made at the second phase of the design, where a subample is drawn randomly without replacement, from the first phase cohort. The differences between the ODS design and the PDS design are that (a) the second phase of the ODS design is dependent on the outcome Y but not Z. The PDS design allows Z in the decision making of the second-phase draw; (b) the two-phase PDS design not only allows for a continuous Y but also allows for any dimension Z in the decision making of the second-phase draw. For failure time data, Ding et al. (2014) proposed an ODS scheme under the framework of the proportional hazards model. They developed an semiparametric empirical likelihood estimator. Similarly, Yu et al. (2015) proposed another survival model, the additive hazards model, and developed weighted pseudo-score. 3.

(12) estimator for the regression parameters for the proposed design and the optimal design on the fixed total sample size. However the models above were established for the ODS data. The statistical procedure and inference for time-to-event data under the PDS design are undeveloped.. In this paper we consider a two-phase PDS design, use simple linear regression model to estimate the conditional probability for the simple random sample drawn from the underlying cohort at the first phase. We then draw the supplemental samples at the second phase not only depending the estimate probability φˆk but also on those who failed. We fit the data obtained from the above two-phase PDS scheme under the accelerate failure time model(AFT) model. The AFT model is appealing because it is analogous to the classical linear regression approach, directly linking the expected failure time to covariates. Although the AFT model has nice properties, it is not as popular as Cox’s proportional hazards model and the additive hazards model in practice due to the lack of efficient and reliable computational methods. Luckily, we can now use the R package aftgee (Chiou, Kang, and Yan , 2014c) to analyze survival data under the AFT model especially that the calculation of the variance estimator is no longer time-consuming (Chiou, Kang, and Yan , 2014c). This package provides a set of comprehensive tools for semiparametric AFT model in practical survival analysis.. The remainder of this thesis is organized as follows. In Section 2, we introduce the notations, model formulation for the AFT model and the data structure for the failure time under the two-phase PDS design. The asymptotic properties of the proposed estimator are established in Section 3. In Section 4, we conduct simulation studies to compare its efficiency with other competing designs and their corresponding estimators and develop a formula for calculating the optimal. 4.

(13) allocation of subsamples. In Section 5, we apply our proposed method to analyze the Busselton Health Study data. We will conclude and discuss in Section 6. The proof for theoretical results is outlined in the Appendix.. 5.

(14) Chapter 2 Design and Statistical Inference Method 2.1. AFT model and notations Let T˜ and C denote the failure time and the corresponding censoring time.. Due to right censoring, we only observe T = min(T˜, C) and δ = I(T˜ ≤ C), where I(·) is an indicator function. Let X be the exposure of main interest and Z be any continuous or categorical covariates which might be easy or cheap to measure (e.g., age, sex, smoking, etc), respectively. Given the covariates X and Z, T˜ and C are assumed to be independent. We consider the following accelerated failure time(AFT) model : log(T˜) = β0 + β1 X + β2 Z + ,. (2.1). where β0 , β1 and β2 are unknown regression parameters to be estimated, and is the random error with an unspecified distribution function F and the corresponding density function f .. 6.

(15) 2.2. Probability-dependent sampling Suppose that there are N subjects in the underlying population. Let xU. and xL (xL < xU ) be known constants that partition the domain of X into three mutually exclusive intervals: A1 ∪ A2 ∪ A3 = (−∞, xL ] ∪ (xL , xU ] ∪ (xU , ∞). The proposed two-phase PDS scheme is as follows: at the first phase, we observe (T˜, δ, X, Z) from simple random sample of size n0 from the underlying study population. A model of E(X| log T˜, Z) is then fitted on the basis of this SRS and φ1 (log T˜, Z) = Pr(X ∈ A1 | log T˜, Z) and φ3 (log T˜, Z) = Pr(X ∈ A3 | log T˜, Z) are estimated. In the second phase of the PDS design, we use constants, c1 and c3 , where 0 < c1 , c3 < 1 to draw two supplemental random samples, one from ˆ those whose predicted probability φˆ1 = Pr(X ∈ A1 | log T˜, Z) satisfies φˆ1 ≥ c1 and the failures (δ = 1) and the other a supplemental sample is drawn from those whose X values are more likely in the upper tail, i.e. from those with φˆ3 = ˆ Pr(X ∈ A3 | log T˜, Z) ≥ c3 and δ = 1. We then have the following data structure for the proposed two-phase PDS: (i) simple random sample: (T˜0i , δ0i , X0i , Z0i ), i = 1, ..., n0 ;. (ii) supplemental sample I: (T˜1i , δ1i , X1i , Z1i ) : Pr(X1i ∈ A1 | log T˜1i , Z1i ) ≥ c1 , δ1i = 1, i = 1, ..., n1 ;. (iii) supplemental sample II: (T˜3i , δ3i , X3i , Z3i ) : Pr(X3i ∈ A3 | log T˜3i , Z3i ) ≥ c3 , δ3i = 1, i = 1, ..., n3 .. 7.

(16) The total sample size in the PDS design is n = n0 + n1 + n3 . Without loss of generality, we consider the case where X is a continuous covariate and can be approximately normally distributed after proper transformation. Then a linear regression model can be fitted for the SRS data drawn from the first phase, X = γ0 + γ1 log T˜ + γ2 Z + , ∼ N (0, σ12 ). (2.2). Hence φk (log T˜, Z) = Pr(X ∈ Ak | log T˜, Z) can be estimated from this linear regression model, ˜ + γˆ2 Z) x − ( γ ˆ + γ ˆ log T L 0 1 φˆ1 (log T˜, Z) = Φ σˆ2. (2.3). ˜ + γˆ2 Z) x − ( γ ˆ + γ ˆ log T U 0 1 , φˆ3 (log T˜, Z) = 1 − Φ σˆ2. (2.4). . 1. and. . 1. where Φ(·) is the cumulative distribution function of the standard normal distribution and γî ’s, and σˆ1 are estimated from the linear regression model.. 2.3. Statistical inference method To simplify notation, we further define Xi = (Xi , Zi )> , β = (β1 , β2 )> and. ei (β) = log T − β > Xi , i = 1, ..., N . When the data are completely observed, the regression parameters in (2.1) can be estimated by the following rank-based estimating equations:. Un,ϕ (β) =. n X i=1. # Pn X I(e (β) ≥ e (β) j j i j=1 Xi − Pn = 0, j=1 I(ej (β ≥ ei (β)). " ϕi (β)δi. 8. (2.5).

(17) where ϕi (β) is a possibly data-dependent nonnegative weight function with values P between 0 and 1. Some common choices of ϕi (β) are 1 and n−1 nj=1 I(ej (β ≥ ei (β)), corresponding to log-rank (Prentice, 1978) and Gehan (Gehan, 1965). A computationally more efficient approach is the induced smoothing procedure of Brown and Wang (2005, 2007). The induced smoothing procedure replaces Un,ϕ (β) with EZ Un,ϕ (β + N −1/2 Γn Z) , where the expectation is taken with respect to Z. With Gehan’s weight, the resulting smooth estimating equation is. Un,G (β) =. n X n X. δi (Xi − Xj )Φ[. i=1 j=1. ej (β) − ei (β) ] = 0, rij. (2.6). 2 = (Xi −Xj )> Σn (Xi −Xj ), Σn is typically set to a p-dimensional identity where rij. matrix (Brown and Wang 2007) and Φ(·) is the standard normal cumulative distribution function. The estimating equation in Equation (2.6) is monotone and continuously differentiable with respect to β. Hence, its root can be found with standard numerical methods such as the Barzilai-Borwein spectral method. Even with the smoothed equations, the resampling procedure can still be very time consuming. For the weights in our proposed two-phase PDS design, we modify the inverse probability weight in Yu et al. (2015), the inverse probability weight from an PDS design is : ωi =. ξi (1 − δi ) ξi δi (1 − ζi ) + ρ0 ρV ρ0 ρV X πk (1 − ρ0 ρV )ζi,k ηi,k + (1 − ξi )δi , i = 1, ..., N , ρk ρV k=1,3 ξi δi ζi +. where ρ0 , ρV , ρ1 , ρ3 are the limits of n0 /n, n/N, n1 /n, n3 /n,respectively, ζi,k = I(Xi ∈ Ak ), ηi,k = 1 or 0, denoting the i-th subject from the stratum Ak is selected into supplemental failures, and πk = P r(φk (log T˜, Z) ≥ ck , δi = 1) is the. 9.

(18) limit of number of failed cases at the interval divided by the number of subjects in the second-phase design. We implemented the following weights in our PDS design: (i) nonvalidation samples are eliminated by setting ω = 0 ; (ii) the censored subjects selected have the inverse of the sampling probability, (ρ0 ρV )−1 , as their weight; (iii) the cases in the supplemental sample are weighted by πk (1 − ρ0 ρV )/(ρk ρV ); (iv) the sucohort cases drawn are weighted as 1 if they belong to Ak (k = 1, ..., K), and as (ρ0 ρV )−1 , otherwise. Then the following weighted version of equation from (2.6) can be written as: Uñ,G (β) =. n X n X. ωi δi (Xi − Xj )Φ[. i=1 j=1. ej (β) − ei (β) ] = 0, rij. (2.7). The regression coefficients β in the model can be estimated by solving above equations. We estimate these parameters with R package, aftgee (Chiou, Kang, and Yan 2014c).. 10.

(19) Chapter 3 Asymptotic Properties In this section, we provide the theorem of the consistency and asymptotic ˆ We assume the following conditions: normality of the proposed estimator β. (i) the parameter space B containing β0 is a compact set of R (ii). PK. k=1. kZi k + K is bounded almost surely by a nonrankdom constant (i =. 1, ..., N ) (iii) V ar(i ) < ∞ ˜ 0 ) = limN −→∞ ∂U (β0 )/∂β > is nonsingular (iv) the matrix A(β 0 (v) Let f0 (·) denote the marginal density associated with model error term e. 0. Then, f0 (·) and f0 (·) are bounded functions on R with Z R. 0. f0 (t) f0 (t). 2 f0 (t)dt < ∞. (vi) the marginal distribution of Ci is absolutely continuous and has a bounded density gi (·) on R for i = 1, ..., N. 11.

(20) (vii) as N −→ ∞, pm −→ p˜m (0 < p˜m < 1) and rm = nm /(Nm − n0,m ) −→ r˜m (0 < r˜m < 1) for m = 0, 1, 2 ,where pm be the SRS portion (m = 0) and supplemental portions (m = 1, 2), rm be the sampling probabilities for supplemental components, Nm and n0,m are the sizes of the full cohort and SRS sample in Am Conditions (i)-(vi) are identical to those imposed by Chiou, Kang and Yan (2015). Condition (vii) is added to ensure the desired asymptotic convergence of the PDS samples.. √ N (βˆP DS − β0 ) P P converges in distribution to normal with zero-mean and covariance −1 A (β0 )( F (β0 )+ P−1 P > A (β0 )) . S (β0 ))(. Theorem Under conditions, the βˆP DS is strong consistent and. The proof will be provided in the appendix.. 12.

(21) Chapter 4 Simulation Studies 4.1. Data generation. In this section, we conduct simulation studies to assess the finite sample properties of our proposed estimators. We consider the following AFT model, log T˜ = β0 + β1 X + β2 Z + , where the covariate X was generated from a standard normal distribution, Z was from a Bernoulli distribution with P (Z = 1) = 0.5 and followed a standard normal distribution, above leading to T˜ following to that a log-normal distribution. We set β0 = 2, β1 = 1 and β2 = 1 for data generation. The censoring time C was generated from a uniform distribution [0, c] with c chosen to depend on the desired percentage of censoring. We considerd censoring rates of approximately 60% and 80% with the corresponding c values, 19.55 and 7.18, respectively. For our PDS design, we partitioned all the cases into three strata by quantiles q1 and q3 of X. We then randomly select an SRS sample of size n0 from N = 2000. We considered two pairs of the cutpoints, (0.3, 0.7) and (0.1, 0.9) quantiles, respec-. 13.

(22) tively, to investigate the impact of different cutpoints for our PDS design. In the second phase, we set c1 = c3 = 80% or 90% and we denote βˆP DS1 for 80% and βˆP DS2 for 90%. We compare βˆP DS1 and βˆP DS2 with two competing estimators: (i) βˆSRS , from a simple random sample with the same sample size as the PDS design, and (ii) βÔDS , the estimator from the ODS design (Zhou et al., 2002). To obtained the ODS sample, we partitioned all the cases into three strata by the quanS S tiles of their failure time (B1 B2 B3 ). The cutpoint is either (0.3, 0.7) or (0.1, 0.9) quantiles, similar to the PDS design. The supplemental samples of the ODS design are drawn from T˜ ∈ Bk , k = 1, 3, and δ = 1. We estimate the estimator with R package, aftgee Results are based on 2000 independent simulation runs.. 4.2. Results. In Tables 4.1 and 4.2, we fixed the size of the simple random sample, whereas we considered balanced sizes for the two tails in Table 4.1 and unbalanced size in Table 4.2. Below is our summary. (i) βˆP DS1 , βˆP DS2 , βÔDS and βˆSRS are all consistent. (ii) The sample standard deviation is close to the estimated of standard errors based on the 2000 simulations, which is also as expected. Then we compared the proposed estimator with other estimator by the estimated of standard errors. (iii) βˆP DS1 , βˆP DS2 and βÔDS , are more efficient than βˆSRS since βˆSRS obtained the largest standard error.. 14.

(23) (iv) When the overall sample sizes increase, βˆP DS1 and βˆP DS2 are more efficient than βÔDS because standard error of βÔDS is larger than βˆP DS1 and βˆP DS2 . (v) As the sample size increases from 100 to 300, all the standard errors decrease, as expected. (vi) For βˆP DS2 and βˆP DS1 in all the designs, we found that the standard error of βˆP DS2 is smaller than that of βˆP DS1 so that βˆP DS2 is more efficient than βˆP DS1 . Due to the low number of failures, we considered the supplemental sample drawn by the proportions. In Tables 4.3, 4.4 and 4.5, we presented results for different sampling proportions balanced and unbalanced of the supplemental samples out of the PDS sample. Note that the total sample size is not fixed. Then, we compared proposed estimator with the simple random sample estimator. We summarize our findings below. (i) The standard errors of βˆP DS1 and βˆP DS2 are smaller than βˆSRS , except in some cases with censoring rate of 60%. Mostly βˆP DS1 and βˆP DS2 are more efficient than βˆSRS . (ii) The bias for the proportion (0.5, 0.8) is smaller than other proportions. In Table 4.6, we consider the situation where the supplemental samples are drawn from those failed earlier or much later, i.e., the cutpoint (0.1, 0.9). We note that (i) in the case with cutpoint (0.1,0.9) and censoring rate 80%, the bias of βˆP DS1 and βˆP DS2 are smaller than those with censoring rate 60%; (ii) the efficiency gains are higher when the cutpoint is further out i.e., (0.1, 0.9);. 15.

(24) (iii) as the simple random sample size increases, the number of failures is much smaller. Thus we can rarely draw the cases with cutpoint (0.1, 0.9) quantiles. We instead consider the cutpoint quantiles of (0.2, 0.8) when the random sample size is greater than 200. The results are similar to the aforementioned. (iv) Under some settings,such as the sample size of the SRS being 100 and the censoring rate of 60%, βˆP DS obtained slightly the larger standard errors than βˆSRS . We will have further discussion later. In Table 4.7, we consider the design with asymmetric cutpoint quantiles,(0.1, 0.7). We compared the results with Table 4.1 and summarize our findings below. (i) βˆP DS1 and βˆP DS2 are still more efficient than βˆSRS since βˆSRS obtained the largest standard error under both symmetric and asymmetric cutpoints (ii) The bias of βˆP DS with (0.1, 0.7) quantiles is smaller than βˆP DS with (0.3, 0.7) quantiles. (iii) The bias of βÔDS with (0.1, 0.7) quantiles is larger than βÔDS with (0.3, 0.7) quantiles. (iv) When the simple sample size is 300, if is hard to draw the supplemental sample for fail and the probability is 90%. Then, we instead consider cutpoint quantiles to (0.2, 0.7). Most results are similar to the cutpoint quantiles (0.1, 0.7). We also consider that followed a Gumble distribution in Table 4.8. We compared the results of Gumble distribution with the results of Normal distribution. The results are very similar to those in Table 4.1. It concludes that our proposed estimators are robust under different error distributions.. 16.

(25) Table 4.1: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. βˆSRS βÔDS βˆP DS1 βˆP DS2. Mean 1.017 0.989 1.113 1.104. β1 = 1 ESE 0.137 0.119 0.122 0.119. SSD 0.141 0.119 0.124 0.123. Mean 1.017 1.011 1.015 1.076. β2 = 1 ESE 0.249 0.239 0.250 0.243. SSD 0.248 0.240 0.223 0.235. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.026 0.981 1.097 1.096. 0.189 0.133 0.153 0.150. 0.210 0.130 0.157 0.162. 1.021 1.027 1.037 1.051. 0.337 0.259 0.306 0.304. 0.376 0.269 0.283 0.301. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.006 0.983 1.099 1.085. 0.099 0.106 0.093 0.087. 0.100 0.105 0.091 0.089. 1.003 1.011 1.027 1.077. 0.180 0.211 0.182 0.173. 0.183 0.209 0.169 0.171. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.010 0.978 1.086 1.079. 0.140 0.114 0.118 0.115. 0.147 0.113 0.119 0.117. 1.020 1.017 1.045 1.074. 0.249 0.219 0.231 0.222. 0.259 0.231 0.218 0.215. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.005 0.990 1.091 1.078. 0.083 0.100 0.078 0.073. 0.084 0.103 0.075 0.072. 1.014 1.013 1.024 1.078. 0.148 0.198 0.152 0.142. 0.152 0.198 0.140 0.138. (n0 , n1 , n3 ) Censoring (100,10,10) 0.6. (200,10,10). (300,10,10). βˆSRS βÔDS βˆP DS1 βˆP DS2 ESE, the average of the estimates deviation. 0.8. 1.004 0.116 0.122 0.983 0.106 0.105 1.083 0.102 0.099 1.068 0.098 0.097 of standard errors; SSD,. 17. 1.029 1.020 1.050 1.081 sample. 0.204 0.205 0.202 0.203 0.193 0.178 0.186 0.184 standard.

(26) Table 4.2: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. βˆSRS βÔDS βˆP DS1 βˆP DS2. Mean 1.010 1.030 1.114 1.115. β1 = 1 ESE 0.131 0.105 0.118 0.114. SSD 0.134 0.106 0.123 0.124. Mean 1.007 1.066 1.037 1.078. β2 = 1 ESE 0.237 0.220 0.243 0.234. SSD 0.241 0.221 0.226 0.227. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.022 1.036 1.147 1.140. 0.184 0.119 0.141 0.140. 0.195 0.120 0.148 0.155. 1.030 1.076 1.018 1.049. 0.329 0.243 0.284 0.280. 0.345 0.256 0.251 0.260. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.007 1.027 1.116 1.094. 0.097 0.093 0.089 0.084. 0.098 0.096 0.091 0.087. 1.009 1.062 1.035 1.083. 0.175 0.194 0.178 0.168. 0.180 0.195 0.161 0.163. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.014 1.033 1.137 1.120. 0.137 0.102 0.111 0.109. 0.149 0.100 0.114 0.111. 1.025 1.086 1.052 1.070. 0.242 0.207 0.218 0.211. 0.247 0.211 0.194 0.194. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.003 1.024 1.101 1.103. 0.081 0.088 0.075 0.070. 0.080 0.089 0.076 0.070. 1.006 1.056 1.031 1.074. 0.146 0.182 0.147 0.139. 0.149 0.185 0.138 0.134. (n0 , n1 , n3 ) Censoring (100,20,10) 0.6. (200,20,10). (300,20,10). βˆSRS βÔDS βˆP DS1 βˆP DS2 ESE, the average of the estimates deviation. 0.8. 1.011 0.115 0.122 1.033 0.093 0.093 1.123 0.095 0.093 1.124 0.090 0.090 of standard errors; SSD,. 18. 1.016 1.082 1.047 1.064 sample. 0.200 0.212 0.190 0.195 0.183 0.166 0.173 0.157 standard.

(27) Table 4.3: Results are based on the model log T = β1 X +β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n1 + n3 35. Proportions (0.5,0.5). βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. Mean 1.018 1.107 1.012 1.111. β1 = 1 ESE 0.130 0.126 0.136 0.124. SSD 0.132 0.129 0.140 0.128. Mean 1.014 1.037 1.015 1.075. β2 = 1 ESE 0.234 0.252 0.247 0.246. SSD 0.238 0.230 0.251 0.227. 0.8. βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. 1.016 1.094 1.026 1.101. 0.183 0.160 0.191 0.163. 0.204 0.178 0.206 0.181. 1.027 1.027 1.023 1.055. 0.332 0.311 0.342 0.317. 0.360 0.279 0.381 0.298. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.008 1.107 1.012 1.115. 0.121 0.125 0.130 0.121. 0.121 0.127 0.132 0.127. 1.002 1.027 1.002 1.069. 0.219 0.252 0.233 0.243. 0.226 0.224 0.233 0.228. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.031 1.089 1.014 1.089. 0.176 0.156 0.178 0.156. 0.194 0.170 0.192 0.169. 1.033 1.030 1.043 1.052. 0.316 0.305 0.324 0.305. 0.341 0.279 0.361 0.284. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.013 1.138 1.020 1.125. 0.124 0.124 0.132 0.122. 0.127 0.132 0.134 0.129. 1.000 1.029 1.010 1.079. 0.224 0.249 0.238 0.244. 0.224 0.217 0.245 0.233. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.019 1.127 1.029 1.142. 0.180 0.156 0.187 0.160. 0.190 0.168 0.210 0.174. 1.021 1.027 1.043 1.075. 0.323 0.307 0.336 0.312. 0.369 0.283 0.390 0.290. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.009 1.093 1.012 1.084. 0.127 0.126 0.134 0.124. 0.130 0.132 0.138 0.128. 1.008 1.026 1.013 1.067. 0.229 0.250 0.243 0.246. 0.232 0.234 0.247 0.232. Censoring 0.6. 21 28 21 56. (0.8,0.8). 35 44 34 47. (0.8,0.5). 30 35 28 42. (0.5,0.8). 25. βˆSRS 1.020 0.182 ˆ βP DS1 1.062 0.158 27 βˆSRS 1.017 0.187 βˆP DS2 1.068 0.160 ESE, the average of the estimates of standard errors; SSD, deviation. Simple random size is 100. 36. 0.8. 19. 0.205 1.025 0.176 1.014 0.210 1.032 0.177 1.054 sample standard. 0.325 0.361 0.309 0.286 0.334 0.362 0.310 0.300.

(28) Table 4.4: Results are based on the model log T = β1 X +β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n1 + n3 25. Proportions (0.5,0.5). βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. Mean 1.004 1.104 1.012 1.101. β1 = 1 ESE 0.099 0.094 0.101 0.092. SSD 0.099 0.096 0.102 0.093. Mean 1.002 1.028 1.017 1.076. β2 = 1 ESE 0.178 0.182 0.183 0.177. SSD 0.177 0.164 0.181 0.168. 0.8. βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. 1.012 1.096 1.010 1.089. 0.142 0.127 0.142 0.127. 0.147 0.127 0.152 0.126. 1.025 1.048 1.024 1.080. 0.249 0.238 0.251 0.235. 0.258 0.213 0.262 0.222. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.007 1.109 1.003 1.104. 0.096 0.093 0.099 0.090. 0.097 0.096 0.101 0.094. 1.003 1.031 1.014 1.077. 0.174 0.180 0.178 0.175. 0.172 0.162 0.176 0.162. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.014 1.101 1.008 1.084. 0.139 0.123 0.141 0.122. 0.150 0.129 0.151 0.124. 1.009 1.049 1.017 1.081. 0.245 0.234 0.249 0.231. 0.251 0.209 0.258 0.215. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.005 1.132 1.004 1.107. 0.097 0.092 0.100 0.090. 0.097 0.097 0.102 0.096. 1.002 1.034 1.007 1.077. 0.174 0.181 0.179 0.175. 0.171 0.162 0.179 0.168. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.011 1.134 1.008 1.118. 0.141 0.124 0.141 0.125. 0.147 0.130 0.149 0.132. 1.020 1.063 1.019 1.082. 0.247 0.236 0.251 0.233. 0.256 0.207 0.254 0.214. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.005 1.092 1.007 1.076. 0.098 0.094 0.100 0.091. 0.099 0.095 0.103 0.093. 1.018 1.027 1.008 1.072. 0.177 0.182 0.181 0.174. 0.175 0.169 0.179 0.169. Censoring 0.6. 15 17 12 40. (0.8,0.8). 24 26 18 35. (0.8,0.5). 21 22 22 29. (0.5,0.8). 17. βˆSRS 1.006 0.141 ˆ βP DS1 1.068 0.124 14 βˆSRS 1.008 0.142 βˆP DS2 1.057 0.123 ESE, the average of the estimates of standard errors; SSD, deviation. simple random size is 200. 20. 0.8. 20. 0.146 1.023 0.132 1.037 0.150 1.012 0.127 1.066 sample standard. 0.248 0.254 0.234 0.215 0.248 0.254 0.231 0.218.

(29) Table 4.5: Results are based on the model log T = β1 X +β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the size of supplemental is selected by the fixed proportions, and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n1 + n3 20. Proportions (0.5,0.5). βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. Mean 1.002 1.096 1.003 1.094. β1 = 1 ESE 0.083 0.080 0.083 0.078. SSD 0.084 0.082 0.085 0.077. Mean 1.006 1.028 1.012 1.071. β2 = 1 ESE 0.148 0.151 0.149 0.146. SSD 0.148 0.138 0.151 0.138. 0.8. βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. 1.003 1.096 0.998 1.076. 0.118 0.110 0.117 0.109. 0.122 0.110 0.122 0.108. 1.020 1.053 1.016 1.084. 0.207 0.201 0.205 0.197. 0.213 0.181 0.214 0.183. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.004 1.100 1.002 1.098. 0.081 0.078 0.082 0.075. 0.081 0.078 0.083 0.079. 1.005 1.029 1.013 1.072. 0.145 0.149 0.148 0.145. 0.146 0.133 0.148 0.137. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.004 1.098 1.003 1.080. 0.117 0.106 0.117 0.105. 0.123 0.110 0.118 0.108. 1.009 1.053 1.019 1.084. 0.205 0.197 0.205 0.192. 0.214 0.176 0.213 0.184. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.006 1.126 1.004 1.097. 0.082 0.078 0.082 0.075. 0.082 0.081 0.082 0.079. 1.011 1.035 1.008 1.078. 0.147 0.150 0.148 0.145. 0.149 0.132 0.149 0.137. 0.8. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.007 1.130 1.003 1.106. 0.118 0.108 0.118 0.107. 0.122 0.113 0.126 0.112. 1.004 1.056 1.021 1.089. 0.204 0.199 0.206 0.193. 0.207 0.176 0.211 0.177. 0.6. βˆSRS ˆ βP DS1 βˆSRS βˆP DS2. 1.005 1.084 1.002 1.079. 0.083 0.079 0.083 0.077. 0.083 0.080 0.086 0.079. 1.008 1.024 1.017 1.069. 0.148 0.150 0.149 0.145. 0.146 0.136 0.150 0.139. 0.126 1.017 0.108 1.046 0.121 1.008 0.106 1.068 sample standard. 0.205 0.198 0.206 0.193. 0.214 0.182 0.209 0.191. Censoring 0.6. 12 13 9 33. (0.8,0.8). 20 20 14 29. (0.8,0.5). 19 18 13 25. (0.5,0.8). 14. βˆSRS 1.005 0.118 ˆ βP DS1 1.070 0.108 11 βˆSRS 0.998 0.118 βˆP DS2 1.052 0.106 ESE, the average of the estimates of standard errors; SSD, deviation. simple random size is 300. 15. 0.8. 21.

(30) Table 4.6: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the supplemental sample include all cases.. (n0 , n1 , n3 ) Cutpoint (100,44,24) (0.3,0.7). Censoring 0.6. β1 = 1 ESE 0.117 0.124 0.127 0.122. SSD 0.119 0.131 0.133 0.129. Mean 1.015 1.019 1.005 1.069. β2 = 1 ESE 0.211 0.249 0.229 0.245. SSD 0.220 0.227 0.237 0.230. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. Mean 1.014 1.105 1.008 1.113. 0.8. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. 1.018 1.091 1.012 1.108. 0.173 0.189 0.158 0.177 0.126 0.132 0.122 0.129. 1.028 1.024 1.001 1.075. 0.309 0.309 0.228 0.244. 0.341 0.286 0.226 0.224. 0.6. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. 1.007 1.024 1.007 1.021. 0.134 0.118 0.135 0.114. 0.138 0.117 0.134 0.115. 1.015 1.131 1.016 1.156. 0.243 0.238 0.246 0.231. 0.249 0.241 0.249 0.238. 0.8. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. 1.024 1.047 1.019 1.042. 0.184 0.205 0.162 0.160 0.186 0.204 0.168 0.176. 1.036 1.089 1.035 1.098. 0.332 0.321 0.336 0.328. 0.365 0.309 0.364 0.319. 0.6. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. 1.006 1.103 1.010 1.097. 0.095 0.094 0.093 0.098 0.098 0.099 0.090 0.096. 1.004 1.028 1.007 1.076. 0.170 0.181 0.177 0.175. 0.169 0.160 0.176 0.168. 0.8. βˆSRS βˆP DS1 βˆSRS ˆ βP DS2. 1.009 1.096 1.010 1.094. 0.138 0.150 0.123 0.130 0.098 0.097 0.090 0.090. 1.017 1.050 1.009 1.080. 0.241 0.232 0.176 0.175. 0.251 0.210 0.180 0.164. 0.6. βˆSRS ˆ βP DS1 βˆSRS ˆ βP DS2. 1.002 1.097 1.002 1.088. 0.080 0.077 0.081 0.075. 1.006 1.027 1.007 1.066. 0.144 0.149 0.147 0.144. 0.147 0.135 0.146 0.138. βˆSRS 1.002 0.117 0.122 1.013 0.204 ˆ βP DS1 1.097 0.105 0.109 1.053 0.196 ˆ (300,19,3) βSRS 1.007 0.082 0.084 1.015 0.148 βˆP DS2 1.089 0.075 0.079 1.069 0.144 The cutpoints for the design were 0.3 and 0.7 sample quantiles and 0.1 and 0.9 sample quantiles,respectively. ESE, the average of the estimates of standard errors; SSD, sample standard deviation.. 0.206 0.174 0.152 0.137. (100,29,12) (100,27,24) (100,29,11) (100,18,5). (0.1,0.9). (100,15,4) (100,17,10) (100,16,10) (200,34,13). (0.3,0.7). (200,22,5) (200,19,12) (200,22,5) (300,30,9). (0.3,0.7). (300,19,4) (300,15,7). 0.8. 22. 0.080 0.081 0.083 0.077.

(31) Table 4.7: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), and the cutpoints for the design were 0.1 and 0.7 sample quantiles.. βˆSRS βÔDS βˆP DS1 βˆP DS2. Mean 1.018 1.109 1.102 1.066. β1 = 1 ESE 0.141 0.117 0.122 0.117. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.031 1.106 1.104 1.087. 0.190 0.145 0.153 0.154. 0.206 0.135 0.156 0.159. 1.021 1.172 1.044 1.053. 0.340 0.284 0.305 0.300. 0.389 0.306 0.303 0.304. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.015 1.110 1.087 1.048. 0.103 0.101 0.093 0.086. 0.104 0.093 0.092 0.088. 1.017 1.156 1.019 1.037. 0.183 0.203 0.178 0.167. 0.181 0.204 0.175 0.174. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.018 1.096 1.075 1.057. 0.142 0.123 0.119 0.118. 0.149 0.112 0.119 0.118. 1.023 1.149 1.037 1.059. 0.248 0.233 0.223 0.219. 0.258 0.238 0.226 0.229. 0.6. βˆSRS βÔDS βˆP DS1. 1.011 1.115 1.074. 0.086 0.096 0.078. 0.085 0.088 0.078. 1.007 1.121 1.007. 0.152 0.190 0.147. 0.155 0.189 0.145. (n0 , n1 , n3 ) Censoring (100,10,10) 0.6. (200,10,10). (300,10,10). βˆSRS βÔDS βˆP DS1 ESE, the average of the estimates deviation. 0.8. SSD 0.145 0.110 0.123 0.125. 1.011 0.118 0.122 1.097 0.111 0.101 1.069 0.102 0.094 of standard errors; SSD,. 23. Mean 1.020 1.158 1.022 1.035. β2 = 1 ESE 0.254 0.239 0.248 0.233. SSD 0.254 0.244 0.245 0.240. 1.011 1.128 1.041 sample. 0.205 0.206 0.210 0.217 0.188 0.189 standard.

(32) Table 4.8: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1), Z ∼ B(0.5) and ∼ Gumble(0, 1), and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. βˆSRS βÔDS βˆP DS1 βˆP DS2. Mean 1.010 0.999 1.100 1.099. β1 = 1 ESE 0.127 0.106 0.109 0.108. SSD 0.134 0.104 0.106 0.108. Mean 1.011 1.031 0.999 1.044. β2 = 1 ESE 0.231 0.210 0.233 0.230. SSD 0.228 0.209 0.208 0.212. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.014 0.983 1.085 1.079. 0.161 0.112 0.124 0.124. 0.172 0.108 0.129 0.127. 1.015 1.021 1.022 1.050. 0.285 0.213 0.257 0.258. 0.312 0.218 0.227 0.244. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.004 0.991 1.089 1.076. 0.093 0.093 0.083 0.080. 0.092 0.089 0.082 0.078. 1.014 1.022 1.000 1.037. 0.170 0.185 0.171 0.165. 0.175 0.184 0.157 0.157. 0.8. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.007 0.982 1.080 1.067. 0.117 0.095 0.099 0.094. 0.123 0.093 0.096 0.094. 1.022 1.016 1.022 1.052. 0.206 0.180 0.195 0.190. 0.214 0.186 0.176 0.181. 0.6. βˆSRS βÔDS βˆP DS1 βˆP DS2. 1.002 0.995 1.082 1.071. 0.077 0.088 0.071 0.068. 0.078 0.089 0.066 0.067. 1.016 1.026 0.998 1.039. 0.140 0.173 0.141 0.136. 0.137 0.170 0.130 0.128. (n0 , n1 , n3 ) Censoring (100,10,10) 0.6. (200,10,10). (300,10,10). βˆSRS βÔDS βˆP DS1 βˆP DS2 ESE, the average of the estimates deviation. 0.8. 1.008 0.098 0.103 0.985 0.089 0.085 1.072 0.084 0.081 1.060 0.081 0.078 of standard errors; SSD,. 24. 1.014 1.012 1.019 1.041 sample. 0.170 0.174 0.164 0.168 0.162 0.149 0.157 0.151 standard.

(33) 4.3. The optimal design. We consider the optimal allocation of the size of the supplemental samples under the fixed underlying cohort population. The total sample size is fixed, 140, 200 or 240. We calculate the relative efficiency of βˆSRS versus βˆP DS (V ar(βˆP DS )/V ar(βˆSRS )), which is noted by ARE, and see when the sum of ARE(βˆSRS , βˆP DS ) (TARE) achieves its minimum. We consider TAREs under the different configurations of ρ0 , ρ1 and ρ3 . The simulation results are based on N = 2000 with independent 2000 simulated runs.. In Table 4.9, we fixed the total sample size and considered the different size of supplemental sample who failed early. In Tables 4.10 and 4.11, we fixed the proportion of the supplemental sample who failed early and the total sample size. Then, we used Table 10 and 4.11 to find TARE and plot. In Figure 1, the X-axis represents ρ0 (n0 /n) and the Y -axis represents TARE. From Figure 1, we see that: (i) the trace of the asymptotic relative efficiency decrease as ρV increase; (ii) when the total sample sizes are 140, 200 and 240, the smallest corresponding ρ0 are equal to 0.47, 0.7 and 0.75, respectively; (iii) the minimum TAREs corresponding the optimal ρ0 ’s are 0.62, 0.72 and 0.83 for the total sample sizes of 140, 200 and 240, respectively. We recommend the optimal ρ0 between 0.7 and 0.75.. 25.

(34) Figure 4.1: The TARE in different sample size. 26.

(35) Table 4.9: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the fixed total sample size and various proportions of n1 , and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n 140. (n0 , n1 ) (100,8). Censoring 0.8. (100,12) (100,16) (100,20) (100,24) (100,28) (100,32) 140. (90,10) (90,15) (90,20) (90,25) (90,30) (90,35). 0.8. βˆSRS ˆ βP DS2 βˆSRS ˆ βP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS ˆ βP DS2 βˆSRS ˆ βP DS2 βˆSRS ˆ βP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2. βˆSRS βˆP DS2 ESE, the average of the estimates of deviation. The supplemental sample (90,40). Mean 1.008 0.897. β1 = 1 ESE SSD 0.173 0.188 0.139 0.147. Mean 1.025 0.946. β2 = 1 ESE SSD 0.314 0.330 0.266 0.284. 1.023 0.977. 0.175 0.138. 0.192 0.150. 1.024 0.998. 0.316 0.274. 0.335 0.282. 1.012 1.022. 0.174 0.140. 0.187 0.154. 1.024 1.021. 0.313 0.279. 0.322 0.271. 1.020 1.054. 0.177 0.142. 0.193 0.159. 1.030 1.038. 0.315 0.282. 0.338 0.272. 1.017 1.080. 0.175 0.141. 0.187 0.162. 1.026 1.043. 0.314 0.284. 0.334 0.271. 1.018 1.106. 0.177 0.144. 0.190 0.170. 1.024 1.046. 0.314 0.285. 0.347 0.270. 1.017 1.114. 0.174 0.145. 0.189 0.167. 1.037 1.052. 0.317 0.288. 0.341 0.272. 1.020 0.859. 0.176 0.141. 0.186 0.148. 1.026 0.915. 0.313 0.268. 0.332 0.293. 1.011 0.938. 0.175 0.141. 0.189 0.152. 1.010 0.974. 0.313 0.275. 0.331 0.286. 1.014 0.998. 0.174 0.141. 0.186 0.154. 1.042 1.009. 0.314 0.279. 0.353 0.280. 1.016 1.028. 0.176 0.141. 0.190 0.161. 1.026 1.009. 0.315 0.281. 0.337 0.275. 1.016 1.058. 0.175 0.142. 0.196 0.166. 1.022 1.042. 0.315 0.283. 0.332 0.285. 1.018 1.089. 0.175 0.143. 0.187 0.166. 1.030 1.039. 0.313 0.287. 0.339 0.273. 1.016 0.176 0.193 1.020 0.317 1.111 0.145 0.169 1.042 0.292 standard errors; SSD, sample standard proportion of n1 varies from 0.2 to 0.8.. 0.331 0.283. 27.

(36) Table 4.10: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), fixed the total sample size and change the proportion of SRS size, the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n 140. (n0 , n1 ) (40,0.5). Censoring 0.8. (50,0.5) (60,0.5) (70,0.5) (80,0.5). 140. βˆSRS ˆ βP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2. Mean 1.017 0.937. β1 = 1 ESE SSD 0.175 0.201 0.162 0.182. Mean 1.043 0.920. β2 = 1 ESE SSD 0.315 0.346 0.318 0.321. 1.027 0.959. 0.178 0.193 0.156 0.179. 1.019 0.929. 0.317 0.341 0.308 0.313. 1.016 0.976. 0.175 0.189 0.150 0.169. 1.034 0.947. 0.318 0.335 0.293 0.299. 1.020 0.986. 0.175 0.191 0.148 0.172. 1.024 0.974. 0.314 0.335 0.290 0.293. 1.017 1.000. 0.175 0.186 0.146 0.162. 1.029 0.979. 0.313 0.333 0.287 0.291. (90,0.5). βˆSRS βˆP DS2. 1.016 1.028. 0.176 0.190 0.141 0.161. 1.026 1.009. 0.315 0.337 0.281 0.275. (100,0.5). βˆSRS ˆ βP DS2 βˆSRS ˆ βP DS2 βˆSRS ˆ βP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2. 1.020 1.054. 0.177 0.193 0.142 0.159. 1.030 1.038. 0.315 0.338 0.282 0.272. 1.020 1.013. 0.178 0.192 0.163 0.191. 1.032 0.923. 0.318 0.345 0.330 0.336. 1.019 1.033. 0.176 0.199 0.157 0.188. 1.016 0.946. 0.316 0.340 0.318 0.324. 1.017 1.061. 0.175 0.189 0.151 0.179. 1.027 0.980. 0.315 0.335 0.305 0.305. 1.027 1.077. 0.176 0.193 0.149 0.175. 1.033 1.006. 0.318 0.335 0.299 0.296. 1.014 1.091. 0.176 0.190 0.146 0.166. 1.030 1.032. 0.312 0.343 0.296 0.284. βˆSRS βˆP DS2. 1.016 1.111. 0.176 0.193 0.145 0.169. 1.020 1.042. 0.317 0.331 0.292 0.283. (40,0.8) (50,0.8) (60,0.8) (70,0.8) (80,0.8) (90,0.8). 0.8. βˆSRS 1.017 0.174 0.189 1.037 0.317 0.341 ˆ βP DS2 1.114 0.145 0.167 1.052 0.288 0.272 ESE, the average of the estimates of standard errors; SSD, sample standard deviation. The total sample size n = 140 and supplemental sample proportion of n1 is 0.5 or 0.8. (100,0.8). 28.

(37) Table 4.11: Results are based on the model log T = β1 X + β2 Z, where X ∼ N (0, 1) and Z ∼ B(0.5), the fixed total sample size and various the proportions of SRS size, and the cutpoints for the design were 0.3 and 0.7 sample quantiles.. n 240. (n0 , n1 ) Censoring (160,0.5) 0.8 (170,0.5) (180,0.5) (190,0.5) (200,0.5). βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2 βˆSRS βˆP DS2. Mean 1.009 0.943 1.006 0.959 1.007 0.982 1.013 1.001 1.008 1.026. β1 = 1 ESE 0.135 0.107 0.135 0.107 0.135 0.106 0.135 0.107 0.135 0.106. SSD 0.145 0.114 0.144 0.113 0.147 0.113 0.140 0.111 0.140 0.114. Mean 1.024 1.017 1.022 1.030 1.028 1.050 1.027 1.060 1.027 1.059. β2 = 1 ESE 0.239 0.209 0.237 0.209 0.238 0.209 0.238 0.210 0.237 0.209. SSD 0.245 0.227 0.242 0.219 0.250 0.210 0.243 0.216 0.239 0.202. βˆSRS 1.006 0.134 0.143 1.021 0.238 0.245 ˆ βP DS2 1.033 0.105 0.121 1.037 0.207 0.209 ˆ (170,0.8) βSRS 1.004 0.134 0.143 1.021 0.237 0.246 ˆ βP DS2 1.055 0.104 0.122 1.068 0.207 0.201 ˆ (180,0.8) βSRS 1.005 0.134 0.139 1.020 0.237 0.245 ˆ βP DS2 1.066 0.105 0.121 1.061 0.207 0.201 ˆ (190,0.8) βSRS 1.017 0.135 0.139 1.028 0.236 0.242 ˆ βP DS2 1.083 0.106 0.125 1.073 0.208 0.197 ˆ (200,0.8) βSRS 1.014 0.134 0.143 1.019 0.236 0.247 ˆ βP DS2 1.100 0.108 0.128 1.076 0.210 0.198 ESE, the average of the estimates of standard errors; SSD, sample standard deviation. The total sample size n = 240. The supplemental sample proportion of n1 is 0.5 or 0.8. 240. (160,0.8). 0.8. 29.

(38) Chapter 5 The Busselton Health Study Data Analysis We applied the proposed methods to analyze data from the Busselton Health Study (Cullen, 1972; Knuiman et al., 2003). The Busselton Health Surveys are a series of cross-sectional health surveys conducted in the town of Busselton in Western Australia. Every three years from 1966 to 1981, general health information for adult participants was collected by means of questionnaire and clinical visit. The population of this study is based on 1,612 men and women aged 40-89 years who participated in the 1981 Busselton Health Survey and had no history of diagnosed coronary heart disease or stroke at that time. For both coronary heart disease or stroke events, follow-up started on the 1981 survey and continued through the date of the first coronary heart disease event and the data of the first stroke event or December 31, 1998, whichever comes first. The subjects were treated as censored if they left Western Australia during the study. It was proposed that stored body iron is positively related to the risk of coronary heart disease (Sullivan, 1996). However, the accumulated epidemiologic evidence has been inconsistent and it is of interest to examine this hypothesis. 30.

(39) in this population. There are several measures of stored body iron, and serum ferritin is regarded as the best biochemical measure (Cook et al., 1974). To reduce cost and preserve stored serum, a case-cohort sampling design was used. There were 1,612 cohort members with a total of 285 coronary heart disease cases and 159 stroke cases. A random sample of 610 subjects was selected as the subcohort members. Since only about 75% of the entire cohort members had viable blood serum samples, ferritin assays were conducted for 217 coronary heart disease cases, 118 stroke cases, and 450 subcohort members among about 1208 subjects who had viable blood samples. A total of 43 subjects experienced both coronary heart disease and stroke. Because of the overlapping between coronary heart disease and stroke cases and the random subcohort, the total number of assayed sera samples was 626. In order to apply our proposed methods to this dataset, we included the main exposure variable, serum ferritin (Ferr), and three variables as covariates to control for confounding: triglycerides in millimoles per litre (Trigly), age in years (Age) and whether the participant had blood pressure treatment (Rxhyper). The logarithm of the serum ferritin level was used in the model as the main risk factor. We fit the following model for the coronary heart disease case (ChdT) or stroke case (StrT), respectively. log(ChdT) = β0 + β1 log(Ferr) + β2 Age + β3 Trigly + β4 Rxhyper + ε1. (5.1). log(StrT) = γ0 + γ1 log(Ferr) + γ2 Age + γ3 Trigly + γ4 Rxhyper + ε2. (5.2). The domain of the logarithm of the serum ferritin is partitioned into three intervals by the quantile 0.3 and 0.7. We fixed total sample size n = 240 from the cohort of 627 subjects who had available serum ferritin assays. Because of the second-phase probability values, we select the supplemental samples who failed. 31.

(40) Figure 5.1: The TARE of Busselton Health Study. late randomly from the set (ChdT, Ferr, Age, Trigly, Rxhyper, δ) : Pr(log Ferr ∈ A1 | log ChdT, Age, Trigly, Rxhpyer) ≥ 95%, δ = 1 A similar setting is used for the outcome of stroke. The results for the Busselton Health Study analysis are summarized in Table 6.10. Obviously, the two-phase PDS design obtained smaller standard errors than the SRS design of the same sample size. We also applied the optimal design to the Busselton Health study. We used the coronary heart disease case model with fixed sample size of 240, and ρ0 (n0 /n) varying from 0.4 to 0.9. We plot the result in Figure 2. In Table 5.1, the estimated mean of PDS scheme and SRS scheme is negative and all the standard errors of βˆP DS are smaller than βˆSRS .. 32.

(41) Table 5.1: Result from Busselton Health Study. PDS design ρ0 0.4. 0.5. 0.6. 0.7. 0.75. 0.83. SRS design. Est.. SE. Est.. SE. logFerritin. -0.0938. 0.1110. -0.1107. 0.4004. Age. -0.0505. 0.0228. -0.0816. 0.1972. Triglycer. -0.1450. 0.0635. -0.2287. 1.4930. Rxhyper. -0.0814. 0.1959. -0.1129. 0.3663. logFerritin. -0.087. 0.1054. -0.1075. 0.2275. Age. -0.0509. 0.012. -0.0852. 0.1666. Triglycer. -0.1487. 0.0596. -0.2174. 0.3242. Rxhyper. -0.0799. 0.1834. -0.1104. 0.3028. logFerritin. -0.0948. 0.1026. -0.1102. 0.1593. Age. -0.0509. 0.0101. -0.0845. 0.1592. Triglycer. -0.1486. 0.0574. -0.1966. 0.1102. Rxhyper. -0.0886. 0.1769. -0.1084. 0.2879. logFerritin. -0.0951. 0.1019. -0.1099. 0.1597. Age. -0.0515. 0.00981. -0.081. 0.1469. Triglycer. -0.1518. 0.0578. -0.2023. 0.116. Rxhyper. -0.0928. 0.1742. -0.1153. 0.2861. logFerritin. -0.093. 0.1021. -0.1086. 0.1682. Age. -0.05186. 0.0099. -0.0863. 0.1759. Triglycer. -0.1551. 0.0589. -0.202. 0.131. Rxhyper. -0.0933. 0.1762. -0.1079. 0.2939. logFerritin. -0.0982. 0.1059. -0.1101. 0.15. Age. -0.0525. 0.0102. -0.0773. 0.1189. Triglycer. -0.1579. 0.0621. -0.1977. 0.1029. Rxhyper. -0.095. 0.1818. -0.1044. 0.2705. 33.

(42) Chapter 6 Conclusions and Discussions In this thesis, we consider an innovative and cost-effective sampling design, the two-phase PDS design, for the survival data. The advantage of the PDS design proposed is that it allows for a continues variable Y and a vector of available covariates Z to be used in selecting a more informative second-phase data set. We use the simple linear model to estimate the first-phase sample data and obtain the parameter estimators and then implement the AFT model for the resulting biased sample. We conducted simulations studies under various settings and investigated the optimal design by evaluating the trace of the asymptotic variance-covariance matrix between our proposed estimator and the simple random sample estimator. We concluded that: (i) the PDS design enables investigators to collect more informative samples under a fixed budget; (ii) the PDS estimator is more efficient than those from an SRS of the same sample size; (iii) the PDS estimator is efficient than the ODS estimator, especially when the censoring rate is high;. 34.

(43) (iv) the estimator from the cutpoint quantile (0.1, 0.9) has smaller standard error than it from (0.3, 0.7); (v) the second-phase probability value can be consider between 80% and 95%; (vi) the corresponding optimal ρ0 (n0 /n) can be selected between 0.7 and 0.75. For future work, other possible survival models can be taken into consideration, such as the proportional hazards model or the additive hazard model. It would also be interesting to explore different approaches for estimating the probability (φk ) using the first-phase data.. 35.

(44) Appendix ˆ Let φ(·) denote the density We first provide the proof of the consistency of β. function of the standard normal random variable. The convex objective functions ˜ n,G (β), respectively, are then of Un,G (β) and Uñ,G (β), Ln,G (β) and L. Ln,G (β) =. n n √ ej (β) − ei (β) 1 XX rij √ ej (β) − ei (β) )+ √ φ( n ))] δi [(ej (β)−ei (β))×Φ( n n i=1 j=1 rij rij n. n n √ ej (β) − ei (β) 1 XX rij √ ej (β) − ei (β) ˜ Ln,G (β) = ωi δi [(ej (β)−ei (β))Φ( n )+ √ φ( n ))] n i=1 j=1 rij rij n. By applying Lemma 2 in Johnson and Strawdermann (2009) , limn→∞ supβ∈B |Ln,G (β)− L0 (β)| = 0 where L0 (β) is strictly convex for β ∈ B. It can also be shown ˜ n,G (β) − L(β)| ˜ that limn→∞ supβ∈B |L = 0 by the strong law of large numbers for U -statistics, asymptotic convergence results on finite population sampling, and Lemma 1 in Kong, Cai and Sen (2006) . Combining these two results and by ˜ n,G (β) − L0 (β)| = 0. Condition applying the triangle inequality, limn→∞ supβ∈B |L (iv) ensures that L0 (β) is strictly convex at β0 , a unique minimizer of L0 (β). ˆ converges to β0 almost surely. ˜ n,G (β), β, Then, the unique minimizer of L ˆ we first show the asymptotic To establish the asymptotic normality of β, ˆ solution to Un,G (β) = 0 where Un,G (β) is the weighted version of normality of β, Un,G (β) and. Un,G (β) =. n X n X. ωi δi (Xi − Xj ) × I(ej (β) ≥ ei (β)).. i=1 j=1. Then, we show the asymptotic equivalence between the distribution of. 36. √. n(β˜ −β0 ).

(45) and. √ ˆ n(β − β0 ). Rt. I(ei (β) ≥ µ)λ0 (µ)dµ where λ0 (·) is the comN P d mon hazard function for i . Define S (d) (β; t) = n−1 ni=1 Xi I(ei (β) ≥ t) and N P d (d) Sc (β; t) = n−1 ni=1 ωi Xi I(ei (β) ≥ t) (d = 0, 1). Further, define X(β; t) = Let Mi (β; t) = Ni (β; t) −. −∞. ¯ c (β; t) = Sc(1) (β; t)/Sc(0) (β; t). The limiting quantities of S (1) (β; t)/S (0) (β; t), X ¯ ¯ (β; t) = s(1) (β; t)/s(0) (β; t), respectively. S (d) (β; t) and X(β; t) are s(d) (β; t) and x By applying Lemma 1 of Yu et al (2015) . and Lemma 1 of Jin, Lin and Ying P R∞ (0) (2006) to the stochastic integral representation of Un,G = ni=1 −∞ ωi Sc (β; t)(Xi − ¯ c (β; t))dNi (β; t), it can be shown that X Un,G (β0 ) =. n X i=1. where µi (β) =. R∞ −∞. n X √ µi (β0 ) + (ωi − 1)µi (β0 ) + Op ( n). (6.1). i=1. ¯ i (β; t))dMi (β; t). Since s(0) (β; t)(Xi − x. K X πk (1 − ρ0 ρV )ηik ξi ξi ωi −1 = ( −1)(1−δi )+( −1)(1−ζi )δi +[ ( −1)ζik ](1−ζi )δi , ρ0 ρV ρ0 ρV ρ k ρV k=1. the second term in (6.1) is decomposed into Pn. i=1. +. ξi ξi − 1)(1 − δi )µi (β0 ) + ( − 1)(1 − ζi )δi µi (β0 ) ρ0 ρV ρ0 ρV K X πk (1 − ρ0 ρV )ηik [ ( − 1)ζik ](1 − ζi )δi µi (β0 ). ρk ρV k=1 (. These three terms are asymptotically uncorrelated. Moreover, the first term in (6.1) and these three terms are asymptotically uncorrelated. Thus, by applying Lemma 3 in the supplementary materials of Kang and Cai (2009) and the multivariate central limit theorem, we have the desired asymptotic normality of P √ −1 n Un,G (β0 )) whose mean is 0 and asymptotic covariance function is F (β0 ) + N N P P P 2 2 2 and S (β0 ) = E[(ω1 − 1) S1 (β0 )]. The S (β0 ) where F (β0 ) = E[µi (β0 )]. 37.

(46) consistency of βˆ to β0 follows from the similar arguments of showing the conˆ Using this with the arguments in Theorem 2 of Ying (1993), sistency of β. P √ √ −1 √ it can be shown that n(β˜ − β0 ) = − −1 n k A (β0 ) n Un,G (β0 ) + op (1 + √ −1 β˜ − β0 k). Then, by incorporating the asymptotic normality of n Un,G (β0 ), √ −1 ˜ n (β − β0 ) is asymptotic normally distributed with mean 0 and covariance P P P P−1 > function −1 A (β0 )( F (β0 ) + S (β0 ))( A (β0 )) . √ √ To establish the equivalence of the distributions of n(β˜ −β0 ) and n(βˆ −β0 ) asymptotically, it is sufficient to show that, as n → ∞, (i) ∂ Uñ,G (β)/∂β > converges to (ii). √. P. ˜ (β) A. in probability uniformly in β ∈ B,. −1. n (Uñ,G (β) − Un,G (β)) converges to. P. ˜ (β) A. in probability uniformly in. β ∈ B. P To show (i), we decompose ∂ Uñ,G (β)/∂β > − A˜ (β) into (∂ Uñ,G (β)/∂β > −∂Un,G (β)/∂β > )+ P (∂Un,G (β)/∂β > − A˜ (β)). The second term converges to 0 in probability uniformly in β ∈ B by Lemma 3 in Johnson and Strawdermann (2009). The first term cna also be shown to converge 0 in probability uniformly in β ∈ B by applying the strong law of large numbers for U -statistics, Lemma 1 in Kong, Cai and Sen (2006), and the asymptotic convergence results on finite sampling. Combining these two and by applying the triangle inequality, we have the desired result. For (ii), √. −1 n (Uñ,G (β) − Un,G (β)) n n 1 XX ej (β) − ei (β) ej (β) − ei (β) = ωi δi (Xi − Xj )[Φ( ) − I( ≥ 0)] n i=1 j=1 rij rij n. n. √ ej (β) − ei (β) √ ej (β) − ei (β) 1 XX rij = ωi δi (Xi − Xj ) n Φ(− n ). n i=1 j=1 ej (β) − ei (β) rij rij. 38.

(47) √ √ −1 Note that, for u ∈ R, |u(Φ( nu) − I(u ≥ 0))| = sign(u)(uΦ( n |u|)) where √ √ sign(u) = 2I(u ≥ 0)−1. Since Φ(−u) ≤ ( 2πu)−1 exp(−u2 /2), limn→∞ supu∈R |u(Φ( nu)− I(u ≥ 0))| = 0. Then, (ii) follows from this result and by applying the strong law of large number for U -statistics, Lemma 1 in Kong, Cai and Sen (2006), and the asymptotic convergence results on finite sampling.. 39.

(48) Bibliography [1] Brown, B. M., and Wang, Y.-G. Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 1 (2005), 149–158. [2] Brown, B. M., and Wang, Y.-G. Induced smoothing for rank regression with censored survival times. Statistics in Medicine 26, 4 (2007), 828–836. [3] Chatterjee, N., Chen, Y.-H., and Breslow, N. E. A pseudo-score estimator for regression problems with two-phase sampling. Journal of the American Statistical Association 98, 461(Mar.,2003) (2003), 158–168. [4] Chiou, H., Kang, S., and Yan, J. Fitting accelerated failure time models in routine survival analysis with r package aftgee. Journal of Statistical Software 61, 11 (2014), 1–23. [5] Chiou, H., Kang, S., and Yan, J. Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. Journal of the American Statistical Association 110 (2015), 621–629. [6] Ding, J., Zhou, H., Liu, Y., Cai, J., and Longnecker, M. p. Estimating effect of environmental contaminants on women’s subfecundity for the moba study data with an outcome-dependent sampling scheme. Biostatistics 15, 4 (2014), 636–650.. 40.

(49) [7] Gehan, E. A. A generalized wilcoxon test for comparing arbitrarily singlycensored samples. Biometrika 52, 1/2 (1965), 203–233. [8] Hajek, J. Limit theorems for a simple random sampling from a finite population. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5 (1960), 361–374. [9] JIN, Z., LIN, D. Y., and YING, Z. Rank regression analysis of multivariate failure time data based on marginal linear models. Journal of Statistic 33 (2006), 1–23. [10] Johnson, L. M., and Strawderman, R. L. Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96 (2009), 577–590. [11] Kang, S., and Cai, J. Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96 (2009), 887–901. [12] Kong, L., Cai, J., and Sen, P. K. Asymptotic results for fitting semiparametric transformation models to failure time data from case-cohort studies. Statistica Sinica 16 (2009), 135–151. [13] PRENTICE, R. L. Linear rank tests with right censored data. Biometrika 65, 1 (1978), 167–180. [14] Serfling, R. J. Approximation theorems of mathematical statistics. Statistics,(Vol. 162) (2009), John Wiley & Sons. [15] Song, R., Zhou, H., and Kosorok, M. R. On semiparametric efficient inference for two-stage outcome dependent sampling with a continuous outcome. Biometrika 96, 1 (2009), 221–228.. 41.

(50) [16] Weaver, M. A. Semiparametric methods for continuous outcome regression models with covariate data from an outcome dependent subsample. [17] Ying, Z. A large sample study of rank estimation for censored regression data. The Annals of Statistics 21 (1993), 76–99. [18] Yu, J., Liu, Y., Sandler, D. P., and Zhou, H. Statistical inference for the additive hazards model under outcome-dependent sampling. Statistical Society of Canada 43, 3 (2001), 436–453. [19] Yu, J., Liu, Y., Sandler, D. P., and Zhou, H. Statistical inference for the additive hazards model under outcome-dependent sampling. The Canadian Journal of Statistics 43 (2015), 436–453. [20] Zhou, H., Song, R., Wu, Y., and Qin, J. Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 1 (2011), 194–202. [21] Zhou, H., and Weaver, M. A. An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of the American Statistical Association 100, 470 (2005), 459–469. [22] Zhou, H., Weaver, M. A., Qin, J., Longnecker, M., and Wang, M. C. A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58, 2 (2002), 413–421. [23] Zhou, H., Xu, W., Zeng, D., and Cai, J. Semiparametric inference for data with a continuous outcome from a two-phase probability-dependent sampling scheme. Royal Statistical Society Soc.B 76 (2014), 197–215.. 42.

(51) [24] Zhu, H., and Wang, M.-C. Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing. Biometrika 101, 3 (2014), 519–533.. 43.

(52)