Size of SRS samples - 連續型資料的結果依賴採樣設計之配置選擇

In the process of deriving ˆβZ, we discover that the proposed method was sensitive to the initial values, especially for the ODS design with ρ near 0. We then in this section discuss the ODS design with SRS samples as small as possible, i.e., ρ less than 0.1. We conduct the same simulation studies similar to the Tables 3.5 and 3.8 according to the model (3.1) with (β₀, β₁, β₂) = (1, 0.1, -0.5) and (β₀, β₁, β₂) = (1, 0.5, -0.5), respectively. The simulation results were summarized in the Tables 5.8 and 5.9. In addition to the performance of ˆβ_Z, we list the frequency of divergence in 2,000 replications in the ]non.conv in the Tables 5.8 and 5.9.

In Table 5.8, there are 631 times of divergence in 2,000 replications for the ODS design with ρ=0.025 and the cut-points at (10^th, 90^th) even when we give suitable initial values of the estimators. Same results occur in Table 5.9 for ρ=0.025. That is, the estimator is difficult to be estimated by the proposed method under the ODS design with vary small portion of SRS samples.

About this phenomenon, we are unable to give a reasonable explanation yet. One possible speculation is related to the program of the MSELE. However we are certain that the presence of SRS samples will balance the proposed method. We hereby suggest that it is necessary to include substantial SRS samples when considering the ODS design. This conclusion corresponds to what we proposed and obtained in Section 5.2.

Table 5.8: Additional simulation results: Normal model with (β₀, β₁, β₂) = (1, 0.1, −0.5).

βˆ₁

ρ(n₀/n) cut-points ] non.conv Mean SE

0.01(2/200) (10^th, 90^th) 152 0.103 0.046

(20^th, 80^th) 49 0.099 0.052

(30^th, 70^th) 18 0.102 0.058

0.0025(2/800) (10^th, 90^th) 631 0.101 0.023 (20^th, 80^th) 617 0.100 0.026 (30^th, 70^th) 599 0.101 0.029

Table 5.9: Additional simulation results: Normal model with (β₀, β₁, β₂) = (1, 0.5, −0.5).

βˆ₁

ρ(n₀/n) cut-points ] non.conv Mean SE

0.01(2/200) (10^th, 90^th) 0 0.506 0.062

(20^th, 80^th) 0 0.502 0.062

(30^th, 70^th) 2 0.502 0.067

0.0025(2/800) (10^th, 90^th) 524 0.504 0.029 (20^th, 80^th) 617 0.501 0.030 (30^th, 70^th) 803 0.500 0.032

6 CONCLUSION AND DIRECTION FOR FUTURE RESEARCH

In this thesis, we review the data structure and the likelihood function of the ODS design with a continuous outcome, and derive the estimators, SELME, closely following a semiparametric empirical likelihood approach procedure proposed by Zhou et al. (2002). In our simulation studies, we evaluate estimators under different regression models, different sample sizes, and different allocations of supplement sample selections and demonstrate the optimal sampling with a limited total sample size.

Through our simulation results, we offer more general criteria on sample selections of the ODS design with limited total sample size. Below we summarize our conclusions and suggestions.

If the conditional density of Y given covariates X is symmetry, using equal supplemen-tal sampling strategy to add supplemensupplemen-tal samples will gain more efficiency; on the other hand, it is more appropriate to draw more supplemental samples form the long tail than the short tail.

Let the domain of the outcome distribution be partitioned into three intervals by the

cut-points in the 10^th and 90^th percentiles of the distribution Y . We believe that more information was from the extreme intervals. Then we can draw additional samples from two tails as the supplemental samples. Note that we can add only one supplemental sample from the long tail when the distribution is skewed.

Let the proportion of simple random samples to the total samples (ρ=n0/n) be between 0.4 and 0.6. We observe that the allocations of ODS design with ρ less than 0.3 in

some model settings, especially for the coefficient (β) large than 1, do not gain more efficiency than the simple random sample design (discussed in the Sections 5.1 and 5.2) . In addition, the sizes of samples in the extreme intervals are often smaller in practice. Hence, we suggest that the amount of the supplemental samples can be calculated through ρ between 0.4 and 0.6. In other words, we only randomly draw nearly or less than half samples as the SRS samples from the overall population.

When the relationship between the outcome and the interested exposure is not strong,

using the ODS design can obtain more than twice efficiency than the SRS design with the same sample size. Hence, we recommend that using the ODS design to your study, especially for the regression with a smaller coefficient values.

In our simulation results, there are some phenomenon we can not give reasonable expla-nation from the time being, such as the simulation studies in Section 5.2: the rise of relatively efficiency of ˆβ₁ in Figure 5.3 was larger than the rise in Figure 5.2, and the simulation studies in Section 5.3: the estimator can not converge for the ODS design with fewer supplemental samples. These issues can be discussed in the future.

In addition, the new two-stage ODS design is proposed and a more efficient estimation for such design is developed(Song, Zhou, and Kosorok, 2009; Zhou et al. 2010). The issue we discusss in this thesis can be extended to the two-stage ODS design and give researchers more general criteria on the sampling with a limited budget.

References

[1] Liao, D., Myers, R., Hunt, S., Shahar, E., Paton, C. Burke, G., Province, M., and Heiss, G. (1997). Familial history of stroke risk: the family heart study. Stroke 28, 1908-1912.

[2] Longnecker, M., Hoffman, H., Klebanoff, M. A., Brock, J. W., Zhou, H., Needham, L., Adera, T., Guo, X., and Gray, K. A. (2004). In utero exposure to polychlorinated biphenyls and sensorineural hearing loss in 8-year-old children. Neurotoxicology and Teratology 26, 629-637.

[3] Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single func-tional. Biometrika 75, 237-249.

[4] Owen, A. B. (1990). Empirical likelihood for confidence regions. Annals of Statistics 18, 90-120.

[5] Qin, J. and Lawless, J. F. (1994). Empirical likelihood and general estimating equa-tions.Annals of Statistics 22, 300-325.

[6] Song, R., Zhou, H., and Kosorok, M. (2009). A note on semiparametric efficient inference for two-stage outcome-dependent sampling with a continuous outcome. Biometrika 96, 221-228.

[7] Wang, X. and Zhou, H. (2006). A semiparametric empirical likelihood method for biased sampling schemes with auxiliary covariates. Biometrics 62, 1149-1160.

[8] Weaver, M. A. and Zhou, H. (2005). An estimated likelihood method for continuous outcome regression models with outcome-dependent sampling. Journal of the American Statistical Association 100, 459-469.

[9] Zhou, H., Weaver, M. A., Qin, J., Longnecker, M. P., and Wang, M. C. (2002) A semi-parametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome. Biometrics 58, 413-421.

[10] Zhou, H., Song R., Wu Y., and Qin J. (2010). Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome. Biometrics 67, 194-202.

在文檔中連續型資料的結果依賴採樣設計之配置選擇 (頁 130-135)