Sample size - 連續型資料的結果依賴採樣設計之配置選擇

In this section, we provide the sample size required to achieve a given power for different values of β₁ using ODS with various allocation to observe the cost savings. The sample sizes are calculated using the asymptotic normal properties of the proposed estimates. Suppose the hypothesis we want to test as follows:

H₀ : β₁ = 0 vs. H₁ : β₁ = b, (3.4)

where b is a specified constant. We know that the estimator for β₁, ˆβ₁, satisfy that the distri-bution of ( ˆβ₁− β₁) converges to a normal distribution N (0, σ²), where σ can be consistently estimated by ˆσ. In addition, we denote ˆβ₁(n) and ˆσ(n) to emphasize that estimators ˆβ₁ and ˆ

σ are dependent on the sample size n. Given the significance level α=0.05, the power can be calculated by

where Φ(x) is the cumulative distribution function of standard normal variable. Hence, we obtain the sample size for a given power and significance level through the above formula as following:

Step 1: For a given power, b, and significance level α=0.05, we obtain the value of ˆσ(n) by the equation (3.5).

Step 2: We determine the sample size n according to the standard deviation of 2000

simulation results(SE).

Table 3.67 shows the sample sizes required to achieve a given power for three values of β1, 0.05, 0.10, and 0.15, using two methods βZ and βS under the model (3.1). In the table, using the optimal ODS design that cut-points are at (10^th , 90^th) percentiles of Y and the proportion of SRS samples is 0.3 only requires 1800 samples, which is 42% of the subjects when using the simple random sampling scheme for a given power 0.9 at true β1=0.05. For a given power and true value of β₁, fewer subjects are needed for the ODS design with extreme cut-points or fewer proportion of the SRS samples (smaller ρ). Furthermore, as the true value of β1 increases from 0.05 to 0.15, fewer subjects are needed to achieve for the same power.

The above results can be observed in the Table 3.68 as well, which shows the sample sizes required to achieve a given power for three values of β1, 0.05, 0.10, and 0.15, using two methods β_Z and β_S under the model (3.3).

Table 3.67: Sample size needed for testing H₀ : β₁ = 0 for a given power for models with Normal

Table 3.68: Sample size needed for testing H₀ : β₁ = 0 for a given power for models with

Note: The results are based on 2000 simulations with the model f (y|x; β) = λexp{−λy}, for y > 0, and λ = exp{β₀+ β₁x₁+ β₂x₂}, where X₁ ∼ N(0,1), X₂ ∼ Bernoulli(0.4), β₀ = 1

and β₂ = -0.4.

4 APPLICATION TO THE COLLABORATIVE PERINATAL PROJECT DATA

In this section, we apply the proposed method to analyze a real data set from the Col-laborative Perinatal Project (CPP) to verify if it agrees with our simulation results. We first summarizes the CPP data in Section 4.1, and then describe the conditional model we use to analyze the data in Section 4.2. In Section 4.3, we present the results of the parameter estimates.

4.1 The CPP data

The Collaborative Perinatal Project (CPP) is a prospective study designed to study the relationship between neurologic disorders and other conditions on children.(Niswander and Gordon,1972; Gray et al.,2000). From 1959 to 1965, 55,908 pregnant women were recruited into the study from 12 U.S. study centers(in Baltimore, Boston, Buffalo, Memphis, Minneapolis, New Orleans, New York [two hospitals], Philadelphia, Portland, Providence, and Richmond). Study data were collected on the mothers at each prenatal visit, delivery, and on children when they were 24 hours, 4 and 8 months, and 1, 3, 4, 7, and 8 years old. In addition, children undergo many tests that covered by the assessment of cognitive, neurological, and motor development.

In a recent environmental epidemiological study, the investigators are interested in the relationship between the in utero exposure to polychlorinated biphenyls (PCBs) measured from the third trimester serum and the audiometric evaluation, which was down when the children were approximately 8 years old(Longnecker et al., 2001 and 2004). Because the cost of the blood serum assay is expensive and the amount of maternal blood serum specimens

that been preserved is limited, the PCB level is measured on a subsample from the CPP population. Hence the eligible children that must meet the following criteria: (1) live-born singleton and (2) 3-ml third trimester maternal serum specimen are available. The investigators obtained samples include a SRS of 1,200 subjects from eligible child, of whom 726 had an 8-year audiometric evaluation and a supplemental sample of 200 eligible children whose audiometric evaluation showed sensorineural hearing loss (SNHL), defined by a hearing threshold ≥ 13.3 dB according to the average across both ears at 1000, 2000, and 4000 Hz.

In our analysis, we were mainly interested in the effect of PCB to audiometric evaluation, hence we took the hearing level, the average measures at frequencies 1000, 2000, and 4000 Hz for both ears, to be the continuous outcome variable. The main exposure variable was the third trimester maternal serum PCB level (PCB) measured in µg/L. Additional confounding variables included the mother’s maternal age (MAGE), the highest education level attained when giving birth (EDUC), the socioeconomic index (SEI) score ,and the race (RACE) and the gender (GENDER) of child. The covariates of PCB, MAGE, EDUC, and SEI were continuous; RACE was coded 1 for ”White” and 0 for ”Black and Others”, respectively;

GENDER was coded 1 for males and 0 for females.

We only consider 1,806 complete subjects who did not have missing observations for these seven variables in our analysis and let 1806 subjects be the available population.

First, we use all complete subjects to analyze. Then we compare the performance of the parameter estimate for the SRS design to ODS designs with the same sample size but different allocations. The samples under the ODS design in our analysis include a simple random sample from 1806 subjects and add supplemental samples from two tail of the distribution of audiometric evaluation.

在文檔中連續型資料的結果依賴採樣設計之配置選擇 (頁 102-108)