• 沒有找到結果。

Likelihood Analysis Based on Familial data

5 Regression Analysis Based on Familial Data

5.2 Likelihood Analysis Based on Familial data

Table 5.2 summarizes observed case-control familial data in which probands’ times (onset or censored) are matched. Specifically at time t , we sample i m t case probands

 

i

and their relatives and matched with n t control probands and their relatives. Denote

 

i

( p, r)

XX X as observed times and  ( p, r) as the corresponding indicators for a proband and his/her relative respectively.

Time

Table 5.2 Age-matched case-control familial data

To simply the presentation, assume that there are two members in one family (one age-onset data. The model assumption consists of two stages. In the first stage, the model on the proband namely Pr(Zp|

Xp,p

), is assumed to follow the Cox model. In the second stage, the model on Pr(

Xr,r

|

Xp,p

, )Z is constructed where Z (Zp,Zr). It should

be mentioned that a crucial reproducible assumption to simply the analysis is given by

* * * * *

Pr{(X , ) |Z Zp, r)Pr{(X , ) |Z ) (* = p,r). (5.8) This assumption only holds when covariates Z within a family do not depend on any unmeasured variables responsible for the correlation of age-onset. One can write

   

Pr( Xr,r ,Z Zr, p| Xp,p ) Pr(

Xr,r

,Zr |

Xp,p

,Zp) P(Zp |

Xp,p

). Note that

         

Pr( Xr,r, Zr| Xp,p ,Zp)Pr( Xr,r | Xp,p ,Z Zp, r) Pr(Zr | Xp,p ,Zp). By (5.8), we know that, given Z , p

Xp,p

and Z are independent. So, we have r

 

Pr(Zr | Xp,p ,Zp)Pr(Zr|Zp).

Therefore, formulated the likelihood function for familial case-control data can be based on the following decompositions:

     

Pr(Zp | Xp,p ) Pr( Xr,r | Xp,p , ) Pr(Z Zr |Zp). (5.9) The first component of (5.9) can be analyzed based on probands’ data which can be performed in the first stage. Recall the under the Cox model assumption discussed in Section 5.1, the form of Pr(Zp|

Xp,p

) can be specified and equation (5.7) can be applied to estimate . The third component Pr(Zr|Zp) in (5.9) can be treated as a constant.

To specify the form of Pr(

Xr,r

|

Xp,p

, )Z in (5.9), Li et al. (1998) adopted the Clayton model (Clayton, 1978) which is the most popular assumption for bivariate failure-time data. Recall that Tp and T represent the failure times of a proband and his/her r relative respectively. When (T Tp, r) follows the Clayton model , the joint survival function can be written as

       

individual if his/her relative has the disease rather than being disease free at a given age. Note

that 1

 

 

1

 

1 1 11

p p r r

S x S x

 

    . Accordingly we obtain

   

 

where D is the number of individuals with disease among the proband and the relative. Let

 

retrospective likelihood function at the time t is given as: i

 

   

     

Chapter 6 Simulations for Analysis of Age-onset Data from Case-control

6.1 Data Generation for Individual Data 6.1.1 Prospective data of the true population

First of all, we generate the population data based on Cox PH model (5.1) and the afore mentioned assumptions. Set the values of  and p. The algorithm is summarized below:

 Step 1: Generate Zi Bernoulli( )p ;

6.1.2 Case-control data from the true population collected to match a case, he/she will not be picked up again to match another case individual.

6.2 Analysis on Individual Data

Now we want to know whether the case-control sampling procedure produces reliable

Then we analyze the case-control data by solving the MLE of  and then check whether

ˆ is close to the true value of. The results are summarized in Table 6.1. We obtain that

when Q is close to exp

TZ

, the estimation of ˆ is also closer to our true value.

Table 6.1: Analysis of age-onset data

6.3 Data Generation for Familial Data

6.3.1 Familial prospective data of the true population

We now generate familial data following the Clayton model of the form in (5.10). We

 Step 6: Generate

T Tpi, ri

: 6.3.2 Case-control data from the true population

Suppose that we generate nN families from the population with n1n/ 2 families from the case families with

Xpi,pi1

. Then we match n0  n1 n/ 2 control families:

Xpi,pi0

to the case families.

The procedure is stated as follows.

 Step 1: Randomly select n probands from the case families and record their values of 1

pi

Z and data on his/her relative:

Zri,

Xri,ri

 

;

 Step 2: Each case proband is matched with a control proband from the control group with

1 0 0.1

XX  , where X and 1 X are observed times for a case proband and a control 0 proband respectively. And also record their values of

ri

Z and data on his/her relative:

 

Zri, Xri,ri

.

6.4 Analysis on Familial Data

We analyze the simulated familial case-control data by calculating the MLE of the parameters:  and  based on the likelihood function (5.13) and then check whether ˆ and ˆ are close to their true value. We also check whether the probands’ data can achieve equation (5.4). These results are summarized in Table 6.2. We observe that when Q is close

to exp

TZ

, the estimation of ˆ is also close to the true value, that means the conditional likelihood function can represent the first part of our decomposition of the likelihood function (5.13).

Table 6.2: Analysis of familial age-onset data based on case-control studies

1 0

10000, 100, Replications 100

Nnn  

0.5, 0.5

   

 

exp

Q 

 

 ˆ 103 SE of ˆ

 

 ˆ  103 SE of ˆ

0.3

p 0.076737 23.221662 0.008890 1.859457 0.004787

0.5

p 0.023449 10.732187 0.009518 5.252649 0.004348

0.7

p 0.018496 6.228642 0.009259 12.534723 0.004334

Chapter 7 Concluding Remarks

In the thesis, we study and review the literature on two regression models: logistic regression model and Cox proportional hazards model based on the prospective design and the case-control design to analyze the familial disease incidence data and familial age-onset data respectively. For inference, it has been shown that the data from a case-control design can be analyzed as if it is from a prospective design if some crucial properties like (3.7), (5.4) and reproducible properties hold. We perform simulations to examine these properties and check the properties of the parameter estimates. It allows us to see how the quality of generated data affects subsequent inference results.

The variables that we are interested in include disease incidence and age-onset data. For age-onset data, we treat an individual without the disease as being censored. But, these censored individuals may be actually censored or be non-susceptible. In such a situation, cure model can be adopted. This topic deserves future investigation.

References

[1] Bahadur, R. R. (1961). A Representation of the Joint Distribution of Responses to n Dichotomous Outcomes. In Studies in Item Analysis and Prediction, Stanford Mathematical Studies in the Social Sciences IV, Ed. H. Solomon, pp.158-168. Stanford University Press.

[2] Breslow, N. E. and Day N. E. (1980). Statistical Methods in Cancer Research, Vol. 1.

Lyon: IARC Scientific Publication No. 32.

[3] Clayton, D. G. (1978). A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika 65, 141-151.

[4] Li, H., Yang, P. and Schwartz, A. G. (1998). Analysis of Onset Data from Case-Control Family Studies. Biometrics 54, 1030-1039.

[5] Liang, K.-Y. (1987) Extended Mantel-Haenszel Estimating Procedure for Multivariate Logistic Regression Models. Biometrics 43, 289-299.

[6] Liang, K.-Y. and Beaty, T. H. (2000). Statistical Designs for Familial Aggregation.

Statistical Methods in Medical Research 9, 543-562.

[7] Prentice, R. L. and Breslow, N. E. (1978). Retrospective Studies and Failure Time Models. Biometrika 65, 153-158.

[8] Prentice, R. L. and CAi, J. (1992). Covariance and Survivor Function Estimation Using Censored Multivariate Failure Time Data. Biometrika 79, 495-512.

[9] Prentice, R. L. and Pyke, R. (1979). Logistic Disease Incidence Models and Case-Control Studies. Biometrika 66, 403-411.

[10] Shih, J. H. and Chatterjee, N. (2002). Analysis of Survival Data from Case-Control Family Studies. Biometrics 58, 502-509.

[11] Sturmer, T. and Brenner, H. (2000) Potential Gain in Efficiency and Power to Detect Gene-Environment Interactions by Matching in Case-Control Studies. Genetic

Epidemiology 18, 63-80.

[12] Whittemore, A. S. (1995). Logistic Regression of Family Data from Case-Control Studies. Biometrics 82, 55-67.

相關文件