5 Regression Analysis Based on Familial Data
5.2 Likelihood Analysis Based on Familial data
Table 5.2 summarizes observed case-control familial data in which probands’ times (onset or censored) are matched. Specifically at time t , we sample i m t case probands
iand their relatives and matched with n t control probands and their relatives. Denote
i( p, r)
X X X as observed times and ( p, r) as the corresponding indicators for a proband and his/her relative respectively.
Time
Table 5.2 Age-matched case-control familial data
To simply the presentation, assume that there are two members in one family (one age-onset data. The model assumption consists of two stages. In the first stage, the model on the proband namely Pr(Zp|
Xp,p
), is assumed to follow the Cox model. In the second stage, the model on Pr(
Xr,r
|
Xp,p
, )Z is constructed where Z (Zp,Zr). It shouldbe mentioned that a crucial reproducible assumption to simply the analysis is given by
* * * * *
Pr{(X , ) |Z Zp, r)Pr{(X , ) |Z ) (* = p,r). (5.8) This assumption only holds when covariates Z within a family do not depend on any unmeasured variables responsible for the correlation of age-onset. One can write
Pr( Xr,r ,Z Zr, p| Xp,p ) Pr(
Xr,r
,Zr |
Xp,p
,Zp) P(Zp |
Xp,p
). Note that
Pr( Xr,r, Zr| Xp,p ,Zp)Pr( Xr,r | Xp,p ,Z Zp, r) Pr(Zr | Xp,p ,Zp). By (5.8), we know that, given Z , p
Xp,p
and Z are independent. So, we have r
Pr(Zr | Xp,p ,Zp)Pr(Zr|Zp).
Therefore, formulated the likelihood function for familial case-control data can be based on the following decompositions:
Pr(Zp | Xp,p ) Pr( Xr,r | Xp,p , ) Pr(Z Zr |Zp). (5.9) The first component of (5.9) can be analyzed based on probands’ data which can be performed in the first stage. Recall the under the Cox model assumption discussed in Section 5.1, the form of Pr(Zp|
Xp,p
) can be specified and equation (5.7) can be applied to estimate . The third component Pr(Zr|Zp) in (5.9) can be treated as a constant.To specify the form of Pr(
Xr,r
|
Xp,p
, )Z in (5.9), Li et al. (1998) adopted the Clayton model (Clayton, 1978) which is the most popular assumption for bivariate failure-time data. Recall that Tp and T represent the failure times of a proband and his/her r relative respectively. When (T Tp, r) follows the Clayton model , the joint survival function can be written as
individual if his/her relative has the disease rather than being disease free at a given age. Notethat 1
1
1 1 11p p r r
S x S x
. Accordingly we obtain
where D is the number of individuals with disease among the proband and the relative. Let
retrospective likelihood function at the time t is given as: i
Chapter 6 Simulations for Analysis of Age-onset Data from Case-control
6.1 Data Generation for Individual Data 6.1.1 Prospective data of the true population
First of all, we generate the population data based on Cox PH model (5.1) and the afore mentioned assumptions. Set the values of and p. The algorithm is summarized below:
Step 1: Generate Zi Bernoulli( )p ;
6.1.2 Case-control data from the true population collected to match a case, he/she will not be picked up again to match another case individual.
6.2 Analysis on Individual Data
Now we want to know whether the case-control sampling procedure produces reliable
Then we analyze the case-control data by solving the MLE of and then check whether
ˆ is close to the true value of. The results are summarized in Table 6.1. We obtain that
when Q is close to exp
TZ
, the estimation of ˆ is also closer to our true value.Table 6.1: Analysis of age-onset data
6.3 Data Generation for Familial Data
6.3.1 Familial prospective data of the true population
We now generate familial data following the Clayton model of the form in (5.10). We
Step 6: Generate
T Tpi, ri
: 6.3.2 Case-control data from the true populationSuppose that we generate nN families from the population with n1n/ 2 families from the case families with
Xpi,pi 1
. Then we match n0 n1 n/ 2 control families:
Xpi,pi 0
to the case families.The procedure is stated as follows.
Step 1: Randomly select n probands from the case families and record their values of 1
pi
Z and data on his/her relative:
Zri,
Xri,ri
; Step 2: Each case proband is matched with a control proband from the control group with
1 0 0.1
X X , where X and 1 X are observed times for a case proband and a control 0 proband respectively. And also record their values of
ri
Z and data on his/her relative:
Zri, Xri,ri
.6.4 Analysis on Familial Data
We analyze the simulated familial case-control data by calculating the MLE of the parameters: and based on the likelihood function (5.13) and then check whether ˆ and ˆ are close to their true value. We also check whether the probands’ data can achieve equation (5.4). These results are summarized in Table 6.2. We observe that when Q is close
to exp
TZ
, the estimation of ˆ is also close to the true value, that means the conditional likelihood function can represent the first part of our decomposition of the likelihood function (5.13).Table 6.2: Analysis of familial age-onset data based on case-control studies
1 0
10000, 100, Replications 100
N n n
0.5, 0.5
exp
Q
ˆ 103 SE of ˆ
ˆ 103 SE of ˆ0.3
p 0.076737 23.221662 0.008890 1.859457 0.004787
0.5
p 0.023449 10.732187 0.009518 5.252649 0.004348
0.7
p 0.018496 6.228642 0.009259 12.534723 0.004334
Chapter 7 Concluding Remarks
In the thesis, we study and review the literature on two regression models: logistic regression model and Cox proportional hazards model based on the prospective design and the case-control design to analyze the familial disease incidence data and familial age-onset data respectively. For inference, it has been shown that the data from a case-control design can be analyzed as if it is from a prospective design if some crucial properties like (3.7), (5.4) and reproducible properties hold. We perform simulations to examine these properties and check the properties of the parameter estimates. It allows us to see how the quality of generated data affects subsequent inference results.
The variables that we are interested in include disease incidence and age-onset data. For age-onset data, we treat an individual without the disease as being censored. But, these censored individuals may be actually censored or be non-susceptible. In such a situation, cure model can be adopted. This topic deserves future investigation.
References
[1] Bahadur, R. R. (1961). A Representation of the Joint Distribution of Responses to n Dichotomous Outcomes. In Studies in Item Analysis and Prediction, Stanford Mathematical Studies in the Social Sciences IV, Ed. H. Solomon, pp.158-168. Stanford University Press.
[2] Breslow, N. E. and Day N. E. (1980). Statistical Methods in Cancer Research, Vol. 1.
Lyon: IARC Scientific Publication No. 32.
[3] Clayton, D. G. (1978). A Model for Association in Bivariate Life Tables and Its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence. Biometrika 65, 141-151.
[4] Li, H., Yang, P. and Schwartz, A. G. (1998). Analysis of Onset Data from Case-Control Family Studies. Biometrics 54, 1030-1039.
[5] Liang, K.-Y. (1987) Extended Mantel-Haenszel Estimating Procedure for Multivariate Logistic Regression Models. Biometrics 43, 289-299.
[6] Liang, K.-Y. and Beaty, T. H. (2000). Statistical Designs for Familial Aggregation.
Statistical Methods in Medical Research 9, 543-562.
[7] Prentice, R. L. and Breslow, N. E. (1978). Retrospective Studies and Failure Time Models. Biometrika 65, 153-158.
[8] Prentice, R. L. and CAi, J. (1992). Covariance and Survivor Function Estimation Using Censored Multivariate Failure Time Data. Biometrika 79, 495-512.
[9] Prentice, R. L. and Pyke, R. (1979). Logistic Disease Incidence Models and Case-Control Studies. Biometrika 66, 403-411.
[10] Shih, J. H. and Chatterjee, N. (2002). Analysis of Survival Data from Case-Control Family Studies. Biometrics 58, 502-509.
[11] Sturmer, T. and Brenner, H. (2000) Potential Gain in Efficiency and Power to Detect Gene-Environment Interactions by Matching in Case-Control Studies. Genetic
Epidemiology 18, 63-80.
[12] Whittemore, A. S. (1995). Logistic Regression of Family Data from Case-Control Studies. Biometrics 82, 55-67.