• 沒有找到結果。

The Proposed Inference Procedure for the Association Model

Chapter 4. Regression Analysis for Association Based on Three Types of Data

4.4. The Proposed Inference Procedure for the Association Model

We aim to develop a regression model that describes the effect of covariate on the dependence structure. We also want to propose a unified inference approach which can handle the three data structures discussed in Section 2.2.

To achieve the second objective, we need to find an appropriate dependence measure for each data structure. In Section 4.4.1, a flexible way of model formulation is presented.

In Section 4.4.2, we describe the proposed regression model and in Section 4.4.3, we include external censoring in the three data structures. The proposed inference method is discussed in Section 4.4.4. In Section 4.4.5, we present a model checking method for Clayton assumption. We also present the numerical analysis for the proposed inference methods in Section 4.4.6.

4.4.1. Model Formulation

Most methods developed for typical bivariate survival data analyze the joint survival function

Fa(s, t) = Pr(T1 > s, T2 > t),

which seems to be a straightforward extension from the univariate analysis. Mathemat-ically the joint behavior between T1 and T2 can also be described other functions such as

Fb(s, t) = Pr(T1 ≤ s, T2 > t)

Fc(s, t) = Pr(T1 ≤ s, T2 ≤ t)

Fd(s, t) = Pr(T1 > s, T2 ≤ t).

Choosing an appropriate function for further analysis depends not only on its interpre-tation but also the mathematical applicability. Note that θ(s, t) in (5) is defined based on the joint survival function Fa(s, t). Now we denote θ(s, t) = θa(s, t). Oakes (1989) derived another expression of θa(s, t) which is useful for further extensions and statistical inference. Specifically one can write

θa(s, t) = Pr(∆ij = 1| ˜T1,ij = s, ˜T2,ij = t) Pr(∆ij = 0| ˜T1,ij = s, ˜T2,ij = t),

where ∆ij = I[(T1i− T1j)(T2i− T2j) > 0] and ˜Tk,ij = Tki∧ Tkj (k = 1, 2). Based on this representation, θa(s, t) can be viewed as the odds ratio of concordance given the corner value ( ˜T1,ij, ˜T2,ij) = (s, t).

We can extend the ideas of θa(s, t) as follows. The two pairs, (T1i, T2i) and (T1j, T2j), can form four grid points (a, b, c, d) shown in Figure 4-1 with a = ( ˜T1,ij, ˜T2,ij), b = ( ˘T1,ij, ˜T2,ij), c = ( ˘T1,ij, ˘T2,ij), and d = ( ˜T1,ij, ˘T2,ij), where ˜Tk,ij = Tki∧Tkj and ˘Tk,ij = Tki Tkj (k = 1, 2). Selecting a different corner point gives a new odds ratio of concordance.

Define

θ(s, t) = Pr(∆ij = 1|corner = ∗)

Pr(∆ij = 0|corner = ∗) (29)

= F(s, t)DsDtF(s, t)

{DsF(s, t)}{DtF(s, t)} if ∗ = a, c

= {DsF(s, t)}{DtF(s, t)}

F(s, t)DsDtF(s, t) if ∗ = b, d. (30) When T1 and T2 are independent, θ(s, t) = 1. In general θ(s, t) describes local de-pendence at (s, t) such that θ(s, t) > 1 indicates positive association and θ(s, t) < 1 indicates negative association. From the above derivations, we see that imposing a struc-ture on a version of θ(s, t) is associated with model specification on the corresponding joint function F(s, t).

T1

T2

T1

T2

a b

d c

a b

d c

(a) (b)

Figure 4-1: Four grid points formed by two pairs of observation.

(a): (i, j) pairs are concordant; (b): (i, j) pairs are discordant.

The two representations of θ(s, t) in (29) and (30) provide useful insight for further statistical inference. Specifically the first identity is equivalent to

Pr(∆ij = 1|corner ∗ = (s, t)) = θ(s, t)

1 + θ(s, t) (∗ = a, b, c, d).

This implies that the inference of θ(s, t) can be made by applying the method of moment based on data replications of ∆ij. This approach has been adopted by Fine (2001) based on θa(s, t) for semi-competing risks data and by Chaieb et al. (2006) based on θb(s, t) for truncation data. The second identity of θ(s, t) suggests that one can construct the log-rank type of statistics based on a series of two-by-two tables. Figure 4-2 shows four versions of the table construction. This approach has been taken by Day et al. (1997) and Wang (2003) for analyzing semi-competing risks data based on the table in Figure 4-2(a) and by Emura, Wang and Hung (2006) based on Figure 4-2(b) for analyzing truncation data.

To choose an appropriate version of θ(s, t), one should examine whether the biolog-ical meaning is reasonable as well as whether the data provide enough information for the purpose of inference. As mentioned earlier, most existing methods specify the model assumption based on Fa(s, t) which has a direct relationship with θa(s, t). For the first

(c)

Figure 4-2: Four versions of table construction.

data structure, it seems that all four versions are mathematically suitable and corner a is usually adopted due to its straightforward interpretation. For semi-competing risks data, θb(s, t) is an obvious suitable candidate. However θa(s, t) is still appropriate since, from the second identity, the information provided by (T1∧ T2, T2, I(T1 ≤ T2)) is enough for recovering every component of θa(s, t) for (s, t) ∈ R2. However for truncation data, only θb(s, t) is suitable. Note that for modeling dependent truncation data, Chaieb et al (2006) proposes a related measure

θ(s, t) =ˇ Fb(s, t)DsDtFb(s, t)

{DsFb(s, t)}{DtFb(s, t)} = 1/θb(s, t).

Setting θb(s, t) = α is equivalent to assuming the Clayton copula on Fb(s, t):

Pr(T1 ≤ s, T2 > t) = {Pr(T1 ≤ s)1−α+ Pr(T2 > t)1−α− 1}1/(1−α).

4.4.2. The Proposed Regression Model

Consider the situation that the levels of association between (T1, T2) vary in different covariate groups. Let Z = (1, Z1, ..., Zp)T be a vector of common discrete covariates for (T1, T2). We assume that for a chosen corner point ∗ (∗ = a, b, c, d),

θZ(s, t) = Pr(∆ij = 1|corner ∗ = (s, t), Zi = Zj = Z) Pr(∆ij = 0|corner ∗ = (s, t), Zi = Zj = Z)

= exp(ZTβ), (31)

where β = (β0, β1, ..., βp)T. Equivalently the above model assumes that Pr(∆ij = 1|corner ∗ = (s, t), Zi = Zj = Z) = exp(ZTβ)

1 + exp(ZTβ) ≡ η(ZTβ). (32) Note that β0 is the log odds of concordance for the baseline group with Z1 = Z2 = ... = Zp = 0. The slope parameter βk (k = 1, 2, ..., p) can be viewed as the difference of log odds by increasing one unit of Zk with the rest of Zi0s being fixed. The above model assumption is equivalent to assuming that (T1, T2)|Z follows Clayton’s model with

F(s, t|Z) = {F1,∗(s|Z)1−exp(ZTβ)+ F2,∗(t|Z)1−exp(ZTβ)− 1}1/(1−exp(ZTβ)), (33) where F1,∗(s|Z) = Pr(T1 > s|Z) for ∗ = a, d; F1,∗(s|Z) = Pr(T1 ≤ s|Z) for ∗ = b, c, F2,∗(t|Z) = Pr(T2 > t|Z) for ∗ = a, b; F2,∗(t|Z) = Pr(T2 ≤ t|Z) for ∗ = c, d. The main purpose is to estimate β. From previous discussions, we consider the model θaZ(s, t) = exp(ZTβ) for typical bivariate failure-time data and semi-competing risks data. For the truncation data, we consider the model θZb(s, t) = exp(ZTβ).

4.4.3. Three Data Structures with External Censoring

Now we incorporate external censoring. To simplify the presentation, we may use the same notation with different definitions under different data structures. We also discuss the condition under which the value of ∆ij is certain for pair (i, j).

Data Structure 1: Typical bivariate data subject to right censoring

Assume that (T1, T2) is subject to independent censoring by (C1, C2) such that observed variables become X = T1∧C1, Y = T2∧C2, δ1 = I(T1 ≤ C1), and δ2 = I(T2 ≤ C2). Based on observed variables (Xi, δ1i) and (Xj, δ1j), to know the order of T1iand T1j, the smaller variable has to be uncensored. Similar properties can be applied for determination of the order of T2i and T2j based on (Yi, δ2i) and (Yj, δ2j). Formally define ˜Tk,ij = Tki∧ Tkj and ˜Ck,ij = Cki ∧ Ckj (k = 1, 2). As long as ˜T1,ij < ˜C1,ij and ˜T2,ij < ˜C2,ij, the value of

ij is known for sure which means that the (i, j) pair is orderable on the plane.

Data Structure 2: Semi-competing risks data subject to censoring

If often happens that (T1, T2) are subject to a common external censoring variable C.

Observed variables are denoted as X = T1∧ T2 ∧ C, Y = T2∧ C, δ1 = I(T1 ≤ T2∧ C), and δ2 = I(T2 ≤ C). Applying previous arguments, the order of T1i and T1j can be known as long as ˜T1,ij < ˜T2,ij and ˜T1,ij < ˜Cij, where ˜Cij = Ci ∧ Cj. The order of T2i and T2j can be known as long as ˜T2,ij < ˜Cij. Combining both conditions, the orderable condition for semi-competing risks data can be defined as ˜T1,ij < ˜T2,ij < ˜Cij.

Data Structure 3: Truncation data subject to censoring

Recall that T2 is subject to left-truncation by T1 or T1 is subject to right-truncation by T2 so that (T1, T2) can be observed only if T1 < T2. Now we assume that T2 is subject to right censoring by C. Hence, the observed variables become X = T1, Y = T2 ∧ C and δ2 = I(T2 ≤ C). We can set δ1 = 1 which means that T1 is always uncensored. The order of T2i and T2j can be known as long as ˜T2,ij < ˜Cij.

In absence of covariates, observed variables can be denoted as (X, Y, δ1, δ2) for the three data structures. For each data type, statisticians have developed inference proce-dures for investigating the dependent relationship between T1 and T2 based on a random sample of (X, Y, δ1, δ2). There are two approaches which turn out to be applicable to all the three data structures. The first approach utilizes the moment condition of ∆ij in (29). The second approach is developed via constructing a series of two-by-two tables

based on equation (30). This paper extends the first approach to a regression setting in which the dimension of regression parameters may exceed 1 and hence the method of moment is not directly applicable.

4.4.4. The Proposed Inference Procedure

Let (T1i, T2i, Zi) (i = 1, 2, ..., n) be a random sample following the model assumption in (31) or its equivalent versions in (32) or (33). Note that (T1, T2) may be any of the three data types introduced earlier. In presence of external censoring, the observed data are denoted as (Xi, Yi, δ1i, δ2i, Zi) (i = 1, 2, ..., n) which are random replications of (X, Y, δ1, δ2, Z) described in Section 4.3.3.

When the covariates are discrete, we can partition the sample according to distinct values of Z. For a pair of observations in each sub-sample, they need to satisfy two criterion in order to be used in the analysis. Specifically we select a pair (i, j) with Zi = Zj = z such that the corresponding value of ∆ij is known and the chosen corner value is located in the model region. For typical bivariate data and semi-competing risks data, we choose ∗ = a. For truncation data, we set ∗ = b. Since the first type of data falls in R1, we don’t have to impose any restriction. For semi-competing risks data, the restriction for making corner a to fall in R2 is ˜T1,ij ≤ ˜T2,ij. For truncation data, we should set ˘T1,ij < ˜T2,ij for making corner b to fall in R2. Let Dij(z) be the orderable indicator that shows whether pair (i, j) with Zi = Zj = z can be selected in the analysis. For the three types of data structure, Dij(z) is defined as follows. For typical bivariate data, Dij(z) = I( ˜T1,ij < ˜C1,ij, ˜T2,ij < ˜C2,ij, Zi = Zj = z); for semi-competing risks data, Dij(z) = I( ˜T1,ij < ˜T2,ij < ˜Cij, Zi = Zj = z) and, for truncation data, Dij(z) = I( ˘T1,ij < ˜T2,ij < ˜Cij, Zi = Zj = z).

Now we discuss estimation of β. Since the dimension of β usually exceeds 1, we can not directly apply the method of moment based on equation (32) as in existing methods developed for homogeneous data (Fine et al., 2001; Chaieb et al., 2006). Instead, we

apply the least-squares principle to minimize depends on the data type which is given below. For typical bivariate data under right censoring, ˜Xij = Xi ∧ Xj and ˜Yij = Yi ∧ Yj; for semi-competing risks data, ˜Xij = T˜1,ij∧ ˜T2,ij∧ ˜Cij and ˜Yij = ˜T2,ij∧ ˜Cij; and, for truncation data, ˜Xij = ˘T1,ij and ˜Yij = Yi∧Yj. The proposed estimator, denoted as ˆβ, is the one such that U(β) is minimized which can be obtained by solving u(β) = 0, where u(β) = ∂U (β)/∂β. In the simulations, we will evaluate the weight function of the form,

Wz,a,b(x, y) = nz

Pn

i=1I{Xi ≥ min(a, x), Yi ≥ min(b, y), Zi = z},

where nz is the sample size of Z = z; a and b are constants. With a = b = 0, the function reduces to 1 which is the un-weighted case. With a = b = ∞, the weight function becomes nz/Pn

i=1I{Xi ≥ x, Yi ≥ y, Zi = z}. The following theorem provides the asymptotic properties of ˆβ. The proof of Theorem 3 is provided in Appendix 5.

Theorem 3: Let ˆβ denote the solution minimizing (34). We make the following regularity assumptions: (a) The list of possible covariate values is Z = {z1, ..., zK} which spans a non-degenerate linear space. That is, the dimensionaltiy of linear space spanned by Z equals p, the dimensionality of β. (b) nzk/n converge to constants 0 < ck < 1 for k = 1, ..., K. (c) The weight function Wz(u, v) has a uniform bounded limit ˜Wz(u, v).

That is, supz,u,v|Wz(u, v) − ˜Wz(u, v)| → 0 in probability, where ˜Wz is deterministic and bounded for (u, v) in the support of ( ˜Xij, ˜Yij). Let β be the true value of β. Then ˆβ is a consistent estimator and

n( ˆβ − β) converges in distribution to a multivariate normal distribution with variance Σ which is consistently estimated by ˆΣ = ˆI−1J( ˆˆI−1)0, where

Jˆij = n−3X

z

[2 X

k<l<m

( ˆQ(i)klzQˆ(j)kmz+ ˆQ(i)klzQˆ(j)lmz+ ˆQ(i)lmzQˆ(j)kmz) +X

k<l

( ˆQ(i)klzQˆ(j)klz)],

Qˆ(k)ijz = 2Wz( ˜Xij, ˜Yij)Dij(z)[∆ij − η(z0β)](−ˆ exp( ˆβ0+ ... + ˆβpZp)Zk (1 + exp( ˆβ0+ ... + ˆβpZp))2).

4.4.5. Checking the Clayton Assumption

Shih (1998) proposed a testing procedure to verify the Clayton assumption for typical bivariate right censored data. The test statistic is expressed as the difference of two estimators of the association parameter which converges to zero when the Clayton as-sumption holds but converges to a non-zero value when the model asas-sumption is violated.

This idea has been applied to semi-competing risks data by Fine et al. (2001). Now under the current regressing setting, we develop a unified approach of model checking which can handle the three data structures. Note that our result is the first application in the literature which can deal with dependent truncation data.

Let U1(β) and U2(β) follow the same form as U(β) with the weight function Wz being specified as Wz,1 and Wz,2 respectively. The following weight functions are suggested.

For typical bivariate right censored data and semi-competing risks data, we can set Wz,1(x, y) = 1, Wz,2(x, y) = nz

Pn

i=1I(Xi ≥ x, Yi ≥ y, Zi = z), and for truncation data,

Wz,1(x, y) = 1, Wz,2(x, y) = Pn nz

i=1I(Xi ≤ x, Yi ≥ y, Zi = z).

Let ˆβWz,i be the solution to ui(β) = 0 (i = 1, 2). In principle, different weight functions can also be applied and the choice would affect the power of the corresponding test.

Shih (1998) suggested to choose two weight functions such that, under the assumed model, one produces a more efficient estimator while the other results in a less efficient estimator.

The proposed test statistic can be expressed as

T = n( ˆβWz,1− ˆβWz,2)0Γˆ−1( ˆβWz,1− ˆβWz,2), where ˆΓ =

³Γˆij

´

(p+1)×(p+1), Γˆij = n−3X

z

[2 X

k<l<m

( ˆQ∗(i)klzQˆ∗(j)kmz+ ˆQ∗(i)klzQˆ∗(j)lmz + ˆQ∗(i)lmzQˆ∗(j)kmz) +X

k<l

( ˆQ∗(i)klzQˆ∗(j)klz )],

and ˆQ∗(i)klz is defined in Appendix. In Appendix 6, we show that when the regression assumption θZ(s, t) = exp(Z0β) holds, T converges in distribution to χ2p+1. That is, for a γ-level test, we reject the null hypothesis if T > χ2p+1,γ, where Pr(χ2p+1> χ2p+1,γ) = γ.

4.4.6. Numerical Analysis 4.4.6.1. Simulations Results

We performed simulations to assess finite-sample performances of the proposed methods.

Three regression settings were examined: case 1 (two groups): θZ(s, t) = exp(β01Z1), where Z = (1, 0)0 or (1, 1)0; case 2 (three groups): θZ(s, t) = exp(β0 + β1Z1), where Z = (1, 0)0, (1, 1)0 or (1, 2)0 and case 3 (three groups): θZ(s, t) = exp(β0+ β1Z1+ β2Z2), where Z = (1, 0, 0)0, (1, 1, 0)0 or (1, 0, 1)0. The values of parameters were set as follows.

For case 1 and case 2, (β0, β1) = (0.5, 0.5) and (1, 1); and for case 3, (β0, β1, β2) = (0.5, 0.5, 0.5) and (1, 1, 1). For each group, we generated (T1, T2) which follow model (32). For typical bivariate data and semi-competing risks data, we set ∗ = a and, for truncation data, we set ∗ = b. The marginal distributions were generated from T1 ∼ exp(0.8) and T2 ∼ exp(1). Right censoring is incorporated in the three data structures. For all the cases, the censoring distribution was generated from U(0, 6). For bivariate censored data, we set C1 to be independent of C2 and the censoring proportion of Tj (j = 1, 2) is around 0.15. For semi-competing risks data, the censoring rate for T1 which is subject to dependent censoring by T2 varies from 0.35 (τ = 0.76) to 0.48 (τ = 0.25). For truncation data, the missing proportion Pr(T1 > T2) and the censoring

rate Pr(Tj > C|T1 ≤ T2) (j = 1, 2) vary with τ . When τ = 0.25, Pr(T1 > T2) ≈ 0.43, Pr(T1 > C|T1 ≤ T2) ≈ 0.09 and Pr(T2 > C|T1 ≤ T2) ≈ 0.23. When τ = 0.76, Pr(T1 > T2) ≈ 0.27, Pr(T1 > C|T1 ≤ T2) ≈ 0.11 and Pr(T2 > C|T1 ≤ T2) ≈ 0.18.

The sample size was chosen to be 150 and 300. Two weight functions with (a, b) = (0, 0) and (a, b) = (∞, ∞) were evaluated. The results for the three regression settings are summarized in Table 4-1 ∼ 4-3 respectively. Based on 1000 replications, we computed P1000

B=1βˆi(B)/1000 − βi (bias), and the empirical standard deviation of ˆβiσi) and the estimated standard deviationp

n−1Σˆii( ˆσi), and the coverage probability of the nominal 0.95 confidence interval for ˆβi (Cov95).

In all the cases, the proposed estimator ˆβ performs well and the variance estimator ˆ

σ produces confidence intervals with reasonable coverage probabilities. For typical bi-variate right-censored data and semi-competing risks data, the estimator with weight function (a, b) = (∞, ∞) performs better than (a, b) = (0, 0) but, for truncation data, we get the opposite conclusion since there is no information in the wedge T1 > T2. Therefore, we evaluated another weight function,

Wz(x, y) = nz/ Xn

i=1

I{Xi ≤ x, Yi ≥ y, Zi = z}, (35)

and the results are presented in Table 4-4. We see that the new weight function does improve the performances.

Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

Data (a, b) β0 n = 150 n = 300

β1

Data 1 (0, 0) 0.5 0.0065 0.1855 0.1821 0.957 0.0015 0.1259 0.1283 0.945 0.5 -0.0025 0.2619 0.2593 0.948 -0.0007 0.1935 0.1835 0.953

1 0.0117 0.1960 0.1856 0.957 0.0059 0.1332 0.1308 0.942 1 0.0085 0.2858 0.2693 0.954 0.0067 0.1841 0.1890 0.956 (∞, ∞) 0.5 0.0066 0.1751 0.1656 0.947 -0.0003 0.1127 0.1150 0.951

0.5 0.0050 0.2578 0.2400 0.951 0.0096 0.1669 0.1677 0.945 1 0.0083 0.1787 0.1728 0.953 0.0054 0.1278 0.1222 0.946 1 -0.0079 0.2751 0.2553 0.942 0.0032 0.1862 0.1775 0.949 Data 2 (0, 0) 0.5 -0.0003 0.2343 0.2259 0.954 -0.0015 0.1600 0.1586 0.949

0.5 -0.0094 0.3372 0.3137 0.952 0.0020 0.2229 0.2208 0.954 1 0.0106 0.2282 0.2167 0.945 -0.0070 0.1592 0.1540 0.954 1 -0.0093 0.3030 0.3006 0.955 0.0016 0.2132 0.2121 0.950 (∞, ∞) 0.5 0.0071 0.1996 0.1959 0.953 0.0051 0.1399 0.1382 0.950 0.5 0.0073 0.2760 0.2793 0.956 -0.0028 0.1923 0.1956 0.948

1 0.0034 0.1975 0.1971 0.941 0.0026 0.1346 0.1386 0.946 1 0.0045 0.2889 0.2838 0.947 -0.0011 0.1927 0.1956 0.944 Data 3 (0, 0) 0.5 -0.0015 0.1518 0.1469 0.959 0.0033 0.0999 0.1012 0.956

0.5 0.0078 0.2341 0.2236 0.952 -0.0013 0.1565 0.1540 0.961 1 0.0060 0.1745 0.1691 0.950 -0.0034 0.1180 0.1160 0.945 1 -0.0041 0.3370 0.3205 0.951 0.0023 0.2192 0.2189 0.955 (∞, ∞) 0.5 0.0134 0.2330 0.2155 0.952 0.0065 0.1664 0.1641 0.951 0.5 -0.0012 0.3456 0.3259 0.956 -0.0055 0.2508 0.2372 0.953

1 0.0216 0.2667 0.2437 0.950 0.0091 0.1923 0.1766 0.950 1 0.0160 0.4942 0.4362 0.942 0.0026 0.3575 0.3125 0.945 Table 4-1: Simulation results for case 1. Data 1: Typical bivariate right-censored data;

Data 2: Semi-competing risks data; Data 3: Truncation data.

Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

Data (a, b) β0 n = 150 n = 300

β1

Data 1 (0, 0) 0.5 0.0036 0.2096 0.2076 0.947 0.0084 0.1456 0.1459 0.946 0.5 0.0061 0.1761 0.1711 0.955 -0.0005 0.1187 0.1192 0.949

1 0.0137 0.2228 0.2197 0.954 0.0079 0.1649 0.1556 0.958 1 0.0095 0.2308 0.2311 0.953 0.0025 0.1709 0.1597 0.953 (∞, ∞) 0.5 0.0060 0.1974 0.1895 0.952 0.0086 0.1353 0.1330 0.950 0.5 0.0046 0.1656 0.1606 0.950 -0.0007 0.1131 0.1109 0.950

1 0.0162 0.2071 0.2083 0.960 0.0053 0.1489 0.1459 0.949 1 -0.0030 0.2271 0.2267 0.960 0.0049 0.1578 0.1543 0.960 Data 2 (0, 0) 0.5 -0.0023 0.2612 0.2531 0.941 -0.0036 0.1868 0.1794 0.951

0.5 0.0102 0.2115 0.2035 0.945 0.0035 0.1418 0.1407 0.948 1 0.0091 0.2832 0.2589 0.946 -0.0067 0.1865 0.1819 0.947 1 0.0077 0.2665 0.2543 0.950 0.0076 0.1838 0.1741 0.948 (∞, ∞) 0.5 -0.0054 0.2315 0.2256 0.951 0.0018 0.1622 0.1565 0.951 0.5 0.0052 0.1874 0.1844 0.946 0.0058 0.1285 0.1273 0.950 1 0.0121 0.2557 0.2385 0.942 0.0068 0.1712 0.1664 0.945 1 -0.0054 0.2587 0.2466 0.948 -0.0058 0.1649 0.1643 0.945 Data 3 (0, 0) 0.5 -0.0045 0.1776 0.1742 0.956 0.0043 0.1204 0.1187 0.948

0.5 0.0070 0.1753 0.1668 0.950 -0.0028 0.1180 0.1114 0.951 1 -0.0078 0.2225 0.2079 0.943 0.0023 0.1421 0.1418 0.947 1 0.0298 0.3234 0.3207 0.948 0.0080 0.2081 0.2064 0.951 (∞, ∞) 0.5 0.0087 0.2668 0.2447 0.947 0.0037 0.1947 0.1825 0.952 0.5 0.0086 0.2560 0.2416 0.943 -0.0005 0.1833 0.1700 0.952

1 0.0138 0.3305 0.2901 0.953 0.0076 0.2313 0.2076 0.949 1 0.0562 0.4820 0.4359 0.940 0.0337 0.3192 0.2919 0.948 Table 4-2: Simulation results for case 2. Data 1: Typical bivariate right-censored data;

Data 2: Semi-competing risks data; Data 3: Truncation data.

Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

Data (a, b) β0 n = 150 n = 300

β1 β2

Data 1 (0, 0) 0.5 0.0133 0.2351 0.2237 0.945 -0.0012 0.1543 0.1572 0.955 0.5 0.0062 0.3405 0.3194 0.950 0.0089 0.2283 0.2245 0.950 0.5 -0.0068 0.3353 0.3186 0.956 0.0035 0.2232 0.2247 0.948 1 0.0155 0.2379 0.2263 0.950 0.0087 0.1715 0.1603 0.948 1 -0.0065 0.3490 0.3283 0.949 -0.0067 0.2486 0.2319 0.947 1 0.0083 0.3525 0.3268 0.954 0.0001 0.2487 0.2322 0.954 (∞, ∞) 0.5 0.0142 0.2182 0.2020 0.945 0.0062 0.1442 0.1432 0.941 0.5 0.0052 0.3104 0.2943 0.949 0.0047 0.2102 0.2076 0.953 0.5 0.0071 0.3099 0.2949 0.952 0.0044 0.2133 0.2071 0.946 1 0.0194 0.2190 0.2145 0.939 0.0047 0.1526 0.1497 0.956 1 -0.0052 0.3164 0.3178 0.959 0.0022 0.2209 0.2189 0.949 1 -0.0047 0.3247 0.3185 0.954 0.0093 0.2173 0.2193 0.961 Data 2 (0, 0) 0.5 -0.0054 0.2823 0.2780 0.952 -0.0009 0.1995 0.1944 0.949

0.5 0.0101 0.3972 0.3855 0.960 -0.0014 0.2844 0.2706 0.956 0.5 0.0027 0.3956 0.3843 0.945 0.0059 0.2700 0.2704 0.952

1 -0.0012 0.2799 0.2670 0.945 0.0070 0.1926 0.1879 0.947 1 0.0056 0.3881 0.3696 0.953 -0.0025 0.2665 0.2596 0.952 1 0.0123 0.3994 0.3708 0.953 0.0028 0.2623 0.2594 0.949 (∞, ∞) 0.5 0.0153 0.2572 0.2428 0.949 0.0057 0.1744 0.1697 0.947 0.5 -0.0017 0.3676 0.3456 0.949 0.0088 0.2565 0.2404 0.952 0.5 0.0077 0.3674 0.3451 0.946 -0.0025 0.2408 0.2417 0.947

1 0.0149 0.2604 0.2441 0.947 0.0036 0.1732 0.1718 0.961 1 0.0063 0.3724 0.3541 0.953 -0.0024 0.2489 0.2435 0.946 1 0.0092 0.3925 0.3539 0.950 0.0058 0.2544 0.2437 0.963 Data 3 (0, 0) 0.5 -0.0072 0.1863 0.1823 0.940 -0.0005 0.1253 0.1257 0.944

0.5 0.0039 0.3009 0.2807 0.949 -0.0004 0.1900 0.1910 0.959 0.5 0.0060 0.2916 0.2802 0.953 0.0065 0.1939 0.1913 0.955

1 0.0052 0.2242 0.2134 0.943 -0.0039 0.1503 0.1446 0.945 1 0.0358 0.4464 0.4092 0.942 0.0050 0.2768 0.2731 0.941 1 0.0319 0.4357 0.4054 0.948 0.0015 0.2924 0.2729 0.948 (∞, ∞) 0.5 0.0161 0.2934 0.2561 0.947 0.0022 0.2142 0.1908 0.952 0.5 0.0101 0.4348 0.3866 0.949 0.0071 0.3036 0.2834 0.945 0.5 0.0011 0.4551 0.3864 0.947 0.0032 0.3103 0.2841 0.950 1 0.0326 0.3289 0.2856 0.945 0.0064 0.2289 0.2100 0.950 1 0.0443 0.6228 0.5166 0.945 0.0171 0.4321 0.3755 0.952 1 0.0321 0.6232 0.5109 0.953 0.0145 0.4346 0.3750 0.948 Table 4-3: Simulation results for case 3. Data 1: Typical bivariate right-censored data;

Data 2: Semi-competing risks data; Data 3: Truncation data.

Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

Case β n = 150 n = 300

Case 1 0.5 -0.0036 0.1103 0.1020 0.955 -0.0038 0.0663 0.0658 0.953 0.5 0.0007 0.1632 0.1600 0.947 0.0015 0.1075 0.1021 0.954

1 -0.0017 0.1322 0.1226 0.947 0.0015 0.0802 0.0783 0.957 1 0.0085 0.2529 0.2463 0.946 -0.0071 0.1611 0.1553 0.945 Case 2 0.5 0.0073 0.1369 0.1267 0.947 -0.0043 0.0819 0.0806 0.957 0.5 0.0044 0.1311 0.1226 0.953 0.0008 0.0781 0.0768 0.956

1 -0.0085 0.1284 0.1592 0.948 -0.0019 0.1025 0.0995 0.947 1 0.0167 0.2517 0.2465 0.954 0.0055 0.1453 0.1469 0.955 Case 3 0.5 -0.0088 0.1396 0.1346 0.949 -0.0024 0.0883 0.0848 0.936

0.5 0.0033 0.2239 0.2118 0.949 0.0011 0.1389 0.1323 0.955 0.5 0.0086 0.2111 0.2109 0.956 -0.0063 0.1345 0.1319 0.951

1 -0.0030 0.1686 0.1610 0.950 -0.0038 0.1025 0.1015 0.945 1 -0.0011 0.3502 0.3273 0.945 0.0063 0.2112 0.2035 0.954 1 0.0152 0.3539 0.3313 0.947 -0.0027 0.2114 0.2022 0.945 Table 4-4: Simulation results for truncation data with weight function Wz(x, y) in (35).

4.4.6.2. Robustness in Presence of Marginal Heterogeneity

The proposed methodology requires that observations in each subgroup with the same value of Z are identically distributed. In practice, there may exist covariates which may also affect the marginal distributions. If these covariates are discrete, they can be included in the list of Z and the proposed methods are still valid but may lose some efficiency due to extra grouping. We design two regression settings to examine this effect.

In the first setting, let Zj =0 or 1 (j = 1, 2) such that Z1 affects association and Z2

affects the marginal distribution. We compare two model specifications with model 1:

θZ(s, t) = exp(β0 + β1Z1+ β2Z2) and model 2: θZ(s, t) = exp(β0 + β1Z1). The results are summarized in Table 4-5. The results of model 1 indicate that our method is valid by extra grouping. The results of model 2 show that, if the marginal heterogeneity is ignored, the estimator of the intercept term β0 is biased while the estimator of β1 still has reasonable performance. The purpose of the second setting is to evaluate the loss in efficiency due to unnecessary grouping. Let Z1 affects association and Z2 is a redundant

covariate. Table 4-6 shows that model 1 which includes Z2 produces slightly larger bias and variation.

For the simulation settings, set β0 = 0.5, β1 = 0.5 and β2 = 0, this is Z1 affects the association, Z2 doesn’t affect the association. Let (T1, T2) follows the Clayton copula and the sample size and replications are 200 and 1000, respectively. For the first setting, Z2 affects the marginal distribution. Let Z2 affect the marginal distribution by the proportional hazard relationship, i.e. Si(t|Z2) = Si0(t)exp(ηZ2) (i = 1, 2), where Si0(t) is the baseline survival function, Si(t|Z2) = Pr(Ti > t|Z2) and set η=0.5 and 1. For the baseline survival function, Si0(t), the distribution of T1 follows exp(0.8) and the distribution of T2 follows exp(1). For the second setting, Z2 is a redundant covariate.

Let T1 follow exp(0.8) and T2 follow exp(1). The censoring variables are generated from U(0, 6). For typical bivariate data, we set C1 to be independent of C2.

When there exist continuous covariates that affect the marginal distributions, we can group them into distinct classes. On the other hand, if the marginal covariate effect is ignored, we may suspect that the proposed method can estimate the slope parameters more accurately than the intercept parameter. A possible solution is to mimic the idea of Fine et al. (2000) who construct an association model based on error terms in which is the marginal effects have been removed. Since this is not a straightforward extension for more complicated data structures, we will leave it as future work.

Model 1: with grouping Model 2: without grouping Data PH effect β0 Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

β1 β2

Data 1 η = 0.5 0.5 0.0010 0.2059 0.1918 0.950 0.0384 0.1640 0.1533 0.949 0.5 0.0089 0.2381 0.2188 0.954 -0.0098 0.2322 0.2200 0.952

0 0.0004 0.2339 0.2265 0.954

η = 1 0.5 -0.0057 0.2037 0.1922 0.957 0.1466 0.1575 0.1507 0.837 0.5 0.0137 0.2255 0.2048 0.948 -0.0357 0.2216 0.2174 0.947

0 0.0045 0.2291 0.2239 0.942

Data 2 η = 0.5 0.5 0.0028 0.2421 0.2389 0.940 0.0364 0.1918 0.1910 0.952 0.5 0.0065 0.2784 0.2625 0.953 -0.0123 0.2675 0.2666 0.950

0 -0.0142 0.2795 0.2802 0.940

η = 1 0.5 0.0025 0.2407 0.2376 0.953 0.1524 0.1878 0.1865 0.877 0.5 -0.0059 0.2677 0.2564 0.948 -0.0533 0.2648 0.2631 0.942

0 -0.0123 0.2869 0.2756 0.948

Data 3 η = 0.5 0.5 0.0015 0.1717 0.1606 0.953 0.0035 0.1253 0.1233 0.956 0.5 0.0089 0.2054 0.1878 0.956 -0.0122 0.1934 0.1879 0.958

0 0.0022 0.2059 0.1985 0.948

η = 1 0.5 -0.0085 0.1644 0.1588 0.951 -0.0066 0.1194 0.1224 0.946 0.5 0.0164 0.1984 0.1789 0.948 -0.0354 0.1827 0.1843 0.944

0 0.0011 0.2052 0.1958 0.950

Table 4-5: Simulations with marginal heterogeneity. Data 1: Typical bivariate right-censored data; Data 2: Semi-competing risks data; Data 3: Truncation data.

Model 1: with gouping Model 2: without grouping

Data β0 Bias σ¯ σˆ Cov95 Bias σ¯ σˆ Cov95

β1 β2

Data 1 0.5 0.0062 0.2012 0.1934 0.950 0.0041 0.1572 0.1572 0.951 0.5 0.0074 0.2371 0.2149 0.956 0.0061 0.2291 0.2245 0.952

0 0.0011 0.2408 0.2322 0.951

Data 2 0.5 -0.0043 0.2571 0.2394 0.951 -0.0018 0.2025 0.1944 0.950 0.5 0.0089 0.2929 0.2766 0.952 0.0025 0.2849 0.2709 0.949

0 0.0003 0.2961 0.2848 0.956

Data 3 0.5 0.0045 0.1709 0.1606 0.946 -0.0014 0.1276 0.1253 0.946 0.5 0.0068 0.2009 0.1968 0.946 0.0081 0.1894 0.1908 0.951

0 -0.0086 0.2037 0.2021 0.959

Table 4-6: Simulations with marginal homogeneity. Data 1: Typical bivariate right-censored data; Data 2: Semi-competing risks data; Data 3: Truncation data.

4.4.6.3. Real Datas

The proposed methods are applied to analyze two data sets: the bone marrow transplan-tation data (Klein and Moeschberger, 2003) which belong to semi-competing risks data and the transfusion-related AIDS data (Kalbfleisch and Lawless, 1989) which belong to truncation data.

For the first data set, the main objective is to investigate how the disease type (ALL, AML low risk, AML high risk) affects the level of association between the survival time (T2) and the time to develop the chronic graft-versus-host disease (T1). In presence of right censoring, we only observe (X, Y, δ1, δ2, Z), where X is the time to chronic graft-versus-host disease or death or the end of study, Y is the time to death or the end of study and δj indicates whether Tj is observed (j = 1, 2). The covariate vector Z = (Z1, Z2)T is coded as follows: (Z1, Z2) = (0, 0) if the disease type is the ALL group; (Z1, Z2) = (1, 0) for the AML low risk group and (Z1, Z2) = (0, 1) for the AML high risk group.

The regression model can be written as: θaZ(s, t) = exp(β0+ β1Z1+ β2Z2), where β0

is the log odds of concordance for the baseline (ALL) group, β1 represents the difference of the log odds by comparing the AML low risk group with the baseline group, β2 is the difference of the log odds by comparing the AML high risk group with the baseline group. Hence β1− β2 measures the difference of the log odds between the AML low risk group and the AML high risk group.

Applying the testing procedure discussed in Section 4.4.5, we obtain that T = 2.9034 with P-value=0.407 which implies that the Clayton model is suitable for this data. We run two analyses with different weight functions. When a = b = ∞, the estimators and the corresponding standard errors given in the parentheses for β and β1 − β2 are ˆβ0 =-0.5355 (0.2756), ˆβ1=1.2188 (0.4770), ˆβ2=0.5629 (0.3666) and ˆβ1− ˆβ2=0.6559 (0.4582).

Accordingly the odds ratios for the above three sets of comparison are exp( ˆβ1)=3.3831,

exp( ˆβ2)=1.7557 and exp( ˆβ1 − ˆβ2)=1.9268. Although there seems to be positive associ-ation between the two failure times for each risk group, only the result by comparing the AML low risk group and the baseline group is statistically significant in which the former reveals larger association. With the weight function a = b = 0, the estimators and the corresponding standard errors in the parentheses for β and β1 − β2 are ˆβ0 =-0.4480 (0.3107), ˆβ1=1.2573 (0.5075), ˆβ2=0.4875 (0.3952) and ˆβ1− ˆβ2=0.7698 (0.4697).

The results of the two analyses are similar.

The second data set is from a study of blood-transfusion related AIDS (Kalbfleisch and Lawless, 1989). There are 293 subjects who infected HIV by contaminated blood transfusions and developed AIDS between January 1, 1978 and July 1, 1986. The data include the infection time (S) measured from the beginning of the study and induction time (T1) in months, and the age in years at the time of transfusion. Subjects are included in the sample only if they developed AIDS within the study period which lasted 102 months. Let T2 = 102 − S. Hence the sampling criteria is T1 ≤ T2. Tsai (1990) analyzed this data and rejected the null hypothesis of quasi-independence. Now we want to investigate whether age affects the level of association. For the age variable, we classify it into three groups: 0-4 years, 5-59 years and ≥ 60 years. Hence, the covariate Z = (Z1, Z2)T is coded as (Z1, Z2) = (0, 0) if the age of subject is in the ≥ 60 years group; (Z1, Z2) = (1, 0) for the 5-59 years group and (Z1, Z2) = (0, 1) for the 0-4 years group. Therefore, we consider the regression model: θZb(s, t) = exp(β0+ β1Z1+ β2Z2).

Since the data contain many ties, we modify the data by adding a uniform random variable between (−0.4, 0.4) to each subject. Applying the testing procedure in Section 4.4.5, we find T = 1.395 with p-value=0.707. Therefore, the Clayton model is suitable

Since the data contain many ties, we modify the data by adding a uniform random variable between (−0.4, 0.4) to each subject. Applying the testing procedure in Section 4.4.5, we find T = 1.395 with p-value=0.707. Therefore, the Clayton model is suitable

相關文件