Data Analysis - Testing quasi-independence

Chapter 4 Testing quasi-independence

4.6 Data Analysis

We applied the proposed tests to the contaminated blood transfusion AIDS dataset provided in Lagakos et al. (1988). The variables included the infection times T measured from April 1, 1978, and the induction period X measured from their infection times. The sample contained 258 adults and 37 children. Only those who developed AIDS within the 8 years study period can be included in the sample, and thus X + T ≤8 is the truncation criteria. We set the new variable Y = 8−T so that the pair (X,Y) is observed subject to

X ≤ . Lagakos et al. (1988) applied the product-limit estimator for the survival function of X under the quasi-independence of (X,Y) for adults and children groups separately. Now we examine the validity of this assumption.

Applying the proposed Log-rank tests for the adult group, we found that the Z-values of the test statistics L_ρ₌₀, L_ρ₌₁ and L_inv_log, standardized by the jackknife estimators, were -5.012, -2.918 and -3.795 respectively. The negative sign of the Z-values indicates the positive association for (X,Y). The corresponding two-sided p-values of the three test statistics were 5.4×10⁻⁷, 3.5×10⁻³and 1.5×10⁻⁴ respectively. All p-values in the adult group showed significant deviation from quasi-independence, but the test based on L_ρ₌₀ produced the smallest p-value.

For the children group, the Z-values of the test statistics L_ρ₌₀, L_ρ₌₁ and L_inv_log, after standardized by the jackknife estimator, were -1.838, -1.379 and -1.373 respectively. The positive association on (X,Y) can be found in the children group as well. The p-values for the two sided alternative were 0.0661, 0.1679 and 0.1697 respectively. The smallest p-value was also achieved by the L_ρ₌₀ statistics, showing 10% significance level. In this case, the other statistics L_ρ₌₁ and L_inv_log could not reveal significant departure form quasi-independence.

In both groups, the significance level from L_ρ₌₁ is the highest and that from L_ρ₌₁ , which is equivalent to Tsai’s test statistics, was the lowest. One possible explanation of this result is that the data is better approximated by the Clayton semi-survival model than the Frank model. As we have seen in the simulation studies, the statistics L_ρ₌₀ has the highest

efficiency while the statistics L_ρ₌₁ is the worst under the Clayton model. This data analysis also indicates that choosing an appropriate weight function is essential for power improvement especially when the sample size is small.

4.7. Conclusion

In the second project, we have proposed a general class of tests in the form of the weighted Log-rank statistics for testing quasi-independence for truncation data. Tsai’s test (1990) turns out to be a special case of our proposal.

We also utilize the distributional property of the 2× tables in constructing the 2 proposed score test. Our results show that the score test belongs to the proposed class of weighted Log-rank tests with an appropriate choice of the weight function. Our simulations confirm that the score test yields a more powerful testing procedure if the pattern of dependence under the alternative hypothesis is correctly specified. It is important to note that optimal properties of the score test cannot be derived by applying the results for parametric models or the efficiency theory under a semi-parametric framework (Van Der Vaart, 1998, Chapter 25). The difficulty comes from the fact that each term in the product of the likelihood function (4.9) is neither the conditional likelihood nor partial likelihood since the probabilities are calculated conditional on an un-nested sequence of conditioning events. Further theoretical investigation on the likelihood formulation would be helpful.

For establishing the asymptotic normality, we have applied the functional delta method

which can handle more general situations than the U-statistics or rank statistics approaches.

Furthermore, the expression of the proposed statistics in the statistically differentiable functional allows us to verify the consistency of the jackknife estimator. These theoretical justifications allow us to safely use a computationally simpler way for finding the cut-off values.

Another important and practical problem is how to choose the best weight in real data analysis where the association pattern on (X,Y) is unknown in a nonparametric setting.

Now we discuss the possible approaches based on the literature of survival analysis. A common, but somewhat ad-hoc way of choosing weight function is to rely on the researchers’

own experience, or their knowledge on the association structure. Another more elaborate approach is to use a combination of several weighed Log-rank statistics (Tarone, 1981;

Chapter 7 of Fleming & Harrington, 1991 and Kosorok & Lin, 1999). Such an approach is considered to be a robust test (Kosorok & Lin, 1999) in that one may avoid using the worst weight choice in data analysis. To implement this methodology, the joint distribution for several weighted Log-rank statistics must be derived in some sense, and it would be our future problem for investigation.

Appendices : Project 2

Appendix 4.A. Asymptotic Analysis

Let }D{[0,∞)² be the collection of all right-continuous functions with left-side limit defined on [0,∞)² , whose norm is defined by f(x,y)_∞ =sup_x_,_y | f(x,y)| for continuous. The empirical process on the plane is defined as:

∑

The functional delta method is applied based on the weak convergence result of )) structure given by

)

Part I: Proof of Theorem 4.1

After some algebraic manipulations involving (6), we obtain

∫∫

≤

the above expression can be written as

By direct calculations, we can show the Hadamard differentiability of Φ . The differential (⋅) map of Φ at (⋅) π∈ D{[0,∞)²} with direction h∈ D{[0,∞)²} is:

By applying the functional delta method (Van Der Vaart, 1998, p. 297), we obtain the following asymptotic linear expression

)

are iid random variables with mean-zero. From the central limit theorem, n⁻¹^/²L_w converges weakly to a mean-zero normal distribution with the variance σ² = E[U(X_j,Y_j)²].

Part II: Analytic Variance Estimator for the G Class ^ρ

The statistics in the G^ρ class are special cases of L_w. For this class, it is relatively easier to obtain an analytic formula for estimating σ based on asymptotic linear ² expressions. Specifically, the derivative map is given by

∫∫∫∫

The asymptotic expression

∑

= empirical estimator:

Part III: Proof of Theorem 2

Lw involves the estimator of the truncation probability c. From the result of He and Yang (1998), cˆ has an algebraically equivalent expression

∫

^∞

= 0 ˆ ( ) ˆ ( )

ˆ S u dF u

c _Y _X .

The product limit estimators (Lynden-Bell, 1971; Wang, Jewell & Tsai , 1986) for (X,Y) are defined as:

∏

< ⎭⎬⎫ Hadamard differentiable maps:

∫

0^∞ ˆ ( ) ˆ ( )

It is well-known for right-censored data that the product limit estimator is Hadamard differentiable function of the empirical process. For truncation data, we apply the arguments of example 20.15 of Van Der Vaart (1998) to show the Hadamard differentiability of maps from }D{[0,∞)² to D{[0,∞)}:

To prove the former statement, we decompose the map into three differentiable maps

where the Hadamard differentiability of the second map follows from Lemma 20.10 of Van Der Vaart (1998) and the last map follows from the Hadamard differentiability of product integral (Andersen et al., 1993, proposition II.8.7). The Hadamard differentiability of the map

)

π can be established by the same arguments. The Hadamard differentiability of the second map in (A.1) can be found in Lemma 20.10 of Van Der Vaart (1998).Using chain rules (Van Der Vaart, 1998, theorem 20.9), the map g is shown to be Hadamard differentiable. Let g_π′ )(h ∈R be the differential map of g at π∈ D{[0,∞)²} with expansion

|).

By applying the functional delta method, we obtain the following asymptotic linear expression:

where the sequences,

)

are mean-zero i.i.d. random variables. From the central limit theorem, n⁻¹^/²L^*_w converges

weakly to a mean-zero normal distribution with the variance σ²^*.

Part IV: Consistency of the Jackknife Estimator

Now we show the consistency of the jackknife estimator for L . We have shown that _w statistics of the form L have asymptotic normal distributions with finite variances. _w According to the Theorem 3.1 of Shao (1993), we need to show the continuous Gateaux differentiability of Φ(π) at π∈ D{[0,∞)²}. Note that the Hadamard differentiability is stronger than the Gateaux differentiability, and hence the Gateaux derivative map is already available from Section A1. We only need to show the continuous requirement of the derivative map. For sequence π_k∈ D{[0,∞)²} satisfying π_k−π _∞ →0 and t_k →0, we

need to show proving the continuous Gateaux differentiability is essentially the same manner as the example 2.6 in Shao (1993). The continuous differentiability of w(⋅) and the assumption

→0

−π _∞

π_k ensure the following expansion

uniformly in (u,v). Hence a straightforward but tedious calculation shows that )

)

To show the consistency of the jackknife estimator for L , we only need to check whether ^*_w the continuity of the Gateaux differential map of Ψ(π) which is available in Section A.3.

We can obtain the continuity requirement after tedious algebraic operations similar to the above arguments in A4.

Part V: Asymptotic Analysis in Presence of Censoring

Based on the product integral form of the Lynden-Bell’s estimator Sˆ_C(y), we obtain the expression

{ }

From equation (?), (A-1) and (A-2), we obtain the following functional expression:

Here, the last equation follows from the property

⎩⎨

Based on the similar arguments with Section A3, we can express the estimator c as a ˆ^* function of Hˆ such that cˆ^*=g^*(Hˆ). The similar algebraic operation can be applied to obtain the functional expression for L . ^*_w

Appendix 4.B: Proof of Equivalence Formula For right censored data, we show the identity:

∫∫ ∑

Similar algebraic manipulation shows that

∑ ∑

Combining these formulae, we obtain

∑ ∑

~ . 1

~ 2 } {

~ 1

~ 2 } {

1 : 2

∑

∑ ∑

= <

−

= Δ

−

= Δ

−

i ij

ij ij ij n

i jX X ij

ij ij ij

W R B I

W R B I I

i j

The last equation follows from the permutation symmetry of each term with respect to arguments (i,j).

Chapter 5 Future Work

In Chapter 3, we consider semi-parametric inference for semi-survival AC models and propose a likelihood-based approach for estimating the association parameter. A nice equivalent condition for different types of estimating functions is established. Similar idea is used again to construct a score test. Despite that we have seen efficiency gain or power improvement by choosing an appropriate weight function, optimality results are still not available. As mentioned earlier, each term in the product of the likelihood function is neither the conditional likelihood nor the partial likelihood since the probabilities are calculated conditional on an un-nested sequence of conditioning events. Further investigation is needed to elucidate the proposed likelihood, and it is hoped that we develop more understanding for the theoretical properties of the proposed methods.

For establishing the asymptotic normality, the functional delta method is applied for two problems. For the Log-rank statistics in Chapter 4, its expression has been shown to be a statistically differentiable functional that allows us to verify the consistency of the jackknife estimator. This theoretical justification allows us to safely use a computationally simple way for determining the decision rule of the testing procedure. Theoretical property of the jackknife estimator is only proven for the simple case of the Log-rank statistics with no censoring. For other complicated cases, the jackknife method is still a useful tool even though it may lack theoretical justification. Nevertheless finding a tractable and theoretically valid way of constructing confidence intervals or bands still deserves further investigation.

References

ANDERSEN, P. K., BORGAN, O., GILL, R. D. & KEIDING, N. (1993). Statistical Models Based on Counting Processes, New York: Springer-Verlag.

CHAIEB, L. RIVEST, L.-P. & ABDOUS, B. (2006). Estimating survival under a dependent truncation. Biometrika, 93, 655-69.

CLAYTON, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65, 141-51.

CLAYTON, D. G. & CUZICK, J. (1985). Multivariate generalizations of the proportional hazards model (with discussion). Journal of the Royal Statistical Society: Series A, 148, 82-117.

CHEN, C.-H., TSAI, W.-Y. and CHAO, W.-H. (1996). “The product-moment correlation coefficient and linear regression for truncated data.”, Journal of the American Statistical Association, 91, 1181-1186.

CUZICK, J. (1982). Rank tests for association with right censored data. Biometrika, 69, 351-364.

CUZICK, J. (1985). Asymptotic properties of censored linear rank tests. The Annals of Statistics, 13, 133-141.

DABROSKA, D. M. (1986). Rank tests for independence for bivariate censored data. The Annals of Statistics, 14, 250-264.

DAY, R., BRYANT, J. & LEFKOPOLOU, M. (1997). Adaptation of bivariate frailty models for prediction, with application to biological markers as prognostic indicators. Biometrika 84, 45-56.

EFRON, B. F. (1982). The Jackknife , the Bootstrap, and Other Resampling Plans, Philadelphia: Society for Industrial and Applied Mathematics.

FINE, J. P., JIANG, H. & CHAPPELL, R. (2001). On semi-competing risks data. Biometrika 88, 907-19.

GENEST, C. (1987). Frank’s family of bivariate distributions. Biometrika 74, 549-55.

GENEST, C., GHOUDI, K. & RIVEST, L.-P. (1995). A semi-parametric estimation procedure for dependence parameters in multivariate families of distributions. Biometrika 82, 543-52.

GENEST, C. & MACKAY, R. J. (1986). The joy of Copulas: Bivariate distributions with uniform marginals. The American Statistician, 40, 280-283.

HE, S. and YANG, G. L. (1998). Estimation of the truncation probability in the random truncation model. Annals of Statistics. 26, 1011-27.

HSU, L. and PRENTICE, R. L. (1996). A generalisation of the Mantel-Haenszel test to bivariate failure time data. Biometrika, 83, 905-911.

KALBFLEISCH, J. D. & LAWLESS, J. F. (1989). Inference based on retrospective ascertainment: an analysis of the data on transfusion-related AIDS. Journal of the American Statistical Association, 84, 360-72.

KLEIN, J. P. & MOESCHBERGER, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer

KOSOROK, M. R. and LIN, C. (1999). The versatility of functional-indexed weighted log-rank statistics. Journal of the American Statistical Association, 94 320-332.

LAI, T. L. & YING, Z. (1991). Estimating a distribution function with truncated and censored data. Annals of Statistics. 19, 417-42.

LYNDEN-BELL, D. (1971). A method of allowing for known observational selection in small samples applied to 3RC quasars. Mon. Nat. R. Astr. Soc. 155, 95-118.

LAGAKOS, S. W., BARRAJ, L. M. & DE GRUTTOLA, V. (1998). Non-parametric analysis of truncated survival data, with application to AIDS. Biometrika 75, 515-23.

MARTIN, E. C. & BETENSKY, R. A. (2005). Testing quasi-independence of failure and truncation via Conditional Kendall’s Tau. Journal of the American Statistical Association,

100, 484-92.

NELSEN, R. B. (1999). An Introduction to copulas. New York: Springer-Verlag.

OAKES, D. (1982). A model for association in bivariate survival data. Journal of the Royal Statistical Society: Series B, 44, 414-22.

OAKES, D. (1986). Semi-parametric inference in a model for association in bivariate survival data. Biometrika, 73, 353-61.

OAKES, D. (1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association , 84, 487-93.

RIVEST, L.-P. & WELLS, M. T. (2001). A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J. of Mult. Annal. 79, 138-55.

SHAO, J. (1993). Differentiability of statistical functionals and consistency of the jackknife.

The Annals of Statistics, 21, 61-75.

SHIH, J. H. & LOUIS, T. A. (1995). Inference on the association parameter in copula models for bivariate survival data. Biometrics, 51, 1384-99.

SHIH, J. H. & LOUIS, T. A. (1996). Tests of independence for bivariate survival Data.

Biometrics, 52, 1440-1449.

STUTE, W. (1993). Almost sure representation of the product-limit estimator for truncated data.

The Annals of Statistics, 21, 146-56.

TARONE, R. E. (1981). On the distribution of the maximum of the log-rank statistics and the modified Wilcoxon statistics. Biometrics, 37 79-85.

TSAI, W. -Y. (1990). Testing the association of independence of truncation time and failure time. Biometrika 77, 169-177.

VAN DER VAART. A. W. (1998). Asymptotic statistics. Cambridge Series in Statistics and Probabilistic Mathematics. Cambridge: Cambridge University Press.

WANG, M. C., JEWELL, N. P. & TSAI, W. Y. (1986). Asymptotic properties of the product-limit estimate and right censored data. The Annals of Statistics, 13, 1597-605.

WANG, W. & DING, A. A. (2000). On assessing the association for bivariate current status data. Biomertika 87, 897-93.

WANG, W. (2003). Estimating the association parameter for copula models under dependent censoring. Journal of the Royal Statistical Society: Series B, 65, 257-73.

WOODROOFE, M. (1985). Estimating a distribution function with truncated data. The Annals of Statistics, 13, 163-77.

ZHENG, M. & KLEIN, J. (1995). Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika 82, 127-38.

在文檔中相依截切資料的統計推論 (頁 82-0)