Chapter 5: Data Generation Algorithms
5.2 Conditional Distribution Approach
5.2 Conditional Distribution Approach
5.2.1 Theoretical Background
The idea was proposed by Lee (1993). Given the marginal distribution of X1 and if the conditional distribution of X2|X1 is specified, then X2 can be generated. In general, Xk can be generated given that the form of Xk|X X1, 2,...,Xk−1 is specified. The algorithm can be performed successively for k=2,...,p.
Now we apply the above idea to the family of Archimedean copula construction of the form:
(
1, 2,..., p) (
1( )
1 , 2( )
2 ,..., p( )
p)
F x x x =C F x F x F x =φ−1
{
φF x1( )
1 ++φFk( )
xk }
. The joint distribution function is given by(
1) (
1)
( )
1 ( ){
1( )
1( ) }
1( ) ( )
2. Let X1=F1−1
( )
U1 .log log exp log log
.
Obviously, the above form does not allow an explicit solution. Hence to solve the equation, we need to do it numerically.
5.3: The Proposed Data Generation Method
The idea is based on a theorem in Genest & Rivest (1993). Briefly speaking, for
(
X Y,)
which follow an AC model, we can define two random variables
(
U V,)
where(
X Y,)
which follows an AC model. The algorithm can be stated as follows.Hence we have
5.4: Comparisons of the Three Approaches
For the frailty approach, to generate a random replication of
(
X Y,)
, we need to generate γ and a pair of uniform random variables. For the latter two approaches, we only need to generate a pair of uniform random variables. Hence the frailty approach requires generating at least 50% more random numbers. This is considered as a drawback. For the Clayton model in which γ follows the Gamma distribution, the algorithm is simpler.However for the situation with an arbitrary distribution of γ , to generate a random replicate of γ needed additional work. Moreover, not all of AC family can be derived from frailty model, that is, not every generator φ
( )
⋅ can be expressed as an inverse function of Laplace transform of some random variable.Although the idea of the conditional distribution approach is straightforward, the solution of Xk in (5.2) usually does not have a closed-form expression even for the bivariate case.
It is very time consuming if we have to solve the complicated equation numerically.
The proposed method is friendlier compared with the previous two methods. In comparison with the frailty approach, we do not have to generate random numbers, namely γ , which are used only for a temporary purpose. Compared with the conditional distribution approach, our method is technically easier to handle. Sometimes the inverse of K
( )
⋅ has anexplicit form. If not, we can take advantage of the monotone property of K
( )
⋅ and obtain its inverse using the bisection method. Despite the simplicity of the proposed method, currently the result of Genest and Rivest (1993) can not handle higher dimension with p> . 2 It implies that we need more general theoretical results in order to extend the proposedalgorithm to general multivariate situations.
In Figures 5.1, we plot the generated data using the proposed algorithm. The two models appear to be similar when the level of tau decreases.
Fig.5.1. Simulated Data using the Proposed Data Generation Algorithm
Chapter 6: Numerical Analysis
Here we examine the performance of the proposed test by simulations. Since we expect our proposed test can be applied to any Archimedean Copula model, we use the Gumbel
We generate bivariate failure times following the Gumbel model, also called the positive stable frailty model. We evaluate the performances under different Kendall’s τ equal to 0.3, 0.4, 0.5, 0.6 and 0.7 respectively. The marginal distributions of two variables are both exponential with means equal to 1. The bivariate censoring variables are mutually independent and also following exponential distributions such that the probability of censoring is from 0 to 0.5 respectively in each coordinate.
After estimating the association parameter α, we have ˆα and ˆαw and let γˆ=logαˆ and ˆγw=logαˆw. Then estimate the variance of ˆγw− , γˆ σˆJackknife2 . The Gumbel model is rejected if the test statistic
ˆ ˆ probabilities of accepting the Gumbel model under different settings are reported.
Table 6.1 and 6.2 report the empirical probabilities of choosing the Gumbel model. When the true model is Gumbel’s, the nominal probability should be 0.95. When the true model is Clayton’s or Frank’s, the probability is the estimate of type II error rate. Hence we hope that under Gumbel model is correct the proportion of choosing Gumbel should be close to 95/100, and the power is as large as possible. From table 6.1, we find that type-Ι error is a little
smaller than 0.05 when τ equals to 0.3. This may result from the variance estimator using the Jackknife method. The Jackknife algorithm tends to overestimate the variance and results in lower type-I error. When the sample size increases to 200, we see some improvement.
Specifically the results in Table 6.2 give more accurate type I probabilities and better power in Table 6.4 and Table 6.6. In Table 6.3 and Table 6.4, we evaluate the type II error probabilities when the true model is Clayton model. In Table 6.5 and Table 6.6, we evaluate the type II error probabilities when the true model is Frank model. From Table 6.3 to Table 6.6, we find that the power deceases as Kendall’s τ decreases. This is reasonable, since these three models will all reduce to independent models as Kendall’s τ tends to be zero. That is,
( ) ( ) ( )
Pr X >x Y, > y =Sx x ⋅Sy y . This implies that it gets more difficult to distinguish the two models when they are similar.
Figure 6.1 to Figure 6.4 show the powers under true model is Clayton and Frank model with sample size equal to 100 and 200 respectively.
Table 6.1: Empirical Probabilities of Accepting the Gumbel Model with n =100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Gumbel
Sample Mean -0.038 -0.029 -0.02 0.012 0.047
Sample Standard Deviation 0.88 0.959 1.01 0.993 0.99
Proportion of choosing Gumbel 99/100 97/100 96/100 96/100 95/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Gumbel
Sample Mean -0.115 -0.127 -0.141 -0.133 -0.134
Sample Standard Deviation 0.893 0.968 1.036 1.018 0.987
Proportion of choosing Gumbel 98/100 93/100 95/100 97/100 96/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Gumbel
Sample Mean -0.255 -0.252 -0.234 -0.207 -0.199
Sample Standard Deviation 0.882 0.933 0.941 0.886 0.838
Proportion of choosing Gumbel 97/100 96/100 95/100 96/100 99/100
Table 6.2: Empirical Probabilities of Accepting the Gumbel Model with n =200
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Gumbel
Sample Mean 0.146 0.16 0.15 0.164 0.132
Sample Standard Deviation 0.986 1.033 1.022 1.006 1.001
Proportion of choosing Gumbel 97/100 95/100 92/100 93/100 93/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Gumbel
Sample Mean 0.097 0.109 0.088 0.118 0.114
Sample Standard Deviation 1.031 1.087 1.054 1.021 1.009
Proportion of choosing Gumbel 95/100 94/100 94/100 93/100 96/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Gumbel
Sample Mean -0.012 0.006 -0.034 -0.032 -0.032
Sample Standard Deviation 0.946 0.923 0.856 0.907 0.879
Proportion of choosing Gumbel 95/100 97/100 98/100 97/100 96/100
Table 6.3:Empirical Type II Error Probabilities of Accepting the Gumbel Model when the True Model is Clayton with n =100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Clayton
Sample Mean -2.458 -3.203 -3.721 -4.118 -4.358
Sample Standard Deviation 1.113 1.226 1.194 1.193 1.246
Proportion of choosing Gumbel 30/100 17/100 6/100 3/100 3/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Clayton
Sample Mean -1.826 -2.397 -2.793 -3.113 -3.382
Sample Standard Deviation 1.034 1.108 1.108 1.13 1.236
Proportion of choosing Gumbel 55/100 39/100 24/100 10/100 12/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Clayton
Sample Mean -1.031 -1.379 -1.64 -1.919 -2.137
Sample Standard Deviation 0.879 0.983 1.059 1.144 1.135
Proportion of choosing Gumbel 83/100 72/100 65/100 58/100 52/100
Table 6.4:Empirical Type II Error Probabilities of Accepting the Gumbel Model when the True Model is Clayton with n =200
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Clayton
Sample Mean -3.644 -4.76 -5.65 -6.303 -6.78
Sample Standard Deviation 1.217 1.511 1.695 1.832 1.894
Proportion of choosing Gumbel 6/100 2/100 0/100 0/100 0/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Clayton
Sample Mean -2.876 -3.782 -4.501 -5.055 -5.432
Sample Standard Deviation 1.083 1.342 1.525 1.672 1.738
Proportion of choosing Gumbel 23/100 6/100 3/100 0/100 0/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Clayton
Sample Mean -1.771 -2.405 -2.882 -3.314 -3.628
Sample Standard Deviation 0.951 1.218 1.392 1.549 1.579
Proportion of choosing Gumbel 58/100 31/100 28/100 24/100 15/100
Table 6.5:Empirical Type II Error Probabilities of Accepting the Gumbel Model when the True Model is Frank with n =100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Frank
Sample Mean -1.865 -2.248 -2.547 -2.807 -2.977
Sample Standard Deviation 0.959 0.949 0.944 0.936 0.964
Proportion of choosing Gumbel 52/100 39/100 22/100 17/100 13/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Frank
Sample Mean -1.666 -2.047 -2.293 -2.552 -2.691
Sample Standard Deviation 0.945 0.957 0.928 0.92 0.956
Proportion of choosing Gumbel 67/100 50/100 38/100 21/100 17/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Frank
Sample Mean -1.24 -1.53 -1.729 -1.961 -2.018
Sample Standard Deviation 0.923 1.03 1.029 1.056 0.999
Proportion of choosing Gumbel 78/100 67/100 60/100 58/100 53/100
Table 6.6:Empirical Type II Error Probabilities of Accepting the Gumbel Model when the True Model is Frank with n =200
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 0% Frank
Sample Mean -2.829 -3.426 -3.882 -4.255 -4.597
Sample Standard Deviation 1.194 1.329 1.309 1.264 1.199
Proportion of choosing Gumbel 21/100 12/100 4/100 2/100 1/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 20% Frank
Sample Mean -2.77 -3.382 -3.837 -4.142 -4.363
Sample Standard Deviation 1.167 1.333 1.345 1.346 1.315
Proportion of choosing Gumbel 24/100 15/100 5/100 2/100 2/100
tau=0.3 tau=0.4 tau=0.5 tau=0.6 tau=0.7
Censor proportion = 50% Frank
Sample Mean -2.229 -2.762 -3.139 -3.281 -3.413
Sample Standard Deviation 1.176 1.385 1.442 1.376 1.335
Proportion of choosing Gumbel 39/100 30/100 24/100 21/100 18/100
Fig.6.1: Curves of empirical power for H0: Gumbel vs. Ha: Clayton (n=100)
Fig.6.2: Curves of empirical power for H0: Gumbel vs. Ha: Frank (n=100)
Fig.6.3: Curves of empirical power for H0: Gumbel vs. Ha: Clayton (n=200)
Fig.6.4: Curves of empirical power for H0: Gumbel vs. Ha: Frank (n=200)
Fig.6.5: The local odds ratio functions at different levels of Kendall’s tau for the Gumbel model, the Clayton model and the Frank model
Chapter 7: Conclusion
In this article, we propose a test for checking whether the data following an AC model.
In our analysis, we use the Gumbel model for illustration. To verify whether proposed test statistic is asymptotically normal, and we examine its distribution by simulations. Our conjecture is confirmed. We have also found that the power of the proposed test is satisfactory.
Shih (1998) has analyzed the situation when the null hypothesis is the Clayton model while the alternative hypothesis is Gumbel’s model. In our simulations, we reverse the roles of the two models in setting the hypotheses. Our result is similar to that of Shih.
The power decreases as the censoring proportion increases. When the null hypothesis is the Gumbel model, the power is higher under the Clayton alternative than under the Frank model. Recall that in Figure 6.5., the Gumbel model is more close to the Frank model and less similar to the Clayton model. It is easier to distinguish two models which are more different which results in higher power.
As for future investigation, we may try more model combinations. Also it may be interesting to compare the proposed test with the test of Wang and Wells (2000) by simulations.
Appendix
Here, we prove the survival version of the theorem in Genest & Rivest. The proof can be divided into several parts.
Consider the survival AC model:
, ~ (1 ,1 ) 1{ (1 ) (1 )} Pr( , )
(iii) Show that the conditional survival function can be written as
( ) ( )
(i)
( )
( ) ( )
Here, we try to prove the asymptotic normality of γˆw− . The idea is that, first prove γˆ the asymptotic normality of untransformed estimator ˆαW −αˆUw, then utilize delta method to derive the asymptotic normality of ˆγw− .γˆ
1 ,
(
i j,)
pairs observations. So, we can utilize the U -statistic to derive the analytic properties of S( )
α .Reference:
CLAYTON, D. G. (1978). A model for association in bivariate life tables and its application to epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 65, 141-51.
DABROWSKA, D. (1988). Kaplan-Meier estimate on the plane. The Annals of Statistics. 16, 1475-89.
FREES EW, VALDEZ E. (1998). Understanding the relationships using copulas . North American Actuarial Journal. 2, 1-25.
GENEST, C. & RIVEST, L.-P. (1993). Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association. 88, 1034-43.
LEE, A. J. (1993). Generating random binary deviates having fixed marginal distributions and specified degrees of association. The American Statistician. 47, 209-215.
OAKES, D. (1986). Semiparametric inference in a model for association in bivariate survival data. Biometrika. 73, 353-61.
OAKES, D. (1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association. 84, 487-93.
SHIH, J. H. (1998). A goodness-of-fit test for association in a bivariate survival model.
Biometrika 85, 189-200.
WANG, W. & WELLS, M. (1997). Nonparametric estimators of the bivariate survival function under simplified censoring conditions. Biometrika. 84, 863-880.