Conditional Distribution Approach

Chapter 5: Data Generation Algorithms

5.2 Conditional Distribution Approach

5.2.1 Theoretical Background

The idea was proposed by Lee (1993). Given the marginal distribution of X₁ and if the conditional distribution of X₂|X₁ is specified, then X₂ can be generated. In general, X_k can be generated given that the form of X_k|X X₁, ₂,...,X_k₋₁ is specified. The algorithm can be performed successively for k=2,...,p.

Now we apply the above idea to the family of Archimedean copula construction of the form:

(

¹^, ²^,..., ^p

) (

^{( )}

¹ ^, ²

^{( )}

² ^,..., ^p

( )

)

F x x x =C F x F x F x ⁼^φ⁻¹

{

^φ^^{F x}¹

( )

¹ ^⁺⁺^φ^^F^k

( )

^x^k ^

}

. The joint distribution function is given by

(

) (

)

( )

¹ ^{( )}

{

^{( )}

^{( )} }

₁

^{( )} ^{( )}

2. Let X1=F1⁻¹

( )

U1 .

log log exp log log

Obviously, the above form does not allow an explicit solution. Hence to solve the equation, we need to do it numerically.

5.3: The Proposed Data Generation Method

The idea is based on a theorem in Genest & Rivest (1993). Briefly speaking, for

(

^{X Y}^,

)

which follow an AC model, we can define two random variables

(

^{U V}^,

)

^where

(

^{X Y}^,

)

which follows an AC model. The algorithm can be stated as follows.

Hence we have

5.4: Comparisons of the Three Approaches

For the frailty approach, to generate a random replication of

(

^{X Y}^,

)

, we need to generate γ and a pair of uniform random variables. For the latter two approaches, we only need to generate a pair of uniform random variables. Hence the frailty approach requires generating at least 50% more random numbers. This is considered as a drawback. For the Clayton model in which γ follows the Gamma distribution, the algorithm is simpler.

However for the situation with an arbitrary distribution of γ , to generate a random replicate of γ needed additional work. Moreover, not all of AC family can be derived from frailty model, that is, not every generator ^φ

( )

^⋅ can be expressed as an inverse function of Laplace transform of some random variable.

Although the idea of the conditional distribution approach is straightforward, the solution of X_k in (5.2) usually does not have a closed-form expression even for the bivariate case.

It is very time consuming if we have to solve the complicated equation numerically.

The proposed method is friendlier compared with the previous two methods. In comparison with the frailty approach, we do not have to generate random numbers, namely γ , which are used only for a temporary purpose. Compared with the conditional distribution approach, our method is technically easier to handle. Sometimes the inverse of ^K

( )

^⋅ ^{has an}

explicit form. If not, we can take advantage of the monotone property of ^K

( )

^⋅ and obtain its inverse using the bisection method. Despite the simplicity of the proposed method, currently the result of Genest and Rivest (1993) can not handle higher dimension with p> . 2 It implies that we need more general theoretical results in order to extend the proposed

algorithm to general multivariate situations.

In Figures 5.1, we plot the generated data using the proposed algorithm. The two models appear to be similar when the level of tau decreases.

Fig.5.1. Simulated Data using the Proposed Data Generation Algorithm

Chapter 6: Numerical Analysis

Here we examine the performance of the proposed test by simulations. Since we expect our proposed test can be applied to any Archimedean Copula model, we use the Gumbel

We generate bivariate failure times following the Gumbel model, also called the positive stable frailty model. We evaluate the performances under different Kendall’s τ equal to 0.3, 0.4, 0.5, 0.6 and 0.7 respectively. The marginal distributions of two variables are both exponential with means equal to 1. The bivariate censoring variables are mutually independent and also following exponential distributions such that the probability of censoring is from 0 to 0.5 respectively in each coordinate.

After estimating the association parameter α, we have ˆα and ˆα_w and let γˆ=logαˆ and ˆγ_w=logαˆ_w. Then estimate the variance of ˆγ_w− , γˆ σˆ_Jackknife² . The Gumbel model is rejected if the test statistic

ˆ ˆ probabilities of accepting the Gumbel model under different settings are reported.

Table 6.1 and 6.2 report the empirical probabilities of choosing the Gumbel model. When the true model is Gumbel’s, the nominal probability should be 0.95. When the true model is Clayton’s or Frank’s, the probability is the estimate of type II error rate. Hence we hope that under Gumbel model is correct the proportion of choosing Gumbel should be close to 95/100, and the power is as large as possible. From table 6.1, we find that type-Ι error is a little

smaller than 0.05 when τ equals to 0.3. This may result from the variance estimator using the Jackknife method. The Jackknife algorithm tends to overestimate the variance and results in lower type-I error. When the sample size increases to 200, we see some improvement.

Specifically the results in Table 6.2 give more accurate type I probabilities and better power in Table 6.4 and Table 6.6. In Table 6.3 and Table 6.4, we evaluate the type II error probabilities when the true model is Clayton model. In Table 6.5 and Table 6.6, we evaluate the type II error probabilities when the true model is Frank model. From Table 6.3 to Table 6.6, we find that the power deceases as Kendall’s τ decreases. This is reasonable, since these three models will all reduce to independent models as Kendall’s τ tends to be zero. That is,

( ) ( ) ( )

Pr X >x Y, > y =S_x x ⋅S_y y . This implies that it gets more difficult to distinguish the two models when they are similar.

Figure 6.1 to Figure 6.4 show the powers under true model is Clayton and Frank model with sample size equal to 100 and 200 respectively.