• 沒有找到結果。

For the last three decades, the inverse Gaussian (IG) distribution has gained tremendous attention in describing and analyzing right skewed data. IG distribution can accommodate data with a variety of shapes from highly skewed to almost normal and it is known that most of the data from applied fields are often positive and right-skewed, that is why IG distribution has gotten intensive attentions in statistical application fields. In 1915 Schrödinger introduced the probability distribution of the first passage time in Brownian motion, but we still are unaware of other references to the distribution until Tweedie (1945) proposed the name, IG distribution, for the first passage time distribution. Next, Wald (1947) derived the distribution as a limiting form for the distribution of sample size in a sequential probability ratio test. Because of this derivation, the distribution is also known as Wald’s distribution, particularly in the Russian literature.

In many areas of statistical applications, handling of skewed data is by no means an exception but a fact of life. Hence if possible, it is desirable to analyze the data as observed using statistical methods based on skewed distributions. However, standard statistical methods for the normal distribution are commonly used for the data analysis.

This is primarily due to lack of alternative methods that are easily available and also easy to understand. Although Gamma, Weibull, and lognormal distributions enjoy

extensive use in certain special areas, none of them allow for a wide range of statistical methods comparable to those based on the normal distribution.

Comparatively, IG can accommodate a variety of shapes from highly skewed to almost normal. See Chhikara and Folks (1989), Seshadri (1993, 1999) for more details of IG distribution analogies.

As the IG mean is inversely proportional to the drift of Brownian motion or the growth rate in Weiner process, it would be of some interest to compare two IG means if a comparison in the associated processes is desired. Chhikara (1975, 1989) derived UMP-unbiased tests for the equality of two inverse Gaussian population means, say

μ -1 μ , and constructed the confidence interval for the ratio of two means under the 2 identical shape parameter λ assumption. However, the situation that two IG populations have the identical shape parameter does not always happen. Afterward, Tian and Wilding (2005) adopt the directed likelihood ratio and modified directed likelihood ratio method (Barndorff-Nielsen, 1986) to provide an approximate approach for constructing a confidence interval of μ μ of two 1 2 IG populations.

Even so, exact inferences on the ratio of two IG populations’ means when the scale parameters λ and 1 λ are unknown and possible unequal still need to explore. 2 Therefore in this thesis, we would like to propose exact inferences on μ μ without 1 2 making identical shape parameter assumption. We will develop significant tests and

confidence intervals for the general cases without the assumption of equal scale parameters based on the concepts of generalized p-values and generalized confidence intervals. The concepts of the generalized p-value and generalized confidence interval were introduced by Tsui and Weerahandi (1989) and Weerahandi (1993), respectively, to solve many statistical problems involving nuisance parameters. Typically, the generalized p-value and the generalized confidence interval were found to be fruitful for problems where conventional frequentist procedures were non-existent or were difficult to obtain (see the book by Weerahandi (1995) for a detailed discussion). The lack of exact confidence intervals in many applications can be attributed to the statistical problems involving nuisance parameters. Therefore, for these reasons, we will use the idea of a generalized p-values approach to construct a pivotal variable, so it can be used for both hypothesis testing and confidence region.

The rest of the thesis is organized as follows. In chapter 2, the properties of inverse Gaussian distribution and the concept of generalized p-value and generalized confidence interval is reviewed. For one IG population, our procedures and Chhikara and Folks’ (1976) methods for hypothesis testing and constructing the generalized confidence intervals about μ and λ are introduced in chapter 3. In chapter 4, we

will present our procedures for hypothesis testing and constructing the generalized confidence intervals about 1

2

μ

μ and 12 λ

λ for two independent IG populations. The

methods presented by Chhikara and Folks (1975) and Tian and Wilding (2005) will be addressed in this chapter as well. We apply these results to four sets of data, and compare our procedure with other methods with respect to their confidence intervals and confidence widths in chapter 5. Three sets of simulation studies are also presented in chapter 5 to compare the coverage probabilities, expected lengths, type I error and power performances of these methods. Concluding remarks are summarized in chapter 6.

Chapter 2 Properties of IG Distribution and the Generalized Methods

In this chapter we provide some of the properties that play a significant role in the development of statistical methods for the inverse Gaussian distribution and then briefly introduce the theories of the generalized p-value and the general confidence interval.

2.1 Properties of Inverse Gaussian distribution

The probability density function of a random variable X distributed as inverse Gaussian with parameters μ and λ , denoted by X ~IG

(

μ λ , is given by ,

)

μ = , the distribution is often referred to as the standard Wald’s distribution. 1

If X ~IG

(

μ λ , then the characteristic function, denoted by ,

)

CX

( )

t , is given

Suppose that all positive and negative moments exist, the moment generating function is

respectively, with

( )

~ ,

X IG μ λn and n Vλ ~χn21, (2.5)

where IG

(

μ λ,n

)

is the inverse Gaussian distribution and χn21 is the chi-square distribution with n-1 degrees of freedom. We can show that both of them are statistically independent. The density function (2.1) is seen to be a member of the exponential family, and

∑ ∑

is a complete sufficient statistic for inverse

Gaussian distribution.

By a simple characteristic function argument, it can be seen that if

( )

~ ,

X IG μ λ then cX ~IG c

(

μ λ,c

)

for c>0. So the family of inverse Gaussian distributions is closed under a change of scale. Because of the normal analogy it would be natural to hope that any linear combination of inverse Gaussian variables would also be inverse Gaussian. Unfortunately, the property of reproducibility does not hold with respect to a change of location. Although this hope is not satisfied, Chhikara (1972) and Shuster and Miura (1972) have shown that under the necessary condition, λ μi i2 = for all i , inverse Gaussian variables do enjoy a certain ξ additive property. That is if Xi ~IG

(

μ λi, i

)

, i=1, 2,...,n, independently, such that

2

i i

λ μ = for all i , then ξ

Xi ~IG

(

μ ξi,

(

μi

)

2

)

.Therefore in order for the

linear combination

c Xi i of independent inverse Gaussian variables to be inverse Gaussian, λi ciμi2 must be positive and constant, i=1, 2,...,n. Hence the additive property of the inverse Gaussian is restricted by a required relationship between the two parameters.

Furthermore, it is worth to notice that

(

X

)

2 2X ~ 12

λ −μ μ χ . (2.6)

This useful property can be easily proved by finding the moment-generating function,

and we will show how to use the statistic for our generalized method in next chapter.

2.2 Generalized p-value and generalized confidence interval

The concept of generalized p-value was first introduced by Tsui and Weerahandi (1989) to deal with the statistical testing problem in which nuisance parameters are present and it is difficult or impossible to obtain a nontrivial test with a fixed level of significance. The setup is as follows. Let X be a random quantity having a density function f X ζ , where

( )

ζ=

(

θ, η

)

is a vector of unknown parameters, θ is the parameter of interest, and η is a vector of nuisance parameters. Suppose we are interested in testing the null hypothesis

0: 0 versus 1: 0

H θ θ≤ H θ θ> , (2.7)

where θ0 is a specified value.

Let x denote the observed value of X and consider the generalized test variable R X

(

; , x ζ

)

, which depends on the observed value x and the parameters ζ , and satisfies the following requirements:

(i) robs =R

(

x x; , ,θ η

)

does not depend on unknown parameters.

(ii) For fixed x and ζ=

(

θ, η

)

, the distribution of R X

(

; , x ζ

)

is independent of

the nuisance parameters η .

(iii)For fixed x and η , P R

( (

X; , x ζ

)

rθ

)

is either increasing or decreasing in

θ for any given r . (2.8)

Under the above conditions, if R X

(

; , x ζ

)

is stochastically increasing in θ, then the

generalized p-values for testing the hypothesis in (2.8) can be defined as

( )

nuisance parameters. (2.10)

Then, we say R*

(

X; , , x θ η

)

is a generalized pivotal quantity. If r1 and r2 are

For further derails and for several applications based on the generalized p-value, we refer to the book by Weerahandi (1995).

Chapter 3 Inferences on one population of Inverse Gaussian

In this chapter, we will provide inference on parameters μ and λ of inverse Gaussian distribution based on a generalized test variable and generalized pivotal quantity. In addition, Chhikara and Folks’s (1989) method will be briefly introduced in this chapter as well.

3.1 Methods based on the generalized test variable and generalized pivotal quantity

3.1.1 Inferences on μ

Suppose X X1, 2,...,X is a random sample from n IG

(

μ λ,

)

, where μ and λ

are unknown. The sufficient statistics

( )

Consider the problem of significance testing of hypotheses

0: 0

H μ μ= versus H1:μ μ≠ 0 (3.2)

when λ is unknown. Since a generalized test variable can be a function of all unknown parameters, we can construct the random variable R X V x v

(

, ; , , ,μ λ

)

based

on the random independent quantities

2

as mentioned in (2.5) and (2.6), respectively. For facilitation, we define U as

( )

2 2

which is chi-square distribution with 1 degree of freedom, then the generalized test variable for testing (3.2) can be deduced as following equation:

2

dependent on λ. Besides, for fixed ,x v and λ, P R X V x v⎡⎣

(

, ; , , ,μ λ

)

r⎤⎦ is increasing in μ . Therefore, R satisfies the three conditions in (2.8), R is a generalized test variable which can be applied for testing the null hypothesis

0: 0

A generalized pivotal quantity in interval estimation can be treated as a counterpart of generalized test variable in significance testing of hypotheses. Because the distribution of R X V x v

(

, ; , , ,μ λ

)

does not depend on any unknown parameters nuisance parameter λ , so R is indeed a generalized pivotal quantity satisfying the conditions in (2.10). Therefore, we can construct the 100 1

(

α

)

% confidence

1 1 distribution with 1, n-1 degrees of freedom.

Thus

3.1.2 Inferences on λ

Now consider the significance test of the hypothesis H0:λ λ= 0 versus

the generalized test variable based on

2

~ n 1

W ≡λV χ , (3.9)

with

generalized test variable which satisfies the three conditions in (2.8). The generalized p-value for testing the null hypothesis H0:λ λ= 0 versus H1:λ λ≠ 0 can be

On the other hand, if we are interested in constructing confidence interval of λ,

(

; ,

)

T V v λ can be used as a generalized pivotal quantity. Because the observed value

of T V v

(

; ,λ

)

is λ and T V v

(

; ,λ

)

satisfies the two conditions in (2.10), the

( )

100 1α % equal tail confidence interval for λ is

( ) ( )

{

T v;α 2 ,T v;1α 2

}

,

2 2 the thγ quantile of chi-square distribution with n-1 degrees of freedom.

3.2 Methods based on Chhikara and Folks (1989)

3.2.1 Inferences on μ

Suppose X=

(

X X1, 2,...,Xn

)

is from IG

(

μ λ,

)

, the joint density function of when λ is unknown can equivalently be stated as follow:

'

and

Moreover, this critical region is n-1 degrees of freedom. (Chhikara and Folks, 1976)

It is interesting to note that the critical region in (3.18) is equivalent to

2

which is the same as our result in (3.6). Thus we can conclude that our procedure is easily applicable.

On the other hand, according to Chhikara and Folks (1989), the confidence intervals for the parameter μ can be obtained by inverting the acceptance regions.

Therefore, when λ is unknown, it follows from (3.18) that the 100 1

(

α

)

percent

3.2.2 Inferences on λ

Roy and Wasan (1968) derived the UMP-unbiased test for 0

0

denotes the chi-square distribution function with n-1 degrees of freedom, and then C1 and C2 are uniquely determined from using tables of the chi-square distribution. Thus, for the equal tail test, C1 and C2 can be obtained by solving is the rth quantile of chi-square distribution with n-1 degrees of freedom. Therefore the p-value is p=2* min

{

Pχn21>λ0v,Pχn21<λ0v⎦ and the

}

100 1

(

α

)

%

We note that these results are equivalent to our results in (3.11) and (3.12).

Chapter 4 Inferences on two populations of Inverse Gaussian

Although there has been a rapid growth in IG, the problem about making inference to the ratio of two IG means still need to be investigated. As the scale parameters λ and 1 λ of two independent populations are the same, i.e. 2 λ λ1= 2, the two-sided exact confidence interval of θ μ μ= 1 2 has been discussed by Chhikara and Folks (1989). However, it is not practical to expect two IG populations to have the identical scale parameter all the time. Recently, Tian and Wilding (2005) presented an approximate approach to construct the confidence interval of θ μ μ= 1 2 of two independent IG populations based on the modified

directed likelihood ratio method (Barndorff-Nielsen, 1986). Nevertheless, the exact property of θ μ μ= 1 2 deserves further study. Therefore, in this chapter we will provide an exact and convenient method based on generalized p-value and generalized confidence interval to perform the hypothesis testing and then construct confidence intervals for θ μ μ= 1 2 and the ratio of two scale parameters δ λ λ= 1 2. In this chapter, we will also briefly introduce some methods in the literature which will be utilized to compare with our procedure in numerical examples and simulation studies.

4.1 Methods based on the generalized test variable and generalized

X X X be independent random samples from IG

(

μ λ and 1, 1

)

IG

(

μ λ , respectively, where 2, 2

)

μi and λi are unknown and possible unequal with i=1, 2. The independent sufficient statistics are given by

( )

Suppose we are interested in making inference in the parameter θ μ μ= 1 2, consider the following hypothesis testing:

1 that the generalized test variable for two populations of inverse Gaussian is parallel that it for one population of IG in (3.5). In fact, the generalized test variable (3.5)

and its observed value

2 Therefore we find another more flexible generalized test variable

(

1, 2, ,1 2; ,1 2, ,1 2, 1, 2, 1, 2

)

G X X V V x x v v μ μ λ λ ≡G which is constructed by two independent statistics G X V x v1

(

1, ; , ,1 1 1 μ λ1, 1

)

G1 and G2

(

X V x v2, 2; 2, 2,μ λ2, 2

)

G2.

Above all we deliberate the statistic Gi based on the random independent which has been mentioned in (2.5) and (2.6), respectively. Since ~ 2 1

i ni

B χ and

2

~ 1

Ui χ , then one part of the generalized test variable for testing (4.2) can be deduced as following equation: used as a generalized test variable in one population case and the result is equivalent to what we got in Chapter 3.

Eventually, since G1 and G2 are independent generalized test quantities, the generalized test variable G can be defined as follows:

1 2 noted that the distribution of G is independent of the nuisance parameters λ or 1

λ , and the observed value 2

(

1 2 1 2 1 2 1 2 1 2 1 2

)

1

(2.8), G is a generalized test variable which can be applied for testing the hypothesis 0 1 0

1 1

We next consider the problem of interval estimation for μ μ based on 1 2 generalized pivotal quantity. Since the observed value of G is μ μ , the 1 2 parameter of interest, and the properties of G fulfill the requirements in (2.10), thus

G in (4.5) is indeed a generalized pivotal quantity which can be used to construct a generalized confidence interval. Therefore the 100 1

(

α

)

% equal tail confidence interval for μ μ can be computed by 1 2 necessary. Since the property of IG does not hold for the location change, it is hard to make inferences for the mean difference without any restriction. On the contrary, our procedure is readily applicable and easy to use to deal with mean difference problem without any restriction.

4.1.2 Inferences on λ λ 1 2

It is also an interesting problem concerning the parameter δ λ λ= 1 2. Consider the hypothesis employed in one population case can be applied to the two populations’ case as well.

Similarly, since distribution of random variable T is free of nuisance parameters, the observed value *

( )

variable which satisfies the three conditions in (2.8). Therefore T is indeed a * generalized test variable and can be used to test the hypothesis in (4.8). The generalized p-value for testing (4.8) can be computed by

{

* * 0 * * 0

}

Furthermore, in order to construct confidence interval of λ λ , 1 2

( )

where T*

(

v v1, 2;γ stands for the th

)

γ quantile of T*

(

V V v v1, 2; ,1 2,ψ which is

)

defined in (4.10).

4.2 Methods based on Chhikara and Folks (1989)

4.2.1 Inferences on μ μ1 2

Under the restriction of λ λ1= 2 =λ and 12 22

1 2

λ λ ξ

μ = μ = , ξ is a constant, Chhikara and Folks (1989) derived a UMP-unbiased tests by constructing critical points of their rejection regions using percentage points of Student’s t distribution.

For the significance size α test of H012 versus H1:μ1μ2 ,

The test can be extended to compare the two inverse Gaussian means in terms of their ratio. This follows because of the property that density function

(

; ,

)

1

(

;1,

)

It is straightforward to express the UMP-unbiased test procedures in terms of θ0 obtained by inverting the acceptance regions of these tests at level α . When λ is unknown, the confidence interval for θ0 is given by

and 1 1

(

11 11

)

For more details, we refer to the paper and the book by Chhikara and Folks (1975, 1989), respectively.

( )

It is straightforward to construct a confidence interval for 1

2 degrees of freedom. Let

1 2

confidence interval for 1

2

It is interesting to note that the result in (4.18) is the same as our result in (4.14). In our procedure, the pivotal quantity (4.10) of 1

2

− , the quantile points which satisfys

*

for 1

The signed log likelihood ratio has been discussed by many authors, McCullagh (1982), Petersen (1981), Pierce and Schafer (1986), and Barndorff-Nielsen (1986) etc., to obtain a statistic which is asymptotically standard normally distributed with error of order O n( 3/ 2) by repeated sampling. Tian and Wilding (2005) provided an estimating approach for constructing a confidence interval of μ μ based on the 1 2 directed likelihood ratio method. The procedure is as follows.

Suppose the ratio of the two means is the parameter of interest, that is

1 2

θ μ μ= and the vector of nuisance parameters is η=

(

μ λ λ2, 1, 2

)

and ζ=

(

θ,η

)

.

Let Yij =1 Xij, 1,..., ; 1, 2j= n ii = , then Y and 1 j Y are two independent samples 2 j from RRIG

(

μ λ and 1, 1

)

RRIG

(

μ λ , respectively, where RRIG means the 2, 2

)

reciprocal root IG distribution. The log-likelihood function is

( ) (

1 2

)

1 1 2 2 1 1

where

The maximum likelihood estimates of the parameters of (4.19) are

(

S n1 1

) (

S n2 2

)

θ)=

, μ)2 =S2 n2 , λ)1=1 T n

(

1 1n S1 1

)

and λ)2 =1 T n

(

2 2n S2 2

)

. For a given value of θ , the constrained maximum likelihood estimates of the

nuisance parameters η=

(

μ λ λ2, 1, 2

)

can be obtained by solving

Chapter 5 Numerical Examples and Simulation Studies

Some IG data are given to compare our procedure with other methods with respect to their confidence intervals and confidence lengths. Several simulation studies are also presented to compare the performances of three methods, (1) Chhikara and Folks (2) Tian and Wilding (3) the generalized approaches, in terms of their coverage probabilities, expected lengths and the Type I error.

5.1 Numerical examples

Example 1.

Gacula and Kubala (1975) reported certain sensory failure data for two refrigerated food products, M and K as these were called, and studied their shelf life which fit the IG distribution well. The summary data are given in Table 1 and the 95%

confidence intervals for three methods are presented in Table 2.

Table 1. Summary data

Product size μˆ λ ˆ

M 26 42.885 18.622

K 17 56.941 14.881

Table 2. 95% confidence intervals and lengths for 1

2

θ μ

=μ

Method

θ ˆ

95% confidence interval length

Chhikara 0.771 ( 0.635 , 0.905 ) 0.270

Directed 0.753 ( 0.562 , 0.962 ) 0.400

Generalized 0.755 ( 0.554 , 0.977 ) 0.422

Example 2.

Four sets of IG data presented in Folks and Chhikara (1978) who judged that the data are very well described by the Inverse Gaussian distribution. The first set, data (1), gives fracture toughnesses of MIG welds. The second set, data (2), gives data of precipitation (inches) from Jug Bridge, Maryland. The third set, data (3), gives runoff amounts at Jug Bridge, Maryland. Additionally, Gacula and Kubala (1975) gave data (4) on shelf-life of a food product. The summary data for four sets of IG data are shown in Table 3. For investigating the ratio of means of two independent populations when the scale parameters are more different than those in Example 1, we will compare the means of these four data sets mutually and show the results in Table 4.

Table 3. The summary data for four data sets

data size μˆ λ ˆ

( 1 ) 19 74.300 4924.070

( 2 ) 25 2.160 8.080

( 3 ) 25 0.800 1.440

( 4 ) 26 42.885 484.253

Table 4. 95% confidence intervals and lengths for 1

2

θ μ

=μ

(2)/(1)

θ ˆ

95% confidence interval length

Chhikara 0.0303 ( 0.020 , 0.040 ) 0.020

Directed 0.0290 ( 0.024 , 0.036 ) 0.012 Generalized 0.0294 ( 0.024 , 0.037 ) 0.014

(3)/(1)

θ ˆ

95% confidence interval length

Chhikara 0.0118 ( 0.006 , 0.017 ) 0.011

Directed 0.0108 ( 0.008 , 0.016 ) 0.008 Generalized 0.0111 ( 0.008 , 0.016 ) 0.008

(4)/(1)

θ ˆ

95% confidence interval length

Chhikara 0.5857 ( 0.469 , 0.702 ) 0.233

Directed 0.5771 ( 0.509 , 0.661 ) 0.152 Generalized 0.5796 ( 0.505 , 0.667 ) 0.162

(3)/(2)

θ ˆ

95% confidence interval length

Chhikara 0.4046 ( 0.154 , 0.655 ) 0.501

Directed 0.3726 ( 0.262 , 0.558 ) 0.296 Generalized 0.3812 ( 0.258 , 0.564 ) 0.306

(2)/(4)

θ ˆ

95% confidence interval length

Chhikara 0.0521 ( 0.034 , 0.070 ) 0.036

Directed 0.0503 ( 0.040 , 0.066 ) 0.026 Generalized 0.0509 ( 0.040 , 0.065 ) 0.025

(3)/(4)

θ ˆ

95% confidence interval length

Chhikara 0.0201 ( 0.011 , 0.029 ) 0.018

Directed 0.0187 ( 0.014 , 0.028 ) 0.014 Generalized 0.0193 ( 0.014 , 0.028 ) 0.014

From Example 1 and Example 2, the results show that the confidence lengths obtained by the generalized methods are the smallest or close to the smallest confidence lengths no matter what the scale parameters perform when two IG populations are non-homogeneous. Some simulation studies are also worth to be inspected, and we will make discussion in next subsection.

5.2 Simulation studies

Some simulation studies are performed to compare the 95% coverage probabilities, expected lengths and type I errors of three procedures for the ratio of two means, θ μ μ= 1 2 . We will choose different combinations of sample sizes

(

n n1, 2

) (

= 5,10 , 10,5 and 10,10

) ( ) ( )

, respectively, and various values of the ratio of

scale parameters, λ λ , with 1,000 replicates for each combination. The results 1 2 appear in Tables 5-9. In addition, we will present powers of the tests obtained by the generalized method in Table 10.

Table 5. Coverage probabilities (CP) and expected lengths (length) of 95% confidence intervals of 1

1 Generalized methods ○2 Chhikara and Folks (1989) ○3 Directed likelihood ratio statistic

Table 6. Coverage probabilities (CP) and expected lengths (length) of 95% confidence

1 Generalized methods ○2 Chhikara and Folks (1989) ○3 Directed likelihood ratio statistic

Table 7. Coverage probabilities (CP) and expected lengths (length) of 95% confidence

1 Generalized methods ○2 Chhikara and Folks (1989) ○3 Directed likelihood ratio statistic

Table 8. Type I error for testing H0:θ θ= 0 versus H1:θ θ≠ , 0 1

λ Generalized○1 Chhikara○2 Directed○3

( 5 , 10 ) 0.5 0.04 0.05 0.08

1 Generalized methods ○2 Chhikara and Folks (1989) ○3 Directed likelihood ratio statistic

Table 9. Type I error for testing H0:θ θ= 0 versus H1:θ θ≠ , 0 1

λ Generalized○1 Chhikara○2 Directed○3

( 5 , 10 ) 0.5 0.05 0.17 0.09

1 Generalized methods ○2 Chhikara and Folks (1989) ○3 Directed likelihood ratio statistic

Table 10. Simulated powers for testing H0:θ =1 versus H1:θ≠ , 1 1

From Table 5 to Table 10, we can conclude that the coverage probabilities obtained by generalized methods are very close to the nominal level 95% and the

From Table 5 to Table 10, we can conclude that the coverage probabilities obtained by generalized methods are very close to the nominal level 95% and the

相關文件