• 沒有找到結果。

2 Difficulties of Previous Procedure on Simulating Non-normal

variable x could be generated with the specified parameters.

After the thresholds are chosen, also the probability distribution is decided, a

simulated data set with specified sample size could be generated through sampling via

inversion of the cumulative mass function (cmf) of the variable x. Since the cmf is

uniformly distributed over the range between 0 and 1 (Allen, et al., 1996), the random

number y from uniform distribution over the range from 0 to 1 can be sampled and the

value of x can be decided as follows,

A variable of sample size w would be generated after y is generated for w times.

With this procedure, a simulated discrete data set with specified parameters used for

Monte Carlo researches could be generated.

Section 4 2 Difficulties of Previous Procedure on Simulating Non-normal Approximated Discrete Data

The procedure introduced in previous section is the most widely used procedure

in simulating non-normal approximated data. Since deciding the thresholds equals to

deciding the probabilities and it’s easier to imagine probability distributions and the

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

corresponding parameters with probability of each category, the probabilities would

be used to describe in following discussion.

Through previous discussion, it can be seen that deciding the threshold values

also the probability distribution is the critical step in the process of generating a

discrete variable with specified parameters. However, to generate probability

distribution with specified parameters through deciding these values could be a

process much harder than imagining and 2 difficulties would be encountered in the

process.

First, the values of the probability of each category are hard to decide for the

precise parameters. It’s be discussed that the probability of each category of this

probability distribution need to be decided carefully to obtain a discrete probability

distribution with specify the parameter values. Any slight change of each probability

would change the values of the parameters. Take the most frequently specified

parameters, skewness and kurtosis as example. If the effect of the skewness and

kurtosis to a statistic is researched through Monte Carlo method, 2 discrete probability

distributions are needed to be set. Since there is an inequality of skewness and

kurtosis (K. Pearson, 1916), the values of kurtosis are restricted by the values of

skewness, the 2 discrete probability distributions would be a skewed and kurtotic one

and a non-skewed and kurtotic one. The skewed one would be decided first, and the

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

corresponding kurtosis would be calculated, then values of skewness and kurtosis of

the the non-skewed would also be decided. The probability of the skewed discrete

probability distribution is set as .05, .05, .05, .10, and .75 and the corresponding

skewness is -2.028 and kurtosis is 2.898. As described above, the parameters of the

non-skewed probability distribution thus are set as 0 and 2.898. Since the kurtosis

relates to the height of a distribution, it’s a reasonable guess that the probabilities of

the non-skewed probability distribution are .05, .075, .75, .075, and .05. The skewness

of this distribution is 0 but the kurtosis is 2.785 and thus different to the set

parameters. Therefore, the effect of the skewness and kurtosis to the statistic are still

in question because the skewness and kurtosis are both different. This is exactly the

case of Muthén and Kaplan’s (1985) robustness research. In this research, Muthén and

Kaplan indicated that the effect of the 2 parameters still needed to be further

researched.

It can be seen through this example that deciding the discrete probability

distribution with specified parameters precisely is quite difficult. The improper

probabilities deciding could result in imprecise parameters of generated variables

which in turn would lead to the confounding consequences and the limitation of their

application. Therefore, a procedure which could decide the set of probabilities to

obtain the specified parameters precisely is necessary and useful for the robustness

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

researches of the effects of the parameters.

Second, there are infinite discrete probability distributions with the same

parameters and one of them need to be selected for robustness researches. There are

k-1 probabilities need to be decided for a discrete probability distribution with k

categories. The formulas of the m specified parameters could be expressed as m

constraints need to be solved when deciding the k-1 probability. It also equal to

deciding the k probabilities with the m+1 constraints where

k

i

pi 1

1 is also seen as

one of the constraints. With there are k categories and m+1 constraints, a unique

solution of probabilities will be yielded if k=m+1. A unique solution would also be

yielded when k<m+1 when the information is redundant, but no solution would be

realized under most conditions. When k>m+1, a common occurrence in many

simulation researches, there are infinite solutions. In this situation, the greatest

problem is which solution to choose. Therefore, a procedure is needed for choosing a

proper probability distribution.

It can be seen through the discussion above that 2 difficulties would be

encountered to conducting Monte Carlo robustness researches of the effects of

parameters. One is to estimate the discrete probability distribution with the specified

parameters precisely and the other is to choose a discrete probability distribution from

infinite discrete probability distribution with the same parameters. In order to

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

overcome these difficulties, a procedure which can estimate the discrete probability

distribution with specified parameters precisely and choose the one with some

attractive characteristics is needed.

Section 5 the Research Propose

It can be seen through previous discussion that not only the probabilities

deciding process is hard, but also mostly infinite probability distributions could be

chosen when Monte Carlo method is applied for robustness researches of the impact

of the discrete data with specified parameters. Therefore, a procedures which can

overcome these difficulties, estimate the probability distribution with specified

parameters precisely and choose a discrete probability distribution from infinite

discrete probability distributions with some reasonable characteristics, is needed.

A procedure called the Maximum Entropy Procedure (MEP) is introduced to

overcome these difficulties. The MEP estimates a univariate discrete probability

distribution with specified parameters precisely and the distribution estimated through

MEP is with reasonable and attractive characteristics when k>m+1 which makes it a

considerable procedure for simulating data for robustness researches. The definition

of this procedure and the rationale of choosing the discrete probability distributions

will be discussed in the next chapter.

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

Chapter 2 the Characteristics of the Maximum Entropy Procedure (MEP)

The MEP is proposed to estimate the discrete probability distributions precisely

from a set of infinite solutions with constraints precisely when the number of

constraints m+1 is smaller than the number of categories k. The definition of the MEP

and the characteristics of the chosen distributions are to be discussed in this chapter.

Section 1 the Definition of the Maximum Entropy Procedure

The Maximum Entropy Procedure (MEP) is the univariate procedure to estimate

the discrete probability distribution with the specified parameters precisely based on

the maximum entropy principle. The definition of information entropy will be

introduced first since it is central to the maximum entropy principle and also the MEP.

In 1948, Shannon defined a function of the information entropy to measure the

uncertainty of a probability distribution. For a discrete probability function, the

information entropy is defined as:

k

i

i i

k p p

p p H

1 1,..., ) ln

( , (4)

wherep ,....1 pkare the probabilities of k ordered categories of a variable. When

calculating the information entropy, thelnpiis defined as 0 if thep is 0. The lower i

bound of information entropy is 0 and the upper bound depends on the number of

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

categories of the variable. The information entropy is 0 when the probability of one

category is 1 and the others are 0, which also means no uncertainty. The bigger the

value of information entropy, the higher the uncertainty the probability distribution

has. The maximum entropy principle is applied in choosing the probability

distribution with the maximum information entropy satisfying the specified

constraints. The chosen probability distribution is called the maximum entropy

distribution. The MEP estimates the maximum entropy distribution with specified

parameters.

Section 2 the Rationale of Choosing the Maximum Entropy Distributions

When the robustness researches of the impact of non-normality in discrete

variables are conducted, the only information manipulated and discussed are the

number of categories and the specified parameters of the variables, such as the values

of skewness and kurtosis (e.g. Muthén & Kaplan, 1985; Muthén & Kaplan, 1992;

Olsson, 1979; Ory & Mokhtarian, 2010). If only the information is what we know, the

chosen discrete probability distribution would be the one decided only by the

information. With this information, the maximum entropy distribution is a

considerable choice because of following characteristics.

The uncertainty exists and is reduced through the help of information we know

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

and should only reduce through more information known. If the known information

do not fully determine a distribution, choosing the probability distribution with the

most uncertainty is prudent since the uncertainty is due to the information that is not

available. We have no certain about it and should not assume it. We should maximally

uncertain about what we do not know and choose the one with most uncertainty.

Therefore, the maximum entropy distributions chosen through maximum entropy

principle which retains the most uncertainty is the most prudent and honest choice

(Golan, Judge, & Miller, 1996; Jaynes, 1982; Kapur & Kesavan, 1992; Kesavan &

Kapur, 1989).

In addition, the maximum entropy distributions are the one most frequently

appear in the probability distributions satisfying the same information. Although

infinite probability distributions exist with the information of specified parameters,

the probability distributions which appears seldom is apparently not the probability

distribution we imagine and thus should not be the choice. It’s intuitive that if a

probability distribution appears more often, it is more representative of the probability

distribution satisfying the same information of constraints of the specified parameters.

Therefore, the probability distribution which appears most would seem as a

reasonable choice. The probability distribution most frequently appeared in which

with the same specified constraints is the maximum entropy distribution (Golan, et al.,

1996; Jaynes, 1982; Wu, 1997; Zellner & Highfield, 1988) and thus it is a reasonable

choice with the known information. This can be demonstrated as follows. Let a

discrete probability distribution of k categories consist of N trials (limit N→∞), and

the number of trials of each category would be n1 ,n2,...,nk. As a constraint,

There would be k original discrete probability distributions. The number of N

each discrete probability distribution appears is

!

Therefore, the discrete probability which maximize W is the one most frequently seen.

We can solve the problem by maximizing the monotonic increasing function of W as

follows:

seen that the formula (7) is the same as the formula (4). Therefore, the maximum

entropy distribution which maximizes formula (4) is the one with the greatest number

of the probability distributions satisfying the same specified constraints (Golan, et al.,

1996; Wu, 1997; Zellner & Highfield, 1988).

In addition, according to the Entropy Concentration Theorem (Jaynes, 1982), any

probability distribution other than the maximum entropy distribution appears quite

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

less thus is highly atypical. This theorem proved that: asymptotically, F% of

probability distributions satisfying the specified constraints will yield the values of

information entropy in the range

max 2

1

max H H(p ,p ,...,p ) H

H   k  (8)

where Hmaxis the value of the entropy of the maximum entropy distribution and

) 1 ( 2N 1x2 1 F

Hk m

(9)

where N is the number of trails and 2N1xk2m1(1F) is the chi-squared value for

k-m-1 degrees of freedom at the upper tail area(1F). With the mean value is

specified to be 4.5 of a 6 categories probability distribution of 1000 trails, the

information entropy of the maximum entropy distribution isHmax 1.61358. Applying

the concentration theorem, we have 6-1-1=4 degree of freedom; 95% of all probability

distributions satisfying the constraint have information entropy in a range of

widthH 2N1xk2m1(0.05)0.00474. Thus 95% of the probability distributions

satisfying the constraint have information entropy in the range1.609 H 1.614. It

can be seen that the possible probability distributions are concentrated strongly near

the one of maximum entropy. Moreover, the range of entropy is smaller as N becomes

bigger. As N→∞, any probability distribution other than one of maximum entropy

thus becomes highly atypical of those satisfying the specified constraints (Jaynes,

1982; Zellner & Highfield, 1988).

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

Since the only thing we know in simulating random data is not the real

probability distribution of the variable but the constraints, the maximum entropy

distribution which retain the most uncertainty is the most prudent choice. Moreover, it

is the most frequently seen probability distribution from the infinite distributions

satisfying the same specified constraints and thus most typical and representative. In

contrast, the Entropy Concentration Theorem proves that the probability distributions

other than the maximum distribution are highly atypical. Because of the properties of

the maximum entropy distributions, many researchers (Golan, et al., 1996; Jaynes,

1957a, 1957b, 1982; Kapur & Kesavan, 1992; Theil & Fiebig, 1984; Wu, 1997)

suggested using the maximum entropy principle in choosing the unknown distribution

with known constraints.

In addition to these properties, the characteristics of the shapes of the maximum

information distribution also make it a favorite choice to simulate the discrete

probability distributions generated from the categorized measures. It’s believed that

the discrete probability distribution generated through well designed categorized

measures in the real world should not be lumpy nor have empty category since the

underlying distribution is continuous (Micceri, 1989). The nature of the discrete

maximum entropy distribution fits all these descriptions and is discussed as follows.

The information entropy is maximizing with other specified constraints fulfilled

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

in a discrete probability distribution. With no constraints, the information entropy

achieves its maximum when eachp is the same and there is a discrete uniform i

distribution. When some constraints are specified, the value of eachp changes with i

the information entropy is maximized simultaneously. Therefore, eachp changes to i

the value closest to the one of a discrete uniform distribution and satisfying the

specified constraints. Because of this, the discrete maximum entropy distribution is

smoother than any other distribution satisfying the same constraints (Jaynes, 1982).

In addition, because the information entropy does not rise with an empty

category, the probability distribution with the maximum entropy is the one with least

number of empty categories. Therefore, unless the constraints aren’t satisfied without

empty category, the maximum entropy distribution have no empty category (Jaynes,

1957a).

The characteristics of smooth and no empty category of the maximum entropy

distributions fit to the belief of the discrete probability distribution generated through

well designed categorized measures. Therefore, the discrete maximum entropy

distribution is the suitable choice to simulating the discrete probability distributions of

psychological researches.

In addition, the set of maximum entropy distributions of difference constraints

consists of some commonly seen distributions, such as the continuous and discrete

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

uniform distribution, beta distribution, exponential distribution, gamma distribution,

Possion distribution, and normal distribution. All of these are members of the

maximum entropy family subjecting to different constraints (Theil & Fiebig, 1984).

This phenomenon implies that the maximum entropy principle might be a general

principle for probability distributions.

In order to illustrate the characteristics of the maximum entropy distribution, the

5 category discrete probability distributions in Muthén’s research (Muthén & Kaplan,

1985) and the discrete maximum entropy distributions with the same skewness and

kurtosis are presented in Figure 1 to Figure 5. In addition, the maximum entropy

probability distribution of the skewness 0 and kurtosis 2.898 is presented in Figure 6.

It can be seen through these figures that the maximum entropy distributions are

smoother than the Mutén’s distributions.

Through the discussion above, it can be seen that the discrete maximum entropy

distributions are typical, smooth, and have no empty category. All these properties

make them a proper choice from infinite probability distributions satisfying the

specified constraints. Therefore, the MEP which chooses the discrete maximum

entropy probability distributions with specified constraints is also a reasonable and

attractive procedure to generate data with specified parameters for Monte Carlo

method of the robustness researches.

Section 3 the Solution of the Maximum Entropy Procedure

Since the MEP is a reasonable and suitable procedure for choosing the discrete

probability distributions with specified constraints for psychological data, the way to

obtain the maximum entropy distribution is the next problem when the MEP is

implemented. Traditionally the Lagrange multiplier is used to solve this problem

(Jaynes, 1957b; Kapur & Kesavan, 1992; Kesavan & Kapur, 1989). The Lagrange

multiplier provides a strategy for finding the maximum or minimum of a function

subject to constraints and thus is suitable for this problem. The Lagrange function to

maximize the information entropy of discrete probability distribution subjects to

specified constraints is defined as,

)

is a constraint 1

1

k

i

pi , and thus the total number of constraints is m+1. By solving the partial differentiations of each variables of the Lagrange function equal to 0 as

follows,

the solutions of p1 to pk which satisfying the m+1 constraints can be attained. After the

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

probabilities p1 to pk are attained, the data from the discrete maximum entropy

probability distribution with the specified parameters can be generated through

sampling via inversion of the cumulative mass function (cmf) discussed in previous

chapter.

Section 4 the Maximum Entropy Procedures Proposed in this Research

Through the discussion of previous sections, it can be seen that the properties of

the discrete maximum entropy distributions makes the MEP a reasonable and suitable

discrete probability distribution choosing procedure for simulating data with specified

parameters for Monte Carlo researches. The solutions of the maximum entropy

distributions are also presented make the implementation of the MEP realistic.

Therefore, the MEP would be proposed to choose the discrete probability distribution

with the specified parameters. The sets of parameters chosen for the implementation

of MEP in this research are the set of 4 parameters, mean, variance, skewness and

kurtosis and the set of 2 parameters, skewness and kurtosis.

The set of the 4 parameters are chosen for the MEP proposed in this research

because most real-world distributions are characterized by mean, variance, skewness

and kurtosis as noted by Fleishman (1978). In addition, the mean and variance of the

discrete data could not be simply transformed as these parameters of the continuous

‧ 國

立 政 治 大 學

Na tiona

l Ch engchi University

data. With different mean and variance, the shapes of discrete probability distributions

are different even with same skewness and kurtosis. With the 4 parameters specified,

the discrete probability distributions specified are less uncertain since there is more

information. Therefore, the Maximum Entropy Procedure with 4 parameters (MEP-4)

is proposed with the 4 parameters are specified in this research for Monte Carlo

researches with all the information is known.

The set of the 2 parameters are chosen for the MEP proposed in this research

because the degree of non-normality is most often evaluated only by the skewness and

kurtosis values (Muthén & Kaplan, 1985). Because the mean and variance values of

normal distributions can be set arbitrarily, most researchers also only manipulated

these values in their robustness studies (e.g., Curran et al., 1996; Hampel, 1973; Lei &

Lomax, 2005). Therefore, to generate variables for robustness research of

non-normality, the Maximum Entropy Procedure with these 2 parameters specified

(MEP-2) is proposed.

In this research, the MEP is proposed for generating the discrete data with the

specified parameters for robustness researches. The sets of the specified parameters

are the set of mean, variance, skewness and kurtosis and the set of skewness and

kurtosis. The details of these procedures and the statistical program used to implement

kurtosis. The details of these procedures and the statistical program used to implement