variable x could be generated with the specified parameters.
After the thresholds are chosen, also the probability distribution is decided, a
simulated data set with specified sample size could be generated through sampling via
inversion of the cumulative mass function (cmf) of the variable x. Since the cmf is
uniformly distributed over the range between 0 and 1 (Allen, et al., 1996), the random
number y from uniform distribution over the range from 0 to 1 can be sampled and the
value of x can be decided as follows,
A variable of sample size w would be generated after y is generated for w times.
With this procedure, a simulated discrete data set with specified parameters used for
Monte Carlo researches could be generated.
Section 4 2 Difficulties of Previous Procedure on Simulating Non-normal Approximated Discrete Data
The procedure introduced in previous section is the most widely used procedure
in simulating non-normal approximated data. Since deciding the thresholds equals to
deciding the probabilities and it’s easier to imagine probability distributions and the
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
corresponding parameters with probability of each category, the probabilities would
be used to describe in following discussion.
Through previous discussion, it can be seen that deciding the threshold values
also the probability distribution is the critical step in the process of generating a
discrete variable with specified parameters. However, to generate probability
distribution with specified parameters through deciding these values could be a
process much harder than imagining and 2 difficulties would be encountered in the
process.
First, the values of the probability of each category are hard to decide for the
precise parameters. It’s be discussed that the probability of each category of this
probability distribution need to be decided carefully to obtain a discrete probability
distribution with specify the parameter values. Any slight change of each probability
would change the values of the parameters. Take the most frequently specified
parameters, skewness and kurtosis as example. If the effect of the skewness and
kurtosis to a statistic is researched through Monte Carlo method, 2 discrete probability
distributions are needed to be set. Since there is an inequality of skewness and
kurtosis (K. Pearson, 1916), the values of kurtosis are restricted by the values of
skewness, the 2 discrete probability distributions would be a skewed and kurtotic one
and a non-skewed and kurtotic one. The skewed one would be decided first, and the
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
corresponding kurtosis would be calculated, then values of skewness and kurtosis of
the the non-skewed would also be decided. The probability of the skewed discrete
probability distribution is set as .05, .05, .05, .10, and .75 and the corresponding
skewness is -2.028 and kurtosis is 2.898. As described above, the parameters of the
non-skewed probability distribution thus are set as 0 and 2.898. Since the kurtosis
relates to the height of a distribution, it’s a reasonable guess that the probabilities of
the non-skewed probability distribution are .05, .075, .75, .075, and .05. The skewness
of this distribution is 0 but the kurtosis is 2.785 and thus different to the set
parameters. Therefore, the effect of the skewness and kurtosis to the statistic are still
in question because the skewness and kurtosis are both different. This is exactly the
case of Muthén and Kaplan’s (1985) robustness research. In this research, Muthén and
Kaplan indicated that the effect of the 2 parameters still needed to be further
researched.
It can be seen through this example that deciding the discrete probability
distribution with specified parameters precisely is quite difficult. The improper
probabilities deciding could result in imprecise parameters of generated variables
which in turn would lead to the confounding consequences and the limitation of their
application. Therefore, a procedure which could decide the set of probabilities to
obtain the specified parameters precisely is necessary and useful for the robustness
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
researches of the effects of the parameters.
Second, there are infinite discrete probability distributions with the same
parameters and one of them need to be selected for robustness researches. There are
k-1 probabilities need to be decided for a discrete probability distribution with k
categories. The formulas of the m specified parameters could be expressed as m
constraints need to be solved when deciding the k-1 probability. It also equal to
deciding the k probabilities with the m+1 constraints where
k
i
pi 1
1 is also seen as
one of the constraints. With there are k categories and m+1 constraints, a unique
solution of probabilities will be yielded if k=m+1. A unique solution would also be
yielded when k<m+1 when the information is redundant, but no solution would be
realized under most conditions. When k>m+1, a common occurrence in many
simulation researches, there are infinite solutions. In this situation, the greatest
problem is which solution to choose. Therefore, a procedure is needed for choosing a
proper probability distribution.
It can be seen through the discussion above that 2 difficulties would be
encountered to conducting Monte Carlo robustness researches of the effects of
parameters. One is to estimate the discrete probability distribution with the specified
parameters precisely and the other is to choose a discrete probability distribution from
infinite discrete probability distribution with the same parameters. In order to
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
overcome these difficulties, a procedure which can estimate the discrete probability
distribution with specified parameters precisely and choose the one with some
attractive characteristics is needed.
Section 5 the Research Propose
It can be seen through previous discussion that not only the probabilities
deciding process is hard, but also mostly infinite probability distributions could be
chosen when Monte Carlo method is applied for robustness researches of the impact
of the discrete data with specified parameters. Therefore, a procedures which can
overcome these difficulties, estimate the probability distribution with specified
parameters precisely and choose a discrete probability distribution from infinite
discrete probability distributions with some reasonable characteristics, is needed.
A procedure called the Maximum Entropy Procedure (MEP) is introduced to
overcome these difficulties. The MEP estimates a univariate discrete probability
distribution with specified parameters precisely and the distribution estimated through
MEP is with reasonable and attractive characteristics when k>m+1 which makes it a
considerable procedure for simulating data for robustness researches. The definition
of this procedure and the rationale of choosing the discrete probability distributions
will be discussed in the next chapter.
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
Chapter 2 the Characteristics of the Maximum Entropy Procedure (MEP)
The MEP is proposed to estimate the discrete probability distributions precisely
from a set of infinite solutions with constraints precisely when the number of
constraints m+1 is smaller than the number of categories k. The definition of the MEP
and the characteristics of the chosen distributions are to be discussed in this chapter.
Section 1 the Definition of the Maximum Entropy Procedure
The Maximum Entropy Procedure (MEP) is the univariate procedure to estimate
the discrete probability distribution with the specified parameters precisely based on
the maximum entropy principle. The definition of information entropy will be
introduced first since it is central to the maximum entropy principle and also the MEP.
In 1948, Shannon defined a function of the information entropy to measure the
uncertainty of a probability distribution. For a discrete probability function, the
information entropy is defined as:
k
i
i i
k p p
p p H
1 1,..., ) ln
( , (4)
wherep ,....1 pkare the probabilities of k ordered categories of a variable. When
calculating the information entropy, thelnpiis defined as 0 if thep is 0. The lower i
bound of information entropy is 0 and the upper bound depends on the number of
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
categories of the variable. The information entropy is 0 when the probability of one
category is 1 and the others are 0, which also means no uncertainty. The bigger the
value of information entropy, the higher the uncertainty the probability distribution
has. The maximum entropy principle is applied in choosing the probability
distribution with the maximum information entropy satisfying the specified
constraints. The chosen probability distribution is called the maximum entropy
distribution. The MEP estimates the maximum entropy distribution with specified
parameters.
Section 2 the Rationale of Choosing the Maximum Entropy Distributions
When the robustness researches of the impact of non-normality in discrete
variables are conducted, the only information manipulated and discussed are the
number of categories and the specified parameters of the variables, such as the values
of skewness and kurtosis (e.g. Muthén & Kaplan, 1985; Muthén & Kaplan, 1992;
Olsson, 1979; Ory & Mokhtarian, 2010). If only the information is what we know, the
chosen discrete probability distribution would be the one decided only by the
information. With this information, the maximum entropy distribution is a
considerable choice because of following characteristics.
The uncertainty exists and is reduced through the help of information we know
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
and should only reduce through more information known. If the known information
do not fully determine a distribution, choosing the probability distribution with the
most uncertainty is prudent since the uncertainty is due to the information that is not
available. We have no certain about it and should not assume it. We should maximally
uncertain about what we do not know and choose the one with most uncertainty.
Therefore, the maximum entropy distributions chosen through maximum entropy
principle which retains the most uncertainty is the most prudent and honest choice
(Golan, Judge, & Miller, 1996; Jaynes, 1982; Kapur & Kesavan, 1992; Kesavan &
Kapur, 1989).
In addition, the maximum entropy distributions are the one most frequently
appear in the probability distributions satisfying the same information. Although
infinite probability distributions exist with the information of specified parameters,
the probability distributions which appears seldom is apparently not the probability
distribution we imagine and thus should not be the choice. It’s intuitive that if a
probability distribution appears more often, it is more representative of the probability
distribution satisfying the same information of constraints of the specified parameters.
Therefore, the probability distribution which appears most would seem as a
reasonable choice. The probability distribution most frequently appeared in which
with the same specified constraints is the maximum entropy distribution (Golan, et al.,
‧
1996; Jaynes, 1982; Wu, 1997; Zellner & Highfield, 1988) and thus it is a reasonable
choice with the known information. This can be demonstrated as follows. Let a
discrete probability distribution of k categories consist of N trials (limit N→∞), and
the number of trials of each category would be n1 ,n2,...,nk. As a constraint,
There would be k original discrete probability distributions. The number of N
each discrete probability distribution appears is
!
Therefore, the discrete probability which maximize W is the one most frequently seen.
We can solve the problem by maximizing the monotonic increasing function of W as
follows:
seen that the formula (7) is the same as the formula (4). Therefore, the maximum
entropy distribution which maximizes formula (4) is the one with the greatest number
of the probability distributions satisfying the same specified constraints (Golan, et al.,
1996; Wu, 1997; Zellner & Highfield, 1988).
In addition, according to the Entropy Concentration Theorem (Jaynes, 1982), any
probability distribution other than the maximum entropy distribution appears quite
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
less thus is highly atypical. This theorem proved that: asymptotically, F% of
probability distributions satisfying the specified constraints will yield the values of
information entropy in the range
max 2
1
max H H(p ,p ,...,p ) H
H k (8)
where Hmaxis the value of the entropy of the maximum entropy distribution and
) 1 ( 2N 1x2 1 F
H k m
(9)
where N is the number of trails and 2N1xk2m1(1F) is the chi-squared value for
k-m-1 degrees of freedom at the upper tail area(1F). With the mean value is
specified to be 4.5 of a 6 categories probability distribution of 1000 trails, the
information entropy of the maximum entropy distribution isHmax 1.61358. Applying
the concentration theorem, we have 6-1-1=4 degree of freedom; 95% of all probability
distributions satisfying the constraint have information entropy in a range of
widthH 2N1xk2m1(0.05)0.00474. Thus 95% of the probability distributions
satisfying the constraint have information entropy in the range1.609 H 1.614. It
can be seen that the possible probability distributions are concentrated strongly near
the one of maximum entropy. Moreover, the range of entropy is smaller as N becomes
bigger. As N→∞, any probability distribution other than one of maximum entropy
thus becomes highly atypical of those satisfying the specified constraints (Jaynes,
1982; Zellner & Highfield, 1988).
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
Since the only thing we know in simulating random data is not the real
probability distribution of the variable but the constraints, the maximum entropy
distribution which retain the most uncertainty is the most prudent choice. Moreover, it
is the most frequently seen probability distribution from the infinite distributions
satisfying the same specified constraints and thus most typical and representative. In
contrast, the Entropy Concentration Theorem proves that the probability distributions
other than the maximum distribution are highly atypical. Because of the properties of
the maximum entropy distributions, many researchers (Golan, et al., 1996; Jaynes,
1957a, 1957b, 1982; Kapur & Kesavan, 1992; Theil & Fiebig, 1984; Wu, 1997)
suggested using the maximum entropy principle in choosing the unknown distribution
with known constraints.
In addition to these properties, the characteristics of the shapes of the maximum
information distribution also make it a favorite choice to simulate the discrete
probability distributions generated from the categorized measures. It’s believed that
the discrete probability distribution generated through well designed categorized
measures in the real world should not be lumpy nor have empty category since the
underlying distribution is continuous (Micceri, 1989). The nature of the discrete
maximum entropy distribution fits all these descriptions and is discussed as follows.
The information entropy is maximizing with other specified constraints fulfilled
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
in a discrete probability distribution. With no constraints, the information entropy
achieves its maximum when eachp is the same and there is a discrete uniform i
distribution. When some constraints are specified, the value of eachp changes with i
the information entropy is maximized simultaneously. Therefore, eachp changes to i
the value closest to the one of a discrete uniform distribution and satisfying the
specified constraints. Because of this, the discrete maximum entropy distribution is
smoother than any other distribution satisfying the same constraints (Jaynes, 1982).
In addition, because the information entropy does not rise with an empty
category, the probability distribution with the maximum entropy is the one with least
number of empty categories. Therefore, unless the constraints aren’t satisfied without
empty category, the maximum entropy distribution have no empty category (Jaynes,
1957a).
The characteristics of smooth and no empty category of the maximum entropy
distributions fit to the belief of the discrete probability distribution generated through
well designed categorized measures. Therefore, the discrete maximum entropy
distribution is the suitable choice to simulating the discrete probability distributions of
psychological researches.
In addition, the set of maximum entropy distributions of difference constraints
consists of some commonly seen distributions, such as the continuous and discrete
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
uniform distribution, beta distribution, exponential distribution, gamma distribution,
Possion distribution, and normal distribution. All of these are members of the
maximum entropy family subjecting to different constraints (Theil & Fiebig, 1984).
This phenomenon implies that the maximum entropy principle might be a general
principle for probability distributions.
In order to illustrate the characteristics of the maximum entropy distribution, the
5 category discrete probability distributions in Muthén’s research (Muthén & Kaplan,
1985) and the discrete maximum entropy distributions with the same skewness and
kurtosis are presented in Figure 1 to Figure 5. In addition, the maximum entropy
probability distribution of the skewness 0 and kurtosis 2.898 is presented in Figure 6.
It can be seen through these figures that the maximum entropy distributions are
smoother than the Mutén’s distributions.
Through the discussion above, it can be seen that the discrete maximum entropy
distributions are typical, smooth, and have no empty category. All these properties
make them a proper choice from infinite probability distributions satisfying the
specified constraints. Therefore, the MEP which chooses the discrete maximum
entropy probability distributions with specified constraints is also a reasonable and
attractive procedure to generate data with specified parameters for Monte Carlo
method of the robustness researches.
‧
Section 3 the Solution of the Maximum Entropy Procedure
Since the MEP is a reasonable and suitable procedure for choosing the discrete
probability distributions with specified constraints for psychological data, the way to
obtain the maximum entropy distribution is the next problem when the MEP is
implemented. Traditionally the Lagrange multiplier is used to solve this problem
(Jaynes, 1957b; Kapur & Kesavan, 1992; Kesavan & Kapur, 1989). The Lagrange
multiplier provides a strategy for finding the maximum or minimum of a function
subject to constraints and thus is suitable for this problem. The Lagrange function to
maximize the information entropy of discrete probability distribution subjects to
specified constraints is defined as,
)
is a constraint 1
1
k
i
pi , and thus the total number of constraints is m+1. By solving the partial differentiations of each variables of the Lagrange function equal to 0 as
follows,
the solutions of p1 to pk which satisfying the m+1 constraints can be attained. After the
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
probabilities p1 to pk are attained, the data from the discrete maximum entropy
probability distribution with the specified parameters can be generated through
sampling via inversion of the cumulative mass function (cmf) discussed in previous
chapter.
Section 4 the Maximum Entropy Procedures Proposed in this Research
Through the discussion of previous sections, it can be seen that the properties of
the discrete maximum entropy distributions makes the MEP a reasonable and suitable
discrete probability distribution choosing procedure for simulating data with specified
parameters for Monte Carlo researches. The solutions of the maximum entropy
distributions are also presented make the implementation of the MEP realistic.
Therefore, the MEP would be proposed to choose the discrete probability distribution
with the specified parameters. The sets of parameters chosen for the implementation
of MEP in this research are the set of 4 parameters, mean, variance, skewness and
kurtosis and the set of 2 parameters, skewness and kurtosis.
The set of the 4 parameters are chosen for the MEP proposed in this research
because most real-world distributions are characterized by mean, variance, skewness
and kurtosis as noted by Fleishman (1978). In addition, the mean and variance of the
discrete data could not be simply transformed as these parameters of the continuous
‧ 國
立 政 治 大 學
‧
Na tiona
l Ch engchi University
data. With different mean and variance, the shapes of discrete probability distributions
are different even with same skewness and kurtosis. With the 4 parameters specified,
the discrete probability distributions specified are less uncertain since there is more
information. Therefore, the Maximum Entropy Procedure with 4 parameters (MEP-4)
is proposed with the 4 parameters are specified in this research for Monte Carlo
researches with all the information is known.
The set of the 2 parameters are chosen for the MEP proposed in this research
because the degree of non-normality is most often evaluated only by the skewness and
kurtosis values (Muthén & Kaplan, 1985). Because the mean and variance values of
normal distributions can be set arbitrarily, most researchers also only manipulated
these values in their robustness studies (e.g., Curran et al., 1996; Hampel, 1973; Lei &
Lomax, 2005). Therefore, to generate variables for robustness research of
non-normality, the Maximum Entropy Procedure with these 2 parameters specified
(MEP-2) is proposed.
In this research, the MEP is proposed for generating the discrete data with the
specified parameters for robustness researches. The sets of the specified parameters
are the set of mean, variance, skewness and kurtosis and the set of skewness and
kurtosis. The details of these procedures and the statistical program used to implement
kurtosis. The details of these procedures and the statistical program used to implement