• 沒有找到結果。

In most cases, we cannot suggest a specific mechanism to account for the observed data.

Hence we do not have a complete pdf to describe the experimental data. All that we can do is to suggest that the hypothetical pdf has a certain mathematical form. But the pdf can be an addition of multiple components, say, an addition of three exponential terms. If we do not know how many components there are in the pdf, we do not know how many parameters we should use to evaluate the log-likelihood. Sometimes

(a)

(b)

(c)

(d)

high

low

reflection

reflection and expansion

contraction

multiple contraction

Figure 2.1: Four possible actions for a step in the downhill simplex method are shown.

At the beginning of the step, the initial simplex, in this case a triangle, is shown on top.

At the end of the step, the simplex can take any one of the four actions according to the log-likelihood change of the highest point, (a) a reflection away from the high point, (b) a reflection and expansion away from the high point, (c) an one-dimensional contraction from the high point, or (d) an overall contraction towards the low point.

Figure 2.2: The algorithm flowchart of the downhill simplex method. The terminating condition can be (1) whether the fractional change of the vector distance moved in this step is less than a pre-defined tolerance or (2) whether the fractional difference between the highest and the lowest log-likelihood is less than a pre-defined tolerance. The symbols (a) (b) (c) (d) denote the four actions.

A

B

C

θ1 θ2 θ3

− Log-likelihood

Parameter

Figure 2.3: A schematic plot of the log-likelihood function with only one parameter. Point B is the global minimum while points A and C are local minima. Three initial guesses are shown. When the initial guesses are θ1 and θ3, the searching will be trapped at A and C respectively. Only when the searching is starting from θ2, the global minimum B can be found.

we can determine the number of components by visually inspecting the diagram of the experimental data, but most of the time it is not that simple. What we then need is a statistical approach: the log likelihood ratio test.

The log likelihood ratio test is based on the following theorem. Let l1(ˆθn1) and l2(ˆθn2) be two log-likelihood functions with n1 and n2 parameters, respectively, and n2 > n1. Define a quantity R, called the log likelihood ratio (logarithm of the ratio of the likelihoods)

R ≡ l2− l1 = log L2− log L1 = log L2 L1



. (2.3.1)

It can be shown [4] that:

If the most appropriate number of parameters is n1 and the number of data

is large, the quantity 2R will have a χ2 distribution with n2− n1 degrees of freedom.

The χ2 distribution is explained in the next paragraph. Note that R is defined only when L2/L1 > 1, i.e. R > 0. This is very reasonable since the more parameters we use to fit the data, the higher the likelihood can be. From the definition, we can see that R is a quantity describing the increase of the ‘goodness of fit’ when the number of parameters is increased.

Before proceeding further, the χ2 distribution should be defined. A χ2 distribution is actually a special case of Γ distribution with parameters α = ν/2 and β = 2. The cumulative distribution function (cdf) of a Γ distribution is

Fγ(x; α, β) = γ(α, x/β)

γ(α, x) is the incomplete gamma function ,

γ(α, x) ≡ Z x

0

x0α−1e−αdx0 (2.3.4)

and P (α, x) is the regularized incomplete gamma function

P (α, x) ≡ γ(α, x)

Γ(α) (2.3.5)

and ν, in the log likelihood case mentioned above, equals n2 − n1, which means ‘the degrees of freedom’. Therefore, by inserting the parameters α = ν/2 and β = 2 into

the cdf of a Γ distribution, Eq. (2.3.2), we have the cdf of a χ2 distribution

By differentiating the cdf we obtain the pdf

pχ22; ν) = (1/2)k/2xk/2−1e−x/2

Γ(ν/2) (2.3.7)

Fig. 2.4 shows the pdf and cdf of a χ2 distribution.

0

With the value 2R and its χ2 distribution, how do we use them to decide the most appropriate number of parameters ? First we need to ask the question: what does it mean by ‘the most appropriate’ ? Since we know in general the fitting will be better and better if we use more and more parameters, it is natural to say that when the number of parameters reaches the most appropriate one, any further increase of the number of parameters will have a large probability to make only an insignificant increase in the likelihood and hence a small value of 2R. An insignificant increase means there is a large probability that this amount of increase is only a result of chance. The value of that probability can be obtained from another function Q(α, x), also confusingly

named the regularized incomplete gamma function, or the regularized complementary incomplete gamma function for discrimination. Q(α, x) is defined as:

Q(α, x) ≡ Γ(α, x)

Γ(α) = 1 − P (α, x) (2.3.8)

where Γ(α, x) is the (complementary) incomplete gamma function

Γ(α, x) ≡

This so called χ2 probability function, Q(χ2|ν), is the probability that an increase of the ‘goodness of fit’, 2R, due to the increase of the number of parameters, ν, is only a result of chance. So when Q is small, it means that the increase of the number of parameters is necessary to give a better description of the data. Usually the value of Q is also called P value. See Fig. 2.5 for pictures of P and Q.

In practice, we often require the acceptance value of Q to be smaller than 0.05, 0.01 or even 0.001. For example, when we increase the number of parameters from four to five and obtain the log likelihood ratio 2R1, we have Q(2R1|1) = Q 1

2, R1



≡ Q1. We again increase the number of parameters from five to six and obtain Q2. If we set our standard to be 0.05, and Q1 < 0.05 while Q2 > 0.05, this means that the most appropriate number of parameters is five. However, if Q2 is still less than 0.05 when the number of parameters is six, then we shall need to increase the number of parameters further.

x

0

P(α,x)

0 x

Q(α,x)

Figure 2.5: P (α, x) and Q(α, x) are actually integrations of the pdf of χ2 distribution.

The integration range of P is from 0 to x and that of Q is from x to ∞.

相關文件