Log Likelihood Ratio Test - 最大似然法在時間序列資料分析的應用與全身麻醉藥（三氟氯溴乙烷）在細胞膜內的分子動力學模擬

In most cases, we cannot suggest a specific mechanism to account for the observed data.

Hence we do not have a complete pdf to describe the experimental data. All that we can do is to suggest that the hypothetical pdf has a certain mathematical form. But the pdf can be an addition of multiple components, say, an addition of three exponential terms. If we do not know how many components there are in the pdf, we do not know how many parameters we should use to evaluate the log-likelihood. Sometimes

(a)

(b)

(c)

(d)

high

low

reflection

reflection and expansion

contraction

multiple contraction

Figure 2.1: Four possible actions for a step in the downhill simplex method are shown.

At the beginning of the step, the initial simplex, in this case a triangle, is shown on top.

At the end of the step, the simplex can take any one of the four actions according to the log-likelihood change of the highest point, (a) a reflection away from the high point, (b) a reflection and expansion away from the high point, (c) an one-dimensional contraction from the high point, or (d) an overall contraction towards the low point.

Figure 2.2: The algorithm flowchart of the downhill simplex method. The terminating condition can be (1) whether the fractional change of the vector distance moved in this step is less than a pre-defined tolerance or (2) whether the fractional difference between the highest and the lowest log-likelihood is less than a pre-defined tolerance. The symbols (a) (b) (c) (d) denote the four actions.

θ¹ θ² θ³

− Log-likelihood

Parameter

Figure 2.3: A schematic plot of the log-likelihood function with only one parameter. Point B is the global minimum while points A and C are local minima. Three initial guesses are shown. When the initial guesses are θ1 and θ3, the searching will be trapped at A and C respectively. Only when the searching is starting from θ₂, the global minimum B can be found.

we can determine the number of components by visually inspecting the diagram of the experimental data, but most of the time it is not that simple. What we then need is a statistical approach: the log likelihood ratio test.

The log likelihood ratio test is based on the following theorem. Let l₁(ˆθ_n₁) and l₂(ˆθ_n₂) be two log-likelihood functions with n₁ and n₂ parameters, respectively, and n₂ > n₁. Define a quantity R, called the log likelihood ratio (logarithm of the ratio of the likelihoods)

R ≡ l₂− l₁ = log L₂− log L₁ = log L₂ L₁

. (2.3.1)

It can be shown [4] that:

If the most appropriate number of parameters is n1 and the number of data

is large, the quantity 2R will have a χ² distribution with n₂− n₁ degrees of freedom.

The χ² distribution is explained in the next paragraph. Note that R is defined only when L₂/L₁ > 1, i.e. R > 0. This is very reasonable since the more parameters we use to fit the data, the higher the likelihood can be. From the definition, we can see that R is a quantity describing the increase of the ‘goodness of fit’ when the number of parameters is increased.

Before proceeding further, the χ² distribution should be defined. A χ² distribution is actually a special case of Γ distribution with parameters α = ν/2 and β = 2. The cumulative distribution function (cdf) of a Γ distribution is

F_γ(x; α, β) = γ(α, x/β)

γ(α, x) is the incomplete gamma function ,

γ(α, x) ≡ Z x

x^0α−1e^−αdx⁰ (2.3.4)

and P (α, x) is the regularized incomplete gamma function

P (α, x) ≡ γ(α, x)

Γ(α) (2.3.5)

and ν, in the log likelihood case mentioned above, equals n₂ − n₁, which means ‘the degrees of freedom’. Therefore, by inserting the parameters α = ν/2 and β = 2 into

the cdf of a Γ distribution, Eq. (2.3.2), we have the cdf of a χ² distribution

By differentiating the cdf we obtain the pdf

p_χ²(χ²; ν) = (1/2)^k/2x^k/2−1e^−x/2

Γ(ν/2) (2.3.7)

Fig. 2.4 shows the pdf and cdf of a χ² distribution.

With the value 2R and its χ² distribution, how do we use them to decide the most appropriate number of parameters ? First we need to ask the question: what does it mean by ‘the most appropriate’ ? Since we know in general the fitting will be better and better if we use more and more parameters, it is natural to say that when the number of parameters reaches the most appropriate one, any further increase of the number of parameters will have a large probability to make only an insignificant increase in the likelihood and hence a small value of 2R. An insignificant increase means there is a large probability that this amount of increase is only a result of chance. The value of that probability can be obtained from another function Q(α, x), also confusingly

named the regularized incomplete gamma function, or the regularized complementary incomplete gamma function for discrimination. Q(α, x) is defined as:

Q(α, x) ≡ Γ(α, x)

Γ(α) = 1 − P (α, x) (2.3.8)

where Γ(α, x) is the (complementary) incomplete gamma function

Γ(α, x) ≡

This so called χ² probability function, Q(χ²|ν), is the probability that an increase of the ‘goodness of fit’, 2R, due to the increase of the number of parameters, ν, is only a result of chance. So when Q is small, it means that the increase of the number of parameters is necessary to give a better description of the data. Usually the value of Q is also called P value. See Fig. 2.5 for pictures of P and Q.

In practice, we often require the acceptance value of Q to be smaller than 0.05, 0.01 or even 0.001. For example, when we increase the number of parameters from four to five and obtain the log likelihood ratio 2R₁, we have Q(2R₁|1) = Q 1

2, R₁

≡ Q₁. We again increase the number of parameters from five to six and obtain Q₂. If we set our standard to be 0.05, and Q1 < 0.05 while Q2 > 0.05, this means that the most appropriate number of parameters is five. However, if Q₂ is still less than 0.05 when the number of parameters is six, then we shall need to increase the number of parameters further.

P(α,x)

∞ 0 x

Q(α,x)

∞

Figure 2.5: P (α, x) and Q(α, x) are actually integrations of the pdf of χ² distribution.

The integration range of P is from 0 to x and that of Q is from x to ∞.

在文檔中最大似然法在時間序列資料分析的應用與全身麻醉藥（三氟氯溴乙烷）在細胞膜內的分子動力學模擬 (頁 17-24)