Experiment result : The Mixture Gaussian Selection

Bootstrap for Model Selection

4.3 Experiment result : The Mixture Gaussian Selection

In the subsection 3.3.5 , we have introduced that a mixture Gaussian distribution P (t) for the density can be expressed as a linear combination of component densities p(t|Θr) in the form

P (t) = XR

r=1

P_rp_r(t|Θ_r), (4.4)

and a histogram vector H(t) can be approximated to the P (t) using their cross-entropy with the EM algorithm which proposed in the subsection 3.3.5.

In the general case, before using the EM algorithm to estimate the mixture Gaussian, one must first decide the number of the component in a mixture Gaussian. However, how to determine the number of components is still an important issue. In this section, we using the form of the Eq.(4.4) to construct a mixture Gaussian which R = 3 as belows:

P3(t) = 1 3N(1

3, 0.25) +1 3N(2

3, 0.25) +1

3N(1.0, 0.25).(true model), (4.5) We generate a data set, say B₀, with the sample size n = 300 based on the true model Eq.(4.5)

B₀ = (y₁, · · · , y_n), n = 300.

Based on the original data set B₀, we consider the fitted model:

M_m(t) =

Next, we consider the following 10 models : R = 1, · · · , 10, and which one is the best?

To show how to select the true model , we consider the following fitted models are consider:

Model M₁(t) = N(1.0, 0.25)

Based on the idea of bootstrap, we use the bootstrap algorithm to select the suitable m.

First, as the similar argument of Eq.(4.3), we define the residual for the ith observation as follows:

Ei = yi− ˆyi, i = 1, · · · , n. (4.7)

Table 4.1: Comparison results of OB and WB algorithms with different bootstrap replications B in Neural Model - Mixture Gaussian Selection

M1 M2 M3 M4 M5

B µ_boot σ_boot µ_boot σ_boot µ_boot σ_boot µ_boot σ_boot µ_boot σ_boot WB 25 0.4534 0.2443 0.5384 0.3431 0.0717 0.0525 1.0229 0.7667 1.2735 0.8721

50 0.4326 0.2732 0.5377 0.3427 0.0709 0.0520 1.0108 0.7576 1.2367 0.8714 OB 50 0.4713 0.2545 0.5736 0.3634 0.0864 0.0557 1.0312 0.7363 1.2778 0.8332 100 0.4744 0.2617 0.5764 0.3652 0.0858 0.0554 1.0245 0.7269 1.2654 0.8234

M6 M7 M8 M9 M10

B µboot σboot µboot σboot µboot σboot µboot σboot µboot σboot

WB 25 1.4247 0.9314 1.5772 1.0321 1.8023 1.1724 2.0752 1.2544 2.2231 1.4051 50 1.4027 0.9231 1.4752 1.0232 1.7859 1.1582 2.0109 1.2321 2.1247 1.3833 OB 50 1.4321 0.9662 1.7652 1.0724 1.9702 1.1892 2.1037 1.2346 2.2646 1.3632 100 1.4141 0.9526 1.7321 1.0682 1.9622 1.1661 2.0347 1.2266 2.2385 1.3256

and the weighted bootstrap algorithm with the resampling probability:

Q_i = exp(−|E_i|) P_n

j=1exp(−|E_j|), i = 1, · · · , n. (4.8) For each model, we also compute µboot and σboot based on OB and WB algorithms. The results are listed in Table 4.1 , indicating that the best model for the WB algorithm is M₃ with B = 25, 50. It is natural to pose the question: “Which one is appropriate?”.

Since the difference between 0.0717 and 0.0709 is negligible, we choose B = 25 for the WB algorithm. However, the best model for OB algorithm is M₃, and the difference between 0.0864 and 0.0858 is also negligible. Thus, we choose B = 50 for the OB algorithm in the mixture Gaussian selection models.

When decide the number of Gaussian mixture in the above process, in order to obtain more samples, we use the OB and WB algorithm to resample, then comparing all the candidate Gaussian models. It’s well known that when the number of Gaussian mixture model are too many, we may achieve lower error rate, but its will suffer from high complexity, not only unrealistic in the implementations, but also lead to over-fitting situation. On the other

hand to avoid be exhaustive testing one by one for all the candidates Gaussian models, which is resulting in large computational burden, so we apply the another application of the bootstrap : Bootstrap Likelihood Ratio Test [55],[56], to make a simple hypothesis testing which can filter the model with too many components. The likelihood ratio is the ratio of the likelihood function over two different sets or models, and is a kind of statistical test to make a decision between two hypotheses based on this ratio. In general, the likelihood function is often denoted as l(θ|x), is a function of the parameters of a statistical model in statistical inference. Defined as :

l(θ|x) = f (x|θ)

A statistical model is a parametrized family of probability density functions (or probability mass functions) f (x|θ), and a hypotheses test has specified models under both the null hypotheses H0 and alternative hypotheses H1, i.e. :

H0 : θ = θ0

H₁ : θ = θ₁

Then a likelihood ratio test statistic describe above can be written as:

Ω(x) = l(θ₀|x)

l(θ₁|x) = f (x|θ₀)

f (x|θ₁) (4.9)

General speaking, the likelihood ratio Ω(x) is small if the alternative model H₁ is better than the null model H₀ and the likelihood ratio test provides the decision rule as:

1. If Ω ≥ ∆, do not reject H₀; 2. If Ω < ∆, reject H₀;

where Ω are usually chosen to obtain a specified significance level α. It means that the likelihood-ratio test rejects the null hypothesis if the value of this statistic is small than the significance level of the test.

Now, following Titterington [57], denoting the likelihood ratio statistic as T_R^R+1(θ) = 2[L(θ^R+1) − L(θ^R)] for testing between R and R +1 components as, where L(θ) denotes the

log of the likelihood function l(θ), and . The bootstrap likelihood ratio statistic procedure is described as below:

1. For 1 ≤ R ≤ 4, estimate parameters θ^R and θ^R+1 associated with a R and R + 1 -component mixture, and evaluate L(θ^R+1) and L(θ^R) respectively, then get the T_R^R+1(θ);

2. For R = 1, generate 99 bootstrap samples of size n from the R-component model with parameters ˆθ and calculate a value of T_R^R+1 for each of them;

3. If the observed T_R^R+1 is larger than at least 94 of the bootstrapped values, increase R by 1 and repeat steps 2 and 3 (the maximum value of R is 4);

4. Otherwise, assume that the number of mixture components is R and stop.

In the next Chapter, we will introduce the proposed EM based instance learning for CBIR and Multiple-Instance Learning Neural Network(MINN) for CBIR. The former EM based CBIR system using the similar feature extraction as in [18], which can provides not only the color information but also some of the spatial information. The latter MINN CBIR system using the new proposed feature extraction method, which is called Weighted Color Histogram and Weighted Texture Histogram. It is worthy of being mentioned that in subsection 5.2.2, we will apply the proposed WB algorithm in this subsection to determine the suitable number of the mixture Gaussian, then using the EM algorithm to estimate the remaining parameters in the mixture Gaussian which is interesting by the user.

Chapter 5

在文檔中多實例類神經網路影像檢索之研究 (頁 46-51)