Experiments - National University of Kaohsiung Repository System:Item 310360000Q/10632

In this section, we illustrate the performance of our new variable selection pro-cedures by several simulations and real example. Hence, our propro-cedures are called MCMC forward selection (MCMC FS) if Gibbs sampler of matching pursuit al-gorithm is used, and MCMC windowed stepwise selection (MCMC WSS) when Metropolized matching pursuit algorithm is used. First six simulative examples are shown, and then a real example presented in Draper and Smith (1981), is displayed.

Example 4.1. This example considers a simple variable selection problem, which have been presented in George and McCulloch (1993), with p = 5 regressors of length n = 60. The regressors are obtained as independent standard normal vectors, X₁, . . . , X₅ ^iid∼ N₆₀(0, 1), so that they are practically uncorrelated. The dependent variable is generated according to the model

Y = X₄+ 1.2X₅+ 

where  ∼ N60(0, 2.5²I). Deﬁne β = (0, 0, 0, 1, 1.2). We set the parameter vector, (ρ, σ, τ ), in our procedures to be (0.5, 1, 10) and in MCMC WSS, we only consider the whole regressors with only one window. Hence we run the 5000 iterations for each of two MCMC matching pursuit algorithms. The ﬁrst 1000 iterations are discarded, and the remaining 4000 iterations are used to compute the posterior probability and the marginal probability of X_i. The result is tabulated. Table 1 shows the higher posterior probability criterion, and Table 2 displays the marginal probability of X_i.

Table 1. Higher posterior probability models in Example 4.1

MCMC FS MCMC WSS

Model posterior Model posterior

variables probability variables probability

4 5 0.703 4 5 0.715

1 4 5 0.114 2 4 5 0.104

2 4 5 0.083 1 4 5 0.088

3 4 5 0.074 3 4 5 0.074

Table 2. The marginal probability of X_i in Example 4.1

variable X₁ X₂ X₃ X₄ X₅

MCMC FS marginal probability 0.114 0.130 0.084 1 1 MCMC WSS marginal probability 0.127 0.126 0.105 1 1

From Table 1, MCMC FS and MCMC WSS select the same model, Y = f(X₄, X₅), which have the highest posterior probability, the same as the true model. In Table 2, the marginal probability of X₄ and X₅ exceeds 0.5, and others are less than 0.5.

Hence according to medium probability criterion, we also suggest Y = f(X₄, X₅), the same as the true model.

Example 4.2. This example considers a variable selection problem with p = 10 regressors of length n = 60. The regressors, X₁, . . . , X₁₀ come from N(0,1), inde-pendently. The dependent variable is generated by

Y = 2X₁ + 3X₂+ 4X₅+ 5X₆+ 6X₉+ 7X₁₀+ 

where ∼ N60(0, 2.5²I), and the coeﬃcients β = (β₁, . . . , β₁₀) are set to be

(2, 3, 0, 0, 4, 5, 0, 0, 6, 7). With the same parameters setup as that in Example 4.1, we iterate each of our procedure 5000 times for posterior sampling and keep the last 4000 iterations for computing the posterior probability. The results are shown in Table 3 and Table 4.

Table 3. Higher posterior probability models in Example 4.2

MCMC FS MCMC WSS

Model posterior Model posterior

variables probability variables probability

1 2 5 6 9 10 0.376 1 2 5 6 9 10 0.341

1 2 5 6 8 9 10 0.166 1 2 5 6 7 9 10 0.2 1 2 5 6 7 9 10 0.14 1 2 5 6 8 9 10 0.155 1 2 4 5 6 9 10 0.07 1 2 3 5 6 9 10 0.08

Table 4. The marginal probability of X_i in Example 4.2

X₁ X₂ X₃ X₄ X₅

MCMC FS 1 1 0.132 0.068 1

marginal probability X₆ X₇ X₈ X₉ X₁₀

1 0.239 0.216 1 1

X₁ X₂ X₃ X₄ X₅

MCMC WSS 1 1 0.193 0.165 1

marginal probability X₆ X₇ X₈ X₉ X₁₀

1 0.311 0.246 1 1

From Table 3, the most frequent model, Y = f(X₁, X₂, X₅, X₆, X₉, X₁₀), appear with frequencies of 37.6% in MCMC FS and 34.1% in MCMC WSS. Both of our two procedures select the same variables which are equal to these in the true model.

In Table 4, the marginal probabilities of X₁, X₂, X₅, X₆, X₉, X₁₀ are equal to 1, and others are less than 0.5. Hence, the medium probability model chosen from our two procedures are also the same as the true model.

Example 4.3. This example is to demonstrate the performance of the two MCMC variable selection procedures for large data set. We construct 100 regressors, X₁, . . . , X₁₀₀, of length n = 200, where X₁, . . . , X₁₀₀ ^iid∼ N(0, 1). The coeﬃcients β = (β₁, . . . , β₁₀₀) are set β_i = 3 for i = 10, 20, 30, 40, 50, 60, 70, 80, 90, and others are set zero. Hence the dependent variable is generated according to the model

Y = 3X₁₀+ 3X₂₀+ 3X₃₀+ 3X₄₀+ 3X₅₀+ 3X₆₀+ 3X₇₀+ 3X₈₀+ 3X₉₀+ 

where  ∼ N₂₀₀(0, 2.5²I). Then the parameter vector in our procedures, (ρ, σ, τ) is set to be (0.5, 2, 10). Here in MCMC WSS procedures, we can choose a window with L regressors and then instead of searching the active regressors from the whole regressors one time, at each iteration of MCMC algorithm we only focus on this win-dow to update our model. Hence, MCMC WSS(L) is used to denote that we choose a window with L regressors, and when L is equal to p, our procedure is denoted by MCMC WSS(oﬀ). Then we take 5000 iterations for each of our procedures, and keep the last 4000 iterations for computing the posterior probability. The results are shown in Tables 5 and 6.

Table 5. Higher posterior probability models in Example 4.3 MCMC FS

Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.153

10 20 30 40 50 60 70 79 80 90 0.089 10 20 30 40 44 50 60 70 80 90 0.054 10 20 30 31 40 50 60 70 80 90 0.045

Table 6. Higher posterior probability models in Example 4.3 MCMC WSS(oﬀ)

Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.099

10 20 30 31 40 50 60 70 80 90 0.089 10 20 30 40 44 50 60 70 80 90 0.054 10 20 30 40 48 50 60 70 80 90 0.045

MCMC WSS(3)

Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.026

10 20 30 40 50 60 70 80 90 95 0.013 10 20 30 31 40 50 60 70 80 90 0.007 10 20 30 40 50 60 70 80 87 90 0.005

From Table 5, the model selected by MCMC FS is

Y = f(X₁₀, X₂₀, X₃₀, X₄₀, X₅₀, X₆₀, X₇₀, X₈₀, X₉₀)

which is the same as the true model, and from Table 6, no matter window version of MCMC WSS is used or not, the true model is still selected by our methods.

Table 7. The marginal probability of X_i in Example 4.3 X₁₀ X₂₀ X₃₀ X₄₀ X₅₀ X₆₀ X₇₀

MCMC FS 1 1 1 1 1 1 1

marginal probability X₈₀ X₉₀ X₈₅ X₂ X₂₅ X₇₈ X₈₇ 1 1 0.209 0.205 0.175 0.150 0.148 X₁₀ X₂₀ X₃₀ X₄₀ X₅₀ X₆₀ X₇₀

MCMC WSS(oﬀ) 1 1 1 1 1 1 1

marginal probability X₈₀ X₉₀ X₂ X₅₁ X₈₇ X₇₄ X₉₅ 1 1 0.138 0.127 0.119 0.095 0.067 X₁₀ X₂₀ X₃₀ X₄₀ X₅₀ X₆₀ X₇₀

MCMC WSS(3) 1 1 1 1 1 1 1

marginal probability X₈₀ X₉₀ X₉₅ X₈₅ X₂ X₅₁ X₈₇ 1 1 0.090 0.080 0.079 0.073 0.060 Table 7 shows the fourteen variables which have higher marginal posterior probabil-ities. From this table, the marginal posterior probabilities of correct variables are all equal to 1, and the probabilities for other regressors are all less than 0.5. Thus, based on the medium probability criterion our procedures still work well.

Example 4.4. Here the dependent structure of X = [X₁, . . . , X_p] is constructed.

We constructed n = 180 observations on p = 15 potential regressors X_i, and set where G, G₁, . . . , G₁₅ ^iid∼ N(0, In), and then X_i = G_i+ 2G for i = 1, . . . , 15. Based on this setup for each i and j, the sample correlation of X_i and X_j is among 0.8.

We deﬁne evenly spaced values β_i = 2i/15 for i = 1, . . . , 15, and set σ = 2.5. The dependent variable Y is generated by Y ∼ N(Xβ, σ²I). Here (ρ, σ, τ) is ﬁxed to be (0.5, 5, 10). We iterate each of our procedure 5000 times for posterior sampling

0 5 10 15 0

0.5 1

Figure 1: The marginal probability of X_i computing by SSVS in Exam. 4.4

0 5 10 15

0 0.5 1

Figure 2: The marginal probability of X_i computing by MCMC FS in Exam. 4.4

0 5 10 15

0 0.5 1

Figure 3: The marginal probability of X_i computing by MCMC WSS in Exam. 4.4

and keep the last 2000 iterations. Here we have an adjustment in MCMC WSS.

At each iteration, if the regressor does not be deleted at death stage, we would sample the value of β_i again rather then keeping the original value. Table 8 shows the variables selected for the median probability criterion. Figure 1 to 3 show the marginal posterior probabilities for regressors by diﬀerent methods. From these Figures, our two procedures can ﬁnd identify the similar model as that SSVS does.

Table 8. The variables selected for the median probability criterion in Example 4.4 Methods The variables selected for the median probability criterion.

SSVS X₆, X₇, X₈, X₉, X₁₀, X₁₁, X₁₂, X₁₃, X₁₄, X₁₅ MCMC FS X₆, X₇, X₈, X₉, X₁₀, X₁₁, X₁₂, X₁₃, X₁₄, X₁₅ MCMC WSS X₅, X₆, X₇, X₈, X₉, X₁₀, X₁₁, X₁₂, X₁₃, X₁₄, X₁₅

Example 4.5. We construct 15 potential regressors with n = 180 and X₁, . . . , X₁₅^iid∼ N(0, I_n). We set β_i = 2i

15 for i = 1, . . . , 15, . The dependent variable Y is generated from N (Xβ, σ²I) with diﬀerent values of σ. Here four diﬀerent values of σ, 5, 10, 15, 20, are considered. That is we want to show the eﬀect of noise. Here (ρ, σ, τ ) is ﬁxed to be (0.5, 2, 10). We iterate each of our procedure 5000 times for posterior sampling and keep the last 2000 iterations. Figures 4 to 11 are the marginal posterior probabilities for each regressors with diﬀerent procedures and σ² When we use the MCMC FS, the marginal posterior probability is increasing in Figure 4 and Figure 5. For the case of σ = 15 and 20, no matter what procedures are used, we could not choose the right model here.

Example 4.6. We constructed n = 180 observations on p = 20 potential regressors by generating X₁, . . . , X₂₀ ^iid∼ N(0, I_n). We set β_i = 0 for i = 1, . . . , 5, and β_i = 2i

15 for i = 6, . . . , 20. The dependent variable Y is generated by Y ∼ N(Xβ, 2.5²I).

Then (ρ, σ, τ ) is set to be (0.5, 5, 10). We iterate each of our procedure 5000 times for posterior sampling and keep the last 2000 iterations. From Figures 12 to 14, X₁, . . . , X₅ are not chosen, no matter what procedures is used for selection.

0 5 10 15

Figure 4: The marginal probability of X_i computing with σ = 5 by MCMC FS

0 5 10 15

Figure 5: The marginal probability of X_i computing with σ = 10 by MCMC FS

0 5 10 15

Figure 6: The marginal probability of X_i computing with σ = 15 by MCMC FS

0 5 10 15

Figure 8: The marginal probability of X_i computing with σ = 5 by MCMC WSS

0 5 10 15

Figure 9: The marginal probability of X_i computing with σ = 10 by MCMC WSS

0 5 10 15

Figure 10: The marginal probability of X_i computing with σ = 15 by MCMC WSS

0 5 10 15

0 2 4 6 8 10 12 14 16 18 20 0

0.2 0.4 0.6 0.8 1

Figure 12: The marginal probability of X_i computing by SSVS

0 2 4 6 8 10 12 14 16 18 20

0 0.2 0.4 0.6 0.8 1

Figure 13: The marginal probability of X_i computing by MCMC FS

0 2 4 6 8 10 12 14 16 18 20

0 0.2 0.4 0.6 0.8 1

Figure 14: The marginal probability of X_i computing by MCMC WSS

Example 4.7. In this example, the response Y_i = Heat produced in the harding of cement (in calories per gram). There are 4 candidate regressors, and these are X_i1= percentage of input composed of tricalcium aluminate, X_i2 = percentage of input composed of tricalcium silicate, X_i3 = percentage of input composed of tricalcium alumino ferrite, X_i4 = percentage of input composed of dicalcium silicate. There are only 13 observations in the data set. In MCMC FS, we set ρ = 0.5, σ = 3 or 4, τ= 1 or 10. In MCMC WSS, we set ρ = 0.5, σ = 4 6 or 7, τ= 1 or 10, and set

Table 9. The posterior probability models of Example 4.7

MCMC FS MCMC WSS

Model variables (3,1) (4,1) (3,10) (4,10) (4,10) (6,1) (7,1) (6,10) (7,10) 1 2 0.047 0.312 0.317 0.789 0.026 0.016 0.028 0.072 0.922 1 2 3 0.083 0.229 0.050 0.057 0.958 0.798 0.829 0.783 0.056 1 2 4 0.750 0.370 0.614 0.148 0.010 0.101 0.042 0.011 0.017 1 2 3 4 0.120 0.089 0.018 0.005 0.006 0.092 0.080 -

-Table 10. The medium probability models, M_Z of Example 4.7

MCMC FS MCMC WSS

variable (3,1) (4,1) (3,10) (4,10) (4,10) (6,1) (7,1) (6,10) (7,10)

x₁ 1 1 1 1 1 0.996 0.979 0.875 0.997

x₂ 1 1 1 1 1 1 1 1 1

x₃ 0.203 0.319 0.069 0.063 0.964 0.882 0.918 0.804 0.057 x₄ 0.870 0.459 0.632 0.153 0.016 0.183 0.123 0.022 0.018 M_Z 1, 2, 4 1, 2 1, 2, 4 1, 2 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 1, 2 window = 4 for each setting in MCMC WSS. For each setting, we run 10000 it-erations and cut the ﬁrst 5000 itit-erations. All results are in Table 9 and 10. As described by Draper and Smith (1981), three models are favored by conventional selection procedures. The model Y = f(X₁, X₂), yielding R² = 97.9%, is favored by all subsets regression, backward elimination, and stepwise regression; and the model Y = f(X ₁, X₄), yielding R² = 97.2%, is also favored by all subsets regression; and the model Y = f(X₁, X₂, X₄), yielding R² = 98.2%, is favored by forward selection.

We apply the two MCMC version of variable selection to this data. First, we dis-cuss the performance of the MCMC FS. From the Table 9, under (σ, τ ) = (3,1),(4,1), and (3,10), the highest posterior probability model is, Y = f(X₁, X₂, X₄), and when we set (σ, τ ) = (4,10), the best model is Y = f(X₁, X₂). We also can ﬁnd that the posterior probability of f (X₁, X₂) is increasing when the value of σ or τ is in-creasingly. From the Table 10, no matter what setting of the parameters σ and τ , we discover that the marginal probability of X₁ and X₂ is always 1. According to medium probability criterion, under (σ, τ ) = (3,1) and (3,10), the medium probabil-ity model is Y = f(X₁, X₂, X₄), and Y = f(X₁, X₂), is selected under (σ, τ ) = (4,1) and (4,10). When the value of the parameters σ or τ is increasing, the marginal probability of X₃ and X₄ is decreasing. Our results are similar to these of Draper and Smith (1981). The model Y = f(X₁, X₂) and Y = f(X₁, X₂, X₄) are favored in our performance of MCMC FS.

Subsequently we discuss the performance of MCMC WSS. From the Table 9, under (σ, τ ) = (7,10), the model which have the highest posterior probability is Y = f(X ₁, X₂), and Y = f(X₁, X₂, X₃) is selected in other settings. From the Table 9, the marginal probability of X₁ and X₂ is close to 1 in each setting. Only when (σ, τ ) = (7,10), the marginal probability model, Y = f(X₁, X₂), is selected and for other setup the marginal probability models are all equal to Y = f(X₁, X₂, X₃) we believe. From our results, it seems that the larger σ or τ is, the higher chance we would choose Y = f(X₁, X₂). In this performance, we prefer choosing Y = f(X₁, X₂, X₃) when σ is less than 7. Although this result is unlike that in Draper and Smith (1981), that Y = f(X₁, X₂, X₃) is still reasonable, because the sample correlation of X₁ and X₃ is 0.824, and the corresponding R² = 98.2%.

在文檔中 National University of Kaohsiung Repository System:Item 310360000Q/10632 (頁 16-27)