In this section, we illustrate the performance of our new variable selection pro-cedures by several simulations and real example. Hence, our propro-cedures are called MCMC forward selection (MCMC FS) if Gibbs sampler of matching pursuit al-gorithm is used, and MCMC windowed stepwise selection (MCMC WSS) when Metropolized matching pursuit algorithm is used. First six simulative examples are shown, and then a real example presented in Draper and Smith (1981), is displayed.
Example 4.1. This example considers a simple variable selection problem, which have been presented in George and McCulloch (1993), with p = 5 regressors of length n = 60. The regressors are obtained as independent standard normal vectors, X1, . . . , X5 iid∼ N60(0, 1), so that they are practically uncorrelated. The dependent variable is generated according to the model
Y = X4+ 1.2X5+
where ∼ N60(0, 2.52I). Define β = (0, 0, 0, 1, 1.2). We set the parameter vector, (ρ, σ, τ ), in our procedures to be (0.5, 1, 10) and in MCMC WSS, we only consider the whole regressors with only one window. Hence we run the 5000 iterations for each of two MCMC matching pursuit algorithms. The first 1000 iterations are discarded, and the remaining 4000 iterations are used to compute the posterior probability and the marginal probability of Xi. The result is tabulated. Table 1 shows the higher posterior probability criterion, and Table 2 displays the marginal probability of Xi.
Table 1. Higher posterior probability models in Example 4.1
MCMC FS MCMC WSS
Model posterior Model posterior
variables probability variables probability
4 5 0.703 4 5 0.715
1 4 5 0.114 2 4 5 0.104
2 4 5 0.083 1 4 5 0.088
3 4 5 0.074 3 4 5 0.074
Table 2. The marginal probability of Xi in Example 4.1
variable X1 X2 X3 X4 X5
MCMC FS marginal probability 0.114 0.130 0.084 1 1 MCMC WSS marginal probability 0.127 0.126 0.105 1 1
From Table 1, MCMC FS and MCMC WSS select the same model, Y = f(X4, X5), which have the highest posterior probability, the same as the true model. In Table 2, the marginal probability of X4 and X5 exceeds 0.5, and others are less than 0.5.
Hence according to medium probability criterion, we also suggest Y = f(X4, X5), the same as the true model.
Example 4.2. This example considers a variable selection problem with p = 10 regressors of length n = 60. The regressors, X1, . . . , X10 come from N(0,1), inde-pendently. The dependent variable is generated by
Y = 2X1 + 3X2+ 4X5+ 5X6+ 6X9+ 7X10+
where ∼ N60(0, 2.52I), and the coefficients β = (β1, . . . , β10) are set to be
(2, 3, 0, 0, 4, 5, 0, 0, 6, 7). With the same parameters setup as that in Example 4.1, we iterate each of our procedure 5000 times for posterior sampling and keep the last 4000 iterations for computing the posterior probability. The results are shown in Table 3 and Table 4.
Table 3. Higher posterior probability models in Example 4.2
MCMC FS MCMC WSS
Model posterior Model posterior
variables probability variables probability
1 2 5 6 9 10 0.376 1 2 5 6 9 10 0.341
1 2 5 6 8 9 10 0.166 1 2 5 6 7 9 10 0.2 1 2 5 6 7 9 10 0.14 1 2 5 6 8 9 10 0.155 1 2 4 5 6 9 10 0.07 1 2 3 5 6 9 10 0.08
Table 4. The marginal probability of Xi in Example 4.2
X1 X2 X3 X4 X5
MCMC FS 1 1 0.132 0.068 1
marginal probability X6 X7 X8 X9 X10
1 0.239 0.216 1 1
X1 X2 X3 X4 X5
MCMC WSS 1 1 0.193 0.165 1
marginal probability X6 X7 X8 X9 X10
1 0.311 0.246 1 1
From Table 3, the most frequent model, Y = f(X1, X2, X5, X6, X9, X10), appear with frequencies of 37.6% in MCMC FS and 34.1% in MCMC WSS. Both of our two procedures select the same variables which are equal to these in the true model.
In Table 4, the marginal probabilities of X1, X2, X5, X6, X9, X10 are equal to 1, and others are less than 0.5. Hence, the medium probability model chosen from our two procedures are also the same as the true model.
Example 4.3. This example is to demonstrate the performance of the two MCMC variable selection procedures for large data set. We construct 100 regressors, X1, . . . , X100, of length n = 200, where X1, . . . , X100 iid∼ N(0, 1). The coefficients β = (β1, . . . , β100) are set βi = 3 for i = 10, 20, 30, 40, 50, 60, 70, 80, 90, and others are set zero. Hence the dependent variable is generated according to the model
Y = 3X10+ 3X20+ 3X30+ 3X40+ 3X50+ 3X60+ 3X70+ 3X80+ 3X90+
where ∼ N200(0, 2.52I). Then the parameter vector in our procedures, (ρ, σ, τ) is set to be (0.5, 2, 10). Here in MCMC WSS procedures, we can choose a window with L regressors and then instead of searching the active regressors from the whole regressors one time, at each iteration of MCMC algorithm we only focus on this win-dow to update our model. Hence, MCMC WSS(L) is used to denote that we choose a window with L regressors, and when L is equal to p, our procedure is denoted by MCMC WSS(off). Then we take 5000 iterations for each of our procedures, and keep the last 4000 iterations for computing the posterior probability. The results are shown in Tables 5 and 6.
Table 5. Higher posterior probability models in Example 4.3 MCMC FS
Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.153
10 20 30 40 50 60 70 79 80 90 0.089 10 20 30 40 44 50 60 70 80 90 0.054 10 20 30 31 40 50 60 70 80 90 0.045
Table 6. Higher posterior probability models in Example 4.3 MCMC WSS(off)
Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.099
10 20 30 31 40 50 60 70 80 90 0.089 10 20 30 40 44 50 60 70 80 90 0.054 10 20 30 40 48 50 60 70 80 90 0.045
MCMC WSS(3)
Model variables posterior probability 10 20 30 40 50 60 70 80 90 0.026
10 20 30 40 50 60 70 80 90 95 0.013 10 20 30 31 40 50 60 70 80 90 0.007 10 20 30 40 50 60 70 80 87 90 0.005
From Table 5, the model selected by MCMC FS is
Y = f(X10, X20, X30, X40, X50, X60, X70, X80, X90)
which is the same as the true model, and from Table 6, no matter window version of MCMC WSS is used or not, the true model is still selected by our methods.
Table 7. The marginal probability of Xi in Example 4.3 X10 X20 X30 X40 X50 X60 X70
MCMC FS 1 1 1 1 1 1 1
marginal probability X80 X90 X85 X2 X25 X78 X87 1 1 0.209 0.205 0.175 0.150 0.148 X10 X20 X30 X40 X50 X60 X70
MCMC WSS(off) 1 1 1 1 1 1 1
marginal probability X80 X90 X2 X51 X87 X74 X95 1 1 0.138 0.127 0.119 0.095 0.067 X10 X20 X30 X40 X50 X60 X70
MCMC WSS(3) 1 1 1 1 1 1 1
marginal probability X80 X90 X95 X85 X2 X51 X87 1 1 0.090 0.080 0.079 0.073 0.060 Table 7 shows the fourteen variables which have higher marginal posterior probabil-ities. From this table, the marginal posterior probabilities of correct variables are all equal to 1, and the probabilities for other regressors are all less than 0.5. Thus, based on the medium probability criterion our procedures still work well.
Example 4.4. Here the dependent structure of X = [X1, . . . , Xp] is constructed.
We constructed n = 180 observations on p = 15 potential regressors Xi, and set where G, G1, . . . , G15 iid∼ N(0, In), and then Xi = Gi+ 2G for i = 1, . . . , 15. Based on this setup for each i and j, the sample correlation of Xi and Xj is among 0.8.
We define evenly spaced values βi = 2i/15 for i = 1, . . . , 15, and set σ = 2.5. The dependent variable Y is generated by Y ∼ N(Xβ, σ2I). Here (ρ, σ, τ) is fixed to be (0.5, 5, 10). We iterate each of our procedure 5000 times for posterior sampling
0 5 10 15 0
0.5 1
Figure 1: The marginal probability of Xi computing by SSVS in Exam. 4.4
0 5 10 15
0 0.5 1
Figure 2: The marginal probability of Xi computing by MCMC FS in Exam. 4.4
0 5 10 15
0 0.5 1
Figure 3: The marginal probability of Xi computing by MCMC WSS in Exam. 4.4
and keep the last 2000 iterations. Here we have an adjustment in MCMC WSS.
At each iteration, if the regressor does not be deleted at death stage, we would sample the value of βi again rather then keeping the original value. Table 8 shows the variables selected for the median probability criterion. Figure 1 to 3 show the marginal posterior probabilities for regressors by different methods. From these Figures, our two procedures can find identify the similar model as that SSVS does.
Table 8. The variables selected for the median probability criterion in Example 4.4 Methods The variables selected for the median probability criterion.
SSVS X6, X7, X8, X9, X10, X11, X12, X13, X14, X15 MCMC FS X6, X7, X8, X9, X10, X11, X12, X13, X14, X15 MCMC WSS X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15
Example 4.5. We construct 15 potential regressors with n = 180 and X1, . . . , X15iid∼ N(0, In). We set βi = 2i
15 for i = 1, . . . , 15, . The dependent variable Y is generated from N (Xβ, σ2I) with different values of σ. Here four different values of σ, 5, 10, 15, 20, are considered. That is we want to show the effect of noise. Here (ρ, σ, τ ) is fixed to be (0.5, 2, 10). We iterate each of our procedure 5000 times for posterior sampling and keep the last 2000 iterations. Figures 4 to 11 are the marginal posterior probabilities for each regressors with different procedures and σ2 When we use the MCMC FS, the marginal posterior probability is increasing in Figure 4 and Figure 5. For the case of σ = 15 and 20, no matter what procedures are used, we could not choose the right model here.
Example 4.6. We constructed n = 180 observations on p = 20 potential regressors by generating X1, . . . , X20 iid∼ N(0, In). We set βi = 0 for i = 1, . . . , 5, and βi = 2i
15 for i = 6, . . . , 20. The dependent variable Y is generated by Y ∼ N(Xβ, 2.52I).
Then (ρ, σ, τ ) is set to be (0.5, 5, 10). We iterate each of our procedure 5000 times for posterior sampling and keep the last 2000 iterations. From Figures 12 to 14, X1, . . . , X5 are not chosen, no matter what procedures is used for selection.
0 5 10 15
Figure 4: The marginal probability of Xi computing with σ = 5 by MCMC FS
0 5 10 15
Figure 5: The marginal probability of Xi computing with σ = 10 by MCMC FS
0 5 10 15
Figure 6: The marginal probability of Xi computing with σ = 15 by MCMC FS
0 5 10 15
0 5 10 15
Figure 8: The marginal probability of Xi computing with σ = 5 by MCMC WSS
0 5 10 15
Figure 9: The marginal probability of Xi computing with σ = 10 by MCMC WSS
0 5 10 15
Figure 10: The marginal probability of Xi computing with σ = 15 by MCMC WSS
0 5 10 15
0 2 4 6 8 10 12 14 16 18 20 0
0.2 0.4 0.6 0.8 1
Figure 12: The marginal probability of Xi computing by SSVS
0 2 4 6 8 10 12 14 16 18 20
0 0.2 0.4 0.6 0.8 1
Figure 13: The marginal probability of Xi computing by MCMC FS
0 2 4 6 8 10 12 14 16 18 20
0 0.2 0.4 0.6 0.8 1
Figure 14: The marginal probability of Xi computing by MCMC WSS
Example 4.7. In this example, the response Yi = Heat produced in the harding of cement (in calories per gram). There are 4 candidate regressors, and these are Xi1= percentage of input composed of tricalcium aluminate, Xi2 = percentage of input composed of tricalcium silicate, Xi3 = percentage of input composed of tricalcium alumino ferrite, Xi4 = percentage of input composed of dicalcium silicate. There are only 13 observations in the data set. In MCMC FS, we set ρ = 0.5, σ = 3 or 4, τ= 1 or 10. In MCMC WSS, we set ρ = 0.5, σ = 4 6 or 7, τ= 1 or 10, and set
Table 9. The posterior probability models of Example 4.7
MCMC FS MCMC WSS
Model variables (3,1) (4,1) (3,10) (4,10) (4,10) (6,1) (7,1) (6,10) (7,10) 1 2 0.047 0.312 0.317 0.789 0.026 0.016 0.028 0.072 0.922 1 2 3 0.083 0.229 0.050 0.057 0.958 0.798 0.829 0.783 0.056 1 2 4 0.750 0.370 0.614 0.148 0.010 0.101 0.042 0.011 0.017 1 2 3 4 0.120 0.089 0.018 0.005 0.006 0.092 0.080 -
-Table 10. The medium probability models, MZ of Example 4.7
MCMC FS MCMC WSS
variable (3,1) (4,1) (3,10) (4,10) (4,10) (6,1) (7,1) (6,10) (7,10)
x1 1 1 1 1 1 0.996 0.979 0.875 0.997
x2 1 1 1 1 1 1 1 1 1
x3 0.203 0.319 0.069 0.063 0.964 0.882 0.918 0.804 0.057 x4 0.870 0.459 0.632 0.153 0.016 0.183 0.123 0.022 0.018 MZ 1, 2, 4 1, 2 1, 2, 4 1, 2 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 1, 2 window = 4 for each setting in MCMC WSS. For each setting, we run 10000 it-erations and cut the first 5000 itit-erations. All results are in Table 9 and 10. As described by Draper and Smith (1981), three models are favored by conventional selection procedures. The model Y = f(X1, X2), yielding R2 = 97.9%, is favored by all subsets regression, backward elimination, and stepwise regression; and the model Y = f(X 1, X4), yielding R2 = 97.2%, is also favored by all subsets regression; and the model Y = f(X1, X2, X4), yielding R2 = 98.2%, is favored by forward selection.
We apply the two MCMC version of variable selection to this data. First, we dis-cuss the performance of the MCMC FS. From the Table 9, under (σ, τ ) = (3,1),(4,1), and (3,10), the highest posterior probability model is, Y = f(X1, X2, X4), and when we set (σ, τ ) = (4,10), the best model is Y = f(X1, X2). We also can find that the posterior probability of f (X1, X2) is increasing when the value of σ or τ is in-creasingly. From the Table 10, no matter what setting of the parameters σ and τ , we discover that the marginal probability of X1 and X2 is always 1. According to medium probability criterion, under (σ, τ ) = (3,1) and (3,10), the medium probabil-ity model is Y = f(X1, X2, X4), and Y = f(X1, X2), is selected under (σ, τ ) = (4,1) and (4,10). When the value of the parameters σ or τ is increasing, the marginal probability of X3 and X4 is decreasing. Our results are similar to these of Draper and Smith (1981). The model Y = f(X1, X2) and Y = f(X1, X2, X4) are favored in our performance of MCMC FS.
Subsequently we discuss the performance of MCMC WSS. From the Table 9, under (σ, τ ) = (7,10), the model which have the highest posterior probability is Y = f(X 1, X2), and Y = f(X1, X2, X3) is selected in other settings. From the Table 9, the marginal probability of X1 and X2 is close to 1 in each setting. Only when (σ, τ ) = (7,10), the marginal probability model, Y = f(X1, X2), is selected and for other setup the marginal probability models are all equal to Y = f(X1, X2, X3) we believe. From our results, it seems that the larger σ or τ is, the higher chance we would choose Y = f(X1, X2). In this performance, we prefer choosing Y = f(X1, X2, X3) when σ is less than 7. Although this result is unlike that in Draper and Smith (1981), that Y = f(X1, X2, X3) is still reasonable, because the sample correlation of X1 and X3 is 0.824, and the corresponding R2 = 98.2%.