4.3 Results
4.3.1 Simulation
In this section, we report the simulation results and performance of OGA+HDHQ and OGA+HDBIC through three different types of models: the multiple linear regression, autoregressive time-series and threshold spiking neuron models. The Matlab code used for this study is available to interested readers upon request.
Multiple linear regression model Consider the regression model
yt =
q
X
j=1
βjxtj+
p
X
j=q+1
βjxtj+ t, t = 1, . . . , n, (4.10)
where βq+1 = · · · = βp = 0, p n, t are i.i.d. N (0, σ2) and are independent of the xtj. The regressors are designed as
xtj = dtj+ ηwt, j = 1, . . . , p, (4.11) where η ≥ 0 tunes the strength of correlation between input variables, and (dt1, . . . , dtp, wt)>, 1 ≤ t ≤ n are i.i.d. normal with zero mean and identity covariance matrix I. Simple calculation gives Corr(xtj, xtk) = 1+ηη22, which increases with η > 0. The input variables are mutually independent when η = 0.
Consider (4.10) with q = 5, (β1, . . . , β5) = (3, −3.5, 4, −3.6, 3.2), and σ = 1. We choose K = D(n/ log p)1/2 with D = 5 and choose c = 2.01 for HDHQ. We have also al-lowed D to vary between 3 and 10 and c to vary among 2.01, 2.51, 3.01, 3.51 and 4.01, but the results are quite similar for the different choices. The results are shown in Table 4.1.
For the sample size n ≥ 100, OGA includes all of the 5 relevant regressors within K iter-ations, and for about 99% of the 1000 simuliter-ations, the HDHQ and HDBIC successfully identify the smallest correct model, irrespective of whether the candidate regressors xj are uncorrelated (η = 0) or highly correlated (η = 2) with y. The OGA+HDBIC+Trim has the best performance, choosing the smallest correct model in all simulations. For comparison, we have also included the performance of OGA+BIC. Table 4.1 shows that for n ≥ 100, OGA+BIC always chooses the largest model ˆJK along the OGA path.
Define the mean squared prediction error as follows:
MSPE = 1 1000
1000
X
l=1
(
p
X
j=1
βjx(l)n+1,j− ˆy(l)n+1)2, (4.12)
where x(l)n+1,1, . . . , x(l)n+1,p are the regressors associated with y(l)n+1, the new outcome in the lth simulation run, and ˆyn+1(l) denotes the predictor of y(l)n+1. The MSPEs of OGA+BIC are quite larger than that of other methods when n ≥ 100 due to serious overfitting.
The traditional BIC can include all relevant variables but it also includes all irrelevant variables. The large p leads to many spuriously significant regression coefficients if one does not adjust for multiple testing [6]. The factor used in (4.4) of HDIC can be regarded as such adjustment.
Autoregressive time-series model Consider the autoregressive model
yt= 0.95yt−r0 +
q
X
j=1
βjxt−rj,j+
N −1
X
j=q+1 m
X
r=1
βr,jxt−r,j+ t, t = 1, . . . , n, (4.13)
where β1,q+1 = · · · = βm,N = 0, t are i.i.d. N (0, σ2) and are independent of xtj. Here, for simplicity, we assume that {y, xj, j = 1, . . . , q} contributes to y only through single lags {rj, j = 0, . . . , q}, respectively. The regressors xj are designed in the same way as (4.11). After converting to multiple linear regression format, the total number of candidate variables will be p := N m n.
Table 4.1: Frequency, in 1000 simulations, of including all five relevant variables (Correct), of selecting exactly the relevant variables (E), of selecting all relevant variables and i irrelevant variables (Ei), and of selecting the largest model which includes all relevant variables (E∗).
η n p Method E E1 E2 E3∼5 E∗ Correct MSPE
0 50 1000 OGA+HDHQ 818 78 23 8 1 928 5.416
OGA+HDHQ+Trim 873 45 8 2 0 928 5.408
OGA+HDBIC 864 44 16 0 0 924 5.533
OGA+HDBIC+Trim 923 1 0 0 0 924 5.526
OGA+BIC 0 0 0 0 928 928 10.598
100 2000 OGA+HDHQ 991 9 0 0 0 1000 0.057
OGA+HDHQ+Trim 991 9 0 0 0 1000 0.057
OGA+HDBIC 999 1 0 0 0 1000 0.053
OGA+HDBIC+Trim 1000 0 0 0 0 1000 0.053
OGA+BIC 0 0 0 0 1000 1000 1.214
200 4000 OGA+HDHQ 1000 0 0 0 0 1000 0.023
OGA+HDHQ+Trim 1000 0 0 0 0 1000 0.023
OGA+HDBIC 1000 0 0 0 0 1000 0.023
OGA+HDBIC+Trim 1000 0 0 0 0 1000 0.023
OGA+BIC 0 0 0 0 1000 1000 0.942
2 50 1000 OGA+HDHQ 592 132 55 38 2 819 10.908
OGA+HDHQ+Trim 780 37 1 0 1 819 10.884
OGA+HDBIC 614 116 51 20 0 801 12.247
OGA+HDBIC+Trim 800 1 0 0 0 801 12.230
OGA+BIC 0 0 0 0 819 819 14.682
100 2000 OGA+HDHQ 989 11 0 0 0 1000 0.062
OGA+HDHQ+Trim 995 5 0 0 0 1000 0.062
OGA+HDBIC 994 6 0 0 0 1000 0.059
OGA+HDBIC+Trim 1000 0 0 0 0 1000 0.059
OGA+BIC 0 0 0 0 1000 1000 1.152
200 4000 OGA+HDHQ 1000 0 0 0 0 1000 0.027
OGA+HDHQ+Trim 1000 0 0 0 0 1000 0.027
OGA+HDBIC 1000 0 0 0 0 1000 0.027
OGA+HDBIC+Trim 1000 0 0 0 0 1000 0.027
OGA+BIC 0 0 0 0 1000 1000 1.022
Consider (4.13) with q = 5, (β1, . . . , β5) = (3, −3.5, 4, −3.6, 3.2), m = 3, (r0, r1, . . . , r5) = (3, 1, 2, 3, 2, 1) and σ = 1. We choose K = D(n/ log p)1/2 with D = 5 and choose c = 2.01 for HDHQ. Define the mean squared prediction error
MSPE = 1 1000
1000
X
l=1
(yn+1(l) − ˆy(l)n+1)2, (4.14)
where yn+1(l) is the new outcome in the lth simulation run, and ˆyn+1(l) denotes the predictor
of y(l)n+1. This MSPE is also used in the next subsection because in practice we only observed the data y and the true model coefficients are unknown. The simulation results are shown in Table 4.2. The results are quite similar to those presented in Table 4.1 except that more samples are needed for the methods to perform well when η > 0. This is mainly due to the structural discrepancy between autoregressive and linear regression models for the former has temporal correlations between lag variables while the latter has ideal i.i.d. samples.
Table 4.2: Simulation results of autoregressive time-series model (see notation in Ta-ble 4.1).
η n p Method E E1 E2 E3∼5 E∗ Correct MSPE
0 50 750 OGA+HDHQ 793 94 22 9 1 919 5.491
OGA+HDHQ+Trim 859 44 11 4 1 919 5.490
OGA+HDBIC 847 54 13 0 0 914 6.047
OGA+HDBIC+Trim 912 2 0 0 0 914 6.046
OGA+BIC 0 0 0 0 919 919 10.964
100 1500 OGA+HDHQ 995 5 0 0 0 1000 1.017
OGA+HDHQ+Trim 995 5 0 0 0 1000 1.017
OGA+HDBIC 1000 0 0 0 0 1000 1.019
OGA+HDBIC+Trim 1000 0 0 0 0 1000 1.019
OGA+BIC 0 0 0 0 1000 1000 2.278
1 100 750 OGA+HDHQ 776 178 33 13 0 1000 1.142
OGA+HDHQ+Trim 983 16 1 0 0 1000 1.120
OGA+HDBIC 790 167 30 13 0 1000 1.139
OGA+HDBIC+Trim 999 1 0 0 0 1000 1.118
OGA+BIC 0 0 0 0 1000 1000 2.300
200 1500 OGA+HDHQ 991 9 0 0 0 1000 1.120
OGA+HDHQ+Trim 995 5 0 0 0 1000 1.120
OGA+HDBIC 996 4 0 0 0 1000 1.119
OGA+HDBIC+Trim 1000 0 0 0 0 1000 1.119
OGA+BIC 0 0 0 0 1000 1000 2.077
Threshold spiking neuron model
To illustrate the variable selection method in a neural spiking context, a large-scale sparse feedforward I&F neuron network is simulated (depicted in Figure 4.1). Briefly, excluding the target neuron, there are N − 1 neurons in total, but only q = 5 neurons have inputs to the target. Three of the five have excitatory effects on the target and the other two hava inhibitory effects. The target neuron is implemented by a direct discrete time summation of the synaptic inputs {β1, . . . , β5} with propagation delay {r1, . . . , r5} respectively. Besides, the background noisy input is set to be i.i.d. N (µ, σ) at every time step with time resolution 1 ms. The internal potential is reset to Vreset = −80 (mV) and produces a spike when the threshold value Vth = −55 (mV) is reached. During the refractory period, the potential will linearly recover from Vreset to the resting potential
Vrest = −70 (mV). To model the diffusion of ions, there is a 2 mV decrease/increase of the potential to the Vrest every unit time depending on the status of de/hyper -polarizations, respectively. The internal potential is forced to lie in the range [EK+, EN a+] = [−90, 60], which are the equilibrium potentials of K+ and N a+.
Figure 4.1: The simulated feedforward neuronal network. Red circles represent excita-tion and green squares represent inhibiexcita-tion. Source neurons 1 − 5 have synaptic inputs β1, . . . , β5 with propagation delay r1, . . . , r5, respectively. The target neuron N is im-plemented by a direct discrete time summation of the synaptic inputs. The background noisy input is set to be i.i.d. N (µ, σ) at every time step with time resolution 1 ms.
The rest N − 1 − q neurons are independent of the target neuron, but all the N − 1 neurons are modeled as Poisson point processes (Xtj) with same firing rate:
Xtj = Dtj + Wt, j = 1, . . . , N − 1, (4.15) where (Dt1, . . . , Dt,N −1, Wt) are independent Poisson processes with firing rate (λ, . . . , λ, η) Hz. So, {Xtj, j = 1, . . . , N −1} have the same firing rate λ+η and are correlated with each other if η 6= 0. The synaptic weights are fixed at (β1, . . . , β5) = (3, −3.5, 4, −3.6, 3.2)×
3 mV, and the associated propagation delay are fixed at (r1, . . . , r5) = (1, 2, 3, 2, 1)× 0.5 sec. Let n donote the data length after binning with bin size 0.5 sec and p = N ∗ m denote the total number of candidate variables, where m is the chosen AR order (will be 3 in this case).
We begin with Simulation 1 in which the associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = σ = 0; namely, the candidate input processes are uncorrelated with each other with common firing rate 40 Hz and there are no background noisy inputs to the target neuron. The simulated firing trajectories and the simulation result are shown in Figure 4.2 and Table 4.3, respectively. For about 99% of the 1000 simulations, both HDHQ and HDBIC successfully identify the smallest correct model as sample size n increases. But it is found that, for small sample size, HDBIC is too conservative and performs not very well; while HDHQ+Trim identifies the smallest correct input neurons
in 948 simulations and includes 1 − 2 extra irrelevant neurons in only 16 simulations. The traditional BIC always chooses the largest set of neurons along the OGA path and causes serious overfitting (largest MSPE).
Figure 4.2: The firing behaviors of the source and target neurons in Simulation 1.
The associated parameter settings are: (β1, . . . , β5) = (3, −3.5, 4, −3.6, 3.2)× 3 mV, (r1, . . . , r5) = (1, 2, 3, 2, 1)× 0.5 sec., N = 250, n = 100, m = 3, p = N m = 750, λ = 40, η = µ = σ = 0.
Table 4.3: Results of Simulation 1 in threshold spiking neuron model (see notation in Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0, σ = 0.
n p Method E E1 E2 E3∼5 E∗ Correct MSPE
100 750 OGA+HDHQ 900 57 6 1 0 964 1.221
OGA+HDHQ+Trim 948 15 1 0 0 964 1.222
OGA+HDBIC 760 4 0 0 0 764 1.680
OGA+HDBIC+Trim 761 0 0 0 0 761 1.680
OGA+BIC 0 0 0 0 993 993 2.455
200 1500 OGA+HDHQ 995 5 0 0 0 1000 1.188
OGA+HDHQ+Trim 995 5 0 0 0 1000 1.188
OGA+HDBIC 999 1 0 0 0 1000 1.185
OGA+HDBIC+Trim 999 1 0 0 0 1000 1.185
OGA+BIC 0 0 0 0 1000 1000 2.041
400 2250 OGA+HDHQ 997 3 0 0 0 1000 1.215
OGA+HDHQ+Trim 997 3 0 0 0 1000 1.215
OGA+HDBIC 1000 0 0 0 0 1000 1.217
OGA+HDBIC+Trim 1000 0 0 0 0 1000 1.217
OGA+BIC 0 0 0 0 1000 1000 1.920
In Simulation 2, the background noise presents with mean value and standard devi-ation (µ, σ) = (0.15, 0.1). Results are shown in Table 4.4. HDBIC still converges to the smallest set of input neurons as n becomes large enough while performs not well when n is small. HDHQ+Trim performs relatively well when n is small, selecting the correct set of neurons in 969 simulations. However, it tends to include 1 − 2 irrelevant neurons at all n since the target neuron in this simulation is not completely triggered by the five input neurons but also by the unmeasurable background noise. Nevertheless, compared to BIC, including 1 − 2 irrelevant neurons among a large set of neurons is still acceptable and good enough in practical use.
Table 4.4: Results of Simulation 2 in threshold spiking neuron model (see notation in Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0.15, σ = 0.1.
n p Method E E1 E2 E3∼5 E∗ Correct MSPE
100 750 OGA+HDHQ 770 176 22 1 0 969 1.285
OGA+HDHQ+Trim 811 153 5 0 0 969 1.279
OGA+HDBIC 649 14 0 0 0 663 1.937
OGA+HDBIC+Trim 656 3 0 0 0 663 1.934
OGA+BIC 0 0 0 0 997 997 2.469
200 1500 OGA+HDHQ 728 263 9 0 0 1000 1.241
OGA+HDHQ+Trim 728 264 8 0 0 1000 1.241
OGA+HDBIC 997 3 0 0 0 1000 1.227
OGA+HDBIC+Trim 997 3 0 0 0 1000 1.227
OGA+BIC 0 0 0 0 1000 1000 2.128
400 2250 OGA+HDHQ 409 567 24 0 0 1000 1.331
OGA+HDHQ+Trim 409 567 18 0 0 1000 1.330
OGA+HDBIC 994 6 0 0 0 1000 1.299
OGA+HDBIC+Trim 994 6 0 0 0 1000 1.299
OGA+BIC 0 0 0 0 1000 1000 2.023
In Simulation 3, we set further (λ, η) = (30, 10) Hz, making the firing behaviors of the candidate input neurons similar; in other words, we make the candidate input variables correlated to each other. Here, the background noise with (µ, σ) = (0.15, 0.1) still keeps.
Table 4.5 shows that, compared to Simulation 2, more sample data are needed to ensure good performance. Because the input variables behave similarly, more information (i.e., large sample size) are needed for the method to distinguish them using the associated behavior of the target neuron. For practical use, if neurons from some brain area behave very similarly, then longer recordings will be needed to correctly identify relevant input neurons among them.
Finally, in simulation 4, we consider two extra indirect inputs in the original network (Figure 4.3). Neurons 6 and 7 are simulated as independent Poisson processes with firing rate 40 Hz. The point processes of Neurons 1 and 2 are designed as time shifted version of Neurons 6 and 7 respectively with i.i.d. N (0.5, 0.1) propagation delay on each spike
Table 4.5: Results of Simulation 3 in threshold spiking neuron model (see notation in Table 4.1). The associated parameter settings are: λ = 30 Hz, η = 10 Hz, µ = 0.15, σ = 0.1.
n p Method E E1 E2 E3∼5 E∗ Correct MSPE
200 750 OGA+HDHQ 768 225 4 0 0 997 1.347
OGA+HDHQ+Trim 769 226 2 0 0 997 1.347
OGA+HDBIC 918 3 0 0 0 921 1.359
OGA+HDBIC+Trim 915 2 0 0 0 917 1.358
OGA+BIC 0 0 0 0 1000 1000 2.188
400 1500 OGA+HDHQ 473 508 19 0 0 1000 1.376
OGA+HDHQ+Trim 473 512 15 0 0 1000 1.376
OGA+HDBIC 989 11 0 0 0 1000 1.363
OGA+HDBIC+Trim 989 11 0 0 0 1000 1.363
OGA+BIC 0 0 0 0 1000 1000 2.080
event [61]. The two indirect inputs are considered as very strong and excitatory so that Neurons 1 and 2 have very similar spiking behavior to that of Neurons 6 and 7. The simulated firing trajectories of Neurons 1 and 6 is shown in Figure 4.4 and the simulation result is shown in Table 4.6. Comparing Table 4.6 with Table 4.4, we can find that, for small sample size (n = 100), there are much more mistakes made in Simulation 4 since the method has not enough information to distinguish Neuron 1 from Neuron 6 and distinguish Neuron 2 from Neuron 7. However, increasing sample size n rapidly alleviates the mistakes and converges nearly to the true set of input neurons.
Figure 4.3: Two extra indirect inputs added in the original network. Neurons 6 and 7 are simulated as independent Poisson processes with firing rate 40 Hz. The point processes of Neurons 1 and 2 are designed as time shifted version of Neurons 6 and 7 respectively with i.i.d. N (0.5, 0.1) propagation delay on each spike event.
Figure 4.4: The firing behaviors of Neurons 1 and 6. The indirect input (Neuron 6) is considered as very strong and excitatory so that Neuron 1 has a very similar spiking behavior to that of Neurons 6.
Table 4.6: Results of Simulation 4 in threshold spiking neuron model with two extra indirect inputs (see notation in Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0.15, σ = 0.1.
n p Method E E1 E2 E3∼5 E∗ Correct MSPE
100 750 OGA+HDHQ 661 144 11 1 0 817 1.427
OGA+HDHQ+Trim 684 130 3 0 0 817 1.434
OGA+HDBIC 555 10 0 0 0 565 1.955
OGA+HDBIC+Trim 558 0 0 0 0 558 1.980
OGA+BIC 0 0 0 0 875 854 2.616
200 1500 OGA+HDHQ 736 210 3 1 0 950 1.301
OGA+HDHQ+Trim 737 210 3 0 0 950 1.301
OGA+HDBIC 946 1 0 0 0 947 1.290
OGA+HDBIC+Trim 946 1 0 0 0 947 1.290
OGA+BIC 0 0 0 0 957 957 2.191
400 2250 OGA+HDHQ 451 527 20 0 0 998 1.233
OGA+HDHQ+Trim 452 531 15 0 0 998 1.234
OGA+HDBIC 993 4 0 0 0 997 1.241
OGA+HDBIC+Trim 993 4 0 0 0 997 1.241
OGA+BIC 0 0 0 0 998 998 1.950