Figure 3.10: The first 2 sec. simulated spike trains of one of the realizations. The
parameters are: T = 10 (sec.), λ = 20 (spikes/sec.), w = 8 (mV), d = 10 (ms), k = 5,
m = 2, and bin width 0.1 (sec.), The source spike train P is labeled neuron #1, the
decomposed trains P_{i} are labeled neuron #2 − 6, the environment trains U_{i} are labeled
neuron #7 − 8, and the target Q is labeled neuron #9.

will be included in a separate article in the future.

finite order; the original GCI can reveal the degree of causal influences between neurons while the extended NSI can reveal the degree and the type of synaptic transmission.

Theoretically, the GCI and NSI can only be approximations or even be spurious if either of these two conditions fail to hold. It seems to be a very strong constraint for researchers to apply these approaches in practice. However, things are not really that bad. The simulated IF spiking networks in Section 3.3.2-3.3.3 are tested to be correctly captured by vector autoregressive model and show a consistency between network structures and the proposed indices. Finally and also importanly, note that significant GCI or NSI do not necessarily signify a anatomical connectivity since they are fundamentally statistical concepts. Treated with care, the GCI and NSI could be useful for researchers to infer possible relationships between network structure or to construct a description of network dynamics in neuroscience.

It should be pointed out that the significance of the NSI can be checked directly via the significance test on the GCI, since the proposed method is GC-based. In addition, if voltage-trajectories are available then the method can be applied directly without any ambiguity. For spike train data, however, the role of voltage-trajectory can be replaced by the trajectory of firing rate estimated using binning or Gaussian kernel filtering. The high firing rate in the simulations are to ensure significant inhibitory effects among the simple networks. In practice, the high firing rate can be considered as a consequence of pooling spike trains. Low firing rate leads to sparse spike trains, for which the Gaussian kernel filtering will introduce highly artificial signals which will hinder the autoregression modeling for computing the GCI. The solution is applying some suitable preprocessing [75] or applying binning [63], which is more stable for converting sparse spike trains.

In closing, it is worth noting some related articles for possible future works. (i) In Section 3.3.3, the NSI was shown to successfully capture the behavior of synaptic transmissions. Although the NSI values are correlated with the underlying synaptic weights, it is not clear what the actual relationship might be. Cadotte et al. have found that under certain settings, the GCI and the synaptic weight has the following nonlinear relationship [12]:

F_{Y →X} = 1

1 + 384e^{−02124∗sw}, (3.10)

where sw denotes the synaptic weight. Similar results for the NSI may be derived in the future using (3.10). (ii) When faced with most scientific computing problem, a linear model is generally a first, basic, and winning strategy to try. Neural spiking networks are highly complicated and nonlinear, a linear model could be considered to be satisfactory if it can well approximates the behavior of a nonlinear dynamics to some degree. In the current study, the behavior of the simulated nonlinear network is well captured by our approach; obtaining the exact ratio of the synaptic weights can be set as an important objective for us to strive. Our proposed method and the BLP assumption could be refor-mulated and generalized to handle nonlinear dynamics by means of the nonparametric kernel modelling [48] in the future. (iii) This work did not consider any hidden network structure in both theoretical and numerical parts. The partial Granger causality, pro-posed by Guo et al. [30], may be used to extend the method to deal with the effects

of exogenous inputs or hidden neurons. (iv) Granger causality is originally designed to measure effect, not mechanism [5]. Numerical evidences showed that Granger causality can, to some degree, be used to infer the underlying mechanism. The GCI only use the information of residual noises, there are still some useful information which could be extracted from the regression coefficients [32]. The NSI uses the summation of re-gression coefficients (3.7), other forms of information may be developed in the future.

(v) The Gaussian filtering and binning techniques link the spike train data and the sta-tistical models for continuous signals, leading to both mathematically easy derivations and computationally efficient algorithms. However, distortions may arise after the filter-ing is applied. A generalized linear model (GLM)-based point process framework was proposed for directly applying the GC on neural spike trains without any filtering [38].

A conceptually similar but more robust measure, called directed information, was also proposed [55]. The modality-independent nature allows the measure to characterize sta-tistically causal relationship between arbitrary stochastic processes. A more sophisticated coupling model was also proposed [53]. It’s parameters consist of a bank of stimulus fil-ters, spike-history filfil-ters, and couping filters. Splines can also be used to fit nonlinearity in the stimulus filter. The mathematics used in these frameworks are more involved than that used in this chapter, but we surmise that the proposed algorithm and assumption can somehow be translated into them to obtain similar or more powerful results in the future. (vi) Recently, a new framework of spatio-temporal Granger causality has been proposed to reliably estimate the Granger causality from experimental datasets possess-ing time-varypossess-ing properties [44]. The NSI may be extended to its dynamic version for automatically analyzing experimental datasets without laborious jobs on windowing.

## Chapter 4

## Large-Scale Sparse Neuronal Networks

### 4.1 Introduction

In neuroscience, identifying causal dependencies between multiple time-series signals is a topic of great interest. Regression methods and time-series models have been ex-tensively employed in attempt to uncover the underlying brain connectivity. One of the most popular and successful method is the Granger causality (GC) which pro-poses that if the prediction of one time series can be improved with the knowledge of a second time series, then there is a causal influence from the second one to the first one. The prediction mentioned above is made by using autoregressive models [27]

and a GC index (GCI) is also derived to quantify the strength of caual interactions.

The GC is shown to be effective in distinguishing between direct and indirect causal connections and has applications across various domains such as: electroencephologra-phy/magnetoencephalography (EEG/MEG) [2,26,30], local field potential (LFP) [13,24], functional magnetic resonance imaging (fMRI) [40,45,73], calcium imaging (CI) [21], and multiple spike trains [18, 38, 41, 74]. Recently, based on the GC framework, we developed a new index called neuron synaptic index (NSI) to measure the synaptic weights between neurons in a neuronal network [62]. GCI is designed in its original form to measure causal connectivities while NSI is designed to measure structural connectivities, and in general they differ from each other. The GCI and NSI will be shortly introduced in Section 4.2.5 and 4.2.6, respectively. More details about the GC can be found in [4, 8, 12, 20, 31].

Nowadays, due to the development of multi-electrode array and optogenetic tech-niques, more and more large-scale time series data become available. When the number of variables is much larger than the length of time series, traditional implementation of the GC encounters serious problems such as solvability of large-sclae underdetermined system, high computational cost/complexity, and multiple statistical testing. The same problem exists in microarray dataset of large-scale gene regulatory networks [15, 49, 76].

To cope with this high-dimensional situation, penalized regression methods (e.g., L1-regularization or LASSO, LASSO variants) have been applied. Although these methods lead to a considerable progress for the analysis of high-dimensional sparse data, there still leaves room for improvement. (1) The regularization parameter is not easy to tune,

con-ducting time-consuming cross-validation (CV) is needed. (2) CV usually gives not very sparse solutions which might contains lots of irrelevant variables, leading to overfitting problems. (3) Lasso can fail to distinguish irrelevant variables that are highly correlated with relevant variables [72]. (4) Lasso may lead to severe bias for large regression coeffi-cients [22]. GC has proven to be effective and powerful in the case where the number of variables is much less than the length of time series. For neural data, short time-series length is often used to ensure the stationarity of the data and the time-invariance of the underlying structure since neural data are usually only locally stationary and locally time-invariant. However, the number of variables is capable of being reduced if the un-derlying connection to each neuron is sparse, meaning that the number of connecting edges is quite small compared to the number of possible edges. Therefore, our direction is to first, for each neuron, reduce the large-scale variable set to a small one which may contains a few irrelevant variables but should contains all the relevant variables; then, for each neuron, apply the GC to the associated reduced set.

The assumption of network sparsity has the following important meanings: the first one is for mathematical reasons that oversized ratio between variable number and series length causes extreme obstacles to any existing statistical method. So, limit the net-work complexity is currently the best way to cross the obstacle. The second one is for the evidence found that many brain areas have sparse structures. For instance, cortical neurons were found to be quite sparsely connected relative to the population of neu-rons in a cell’s neighborhood [10, 11, 16, 52, 66]. The third one is for practical use that picking out sparse but pivotal variables from a large-scale dataset may be quite satis-factory to most experimental purposes. Based on the sparsity assumption, we propose a computational algorithm (in Section 4.2.4) for automatically and effectively selecting relevant neurons from a large scale neuronal network. The kernel wrapped in the algo-rithm is a three-stage model selection approach (Section 4.2.1–4.2.3), which consists of (1) an orthogonalized forward selection of input variables that circumvents difficulties with inverting high-dimensional matrices, (2) a stopping criterion to terminate the for-ward inclusion of variables, and (3) a backfor-ward elimination of variables according to the criterion to ensure the consistency of the approach. The performance was checked by a large-scale simulated threshold spiking neuron model, and real data from rats executing empathic prosocial behaviors were also analyzed.