causal strength of complex networks [58, 59]. However, the effects of excitations and inhibitions could not be differentiated in its original form. Based on the GC, we pro-pose a computational algorithm (presented in Section 3.2.3) under the assumption of best linear predictor (BLP) for analyzing neuronal networks by estimating the synaptic weights among them. The idea of the mathematical assumption BLP is that the weighted voltage-fluctuation of the pre-synaptic neurons should be the best linear explanation for the voltage-fluctuation of the post-synaptic neuron among the network. Using this inter-pretation, the GC analysis can be extended to measure both excitatory and inhibitory effects between neurons without too much extra computational complexity. The appro-priateness of the BLP assumption was examined by three sorts of simulated networks:

those with linear, almost linear, and nonlinear network structures. To illustrate the ap-plication of the proposed method, real spike trains from the anterior cingulate cortex (ACC) and the striatum (STR) were analyzed.

It is worth noting that spike trains are non-equally spaced data and are regarded as being from a point process. Filtering is usually required for converting them to equally spaced time series for further GC analyses [57]. This study adopted the Gaussian kernel filtering or binning (depending on the situation) to convert spike trains into time series data for the following three main reasons: (1) it reduces the complexity of analysis, and considers also the effect of temporal summation of action potentials, (2) under suitable preprocessing, even short, sparse spike trains can be converted, so that the standard autoregression modeling can be applied [75], (3) most important of all, spike trains can be filtered to form close approximations to the firing rates or the voltage-fluctuations of the underlying neurons [42].

The rest of this chapter is organized as follows. In Section 2, we briefly introduce the so-called Granger causality index, and then extend it to measure both excitatory and inhibitory effects between network nodes by using the BLP assumption. Section 3 presents three network models to ensure the appropriateness of the proposed algorithm.

In Section 4, we apply the algorithm to analyze real spike train data. Section 5 provides some discussion about the results obtained from Section 3–4, shortcomings of the method, and related future works.

### 3.2.1 An introduction to the GC

Let X = (x^{1}, x^{2}, . . . , x^{n}) be a stationary n-dimensional time series process with zero
mean. The p-th order linear autoregressive model for X is given by

x^{1}_{t} =

p

X

r=1

a^{1,1}_{r} x^{1}_{t−r}+ · · · +

p

X

r=1

a^{1,n}_{r} x^{n}_{t−r}+ ^{1}_{t}

x^{2}_{t} =

p

X

r=1

a^{2,1}_{r} x^{1}_{t−r}+ · · · +

p

X

r=1

a^{2,n}_{r} x^{n}_{t−r}+ ^{2}_{t}
...

x^{n}_{t} =

p

X

r=1

a^{n,1}_{r} x^{1}_{t−r}+ · · · +

p

X

r=1

a^{n,n}_{r} x^{n}_{t−r}+ ^{n}_{t},

(3.1)

where a^{i,j}_{r} is the projection coefficient from the i-th time series onto the j-th time series at
time lag r, representing the coupling strength from node j to node i in the network. The
residuals ^{1}, ^{2}, . . . , ^{n} are zero-mean uncorrelated white noises with covariance matrix
Σ. The diagonal entries {Σ_{ii} = V ar(^{i}), i = 1, . . . , n} measure the accuracy of the
autoregressive prediction to each node based on the information from time stamps t − 1
to t − p.

To see whether the information contained in time series x^{j} is useful in explaining the
state of time series x^{i}, namely, the importance of node j to node i, we can exclude the
time series variable x^{j} from (4.7) to obtain a reduced^{1} (n − 1)-dimensional autoregressive
model with residual series η^{i,j} of x^{i} and corresponding prediction error Γ^{j}_{ii} = V ar(η^{i,j}).

Here Γ^{j}_{ii} measures the accuracy of the prediction of x^{i} based only on the previous values
in time series {x^{1}, . . . , x^{j−1}, x^{j+1}, . . . , x^{n}}. If Σ_{ii} in (4.7) is significantly less than Γ^{j}_{ii} in
the reduced model in some suitable statistical sense, then we say that x^{j} Granger-cause
x^{i}. This causality can be quantified by the GC index from x^{j} to x^{i} formulated as:

F_{j→i}= ln Γ^{j}_{ii}

Σ_{ii}. (3.2)

It is clear that F_{j→i} = 0 when Γ^{j}_{ii} = Σ_{ii}, i.e., x^{j} has no causal influence on x^{i}, and
F_{j→i} > 0 when x^{j} Granger-cause x^{i}. Notice that F_{j→i} is defined only for j 6= i and
is always nonnegative, i.e., Σii is bounded above by Γ^{j}_{ii}, since the full model defined
in (4.7) should fit the data better than the reduced model. Finally, we note that the
GC index (4.8) is significant if the corresponding coefficients a^{i,j}_{r} are jointly significantly
different from zero. This can be assessed via an F -test on the null hypothesis that a^{i,j}_{r}
are zero [29, 60]. The projection coefficients a^{i,j}_{r} and prediction errors Σ_{ii} can be obtained

1The i−th equation of the reduced model reads x^{i}_{t}=

p

X

r=1

b^{i,1}_{r} x^{1}_{t−r}+· · ·+

p

X

r=1

b^{i,j−1}_{r} x^{j−1}_{t−r}+

p

X

r=1

b^{i,j+1}_{r} x^{j+1}_{t−r}+

· · · +

p

X

r=1

b^{i,n}_{r} x^{n}_{t−r}+ η^{i,j}_{t} , where the b’s are the corresponding projection coefficients.

by solving the Yule-Walker equation [39] and an efficient model order can be determined using the Akaike Information Criterion (AIC):

AIC(p) = 2 log(det(Σ)) + 2n^{2}p

T , (3.3)

where T is the total length of the time series. More information about the GC can be found in the author’s previous work [61] and also [4, 13, 20].

### 3.2.2 Synaptic weights estimation

Now, consider the multivariate zero-mean time series, X = (x^{1}, x^{2}, . . . , x^{n}), consisting of
the trajectories of membrane voltage of n distinct neurons. Suppose that the i-th neuron
is triggered by some other k neurons in the network, say {i_{1}, i_{2}, . . . , i_{k}}-th neurons, with
synaptic weights {α^{i}_{1}, α^{i}_{2}, . . . , α_{k}^{i}}. For convenience, we assume that 1 ≤ k := k(i) ≤ n−1.

The case k = 0 means that the i-th neuron is not triggered by others, thus is relatively
easy to deal with. The weights are assumed to be nonzero, if some α_{s}^{i} is zero, then we
can just remove the corresponding index i_{s} from the trigger set. Positive and negative
weights represent excitatory and inhibitory influences on the i-th neuron, respectively.

In general, the trigger set I_{i} := {i_{1}, i_{2}, . . . , i_{k}} and the corresponding weights α^{i} :=

{α^{i}_{1}, α^{i}_{2}, . . . , α^{i}_{k}} in the network can not be identified and estimated easily due to the
un-derlying complex dynamics. However, under the assumption of the best linear predictor
(BLP) (Definition 1 below), I_{i} and α^{i} can be approximated effectively. The results are
described in the following proposition.

Definition 1.

Let x and y be two stationary time series with zero-means. Then we say that y forms the
best linear predictor (BLP) of x among a variable set G if σ^{2}(x|¯x, ¯y) < σ^{2}(x|¯x, ¯z), ∀z ∈ G,
where σ^{2}(x|¯x, ¯y) := min_{p,{f}_{r}},{dr}E{xt−Pp

r=1[fryt−r + drxt−r]}^{2}.
Proposition 1.

In the situation described above, if further the weighted trajectory u^{i} := α^{i}_{1}x^{i}^{1} + α^{i}_{2}x^{i}^{2}+

· · · + α^{i}_{k}x^{i}^{k}, made by the trigger set and the corresponding weights, forms the BLP to the
trajectory of the i-th neuron (namely, x^{i}) among the whole network; then based on the
GC framework, I_{i} can be completely identified and the estimate of α^{i} can be obtained as

ˆ

α^{i} := {Pp

r=1a^{i,i}_{r} ^{1},Pp

r=1a^{i,i}_{r} ^{2}, . . . ,Pp

r=1a^{i,i}_{r} ^{k}} up to a scale factor.

Proof:

Let u^{i} := α_{1}^{i}x^{i}^{1} + · · · + α^{i}_{k}x^{i}^{k} form the BLP of x^{i}, then there exist a positive integer
p and projection coefficients {f_{r}^{i}, r = 1, 2, . . . , p}, {d^{i}_{r}, r = 1, 2, . . . , p} such that x^{i}_{t} =
Pp

r=1[f_{r}^{i}u^{i}_{t−r}+ d^{i}_{r}x^{i}_{t−r}] + ^{i}_{t}, where ^{i} is a stationary white noise possessing the smallest

variance among the whole network. Replacing u^{i} with the weighted trajectory, we obtain

x^{i}_{t} =

p

X

r=1

[f_{r}^{i}u^{i}_{t−r}+ d^{i}_{r}x^{i}_{t−r}] + ^{i}_{t}

=

p

X

r=1

[f_{r}^{i}(α^{i}_{1}x^{i}_{t−r}^{1} + · · · + α^{i}_{k}x^{i}_{t−r}^{k} ) + d^{i}_{r}x^{i}_{t−r}] + ^{i}_{t}

=

p

X

r=1

[α^{i}_{1}f_{r}^{i}x^{i}_{t−r}^{1} + · · · + α^{i}_{k}f_{r}^{i}x^{i}_{t−r}^{k} + d^{i}_{r}x^{i}_{t−r}] + ^{i}_{t}, (3.4)

which represents the underlying but unknown network structure of {x^{i}, x^{i}^{1}, x^{i}^{2}, . . . , x^{i}^{k}}.

On the other hand, fitting to data the same equation as the i-th equation in (4.7), we have the following empirical regression (compared to the theoretical regression (3.4))

x^{i}_{t} =

p

X

r=1

[a^{i,1}_{r} x^{1}_{t−r} + · · · + a^{i,n}_{r} x^{n}_{t−r}] + ˜^{i}_{t}. (3.5)

Let ¯I_{i} := {1, 2, . . . , i − 1, i + 1, · · · , n} − I_{i} be the complement of the trigger set I_{i}. We
note that ˜^{i}_{t} ≡ ^{i}_{t} if a^{i,s}_{r} = 0, ∀r = 1, . . . , p and s ∈ ¯I_{i}; otherwise ˜^{i} and ^{i} are totally
different but with V ar( ˜^{i}_{t}) = V ar(^{i}_{t}) since (3.4) has the smallest residual variance among
the whole network and (3.5) has more degree of freedom (coefficients) than (3.4).

If the trajectories of the ¯I_{i}-th neurons are stochastically independent of both the i-th
and Ii-th neurons, then we have a^{i,s}_{r} = 0, for r = 1, . . . , p and s ∈ ¯Ii. Comparing (3.5)
with (3.4), we then have

p

X

r=1

a^{i,i}_{r} ^{1} = α^{i}_{1}

p

X

r=1

f_{r}^{i} , . . . ,

p

X

r=1

a^{i,i}_{r} ^{k} = α^{i}_{k}

p

X

r=1

f_{r}^{i}. (3.6)

Since the projection coefficients in (3.5) can be obtained by solving the Yule-Walker equation or simply by the least-squares method, (3.6) immediately leads to

α^{i}_{1}
p

X

r=1

a^{i,i}_{r} ^{1}

= _{p}^{α}^{i}^{2}
X

r=1

a^{i,i}_{r} ^{2}

= · · · = _{p}^{α}^{i}^{k}
X

r=1

a^{i,i}_{r} ^{k}
,

(3.7)

provided

p

X

r=1

f_{r}^{i} 6= 0. For α^{i}_{s} and

p

X

r=1

a^{i,i}_{r} ^{s} to have the same sign for s = 1, 2, . . . , k, we

can, without loss of generality, assume that

p

X

r=1

f_{r}^{i} > 0. If

p

X

r=1

f_{r}^{i} < 0, then −u^{i} is used to
replace u^{i}.

If some of the trajectories of the ¯I_{i}-th neurons are linearly dependent of the i-th
or Ii-th neurons, then a^{i,s}_{r} 6= 0, for some r ∈ {1, · · · , p}, s ∈ ¯Ii and the projection

coefficients in (3.5) are thus affected, resulting in a biased estimation of (3.7). However,
this predicament can be solved by virtue of the assumption of BLP and the concept of
GC. Since the ^{i} in (3.4) has the smallest variance among the whole network, taking out
any element of {x^{s} : s ∈ ¯I_{i}} from the regression (3.5) does not increase the variance of ˜^{i}.
According to the concept in (4.8), we can correct the model coefficients by ruling out all
the useless information of the ¯I_{i}-th neurons.

We end this subsection by the following remarks.

Remark 1.

The idea behind the BLP mathematical assumption is that the weighted voltage-trajectory of the trigger neurons should be the best linear explanation for the voltage-trajectory of the target neuron among the whole network. Based on this interpretation, the GC index can be extended to measure both excitatory and inhibitory effects in virtue of the esti-mated synaptic weights.

Remark 2.

The synaptic weight α^{i}_{s} and the summation of the projection coefficients

p

X

r=1

a^{i,i}_{r} ^{s} are

forced to have the same sign for s = 1, 2, . . . , k, because positive and negative

p

X

r=1

a^{i,i}_{r} ^{s}
refer to positive and negative correlations between x^{i} and x^{i}^{s}, respectively.

Remark 3.

The case k = 0 means that the i-th neuron is not triggered by other neurons in the
network, therefore F_{j→i}= 0, ∀j and there is no synaptic weights to be estimated.

Remark 4.

For readers dealing with sparse networks, L1 regularization (or LASSO) would be an use-ful technique for fitting to data a sparse regression to avoid overfitting and the problem of multiple testing [1,49]. In this scenario, a computational much more efficient approach would be first running LASSO to learn the network structure and then using GC to get the causal strength.

Remark 5.

Some arguments in the proof such as the stochastic independence and the estimations of projection coefficients assume that the law of large numbers (LLN) holds. In the case of small samples or limited data, estimation errors come into exsitence thus some statistical inferences in the proof may fail. However, the proof holds for most large-sample cases.

### 3.2.3 The algorithm

Here, we present a step by step algorithm for computing the proposed index (named neuron synaptic index, NSI) from multiple spike train data.

Step 1 : Properly smooth the spike train data by kernel filtering (Gaussian kernels are commonly used) to acquire the approximate membrane voltage trajectories to the underlying voltage evolution of the neurons in the network.

Step 2 : Subtract the mean value from each trajectory to form zero-mean time series and then fit the vector autoregressive model in (4.7). Appropriate model order can be obtained beforehand by using AIC in (4.9).

Step 3 : Compute all the GC indices by (4.8) for all pairs of neurons, i.e., i, j = 1, 2, . . . , n with i 6= j and also perform F -tests to ensure statistical significance.

Step 4 : For each node i = 1, 2, . . . , n, refine the autoregressive model by ruling out
the information about the ¯I_{i}-th neurons, i.e., the neurons with insignificant F_{j→i},
to correct the projection coefficients.

Step 5 : For each node, compute the synaptic weights of the trigger neurons by simply summing the projection coefficients up to the model order p as shown in (3.7).

Step 6 : For each node, take the weighted trajectory as a new explanatory time series and then compute the GC index from this weighted time series to that of the node.

Step 7 : Finally, the NSI is then defined to be the l_{1}-normalized ^{2} estimated
weights obtained in Step 5 multiplying the GC index obtained in Step 6 (see (B.5)
in Appendix).