## 國 國

## 國 國 ** 立** 立 立 立 ** 中** 中 中 中 ** 央** 央 央 央 ** 大** 大 大 大 ** 學** 學 學 學

數 學 系 博 士 論 文

Causal Connectivity Analysis for Identifying the Information Flow of Neuronal Networks

研 究 生：邵 培 強

指導教授：單 維 彰 博士

嚴 健 彰 博士

### Causal Connectivity Analysis for Identifying the Information Flow of Neuronal Networks

### Pei-Chiang Shao

A Dissertation

Submitted to Department of Mathematics College of Science

National Central University

in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

in Mathematics

June 2016

神經元網路訊息的因果分析 神經元網路訊息的因果分析神經元網路訊息的因果分析 神經元網路訊息的因果分析

摘要 摘要摘要 摘要

本文主要發展用來判別神經元網路資訊流向的統計分析方法。格蘭傑因果關係 (Granger causality, GC) 是個用來估計時間序列訊號間因果交互作用相當流行且有效的概念。在過去 的二十年來，GC 已成為一個可用來偵測多種不同類型神經活動資料間因果關聯的強大分析方 法。然而，在實際運用 GC 到神經科學領域時，有三個主要問題尚須進一步的研究與釐清：第 一，原始的 GC 測度被設計成非負實數的形式，因此缺乏用來區分神經元間興奮性與抑制性因 果關聯的有效特徵。第二，雖然 GC 已大量被使用，但估算出的因果強度與實際神經元之間突 觸權重的關聯仍不清楚。第三，GC 無法直接用來解析大尺度的神經元資料，因為其中的變數 數量經常遠遠超過所紀錄到的時間序列樣本數。對於上述前兩個問題，我們利用原始 GC 的架 構並根據一個最佳線性預測子（best linear predictor, BLP）的假設，提出了能夠有效解 析神經元網路間突觸權重的計算方法。在 BLP 的假設下，GC 可被擴展以同時測量神經元間興 奮性與抑制性作用。我們設計了三種不同類型的模擬神經元網路來測試所提的新方法，包含 從簡單的線性與近線性網路結構至複雜的非線性網路結構。這些模擬例子驗證了 BLP 假設的 正當性以及計算方法的正確性。此方法也被示範性的用來分析大腦前扣帶皮層（ACC）與紋狀 體（STR）真實的神經元活動資料。分析結果顯示，在注射 D2 多巴胺受體刺激劑的狀態下，

ACC 當中以及由 ACC 投射至 STR 存在顯著的興奮性作用，而 STR 當中存在顯著的抑制作用。

實務上，從腦中擷取大尺度紀錄之神經訊號間的因果交互作用對於腦部特定功能之完整解析 與可靠推論來說相當重要。然而，在大尺度訊號中，神經元的數量往往嚴重大過紀錄到的訊 號長度；此過大的變數與樣本比值對於大多數現存的統計分析方法來說都會造成相當大的阻 礙，甚至使分析方法在數學計算上完全失效。本文接著介紹一個三階段的變數選擇方法，它 能夠有效地將一個大尺度變數集合調降成一個只包含有相關性變數的較小變數集，進而讓 GC

能夠應用到此調降的變數集合上。此方法使用（1）正交向前選擇解釋變量，（2）一停止準則

以終止前向納入變量，和（3）進一步向後修剪消除不相關變量。對於上段提到的第三個問題，

我們以此三階段方法為核心提出一個變量選擇演算法使得 GC 能夠被用來解析大尺度的時間 序列資料。我們設計了一個以電位閾值激發的大尺度神經元網路模型來試驗所提方法的一致 性。此方法應用在大鼠的行為資料分析中也得到全新的發現。這些框架提供了真實神經元網 路分析的新方向。

關鍵詞關鍵詞

關鍵詞關鍵詞：：：：格格格蘭格蘭蘭傑蘭傑傑因傑因因果因果果關果關關係關係係、係、、格、格格格蘭傑因蘭傑因蘭傑因蘭傑因果果果果指標指標指標指標、、、、向向向量向量量自量自自迴自迴迴歸模型迴歸模型歸模型歸模型、、、神、神神神經經經經元元元元網路網路網路網路、、、突觸、突觸突觸突觸權重權重權重估權重估估估計計計計、、、、

## Abstract

This thesis is devoted to developing statistical methods for identifying the information flow of neuronal networks. Granger causality (GC) is a very popular and also useful concept for estimating causal interactions between time-series signals. Over the past two decades, GC has emerged as a powerful analytical method for detecting the causal relationship among various types of neural activity data. However, three problems en- countered when GC was used in the field of neuroscience remain not very clear and further researches are needed. Firstly, the GC measure is designed to be nonnegative in its orig- inal form, lacking of the trait for differentiating the effects of excitations and inhibitions between neurons. Secondly, how is the estimated causality related to the underlying neu- ronal synaptic weights? Thirdly, GC can not be applied to large-scale neuronal data, in which the number of variables is far greater than the length of time series. For the first two problems, we propose, under a best linear predictor (BLP) assumption, a compu- tational algorithm for analyzing neuronal networks by estimating the synaptic weights among them. Under the BLP assumption, the GC analysis can be extended to measure both excitatory and inhibitory effects between neurons. The method was examined by three sorts of simulated networks: those with linear, almost linear, and nonlinear network structures. The method was also illustrated to analyze real spike train data from the an- terior cingulate cortex (ACC) and the striatum (STR). The results showed, under the quinpirole administration, the significant existence of excitatory effects inside the ACC, excitatory effects from the ACC to the STR, and inhibitory effects inside the STR.

Extracting causal interactions from a large-scale recording of neural ensemble in the brain is very important to a comprehensive understanding or a reliable inference of certain brain functions. However, the oversized ratio between neuron number and signal length causes great difficulties to most of existing statistical methods. This thesis also introduces a three-stage variable selection approach that can be used to effectively reduce the large- scale variable set to a small but relevant one and enables Granger causality to be applied to the reduced set. The method uses (1) an orthogonalized forward selection of input variables, (2) a stopping criterion to terminate the forward inclusion of variables, and (3) a backward elimination to further trim irrelevant variables. For the third problem, we propose a computational algorithm which wraps the above selection approach as its core, enabling GC to work on large-scale time-series data. The method was examined by a large-scale simulated threshold spiking neuron model, and real behavioral data from rats were also analyzed. These frameworks give an insight into the analysis of real neuronal networks.

Keywords: Granger causality; Granger causality index; Vector autoregressive model;

Neuronal networks; Synaptic weights estimation; Neuron Synaptic Index; Variable selec- tion; Large-scale neuronal networks; High-dimensional data analysis.

## Contents

1 Brief Introduction to Neurons 1

1.1 The neuron . . . 1

1.2 Action potentials . . . 2

1.3 Synaptic transmissions . . . 3

2 Effects of Spike Sorting Error 5 2.1 Introduction . . . 5

2.2 Modeling and analysis . . . 6

2.2.1 A short introduction to the GCI . . . 6

2.2.2 An explicit formula . . . 7

2.2.3 Essential GCI factors . . . 8

2.2.4 GCI vs. variance reduction . . . 10

2.3 Simulation study . . . 11

2.3.1 Models . . . 11

2.3.2 Setup . . . 12

2.3.3 Simulation results . . . 12

2.3.4 Supplementary simulations of FPs . . . 15

2.3.5 Simulation for threshold detection . . . 17

2.4 Real data analysis . . . 18

2.4.1 Experimental setup . . . 18

2.4.2 Experimental Results . . . 20

2.5 Discussion . . . 21

3 Synaptic Weights Estimation 24 3.1 Introduction . . . 24

3.2 Modeling and analysis . . . 25

3.2.1 An introduction to the GC . . . 26

3.2.2 Synaptic weights estimation . . . 27

3.2.3 The algorithm . . . 29

3.3 Simulation study . . . 30

3.3.1 Linear network . . . 31

3.3.2 Almost-linear network . . . 32

3.3.3 Nonlinear network . . . 36

3.4 Real data analysis . . . 38

3.4.1 Setup and results . . . 38

3.4.2 Implications of the pooled data . . . 40

3.5 Discussion . . . 43

4 Large-Scale Sparse Neuronal Networks 46 4.1 Introduction . . . 46

4.2 Methods . . . 47

4.2.1 Forward stepwise regression via the OGA iteration . . . 47

4.2.2 Stopping rule and consistent model selection . . . 48

4.2.3 Further trimming to exclude irrelevant variables . . . 49

4.2.4 The computational algorithm . . . 49

4.2.5 Granger causality index . . . 50

4.2.6 Neuron synaptic index . . . 50

4.3 Results . . . 51

4.3.1 Simulation . . . 51

4.3.2 Real data analysis . . . 59

4.4 Discussion . . . 65

A Derivation of the Explicit Formula 67

B Derivation of the NSI using Simple Network 68

## List of Figures

2.1 The relationship between the Granger Causality Index (GCI) and k. This transforms GCI values to the corresponding percentage of variance reduc- tion in the residual noise process. . . 11 2.2 (a) Simulation results of Experiment 1, in which the parameter settings

were λ = 2, T = 100, m = 0.1, σ = 0.02. (b) Simulation results of a much
more concentrative normal, in which the parameter settings were λ = 2,
T = 100, m = 0.1, σ = 0.02, and σN A = T /64. (c) Simulation results
of Experiment 2, in which the parameter settings were λ = 2, T = 100,
m = 0.1, σ = 0.04. (d) Simulation results of Experiment 3, in which the
parameter settings were λ = 2, T = 100, m = 0.1, σ = 0.06. The error
percentages r were all 0 ∼ 0.9, and increased by 0.1. . . 13
2.3 (a) ξ_{1} to ξ_{4} of the NA model, in which the parameter settings were λ = 2,

T = 100, m = 0.1, σ = 0.02, σ_{N A} = T /64. (b) ξ_{1} to ξ_{4} of the SR model,
in which the parameter settings were λ = 2, T = 100, m = 0.1, σ = 0.02.

The error percentages r were both 0 ∼ 0.9, and increased by 0.1. . . 16
2.4 (a) Simulation results of Experiment A–C for FP. (b) ξ_{1} to ξ_{4}of the Exper-

iment A. (c) ξ_{1} to ξ_{4} of the Experiment B. (d) ξ_{1} to ξ_{4} of the Experiment
C. . . 17
2.5 (a) The first 10 APs of the underlying AP sequence A. (b) The detected

spikes (marked by ◦) with threshold value 2.5τ . (c) The detected spikes (marked by ◦) with threshold value 3.0τ . . . 19 2.6 (a) The relationship between GCI value and threshold value by changing

the threshold from 1.5τ to 3.5τ . (b) Number of FPs increases as threshold value decreases. (c) Number of FNs increases as threshold value increases. 20 2.7 (a) Three kinds of Granger Causality Index (GCI) patterns which fre-

quently appeared in real experimental data. (b) ξ_{1} to ξ_{4} of the false-
negative (FN)-decrease pattern. (c) ξ_{1} to ξ_{4} of the false-positive (FP)-
decrease pattern. (d) ξ_{1} to ξ_{4} of the FP-increase pattern. . . 22
3.1 A simple linear network. Red circles represent excitation and green squares

represent inhibition. Variable w serves as the trajectory of post-synaptic
neuron, variables x, y, and z serve as the trajectories of pre-synaptic neu-
rons with synaptic weights α > 0, β > 0, and γ < 0, respectively. v1,v2,
and v_{3}are collateral variables, consisting of source, target, and independent
nodes. A much clear-cut derivation of the NSI using this simple network
is given in Appendix. . . 32

3.2 A simple feedforward integrate-and-fire neuron network. Red circles rep- resent excitation and green squares represent inhibition. Neurons #2 − 5 are modeled by independent Poisson processes with firing rate λ. Neurons

#8, 9 were modeled as single strong inputs by independent Poisson pro-
cesses with firing rate 1.5λ. Neuron #7 is implemented by a direct discrete
time summation of the synaptic inputs α_{1}, . . . , α_{6}(mV ). Neurons #1, 6 are
implemented in the same way that Neuron #7 is done with synaptic inputs
30 (mV) from Neurons #8, 9 respectively. Neurons #10, 11 are modeled
as independent nodes by independent Poisson processes with firing rate λ.

Neurons #5, 6 are inhibitory, i.e., α5, α6 < 0. . . 33 3.3 The first 1 sec. of a simulated voltage-trajectory of Neuron #7 with input

rate λ = 40 Hz. The simulation was done according to the way described
in the context with α_{1} = α_{2} = α_{3} = α_{4} = 5 (mV), α_{5} = α_{6} = −2.5 (mV),
and 10 ms. propagation delay. The subthreshold trajectory is not very
regular due to the lack of self dynamics, in other words, Neuron #7 is
completely triggered by Neurons #1 − 6. . . 34
3.4 The first 1 sec. of a simulated spike train data of the simple feedforward

network with λ = 40 Hz, α_{1} = α_{2} = α_{3} = α_{4} = 5 (mV), α_{5} = α_{6} = −2.5
(mV), and 10 ms. propagation delay. . . 34
3.5 A Gaussian kernel filtering with bandwidth 5 ms. was performed on the

spike train data depicted in Figure 3.4 to obtain an approximation of the subthreshold dynamics of each neuron in the network. The computations of the GCI and NSI were based on the filtered results and this figure shows the first 250 data points. . . 35 3.6 The first 0.5 sec. of a simulated voltage-trajectory of Neuron #7 with k = 1

under the fast spiking (FS) neuron model (a = 0.1, b = 0.25, c = −65, d = 2). 37 3.7 The first 0.5 sec. of a simulated voltage-trajectory of Neuron #7 with

k = 1 under the low-threshold spiking (LTS) neuron model (a = 0.02, b = 0.25, c = −65, d = 2). . . 38 3.8 The firing trajectories (400 sec. with 2 sec. bin) under the quinpirole

administration. (a) excitatory effects inside the ACC (NSI = 0.1035).

(b) excitatory effects from the ACC to the STR (NSIs = 0.1116, 0.1995).

(c) excitatory effects from the ACC to the STR (NSIs = 0.1445, 0.1121).

(d) inhibitory effects inside the STR (NSI = −0.1058). It can be found that positive (negative) NSIs exhibit positive (negative) correlations in the fluctuations of the signals. . . 41 3.9 An illustration for pooled data. On the cause side (left brown), the pooled

train (pool 1) can be considered as a collective input with respect to the effects of temporal or spatial summation of one of the two types ((i) and (ii) in the context.) On the effect side (middle blue), the pooled train (pool 2) can be considered as a collective output, which represents the total discharge of a group of cooperative neurons in function. Again, the collective output (pool 2) can then be treated as a collective input to trigger others (right purple). . . 42

3.10 The first 2 sec. simulated spike trains of one of the realizations. The
parameters are: T = 10 (sec.), λ = 20 (spikes/sec.), w = 8 (mV), d = 10
(ms), k = 5, m = 2, and bin width 0.1 (sec.), The source spike train P is
labeled neuron #1, the decomposed trains P_{i} are labeled neuron #2 − 6,
the environment trains U_{i} are labeled neuron #7 − 8, and the target Q is
labeled neuron #9. . . 43

4.1 The simulated feedforward neuronal network. Red circles represent exci-
tation and green squares represent inhibition. Source neurons 1 − 5 have
synaptic inputs β_{1}, . . . , β_{5} with propagation delay r_{1}, . . . , r_{5}, respectively.

The target neuron N is implemented by a direct discrete time summa- tion of the synaptic inputs. The background noisy input is set to be i.i.d.

N (µ, σ) at every time step with time resolution 1 ms. . . 55 4.2 The firing behaviors of the source and target neurons in Simulation 1. The

associated parameter settings are: (β_{1}, . . . , β_{5}) = (3, −3.5, 4, −3.6, 3.2)×

3 mV, (r_{1}, . . . , r_{5}) = (1, 2, 3, 2, 1)× 0.5 sec., N = 250, n = 100, m = 3,
p = N m = 750, λ = 40, η = µ = σ = 0. . . 56
4.3 Two extra indirect inputs added in the original network. Neurons 6 and 7

are simulated as independent Poisson processes with firing rate 40 Hz. The point processes of Neurons 1 and 2 are designed as time shifted version of Neurons 6 and 7 respectively with i.i.d. N (0.5, 0.1) propagation delay on each spike event. . . 58 4.4 The firing behaviors of Neurons 1 and 6. The indirect input (Neuron 6)

is considered as very strong and excitatory so that Neuron 1 has a very similar spiking behavior to that of Neurons 6. . . 59 4.5 Average all the projected ACC, InC and MI unit activity changes when

the rat was opening the gate for a conspecific. Values in the Y-axis are normalized Z-score calculated from ACC neurons which projected to InC (A, n = 8), MI (B, n = 5), and ACC (C, n = 17) units; InC neurons which projected to ACC (D, n = 5), MI (E, n = 4), and InC (F, n = 15) units;

MI neurons which projected to ACC (G, n = 4), InC (H, n = 5), and MI (I, n = 15) units. The red line represents the 99 % confidence interval.

The red arrows above the histograms point to the periods of the activity exceeded mean +2.33 SD with 2 consecutive bins. Bin size = 100 ms. . . 62 4.6 Average all the projected ACC, InC and MI unit activity changes when

the rat was opening the gate for a toy rat or nothing inside the box. Values in the Y-axis are normalized Z-score calculated from ACC neurons which projected to InC (A, n = 7), MI (B, n = 10), and ACC (C, n = 4) units;

InC neurons which projected to ACC (D, n = 7), MI (E, n = 5), and InC (F, n = 11) units; MI neurons which projected to ACC (G, n = 6), InC (H, n = 2), and MI (I, n = 12) units. The red line represents the 99 % confidence interval. Bin size = 100 ms. . . 63

4.7 Average all the recipient unit activity changes in the ACC, InC and MI when the rat was opening the gate for a conspecific. Values in the Y- axis are normalized Z-score calculated from ACC neurons which received projection from InC (A, n = 5), MI (B, n = 5), and ACC (C, n = 19) units; InC neurons which received projection from ACC (D, n = 9), MI (E, n = 7), and InC (F, n = 18) units; MI neurons which received projection from InC (G, n = 3), ACC (H, n = 6), and MI (I, n = 18) units. The red line represents the 99 % confidence interval. Bin size = 100 ms. . . 64 4.8 Average all the recipient unit activity changes in the ACC, InC and MI

when the rat was opening the gate for a toy rat or nothing inside the box.

Values in the Y-axis are normalized Z-score calculated from ACC neurons which received projection from InC (A, n = 8), MI (B, n = 6), and ACC (C, n = 8) units; InC neurons which received projection from ACC (D, n = 7), MI (E, n = 2), and InC (F, n = 14) units; MI neurons which received projection from InC (G, n = 4), ACC (H, n = 9), and MI (I, n = 12) units. The red line represents the 99 % confidence interval. Bin size = 100 ms. . . 65

## List of Tables

2.1 A schematic summary of Corollaries 1–5. The factors ξi, i = 1, 2, 3, 4 and
error process δx are described in Section 2.2. + (−) stands for positive
(negative) sign. FP (FN) stands for false positive (negative). ↑ (↓) stands
for increasing (decreasing). . . 10
2.2 Relative errors, ^{F (r)−F}^{˜} _{F} × 100%, of GCI for r = 0.1 ∼ 0.9 of the PA and

PR models in the three experiments. . . 14 3.1 The numerical results of Simulation 1 in Section 3.3.2. The effects of

excitations and inhibitions can be differentiated directly by the sign of the NS indices and the ratio of effects between them was close to 5.0 : −2.5 = 1 : −0.5. Numbers in parentheses are corresponding standard errors. . . . 35 3.2 The numerical results of Simulation 2 in Section 3.3.2. The input rate λ

was fixed at 60 Hz, α_{1} = α_{2} = α_{3} = α_{4} = 5 (mV) , and α_{5} = α_{6} = −k × 5
(mV). The ratio of effects between excitatory and inhibitory sources was
still close to 1 : −k as weight ratio changes. Numbers in parentheses are
corresponding standard errors. . . 36
3.3 The numerical results of the simulations in Section 3.3.3. The parameter

settings are: λ = 60 Hz, and α_{1} = α_{2} = α_{3} = α_{4} = 5 (mV), α_{5} = α_{6} =

−k × 5 (mV), all of which with 10 ms. propagation delay. Numbers in parentheses are corresponding standard errors. . . 37 3.4 Groups found performing significant NSIs in Section 3.4.1. The numbers

of neurons recorded were: 8 (ACC, Rat#1), 9 (STR, Rat#1), 16 (ACC, Rat#2), and 15 (STR, Rat#2). Each group had 8/2 = 4, 9/3 = 3, 16/4 = 4, and 15/3 = 5 neurons. Hence there were 2 + 3 = 5 pools in Rat#1 and 4 + 3 = 7 pools in Rat#2. . . 40 3.5 The NSIs between the pooled data from the combinations summarized in

Table 3.4. . . 40 4.1 Frequency, in 1000 simulations, of including all five relevant variables (Cor-

rect), of selecting exactly the relevant variables (E), of selecting all relevant
variables and i irrelevant variables (E_{i}), and of selecting the largest model
which includes all relevant variables (E^{∗}). . . 53
4.2 Simulation results of autoregressive time-series model (see notation in Ta-

ble 4.1). . . 54 4.3 Results of Simulation 1 in threshold spiking neuron model (see notation in

Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0, σ = 0. . . 56

4.4 Results of Simulation 2 in threshold spiking neuron model (see notation in Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0.15, σ = 0.1. . . 57 4.5 Results of Simulation 3 in threshold spiking neuron model (see notation in

Table 4.1). The associated parameter settings are: λ = 30 Hz, η = 10 Hz, µ = 0.15, σ = 0.1. . . 58 4.6 Results of Simulation 4 in threshold spiking neuron model with two ex-

tra indirect inputs (see notation in Table 4.1). The associated parameter settings are: λ = 40 Hz, η = 0 Hz, µ = 0.15, σ = 0.1. . . 59

## Chapter 1

## Brief Introduction to Neurons

The aim of this chapter is to briefly introduce some elementary notions and terminolo- gies of neuroscience such as, action potentials, chemical synapses, neurotransmitters and postsynaptic potentials. We focus primarily on how neurons generate action potentials and how individual neurons communicate with each other via synapses. Based on these notions, we can then get into some interesting investigations on computational neuro- science such as spike-sorting errors, synaptic weights estimation and neuronal network reconstruction, etc. through the following chapters.

### 1.1 The neuron

Neurons are specialized cells that can receive and transmit signals. They are the building blocks of the whole nervous system and are responsible for communicating information throughout the body. There are approximately 100 billion neurons in human brain. Most of them consist of three primary components: the cell body (also called soma), the axon, and the dendrites. In general, dendrites are short and tree-like protrusion that extrude from the cell body. The highly branched structures can help increase the surface area of a neuron so that it can extensively receives signals transmitted by other neurons. The received signals will then be passed and joined on the soma.

The axon originates from the end of the soma at a tapered region called the axon hillock, where action potentials are generated and transmitted down the axon. The generation of action potentials will be elaborated in the next section. Unlike dendrites, axons can be very long. Almost all neurons have only one axon but there can be a considerable number of branches on the axon terminal. A neuron with thousands of branches on its axon terminal is not uncommon, and these branches usually spread closely to the dendrites or soma of other neurons. Sometimes, the structural features described above are not sufficient to distinguish dendrites from axons, then functional differences should be considered. Axons conduct action potentials outward from the soma while dendrites convert chemical signals into small electrical potentials and then transmit the impulses inward to the soma. Details will be given in the following two sections.

### 1.2 Action potentials

Neurons are surrounded by a membrane which possesses lots of ion channels that allow
some ions to pass through across the membrane. Some important ions involved in electro-
chemical messaging of neuron are: sodium (Na^{+}), potassium (K^{+}), calcium (Ca^{2+}), and
chloride (Cl^{−}). The membrane permeability causes different ion concentrations across
the membrane, we call this difference in charge between the interior and exterior of the
membrane a membrane potential. There are three primary types of ion channels: voltage-
gated channels (typically on the axon hillock and the axon), chemically gated channels
(typically on the dendrites), and leaky channels (by definition are always opened). The
potassium leaky channels make K^{+} cross through the membrane more easily than other
ions. In the cell, there are many negatively charged macromolecules such as nucleic acid,
proteins and ATP, these organic molecules cannot pass through the apolar layer of the
membrane. Therefore, positively charged ions outside the cell will be attracted toward
a negatively charged intracellular environment. Although the ions on both sides of the
membrane try to balance out, it cannot be made due to the following two main reasons:

(1) the membrane is selective, it allows only some kinds of ions to pass through. (2)
sodium-potassium pumps actively transit three Na^{+} out of the cell, and two K^{+} into the
cell simultaneously. Finally, under no external stimulus, when a steady state is reached on
the both sides of the membrane, the voltage difference between them is called the resting
membrane potential. The resting potential of a neuron is about −70 mV on average.

Action potentials are induced by different ions crossing the neuron membrane. First,
an excitatory stimulus makes little sodium channels to open, increasing the membrane
permeability to sodium. If some stimuli integrate and make the membrane potential to
reach about −55 mV, then all the sodium channels will open and a large number of
Na^{+} ions will rush into the cell. We call this critical change level a firing threshold of a
neuron and the electrical change a depolarization of the membrane. As Na^{+}ions rush in,
the membrane potential will move toward 0 mV and then keep going rapidly to about
+40 mV, but it will not exceed +60 mV, the equilibrium potential of Na^{+}. Then the
potassium channels also open and the sodium channels close. More and more K^{+} ions
move out of the cell so that the membrane potential declines, falling to about −80 mV in
less than 2 milliseconds, but it will not exceed −90 mV, the equilibrium potential of K^{+}.
This declining electrical change is called a repolarization of the membrane and is called a
hyperpolarization if it goes beyond the resting potential. Finally, the potassium channels
gradually close and the membrane potential will return to the resting potential slowly.

If the membrane potential does not reach the threshold value, no action potential will occur. On the contrary, when the threshold level is reached, a fixed-size action potential will always generate. This is so-called the ”all or none” principle. Here are two special periods: the absolute refractory period and the relative refractory period. The former comes from when the membrane potential hits +40 mV, all sodium channels are closed and the potassium channels are opend. During this short period of time, no further action potential can occur. The latter comes from when the membrane potential is more negative than at rest. During this period, it needs more stimuli to create another action potential.

### 1.3 Synaptic transmissions

Any behavior control in our body involves the neural network with at least two neurons inside. One neuron passes signals to another neuron through a synapse. Synapses are functional conjunctions between neurons, which is usually formed by the axon terminal of the input neuron (called presynaptic neuron) and the dendrite or soma of the target neuron (called postsynaptic neuron). Synaptic transmission can be accomplished either electrically or chemically. Electrical synapses conduct impulses faster and bidirection- ally but are less modifiable than chemical synapses, which are the most common type of synapse found in the neural system of higher vertebrates. In this section, we will fo- cus mainly on chemical synapse. A typical neuron possesses thousands of synapses. The synapse functions as a basic structure and place for messaging between neurons, integrat- ing nerve information received from different sources and ensuring unidirectional nerve conduction. All of these are essential for effective and efficient communications between neurons.

The structure of a chemical synapse consists of three parts: the presynaptic axon
terminal, the postsynaptic cell and the synaptic cleft between them. The cleft precludes
the possibility of direct transmission of action potentials from the presynaptic membrane
to the postsynaptic membrane. The main function of the presynaptic part is to release
the so-called neurotransmitters, which act as messaging molecules, according to the ac-
tion potentials arrived. The released neurotransmitters then travel (diffuse) across the
synaptic cleft to the postsynaptic cell to alter its membrane potential by binding to some
receptor molecules specifically designate for that neurotransmitters. These can produce
either excitatory or inhibitory stimulations, depending on the properties of the receptor
molecules, i.e., on the types of opened ion channels on the membrane. If the response
results in the activation of sodium channels, making more Na^{+} ions move into the cell
then we call this local depolarization an excitatory postsynaptic potential (EPSP). On
the other hand, if the response results in the activation of chloride or potassium channels,
making more Cl^{−} or K^{+} ions move into or move out of the cell, respectively, then we call
this local hyperpolarization an inhibitory postsynaptic potential (IPSP).

It is worth noting that a single EPSP is not sufficient enough for the postsynaptic membrane to generate an action potential. Also, the density of voltage-gated sodium channels is very low at the dendrites or soma, making it more difficult to do that. In fact, the EPSPs and IPSPs are needed to be passively delivered to the axon hillock, where the density of sodium channels is high, to facilitate the generation of action potentials.

However, in the passive delivery process, the amplitude of those PSPs decreases gradually with the distances being passed. Therefore, some forms of summation are required; in other words, those PSPs need to be added together at the axon hillock to surpass the threshold. The firing threshold plays an important role as a filter so that noisy random signals will not be transmitted so easily. Generally speaking, there are two types of mechanisms for summation: the first one is ”temporal summation” and the other one is

”spatial summation”. Temporal summation means that single presynaptic neuron fires frequently such that the released neurotransmitter can successively act on the postsynap- tic membrane, causing the superposition of potentials. Spatial summation means that

different presynaptic neurons, with their synapses located closely, fire simultaneously or nearly simultaneously so that the PSPs can be superimposed after the passive delivery over an extremely short distance.

In fact, not only the EPSP but the IPSP can take part in the summation process.

The IPSPs will cancel out the EPSPs, making the membrane potential farther below the threshold, so that the initiation of an action potential in the postsynaptic neuron can be prevented. The firing bebavior of a neuron usually depends on the stimuli from many different input neurons. Inhibition, in general, plays an important role of regulating the excitatory stimuli for the postsynaptic neuron to fire. Neurons add up these EPSPs and IPSPs constantly in time and over different synapses, the net voltage at the axon hillock determines whether an action potential should be initiated. All of the signals, both excitatory and inhibitory, are transmitted from the input neurons and then integrated at the axon hillock of the target neuron. This process of gathering information and then making decision of neuron is called synaptic integration, which is so crucial for the neurons to perform certain meaningful coordinated neuronal activities in the brain.

## Chapter 2

## Effects of Spike Sorting Error

### 2.1 Introduction

In neuroscience research, it is important to identify information flow among multiple neurons in the brain, according to the recorded neural activity data. A powerful method for achieving this is the Granger causality (GC) which arose in economics after being introduced by Wiener and Granger [27, 28, 69]. The GC is a time series inference (TSI) type of method, proposes that if the prediction of one time series can be improved with the knowledge of a second time series, then there is a causal influence from the second time series to the first. This prediction is made by using the vector autoregressive (VAR) model. In this model, if the variance of the prediction error of one time series at the present time can be reduced by including the past values of another series, then the latter is said to Granger-cause the former. This causality can be quantified by the so- called GC Index (GCI) which can be used to determine whether there is any causal interaction between time series. The GC was shown to be effective and has been widely deployed in recent neuroscience research [7, 12–14, 71]. In addition to the time domain GC, other versions of the GC (e.g., frequency, and time-frequency domain) have been developed as well [3, 18]. The time domain formulation of GCI is briefly introduced in the next section, and we refer the reader to an article by [8] for more details about the GC.

Neurons emit action potentials (APs) that are known as spikes and play an impor- tant role in communicating among cells. The temporal sequence of APs produced by a neuron, which shows its own activity, is also known as a spike train. In multi-channel recordings [9], the APs of neurons are detected and differentiated from background electri- cal noise before single-unit spike trains are used to probe neural behaviors. This technical procedure is called spike sorting. However, it is not easy to obtain spike train data that fully agree with the AP because of noise, superimposed APs, and difficulties of differen- tiating waveforms of APs from different neurons. Spike sorting often introduces unavoid- able errors [19, 43]. These errors can roughly be divided into two types, false positives (FPs) and false negatives (FNs). An FP means an error detection of an event that is not a real spike (just an electrical noise) or is a spike from another neuron. Conversely, an FN means that real spikes were not detected or were classified into groups of other neu- rons. One may be interested in the question: ”How FPs and FNs affect the estimation of

functional connectivity among neurons?”. This study answered this question analytically and also via numerical simulations. The change in the GCI due to spike sorting errors was derived analytically to form an explicit formula, and a direct discussion of the effects of FPs and FNs is possible. Moreover, numerical simulations were used to verify the analyses. We constructed three types of models for sorting errors: those with uniform, random, and concentrative distributions. That is, errors occur uniformly, randomly, and concentratively in spike trains. Changes in the GCI were computed as these types of spike sorting errors were artificially added to the simulated spike trains, and the effects on the directional interactions were also investigated.

Finally, it is worth noting that spike trains are non-equally spaced data and are regarded as a point process. Interpolation or filtering is usually employed to convert point processes to equally spaced time series. Previous studies on spike trains [37, 75]

proposed several methods to convert a time series from being non-equally spaced to equally spaced. This study adopted the procedure of binning to convert spike trains into time series data, which are suitable for GC analyses. Although the GCI between two point processes has been directly defined in [38] recently, we still cannot abandon binning because it reduces the complexity of analysis, and considers also the effect of temporal summation of action potentials in the neuroscience.

The remanider of this chapter is organized as follows. Section 2 presents an analytic formula based on a first order autoregression to show how error processes affect the GCI.

Section 3 presents some models for sorting errors and probes the proposed formula further via numerical simulations. Section 4 presents a real data evaluation where the effects of sorting error on the GCI are evaluated using real experimental data. Section 5 provides some suggestions for spike sorting and the discussion.

We remark that the material of this chapter has been published in [61].

### 2.2 Modeling and analysis

Based on a first order autoregression, we derived an explicit formula for changes in the GCI in terms of four parameters involving the error process. We also investigated the influences of various types of errors on the GCI indicated by the proposed formula.

### 2.2.1 A short introduction to the GCI

Let x and y be two stationary time series with zero means. The first order linear autore- gressive model for x and y is given by

x(n) y(n)

= A x(n − 1) y(n − 1)

+

(n) η(n)

, (2.1)

where A is the model coefficient matrix, and the residuals and η are zero-mean uncor- related white noises with covariance matrix Σ. Here the variances V ar() and V ar(η) are called prediction errors, which measure the accuracy of the autoregressive prediction.

More specifically, V ar(η) measures the accuracy of the prediction of y(n) based on the previous values x(n − 1) and y(n − 1).

Now consider the reduced model that excludes the time series variable x

y(n) = B y(n − 1) + ζ(n), (2.2)

where B is the corresponding model coefficient. The variance V ar(ζ) measures the ac- curacy of the prediction of y(n) based only on its previous value y(n − 1). For η in (2.1) and ζ in (2.2), if V ar(η) is significantly less than V ar(ζ) in some statistical sense, then we say that x Granger-cause y. This causality can be quantified by the GCI from x to y formulated as:

F_{x→y} = lnV ar(ζ)

V ar(η). (2.3)

It is clear that F_{x→y} = 0 when V ar(η) = V ar(ζ), i.e., x has no causal influence on y,
and F_{x→y} > 0 when x Granger-cause y. Notice that F_{x→y} is nonnegative, i.e., V ar(η)
is bounded above by V ar(ζ), since the full model defined in (2.1) should have a better
prediction ability than the reduced model defined in (2.2). Finally, we note that the GCI
values should be checked for significance by using hypothesis testing, and more details of
the GCI can be found in [20, 27, 28].

### 2.2.2 An explicit formula

When inaccurate spike sorting occurs, the sorting errors can be regarded as a perturbed error process. For simplicity, we assume that only the source process x has a sorting error and the corresponding error process is denoted by δx. We can assume that δx is zero mean and the model in (2.1) is perturbed as follows when δx is superposed on x:

{x + δx}(n) y(n)

= ˜A {x + δx}(n − 1) y(n − 1)

+

˜(n)

˜ η(n)

, (2.4)

where ˜A is the corresponding model coefficient matrix, and the residuals ˜ and ˜η have
the covariance matrix ˜Σ. Let S_{y} := V ar(ζ), S := V ar(η), and ˜S := V ar(˜η). Since the
perturbed quantity δx is superposed only on x, the reduced models for (2.1) and (2.4)
are the same as (2.2). Then the original GCI from x to y and the perturbed GCI from
x + δx to y are

F = lnS_{y}

S and F = ln˜ S_{y}

S˜, (2.5)

respectively. To investigate the perturbed GCI, we derived an explicit formula for ˜F
in terms of four parameters involving δx which are ξ_{1} := E δx^{2}_{1}, ξ2 := E x_{1}δx_{1},
ξ_{3} := E y_{2}δx_{1}, and ξ_{4} := E y_{1}δx_{1}. Further denote X_{0} = E x^{2}_{1}, Y_{0} = E y_{1}^{2},
Y_{1} = E y_{1}y_{2}, Z_{1} = E x_{1}y_{1}, and Z_{2} = E x_{1}y_{2}. We are now ready to present the
formula for ˜F .

Proposition 1. In the situation described above, ˜F can be presented explicitly by the following formula (for calculation see Appendix):

F = ln˜ Sy

S + Θ, Θ = S_{y} − SI, (2.6)

where

I = 1

Y_{0} X_{0}+ ξ_{1}+ 2ξ_{2} − ξ4+ Z_{1}2

×n

Y_{0} X_{0}+ ξ_{1}+ 2ξ_{2} − _{S}^{1}

y−SY_{0} ξ_{3}+ Z_{2}2

+ Y_{0}− S

ξ_{4}+ Z_{1}2

− 2Y_{1} ξ_{3}+ Z_{2}

ξ_{4}+ Z_{1}o
.

(2.7)

Note that since S + Θ in (2.6) is bounded above by S_{y}, we have that Θ is upper bounded
by S_{y} − S, i.e., I has an upper bound 1.

We end this subsection by the following two remarks.

Remark 1. In the same situation of Proposition 1, the following inequalities hold:

Y_{0} ≥ S_{y} ≥ S and Y_{1} ≤ 0. (2.8)

According to (2.2), we have Y_{0} = V ar(y_{1}) ≥ V ar(ζ) = S_{y}. The remainder S_{y} ≥ S just
follows by the reason that the prediction error of the reduced model in (2.4) is always less
than or equal to that of the full model in (2.1). The latter holds because of the stationary
assumption. If Y_{1} = E(y_{1}y_{2}) > 0, then y will not be stationary. Thus Y_{1} ≤ 0.

Remark 2. The following result can be obtained easily by using (2.5) and (2.6).

F˜ > F, if I < 0.

F˜ = F, if I = 0.

F˜ < F, if 0 < I < 1.

F˜ = 0, if I = 1.

(2.9)

### 2.2.3 Essential GCI factors

The formula defined by (2.6) and (2.7) is complicated but it reveals the intrinsic property
of the perturbed GCI, depending on the four factors ξ_{1}, ξ_{2}, ξ_{3} and ξ_{4} which, by the
definition, capture the main properties of error signals, namely, the spike sorting errors.

A systematic characterization of these parameters’ influences on the GCI would provide a heuristic understanding of the effect of spike sorting error for researchers to make further decisions. Thus, in this subsection we discuss Proposition 1 more by a total of five different situations and relate each of them to a biological meaning.

We now present the following corollaries for investigating the term I defined in (2.7), and this is equivalent to investigating the term ˜F in (2.6).

Corollary 1 (ξ_{2} = ξ_{3} = ξ_{4} = 0). We start with the simple case in which the error process
δx is uncorrelated with the underlying processes x and y, i.e., ξ_{2} = ξ_{3} = ξ_{4} = 0. In this
case I can be simplified as follows:

I(ξ1) = Y_{0}ξ_{1}

Y_{0}ξ_{1}+ [X_{0}Y_{0}− Z_{1}^{2}]. (2.10)
The above equation shows that I increases as ξ1 increases, but it is bounded above by the
limit L := lim_{ξ}_{1}→∞I(ξ_{1}) = 1. Hence, by (2.6), the weakened GCI ˜F is bounded below
by lim_{ξ}_{1}→∞F (ξ˜ _{1}) = ln_{S+(S}^{S}^{y}

y−S)L = ln^{S}_{S}^{y}

y = 0. In reality, this limit cannot be attained
because the variance ξ_{1} = E δx^{2}_{1} cannot approach infinity. Thus, the GCI will never
vanish if the error process δx is uncorrelated with the underlying process x. We refer to
this corollary as FPs being composed of electrical noises or the spikes of other uncon-
nected neurons during spike sorting.

Corollary 2 (ξ_{2} < 0, ξ_{3} < 0, ξ_{4} < 0). Suppose the error process δx is negatively
correlated with the underlying processes x and y, i.e., ξ_{2}, ξ_{3}, and ξ_{4} are all negative. Since
the perturbed quantity δx is considered to be produced from inaccurate spike sorting,
the maximal negative quantity which δx can be is −x. Therefore, we have the following
constraint:

−X_{0} ≤ ξ_{2} < 0, −Z_{2} ≤ ξ_{3} < 0, −Z_{1} ≤ ξ_{4} < 0, (2.11)
and the equalities are attained when δx = −x.

Now, according to (2.7), (2.8), and (2.11), I is positive, increasing, and bounded
above by 1. In other words, Θ → S_{y} − S and ˜F → 0 as ξ_{1} → ∞ or ξ_{2}, ξ_{3}, ξ_{4}

→

− X_{0}, −Z_{2}, −Z_{1}. This corollary is related to FNs in spike sorting.

Corollary 3 (ξ_{2} < 0, ξ_{3} < 0, ξ_{4} < 0 simplified). Suppose δx is correlated with the
underlying processes as in Corollary 2. If y is further completely induced by x, i.e., y
cannot explain itself (B = 0 in (2.2)), then we obtain S_{y} = Y_{0}, Y_{1} = 0 and (2.7) can be
further simplified as:

I = Y_{0} X_{0}+ ξ_{1}+ 2ξ_{2} − ξ_{4}+ Z_{1}2

−_{S}^{Y}^{0}

y−S ξ_{3}+ Z_{2}2

Y0 X0+ ξ1+ 2ξ2 − ξ4+ Z1

2 . (2.12)

Equation (2.12) still shows us that I → 1 as ξ_{1} → ∞. On the other hand, if ξ_{1} is
fixed and the negative correlation between x and δx increases (i.e., ξ2, ξ3, ξ4 decreases
simultaneously to − X_{0}, −Z_{2}, −Z_{1}), then the value I increases to 1 because of the
quadratic convergence:

ξ_{3}+ Z_{2}2

→ 0. (2.13)

Note that when ξ_{2}, ξ_{3}, ξ_{4} attains the lower bound − X_{0}, −Z_{2}, −Z_{1}, i.e., δx = −x,
there is nothing left to analyze. Therefore, we do not consider this case.

Corollary 4 (ξ_{2} = 0, ξ_{3} > 0, ξ_{4} > 0). Suppose the error process δx is positively corre-
lated with y, and is uncorrelated with x, i.e., ξ_{2} = 0, ξ_{3} > 0, and ξ_{4} > 0. Since I = 0
when ξ_{1} = ξ_{2} = ξ_{3} = ξ_{4} = 0, it is easy to conclude that, in this case, I is negative and
decreasing, i.e., ˜F > F and ˜F increases as ξ_{3} or ξ_{4} increase. We refer to this corollary as
FPs being composed of spikes of some connected neurons which are positively correlated
with the target neuron during spike sorting.

Corollary 5 (ξ_{2} > 0, ξ_{3} > 0, ξ_{4} > 0). Suppose the error process δx is positively correlated
with both x and y, i.e., ξ_{2} > 0, ξ_{3} > 0, and ξ_{4} > 0. Since I is increasing in ξ_{2}, we know
that ˜F is then decreasing in ξ_{2}. Therefore, ˜F in this case exhibits the same behavior as
that in Corollary 4 if ξ_{3}, ξ_{4} dominates ξ2; but ˜F is decreasing if ξ_{2}dominates ξ_{3}, ξ_{4}. We
refer to this corollary as FPs being composed of spikes of some connected neurons which
are positively correlated with both of source and target neurons during spike sorting.

The results of above five corollaries are schematically summarized in Table 2.1.

Table 2.1: A schematic summary of Corollaries 1–5. The factors ξi, i = 1, 2, 3, 4 and error process δx are described in Section 2.2. + (−) stands for positive (negative) sign. FP (FN) stands for false positive (negative). ↑ (↓) stands for increasing (decreasing).

ξ_{2} ξ_{3} ξ_{4} Effects on the GCI Type Interpretations on δx
0 0 0 GCI ↓ as ξ_{1} ↑ FP Electrical noises or

spikes of unconnected neurons.

− − − GCI ↓ as ξ_{1} ↑ FN Spike missing.

0 + + GCI ↑ as ξ_{3} ↑ or ξ_{4} ↑ FP Connected neurons which are
positively correlated with
the target neuron.

+ + + GCI ↑ if (ξ_{3},ξ_{4}) dominates ξ_{2}
GCI ↓ if ξ2 dominates (ξ3,ξ4)

FP Connected neurons which are positively correlated with

both of source and target neurons.

### 2.2.4 GCI vs. variance reduction

Because 0 ≤ S ≤ S_{y}, we set S = (1 − k)S_{y} with 0 ≤ k ≤ 1 and the GCI in (2.5) becomes
F = ln S_{y}

(1 − k)S_{y} = − ln(1 − k). (2.14)

Next, relate F and k through k = 1 − exp(−F ). Since Sy − S = kSy, k = Sy − S/Sy

represents the percentage of variance reduction. More precisely, it represents the relative decrease in prediction errors from the reduced model (2.2) to the full model (2.1). For example, k = 0 means no (0%) variance reduction, and the GCI is equal to zero. On the contrary, k = 1 means a total (100%) variance reduction, and the GCI is equal to infinity. Figure 2.1 shows how the GCI relates to k, and it is almost totally reduced when the GCI is equal to 5.

0 1 2 3 4 5

020406080100

**GCI vs. the percentage of variance reduction**

GCI

k x 100 (%)

Figure 2.1: The relationship between the Granger Causality Index (GCI) and k. This transforms GCI values to the corresponding percentage of variance reduction in the resid- ual noise process.

### 2.3 Simulation study

Here, we present some models for sorting errors and further probe formula (2.7) via nu- merical simulations. For the FP case, we first consider that FPs are made up of electrical noises or spikes of unconnected neurons. Spikes of connected neurons are considered in Section 2.3.4.

### 2.3.1 Models

We first construct a point process X = p_{1}, p_{2}, . . . , p_{N} generated by a Poisson process
with rate λ in the time interval [0, T ], where N is the total number of points. The
second point process Y =q_{1}, q_{2}, . . . , q_{N} is then generated by Y = X + N m, σ, where
N m, σ is a normal random variable with mean m and standard deviation σ. More
precisely, qi = pi+ N m, σ, i = 1, . . . , N . The point process Y presents a time lag m
with respect to X if m > 0 and σ = 0. This study only considers m > 0 and σ > 0.

Spike sorting errors include two types: fake (FP) and missing (FN) spike events. The fake spike event is an erroneous detection of an event that is not a real spike or is a spike from another neuron. Conversely, a missing spike event means that spikes were not detected or were classified into groups of other neurons. Therefore, the fake spike case can be regarded as adding extra points to a point process. This type of error is denoted by ”A-type”. The missing spike case can be regarded as removing some points from a point process, and is denoted by ”R-type”. Forms of the addition or removal of points are considered for uniform, random, and concentrative distributions. The number of fake or missing spike events is rN , where r(= p%) is the ratio of fake or missing points compared to the original spike train. The following explains the generation of these types in detail.

Uniform-partition addition (PA) model: The extra points ˜p_{i} : 1 ≤ i ≤ rN
on the time interval [0, T ] form a uniform partition of [0, T ]. More precisely, ˜pi =

i∆t, where ∆t = _{rN −1}^{T} .

Random-uniform addition (UA) model: The extra points ˜p_{i}, i = 1, . . . , rN ,
on the time interval are generated from a uniform distribution U [0, T ] random
variable.

Random-normal addition (NA) model: The extra points ˜p_{i}, i = 1, . . . , rN , on
the time interval are generated from a normal distribution N T /2, σ_{N A} random
variable. The standard deviation parameter σ_{N A} represents different degrees of
concentration.

Uniform-partition removal (PR) model: A set of reference points ˜p_{j} : 1 ≤
j ≤ rN , which is a uniform partition of [0, T ] was used to remove spike events
fromp_{i} : 1 ≤ i ≤ N . First, fix ˜p_{j} and then remove the point that is closest to ˜p_{j}.
Random-uniform removal (UR) model: Points in p_{i} : 1 ≤ i ≤ N

are
randomly removed using a discrete uniform U_{d} 1, N random variable, i.e., all points
have the same probability of being removed.

Middle-succession removal (SR) model: In this model, the removed points are successive and located near the center T /2 of the time interval [0, T ]. The number of spikes is rN .

We note that sorting errors only occur in the source process X, and Y is assumed to be inerrable.

### 2.3.2 Setup

Set the parameters, λ = 2, T = 100, m = 0.1 for the rate of the Poisson process, the total
time, and the time lag, respectively. To apply autoregressive modeling, we convert point
processes p_{i} : 1 ≤ i ≤ N and q_{i} : 1 ≤ i ≤ N to time series through the procedure
of binning with the bin size as the time lag m. Results are obtained from the average
of 100 simulations for each random case at fixed error percentage r. To investigate the
effects of errors on the GCI, we observe the results from various r, σ, and σ_{N A}.

### 2.3.3 Simulation results

We note that the simulations and the corresponding results are related to Corollaries 1, 2, and 3 of Section 2.2.3.

A-type vs. R-type

Figure 2.2(a) shows the results for experiment 1 in which σ = 0.02, σN A = T /8, and the error percentage increased by 0.1. In this figure, the GCI of the SR model decreases the fastest among all of the models as r increases. All R-type errors cause information loss and greatly weaken the GCI when the error percentage increases. If r = 1, the underlying signal is totally destroyed by the errors, and the causality is undetermined. The PA model produces the largest GCI for a fixed r. Figure 2.2(a) also shows that A-Type GCIs are greater than R-Type GCIs. However, this phenomenon is not always valid. The error

in the NA model is uncorrelated with the signal and weakens the GCI more than any of
the R-Type models when σ_{N A} is small. Figure 2.2(b) shows that a highly non-stationary
process can average out much more underlying causality than the others, for the case of
σ_{N A} = T /64 in the NA model.

●

●

●

●

●

●

● ●

● ●

0.0 0.2 0.4 0.6 0.8

0.00.51.01.5

**(a) Experiment 1**

Error Percentage r

GCI

●

●

●

●

●

●

●

●

●

●

●

●

PA model UA model NA model PR model UR model SR model

●

●

●

●

●

●

● ●

● ●

0.0 0.2 0.4 0.6 0.8

0.00.51.01.5

**(b) Concentrative Normal**

Error Percentage r

GCI

●

●

●

●

●

●

●

●

●

●

●

●

PA model UA model NA model PR model UR model SR model

●

●

●

●

●

● ●

● ● ●

0.0 0.2 0.4 0.6 0.8

0.00.20.40.60.81.0

**(c) Experiment 2**

Error Percentage r

GCI

●

●

●

●

●

●

●

●

●

●

●

●

PA model UA model NA model PR model UR model SR model

●

●

●

●

●

● ●

● ●

●

0.0 0.2 0.4 0.6 0.8

0.00.10.20.30.40.5

**(d) Experiment 3**

Error Percentage r

GCI

●

●

●

●

●

●

●

●

●

●

●

●

PA model UA model NA model PR model UR model SR model

Figure 2.2: (a) Simulation results of Experiment 1, in which the parameter settings were
λ = 2, T = 100, m = 0.1, σ = 0.02. (b) Simulation results of a much more concentrative
normal, in which the parameter settings were λ = 2, T = 100, m = 0.1, σ = 0.02, and
σ_{N A} = T /64. (c) Simulation results of Experiment 2, in which the parameter settings
were λ = 2, T = 100, m = 0.1, σ = 0.04. (d) Simulation results of Experiment 3, in which
the parameter settings were λ = 2, T = 100, m = 0.1, σ = 0.06. The error percentages r
were all 0 ∼ 0.9, and increased by 0.1.

Standard deviation factor σ

We now investigate the effects of the standard deviation σ on GCIs. Figure 2.2(c) and 2.2(d) show the results for experiments 2 and 3 in which parameter σ is 0.04 and 0.06, respectively. Observations in Figure 2.2(a), 2.2(c), and 2.2(d) show that the behaviors of the profiles with σ = 0.04, 0.06 closely resemble the behavior of the profile with σ = 0.02.

In the PA and PR models, let F denote the GCI value without a sorting error (r = 0), and F (r) denote the GCI value with sorting error of error percentage r. Table 2.2 presents˜ the relative errors of the GCI for r from 0.1 to 0.9 of the PA and PR models in the three experiments, where

Relative error(r) :=

F (r) − F˜

F × 100%.

Table 2.2 shows that Experiment 3 had the flattest PA-curve and PR-curve of the three
experiments, and the underlying causality of the PA models were all around 50% off at
r = 0.9 because the signal to noise ratio SN R = _{V ar(δx)}^{V ar(x)} was close to 1. Although
they were all around 50% off, the corresponding decreases in the percentage of variance
reduction are greatly differed, and this can be seen from Figure 2.1. In the PA model,
the three F values corresponding to the three experiments were 1.24, 0.64, and 0.38, and
the corresponding ˜F (0.9) were 0.71, 0.33, and 0.18, respectively. Equation (2.14) can
compute the corresponding relative decreases in the percentage of variance reduction,
which were 28.17%, 40.89%, and 46.85%, respectively.

Table 2.2: Relative errors, ^{F (r)−F}^{˜} _{F} × 100%, of GCI for r = 0.1 ∼ 0.9 of the PA and PR
models in the three experiments.

Relative errors (%) of the PA model

Experiment r = 0.1 r = 0.2 r = 0.3 r = 0.4 r = 0.5 r = 0.6 r = 0.7 r = 0.8 r = 0.9 1(σ = 0.02) -16.46 -27.24 -35.29 -41.48 -46.26 -50.26 -53.43 -55.78 -57.60 2(σ = 0.04) -12.27 -21.87 -29.41 -34.36 -39.84 -42.63 -46.28 -48.93 -51.20 3(σ = 0.06) -11.59 -19.78 -26.99 -32.13 -36.63 -40.89 -43.42 -46.29 -48.41

Relative errors (%) of the PR model

Experiment r = 0.1 r = 0.2 r = 0.3 r = 0.4 r = 0.5 r = 0.6 r = 0.7 r = 0.8 r = 0.9 1(σ = 0.02) -17.10 -30.16 -41.89 -51.31 -59.78 -68.60 -75.86 -84.02 -92.07 2(σ = 0.04) -12.64 -22.71 -32.74 -41.82 -50.63 -59.24 -68.72 -78.04 -88.96 3(σ = 0.06) -11.58 -19.21 -27.60 -37.45 -47.44 -54.15 -65.07 -75.34 -87.68

Explanation of the GCI curves

We now discuss the behaviors of the curves in Figure 2.2(a). Because A-type error
processes (δx) are uncorrelated with the underlying process (x), curves of the PA, UA,
and NA models can be analyzed and directly explained by (2.6) and (2.10). These three
curves decrease as the size of the errors (ξ_{1}) increases. They reach zero only when the
error size is infinity, which is actually unfeasible. Therefore, these curves slowly decrease
and never reach zero. In other words, the underlying causality remains.

Moreover, the PA-curve is above the UA-curve, and the UA-curve is above the NA-
curve. This is because the NA model has the largest error size, followed by the UA model
and the PA model at the same error percentage. On the other hand, error processes of
the PR, UR, and SR models are correlated with the underlying process, and (2.12) is
used instead of (2.10) since ξ_{2}, ξ_{3}, and ξ_{4} are nonzero.

With approximately the same error size, these curves decrease much more quickly than the uncorrelated case because of the negative correlations. Unlike the former, these curves almost reach zero at the error percentage r = 0.9. In addition, the PR-curve is above the UR-curve, and the UR-curve is above the SR-curve. This is because the SR model has the largest error size, followed by the UR and PR models at the same error percentage.

We now discuss the behaviors of the curves in Figure 2.2(b), which shows that A-
type errors are not always better than R-type errors. GCI values of the NA model
with σ_{N A} = T /64 are smaller than those of the SR model when the error percentage is
between r = 0.2 and r = 0.6. To see what occurred, we computed ξ_{1} to ξ_{4} at each error
percentage for these two cases, and results are respectively shown in Figure 2.3(a) and
2.3(b). Comparing these two panels indicates that (i) ξ_{1} of the NA model is much larger
than that of the SR model. According to (2.12), the GCI of the NA model is smaller than
that of the SR model. (ii) The SR model has large ξ_{2} and ξ_{3}. According to (2.13), the
GCI of the SR model is significantly smaller than that of the NA model when r is quite
large, even though the ξ_{1} of the SR model is still relatively smaller than that of the NA
model. Figure 2.2(b) verifies this analysis showing that the error size of the NA model
dominates the negative correlation of the SR model when the error percentage is between
r = 0.2 and r = 0.6. Contrarily, the negative correlation of the SR model dominates
the error size of the NA model when r > 0.6. Therefore, it is necessary to integrate the
error size and the negative correlation to identify the behavior of the GCI, and the reason
is that the error size of the NA model is much larger than that of the SR model when
0 ≤ r ≤ 0.6.

### 2.3.4 Supplementary simulations of FPs

We now present supplementary simulations of FPs by considering the case when FPs are composed of connected neurons. There are no supplementary simulations for FNs since they have only one situation as discussed in Corollary 2, and they were simulated in preceding work. The simulations here are devoted to Corollaries 4 and 5 of Section 2.2.3.

The following explain the simulations in detail and the model parameters are fixed at λ = 2, T = 100, m = 0.1, and σ = 0.02.

Experiment A: Suppose X and Z are two independent Poisson processes with equal rate λ in [0, T ], where N is the total number of points of X. Let Y be another point process generated by Y = Z + N m, σ. Hence, we have that Y is induced by Z, and X is independent of both Y and Z. Then rN points of Z are randomly added to X, where r denotes the error percentage. After binning with bin size m, the GCI from X to Y as a function of r and the corresponding ξ’s are shown in Figure 2.4(a) and 2.4(b). This experiment considers the situation that