In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information

(1)

Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are encouraged to visit:

http://www.elsevier.com/authorsrights

(2)

Contents lists available atSciVerse ScienceDirect

Neural Networks

journal homepage:www.elsevier.com/locate/neunet

Effects of spike sorting error on the Granger causality index

Pei-Chiang Shao

^a

, Wan-Ting Tseng

^b

, Chung-Chih Kuo

^c

, Wei-Chang Shann

^a

, Meng-Li Tsai

^d

, Chien-Chang Yen

^e,^∗

aDepartment of Mathematics, National Central University, Jhongli 32001, Taiwan

bInstitute of Zoology, National Taiwan University, Taipei 10617, Taiwan

cDepartment of Physiology, School of Medicine, College of Medicine, Tzu Chi University, Hualien 97004, Taiwan

dDepartment of Biomechatronic Engineering, National Ilan University, Ilan 26047, Taiwan

eDepartment of Mathematics, Fu-Jen Catholic University, Xinzhuang 24205, Taiwan

a r t i c l e i n f o

Article history:

Received 8 January 2013

Received in revised form 21 May 2013 Accepted 4 June 2013

Keywords:

Granger causality index Spike sorting

Vector autoregressive model

a b s t r a c t

Accurately sorting individual neurons is a technical challenge and plays an important role in identifying information flow among neurons. Spike sorting errors are almost unavoidable and can roughly be divided into two types: false positives (FPs) and false negatives (FNs). This study investigates how FPs and FNs affect results of the Granger causality (GC) analysis, a powerful method for detecting causal interactions between time series signals. We derived an explicit formula based on a first order vector autoregressive model to analytically study the effects of FPs and FNs. The proposed formula was able to reveal the intrinsic properties of the GC, and was verified by simulation studies. The effects of FPs and FNs were further evaluated using real experimental data from the ventroposterior medial nucleus of the thalamus. Some practical suggestions for spike sorting are also provided in this paper.

1. Introduction

In neuroscience research, it is important to identify information flow among multiple neurons in the brain, according to the recorded neural activity data. A powerful method for achieving this is the Granger causality (GC) which arose in economics after being introduced by Wiener and Granger (Granger,1969,1980;

Wiener, 1956). The GC is a time series inference (TSI) type of method, proposes that if the prediction of one time series can be improved with the knowledge of a second time series, then there is a causal influence from the second time series to the first. This prediction is made by using the vector autoregressive (VAR) model.

In this model, if the variance of the prediction error of one time series at the present time can be reduced by including the past values of another series, then the latter is said to Granger-cause the former. This causality can be quantified by the so-called GC Index (GCI) which can be used to determine whether there is any causal interaction between time series. The GC was shown to be effective and has been widely deployed in recent neuroscience research (Bressler, Richter, Chen, & Ding, 2007;Cadotte, DeMarse, He, & Ding, 2008;Cadotte et al.,2010;Cao, Maran, Dhamala, Jaeger,

& Heck, 2012;Zhang et al.,2012). In addition to the time domain GC, other versions of the GC (e.g., frequency, and time–frequency

∗Corresponding author. Tel.: +886 2 2905 3547.

E-mail address:[email protected](C.-C. Yen).

domain) have been developed as well (Baccala & Sameshima, 2001;

Dhamala, Rangarajan, & Ding, 2008). The time domain formulation of GCI is briefly introduced in the next section, and we refer the reader to an article byBressler and Seth(2011) for more details about the GC.

Neurons emit action potentials (APs) that are known as spikes and play an important role in communicating among cells. The temporal sequence of APs produced by a neuron, which shows its own activity, is also known as a spike train. In multi-channel recordings (Brown, Kass, & Mitra, 2004), the APS of neurons are detected and differentiated from background electrical noise before single-unit spike trains are used to probe neural behaviors.

This technical procedure is called spike sorting. However, it is not easy to obtain spike train data that fully agree with the AP because of noise, superimposed APs, and difficulties of differentiating waveforms of APs from different neurons. Spike sorting often introduces unavoidable errors (Deborah, Won, & Patrick, 2003;

Lewicki,1998). These errors can roughly be divided into two types, false positives (FPs) and false negatives (FNs). An FP means an error detection of an event that is not a real spike (just an electrical noise) or is a spike from another neuron. Conversely, an FN means that real spikes were not detected or were classified into groups of other neurons. One may be interested in the question: ‘‘How do FPs and FNs affect the estimation of functional connectivity among neurons?’’. This study answered this question analytically and also via numerical simulations. The change in the GCI due to spike sorting errors was derived analytically to form an explicit formula,

http://dx.doi.org/10.1016/j.neunet.2013.06.001

(3)

and a direct discussion of the effects of FPs and FNs is possible.

Moreover, numerical simulations were used to verify the analyses.

We constructed three types of models for sorting errors: those with uniform, random, and concentrative distributions. That is, errors occur uniformly, randomly, and concentratively in spike trains.

Changes in the GCI were computed as these types of spike sorting errors were artificially added to the simulated spike trains, and the effects on the directional interactions were also investigated.

Finally, it is worth noting that spike trains are non-equally spaced data and are regarded as a point process. Interpolation or filtering is usually employed to convert point processes to equally spaced time series. Previous studies on spike trains (Kaminski, Ding, Truccolo, & Bressler, 2001; Zhu, Lai, Hoppensteadt, & He, 2003) proposed several methods to convert a time series from being non-equally spaced to equally spaced. This study adopted the procedure of binning to convert spike trains into time series data, which are suitable for GC analyses. Although the GCI between two point processes has been directly defined inKim, Putrino, Ghosh, and Brown(2011) recently, we still cannot abandon binning because it reduces the complexity of analysis, and considers also the effect of temporal summation of action potentials in the neuroscience.

This article is organized as follows. Section2presents an analytic formula based on a first order autoregression to show how error processes affect the GCI. Section3presents some models for sorting errors and probes the proposed formula further via numerical simulations. Section4presents a real data evaluation where the effects of sorting error on the GCI are evaluated using real experimental data. Section5provides some suggestions for spike sorting and the discussion.

2. Modeling and analysis

Based on a first order autoregression, we derived an explicit formula for changes in the GCI in terms of four parameters involving the error process. We also investigated the influences of various types of errors on the GCI indicated by the proposed formula.

2.1. A short introduction to the GCI

Let x and y be two stationary time series with zero means. The first order linear autoregressive model for x and y is given by



x

(

ⁿ

)

y

(

ⁿ

)



=

A



x

(

ⁿ

−

1

)

y

(

ⁿ

−

1

)

 +

 ϵ(

ⁿ

) η(

ⁿ

)



,

⁽¹⁾

where A is the model coefficient matrix, and the residuals

ϵ

^and

η

are zero-mean uncorrelated white noises with covariance matrix 6. Here the variances Var

(ϵ)

^{and Var}

(η)

are called prediction errors, which measure the accuracy of the autoregressive prediction.

More specifically, Var

(η)

measures the accuracy of the prediction of y

(

ⁿ

)

based on the previous values x

(

ⁿ

−

1

)

^{and y}

(

ⁿ

−

1

)

^.

Now consider the reduced model that excludes the time series variable x

y

(

ⁿ

) =

^{B y}

(

ⁿ

−

1

) + ζ(

ⁿ

),

⁽²⁾

where B is the corresponding model coefficient. The variance Var

(ζ )

measures the accuracy of the prediction of y

(

ⁿ

)

^{based only} on its previous value y

(

ⁿ

−

1

)

^{. For}

η

ⁱⁿ⁽¹⁾^and

ζ

ⁱⁿ(2), if Var

(η)

^is significantly less than Var

(ζ )

in some statistical sense, then we say that x Granger-cause y. This causality can be quantified by the GCI from x to y formulated as:

F_x→y

=

_ln^Var

(ζ)

Var

(η) .

⁽³⁾

It is clear that F_x→y

=

_{0 when Var}

(η) =

^Var

(ζ )

, i.e., x has no causal influence on y, and F_x→y

>

0 when x Granger-cause y. Notice that F_x→yis nonnegative, i.e., Var

(η)

is bounded above by Var

(ζ )

^{, since} the full model defined in(1)should have a better prediction ability than the reduced model defined in(2). Finally, we note that the GCI values should be checked for significance by using hypothesis testing, and more details of the GCI can be found inDing, Chen, and Bressler(2006),Granger (1969,1980).

2.2. An explicit formula

When inaccurate spike sorting occurs, the sorting errors can be regarded as a perturbed error process. For simplicity, we assume that only the source process x has a sorting error and the corresponding error process is denoted by

δ

x. We can assume that

δ

x is zero mean and the model in(1)is perturbed as follows when

δ

x is superposed on x:

{

x

+ δ

^x

} (

ⁿ

)

y

(

ⁿ

)



= ˜

A

{

x

+ δ

^x

} (

ⁿ

−

1

)

y

(

ⁿ

−

₁

)

 +  ˜ ϵ(

ⁿ

)

η( ˜

ⁿ

)



,

⁽⁴⁾

whereA is the corresponding model coefficient matrix, and the

˜

residuals

ϵ ˜

^and

η ˜

have the covariance matrix₆

˜

_{. Let S}

y

:=

Var

(ζ ),

S

:=

Var

(η)

^{, and}_S

˜ :=

Var

(˜η)

. Since the perturbed quantity

δ

^{x is} superposed only on x, the reduced models for(1)and(4)are the same as(2). Then the original GCI from x to y and the perturbed GCI from x

+ δ

^{x to y are}

F

=

lnS_y

S and _F

˜ =

lnS_y

S

˜ ,

⁽⁵⁾

respectively. To investigate the perturbed GCI, we derived an explicit formula forF in terms of four parameters involving

˜ δ

^x which are

ξ

1

:=

E



δ

^x²1

 , ξ

2

:=

E



x₁

δ

^x1

 , ξ

3

:=

E



y₂

δ

^x1



, and

ξ

4

:=

E



y₁

δ

^x1



. Further denote X₀

=

E



x²₁



,

^Y0

=

E



y²₁



,

^Y1

=

E



y₁y₂



,

Z₁

=

E



x₁y₁



, and Z₂

=

E



x₁y₂



. We are now ready to present the formula for_{F .}

˜

Proposition 1. In the situation described above,F can be presented

˜

explicitly by the following formula (for calculation see theAppendix):

F

˜ =

ln S_y

S

+

_Θ

,

Θ

= 

S_y

−

S



I

,

⁽⁶⁾

where

I

=

¹

Y₀



X₀

+ ξ

1

+

2

ξ

2

 −  ξ

4

+

Z₁



2



Y₀



X₀

+ ξ

1

+

2

ξ

2



−

¹

S_y

−

S



Y₀



ξ

3

+

Z₂



2

+ 

Y₀

−

S



ξ

4

+

Z₁



2

−

2Y₁



ξ

3

+

Z₂



ξ

4

+

Z₁





.

⁽⁷⁾

Note that since S

+

_Θin(6)is bounded above by S_y, we have thatΘ is upper bounded by S_y

−

S, i.e., I has an upper bound 1.

We end this subsection by the following two remarks.

Remark 1. In the same situation ofProposition 1, the following inequalities hold:

Y₀

≥

S_y

≥

S and Y₁

≤

0

.

⁽⁸⁾

According to(2), we have Y₀

=

Var

(

^y1

) ≥

^Var

(ζ ) =

^Sy. The re- mainder S_y

≥

S just follows by the reason that the prediction error of the reduced model in(4)is always less than or equal to that of the full model in(1). The latter holds because of the stationary assump- tion. If Y₁

=

E

(

^y1y₂

) >

0, then y will not be stationary. Thus Y₁

≤

0.

(4)

Remark 2. The following result can be obtained easily by using(5) and(6).

F

˜ >

^F

,

^{if I}

<

⁰

.

F

˜ =

F

,

^{if I}

=

0

.

F

˜ <

^F

,

^{if 0}

<

^I

<

¹

.

F

˜ =

0

,

^{if I}

=

1

.

(9)

2.3. Essential GCI factors

The formula defined by(6)and(7)is complicated but it reveals the intrinsic property of the perturbed GCI, depending on the four factors

ξ

1

, ξ

2,

ξ

3and

ξ

4which, by the definition, capture the main properties of error signals, namely, the spike sorting errors. A systematic characterization of these parameters’ influences on the GCI would provide a heuristic understanding of the effect of spike sorting error for researchers to make further decisions. Thus, in this subsection we discussProposition 1more by a total of five different situations and relate each of them to a biological meaning.

We now present the following corollaries for investigating the term I defined in(7), and this is equivalent to investigating the term F in

˜

(6).

Corollary 1 (

ξ

2

= ξ

3

= ξ

4

=

0). We start with the simple case in which the error process

δ

x is uncorrelated with the underlying processes x and y, i.e.,

ξ

2

= ξ

3

= ξ

4

=

0. In this case I can be simplified as follows:

I

(ξ

1

) =

^Y⁰

ξ

1

Y₀

ξ

1

+ [

X₀Y₀

−

Z₁²

] .

⁽¹⁰⁾

The above equation shows that I increases as

ξ

1 increases, but it is bounded above by the limit L

:=

lim_ξ₁→∞I

(ξ

1

) =

^{1. Hence,} by(6), the weakened GCIF is bounded below by lim

˜

_ξ₁→∞_F

˜ (ξ

1

) =

ln_S₊₍_S^S^y

y−S)L

=

ln^S_S^y

y

=

0. In reality, this limit cannot be attained because the variance

ξ

1

=

E



δ

^x²1



cannot approach infinity. Thus, the GCI will never vanish if the error process

δ

x is uncorrelated with the underlying process x. We refer to this corollary as FPs being composed of electrical noises or the spikes of other unconnected neurons during spike sorting.

Corollary 2 (

ξ

2

<

⁰

, ξ

3

<

⁰

, ξ

4

<

0). Suppose the error process

δ

x is negatively correlated with the underlying processes x and y, i.e.,

ξ

2

, ξ

3, and

ξ

4are all negative. Since the perturbed quantity

δ

^{x is} considered to be produced from inaccurate spike sorting, the maximal negative quantity which

δ

x can be is

−

x. Therefore, we have the following constraint:

−

X₀

≤ ξ

2

<

⁰

, −

Z₂

≤ ξ

3

<

⁰

, −

Z₁

≤ ξ

4

<

⁰

,

⁽¹¹⁾ and the equalities are attained when

δ

^x

= −

x.

Now, according to(7),(8)and(11), I is positive, increasing, and bounded above by 1. In other words,Θ

→

S_y

−

S and_F

˜ →

0 as

ξ

1

→ ∞

or



ξ

2

, ξ

3

, ξ

4

 → −

X₀

, −

^Z2

, −

^Z1



. This corollary is related to FNs in spike sorting.

Corollary 3 (

ξ

2

<

⁰

, ξ

3

<

⁰

, ξ

4

<

0 Simplified). Suppose

δ

^{x is cor-} related with the underlying processes as inCorollary 2. If y is further completely induced by x, i.e., y cannot explain itself (B

=

0 in(2)), then we obtain S_y

=

Y₀

,

^Y1

=

0 and(7)can be further simplified as:

I

=

Y₀



X₀

+ ξ

1

+

2

ξ

2

 −  ξ

4

+

Z₁



2

−

^Y⁰

Sy−S

 ξ

3

+

Z₂



2

Y₀



X₀

+ ξ

1

+

₂

ξ

2

 −  ξ

4

+

_Z₁



2

.

⁽¹²⁾

Eq.(12)still shows us that I

→

_{1 as}

ξ

1

→ ∞

. On the other hand, if

ξ

1is fixed and the negative correlation between x and

δ

x increases (i.e.,

(ξ

2

, ξ

3

, ξ

4

)

decreases simultaneously to

(−

^X0

, −

^Z2

, −

^Z1

)

^{), then} the value I increases to 1 because of the quadratic convergence:

 ξ

3

+

Z₂



2

→

0

.

⁽¹³⁾

Note that when

 ξ

2

, ξ

3

, ξ

4



attains the lower bound

−

X₀

, −

^Z2

, −

^Z1



, i.e.,

δ

^x

= −

x, there is nothing left to analyze. Therefore, we do not con- sider this case.

Corollary 4 (

ξ

2

=

0

, ξ

3

>

⁰

, ξ

4

>

δ

^x is positively correlated with y, and is uncorrelated with x, i.e.,

ξ

2

=

0

, ξ

3

>

^{0, and}

ξ

4

>

^{0. Since I}

=

0 when

ξ

1

= ξ

2

= ξ

3

= ξ

4

=

0, it is easy to conclude that, in this case, I is negative and decreasing, i.e., F

˜ >

^{F and}F increases as

˜ ξ

3or

ξ

4increase. We refer to this corollary as FPs being composed of spikes of some connected neurons which are positively correlated with the target neuron during spike sorting.

Corollary 5 (

ξ

2

>

⁰

, ξ

3

>

⁰

, ξ

4

>

δ

x is positively correlated with both x and y, i.e.,

ξ

2

>

⁰

, ξ

3

>

^0, and

ξ

4

>

0. Since I is increasing in

ξ

2, we know that _{F is then}

˜

decreasing in

ξ

2. Therefore,F in this case exhibits the same behavior as

˜

that inCorollary 4if



ξ

3

, ξ

4



dominates

ξ

2; butF is decreasing if

˜ ξ

2

dominates

 ξ

3

, ξ

4



. We refer to this corollary as FPs being composed of spikes of some connected neurons which are positively correlated with both of source and target neurons during spike sorting.

The results of above five corollaries are schematically summa- rized inTable 1.

2.4. GCI vs. variance reduction

Because 0

≤

S

≤

S_y, we set S

= (

¹

−

k

)

^Sywith 0

≤

k

≤

1 and the GCI in(5)becomes

F

=

ln S_y

(

¹

−

k

)

^Sy

= −

ln

(

¹

−

k

).

⁽¹⁴⁾

Next, relate F and k through k

=

1

−

exp

(−

^F

)

^{. Since S}y

−

S

=

kS_y

,

k

= 

S_y

−

_S



/

^Syrepresents the percentage of variance reduction.

More precisely, it represents the relative decrease in prediction errors from the reduced model (2) to the full model (1). For example, k

=

0 means no (0%) variance reduction, and the GCI is equal to zero. On the contrary, k

=

1 means a total (100%) variance reduction, and the GCI is equal to infinity.Fig. 1shows how the GCI relates to k, and it is almost totally reduced when the GCI is equal to 5.

3. Simulation study

Here, we present some models for sorting errors and further probe formula(7)via numerical simulations. For the FP case, we first consider that FPs are made up of electrical noises or spikes of unconnected neurons. Spikes of connected neurons are considered in Section3.4.

3.1. Models

We first construct a point process X

= 

p₁

,

^p2

, . . . ,

^pN



generated by a Poisson process with rate

λ

in the time interval

[

0

,

^T

]

, where N is the total number of points. The second point process Y

= 

q₁

,

^q2

, . . . ,

^qN



is then generated by Y

=

X

+

_N



m

, σ

^, where N



m

, σ

is a normal random variable with mean m and standard deviation

σ

. More precisely, q_i

=

p_i

+

_N



m

, σ,

ⁱ

=

1

, . . . ,

N. The point process Y presents a time lag m with respect to X if m

>

^{0 and}

σ =

0. This study only considers m

>

^{0 and}

σ >

^0.

(5)

Table 1

A schematic summary ofCorollaries 1–5. The factorsξi,ⁱ=1,²,³,4 and error processδx are described in Section2.2.+(−) stands for positive (negative) sign. FP (FN) stands for false positive (negative).↑ (↓)stands for increasing (decreasing).

ξ2 ξ3 ξ4 Effects on the GCI Type Interpretations onδ^x

0 0 0 GCI↓asξ1↑ FP Electrical noises or spikes of unconnected neurons.

− − − GCI↓asξ1↑ FN Spike missing.

0 + + GCI↑asξ3↑orξ4↑ FP Connected neurons which are positively correlated with the target neuron.

+ + + GCI↑if (ξ3, ξ4) dominatesξ2

GCI↓ifξ2dominates (ξ3, ξ4)

FP Connected neurons which are positively correlated with both of source and target neurons.

Fig. 1. The relationship between the Granger Causality Index (GCI) and k. This transforms GCI values to the corresponding percentage of variance reduction in the residual noise process.

Spike sorting errors include two types: fake (FP) and missing (FN) spike events. The fake spike event is an erroneous detection of an event that is not a real spike or is a spike from another neuron.

Conversely, a missing spike event means that spikes were not detected or were classified into groups of other neurons. Therefore, the fake spike case can be regarded as adding extra points to a point process. This type of error is denoted by ‘‘A-type’’. The missing spike case can be regarded as removing some points from a point process, and is denoted by ‘‘R-type’’. Forms of the addition or removal of points are considered for uniform, random, and concentrative distributions. The number of fake or missing spike events is rN, where r

(=

^p%

)

is the ratio of fake or missing points compared to the original spike train. The following explains the generation of these types in detail.

Uniform-partition addition (PA) model: The extra points

 ˜

_p_i

:

1

≤

i

≤

rN



on the time interval

[

0

,

^T

]

form a uniform partition of

[

0

,

^T

]

. More precisely,p

˜

_i

=

i1^{t, where}1^t

=

^T

rN−1.

Random-uniform addition (UA) model: The extra pointsp

˜

_i

,

ⁱ

=

1

, . . . ,

rN, on the time interval are generated from a uniform distributionU

[

0

,

^T

] 

random variable.

Random-normal addition (NA) model: The extra points_p

˜

_i

,

ⁱ

=

1

, . . . ,

rN, on the time interval are generated from a normal dis- tributionN



T

/

²

, σ

NA



random variable. The standard deviation parameter

σ

NArepresents different degrees of concentration.

Uniform-partition removal (PR) model: A set of reference points

 ˜

_p_j

:

1

≤

j

≤

rN



, which is a uniform partition of

[

0

,

^T

]

was used to remove spike events from



p_i

:

1

≤

i

≤

N



. First, fixp

˜

_j and then remove the point that is closest top

˜

_j.

Random-uniform removal (UR) model: Points in



p_i

:

1

≤

i

≤

N



are randomly removed using a discrete uniformUd



1

,

^N



random variable, i.e., all points have the same probability of being removed.

Middle-succession removal (SR) model: In this model, the removed points are successive and located near the center T

/

² of the time interval

[

0

,

^T

]

. The number of spikes is rN.

We note that sorting errors only occur in the source process X , and Y is assumed to be inerrable.

3.2. Setup

Set the parameters,

λ =

²

,

^T

=

100

,

^m

=

0

.

1 for the rate of the Poisson process, the total time, and the time lag, respectively. To apply autoregressive modeling, we convert point processes



p_i

:

1

≤

i

≤

N



and



q_i

:

1

≤

i

≤

N



to time series through the procedure of binning with the bin size as the time lag m. Results are obtained from the average of 100 simulations for each random case at fixed error percentage r. To investigate the effects of errors on the GCI, we observe the results from various r

, σ

^{, and}

σ

NA. 3.3. Simulation results

We note that the simulations and the corresponding results are related toCorollaries 1–3of Section2.3.

3.3.1. A-type vs. R-type

Fig. 2(a) shows the results for experiment 1 in which

σ =

⁰

.

⁰²

, σ

NA

=

T

/

8, and the error percentage increased by 0.1. In this figure, the GCI of the SR model decreases the fastest among all of the models as r increases. All R-type errors cause information loss and greatly weaken the GCI when the error percentage increases.

If r

=

1, the underlying signal is totally destroyed by the errors, and the causality is undetermined. The PA model produces the largest GCI for a fixed r.Fig. 2(a) also shows that A-Type GCIs are greater than R-Type GCIs. However, this phenomenon is not always valid. The error in the NA model is uncorrelated with the signal and weakens the GCI more than any of the R-Type models when

σ

NA

is small.Fig. 2(b) shows that a highly non-stationary process can average out much more underlying causality than the others, for the case of

σ

NA

=

T

/

64 in the NA model.

3.3.2. Standard deviation factor

σ

We now investigate the effects of the standard deviation

σ

^on GCIs.Fig. 2(c) and (d) show the results for experiments 2 and 3 in which parameter

σ

is 0.04 and 0.06, respectively. Observations in Fig. 2(a), (c), and (d) show that the behaviors of the profiles with

σ =

⁰

.

⁰⁴

,

⁰

.

06 closely resemble the behavior of the profile with

σ =

⁰

.

02. In the PA and PR models, let F denote the GCI value without a sorting error (r

=

0), and_F

˜ (

^r

)

denote the GCI value with sorting error of error percentage r.Table 2presents the relative errors of the GCI for r from 0.1 to 0.9 of the PA and PR models in the three experiments, where

Relative error

(

^r

) :=

_F

˜ (

^r

) −

^F

F

×

100%

.

Table 2shows that Experiment 3 had the flattest PA-curve and PR- curve of the three experiments, and the underlying causality of the PA models were all around 50% off at r

=

0

.

9 because the signal to noise ratio



SNR

=

^Var⁽^x⁾

Var(δ^x)



was close to 1. Although they were all around 50% off, the corresponding decreases in the percentage of variance reduction are greatly differed, and this can be seen from

(6)

(a) Experiment 1. (b) Concentrative normal.

(c) Experiment 2. (d) Experiment 3.

Fig. 2. (a) Simulation results of Experiment 1, in which the parameter settings wereλ =2,T=100,m=0.1, σ =0.02. (b) Simulation results of a much more concentrative normal, in which the parameter settings wereλ =2,T=₁₀₀,m=₀.1, σ =0.02, andσNA=_T/64. (c) Simulation results of Experiment 2, in which the parameter settings wereλ =²,^T =100,^m=0.¹, σ =⁰.04. (d) Simulation results of Experiment 3, in which the parameter settings wereλ =²,^T=100,^m=0.¹, σ =⁰.06. The error percentages r were all 0–0.9, and increased by 0.1.

Table 2

Relative errors,^˜^F⁽^r_F⁾⁻^F ×100%, of GCI for r=0.^1–0.9 of the PA and PR models in the three experiments.

Experiment r=0.¹ ^r=0.² ^r=0.³ ^r=0.⁴ ^r=0.⁵ ^r=0.⁶ ^r=0.⁷ ^r=0.⁸ ^r=0.⁹ Relative errors (%) of the PA model

1(σ =0.02) −16.46 −27.24 −35.29 −41.48 −46.26 −50.26 −53.43 −55.78 −57.60

2(σ =⁰.⁰⁴) −12.27 −21.87 −29.41 −34.36 −39.84 −42.63 −46.28 −48.93 −51.20

3(σ =⁰.⁰⁶) −11.59 −19.78 −26.99 −32.13 −36.63 −40.89 −43.42 −46.29 −48.41

Relative errors (%) of the PR model

1(σ =⁰.⁰²) −17.10 −30.16 −41.89 −51.31 −59.78 −68.60 −75.86 −84.02 −92.07

2(σ =⁰.⁰⁴) −12.64 −22.71 −32.74 −41.82 −50.63 −59.24 −68.72 −78.04 −88.96

3(σ =⁰.⁰⁶) −11.58 −19.21 −27.60 −37.45 −47.44 −54.15 −65.07 −75.34 −87.68

Fig. 1. In the PA model, the three F values corresponding to the three experiments were 1.24, 0.64, and 0.38, and the corresponding F

˜ (

⁰

.

⁹

)

were 0.71, 0.33, and 0.18, respectively. Eq.(14)can compute the corresponding relative decreases in the percentage of variance reduction, which were 28.17%, 40.89%, and 46.85%, respectively.

3.3.3. Explanation of the GCI curves

We now discuss the behaviors of the curves inFig. 2(a). Because A-type error processes (

δ

x) are uncorrelated with the underlying process (x), curves of the PA, UA, and NA models can be analyzed and directly explained by(6)and(10). These three curves decrease as the size of the errors (

ξ

1) increases. They reach zero only when

the error size is infinity, which is actually unfeasible. Therefore, these curves slowly decrease and never reach zero. In other words, the underlying causality remains.

Moreover, the PA-curve is above the UA-curve, and the UA- curve is above the NA-curve. This is because the NA model has the largest error size, followed by the UA model and the PA model at the same error percentage. On the other hand, error processes of the PR, UR, and SR models are correlated with the underlying process, and (12)is used instead of(10)since

ξ

2

, ξ

3, and

ξ

4 are nonzero.

With approximately the same error size, these curves decrease much more quickly than the uncorrelated case because of the

(7)

(a) Theξ1toξ4of NA model. (b) Theξ1toξ4of SR model.

Fig. 3. (a)ξ1toξ4of the NA model, in which the parameter settings wereλ =²,^T =100,^m=0.^1,σ =⁰.⁰², σNA=T/^{64. (b)}ξ1toξ4of the SR model, in which the parameter settings wereλ =2,T=100,m=0.1, σ =0.02. The error percentages r were both 0–0.9, and increased by 0.1.

negative correlations. Unlike the former, these curves almost reach zero at the error percentage r

=

0

.

9. In addition, the PR-curve is above the UR-curve, and the UR-curve is above the SR-curve. This is because the SR model has the largest error size, followed by the UR and PR models at the same error percentage.

We now discuss the behaviors of the curves inFig. 2(b), which shows that A-type errors are not always better than R-type errors.

GCI values of the NA model with

σ

NA

=

_T

/

64 are smaller than those of the SR model when the error percentage is between r

=

0

.

^{2 and} r

=

0

.

6. To see what occurred, we computed

ξ

1to

ξ

4at each error percentage for these two cases, and results are respectively shown inFig. 3(a) and (b). Comparing these two panels indicates that (i)

ξ

1

of the NA model is much larger than that of the SR model. According to(12), the GCI of the NA model is smaller than that of the SR model.

(ii) The SR model has large

ξ

2and

ξ

3. According to(13), the GCI of the SR model is significantly smaller than that of the NA model when r is quite large, even though the

ξ

1of the SR model is still relatively smaller than that of the NA model.Fig. 2(b) verifies this analysis showing that the error size of the NA model dominates the negative correlation of the SR model when the error percentage is between r

=

0

.

^{2 and r}

=

0

.

6. Contrarily, the negative correlation of the SR model dominates the error size of the NA model when r

>

⁰

.

6. Therefore, it is necessary to integrate the error size and the negative correlation to identify the behavior of the GCI, and the reason is that the error size of the NA model is much larger than that of the SR model when 0

≤

r

≤

0

.

^6.

3.4. Supplementary simulations of FPs

We now present supplementary simulations of FPs by consider- ing the case when FPs are composed of connected neurons. There are no supplementary simulations for FNs since they have only one situation as discussed inCorollary 2, and they were simulated in preceding work. The simulations here are devoted toCorollaries 4 and5of Section2.3. The following explain the simulations in detail and the model parameters are fixed at

λ =

²

,

^T

=

100

,

^m

=

0

.

^1, and

σ =

⁰

.

^02.

Experiment A: Suppose X and Z are two independent Poisson processes with equal rate

λ

ⁱⁿ

[

0

,

^T

]

, where N is the total number of points of X . Let Y be another point process generated by Y

=

Z

+

_N

(

^m

, σ )

. Hence, we have that Y is induced by Z , and X is independent of both Y and Z . Then rN points of Z are randomly added to X , where r denotes the error percentage.

After binning with bin size m, the GCI from X to Y as a function of r and the corresponding

ξ

’s are shown inFig. 4(a) and (b).

This experiment considers the situation that FPs are composed of spikes of connected neurons which are positively correlated with target neurons. Note that the true causality between X and Y is uncorrelated (GCI

=

0), which means that the relationship between neurons may be erroneously identified when a FP occurs during spike sorting.

Experiment B: Here we consider another situation that Y is in- duced by both X and Z , that is, Y

= {

X

+

_N

(

^m

, σ )} ∪ {

^Z

+

N

(

^m

, σ )}

, where X and Z are independent Poisson processes with equal rate

λ

ⁱⁿ

[

0

,

^T

]

. Then rN points of Z are randomly added to X . After binning, the GCI from X to Y and the corre- sponding

ξ

’s are shown inFig. 4(a) and (c). The result shows that the relationship is correctly identified, but the strength (GCI) is overestimated. Experiments A and B enlighten that the risk of an erroneous causal identification should be estimated by two parts: the relationship (causal or noncausal) and the strength (the magnitude of the GCI).

Experiment C: Suppose Z is a Poisson processes with rate

λ

in

[

0

,

^T

]

. Let X and Y be point processes generated by X

=

Z

+

_N

(

^m

, σ )

^{and Y}

=

X

+

_N

(

^m

, σ )

, i.e., X is induced by Z , and Y is induced by X . Then rN points of Z are randomly added to X . After binning, the GCI from X to Y and the corresponding

ξ

^’s are shown inFig. 4(a) and (d). The result shows that the effect of the positive-correlation (

ξ

2

>

0) between Z and X dominates the effect of the positive-correlation (

ξ

3

>

⁰

, ξ

4

>

^{0) between} Z and Y ; thus the GCI decreases. This experiment is devoted to Corollary 5 of Section2.3.

3.5. Simulation for threshold detection

Spike sorting consists of two parts: AP detection and AP classification, which are based on thresholding and clustering methods, respectively. Here we discuss the relationship between the GCI value and the detecting threshold via simulation. The classification part will be discussed in Section4by using real experimental data.

We simulate a sequence of 100 APs, denoted byA, having fixed interspike interval (ISI) length as shown inFig. 5(a). Then we add an independent white noise toAwith SNR

=

0

.

8 for the background noise. Denoting the standard deviation of the observed noisy data by

τ

^,Fig. 5(b) and (c) show the detected spikes with threshold values being 2

.

⁵

τ

^{and 3}

.

⁰

τ

, respectively. It is easy to see that a lower threshold is tending to result in FPs of spike-detection and a higher one is tending to result in FNs. Now, let pX be the point process obtained from perfect-detection, i.e., pX coincides withA. Let pY

=

pX

+

_N



0

.

¹

,

⁰

.

⁰²



represent the point process

(8)

(a) Supplementary simulations for FP. (b) Theξ1toξ4of Experiment A.

(c) Theξ1toξ4of Experiment B. (d) Theξ1toξ4of Experiment C.

Fig. 4. (a) Simulation results of Experiment A–C for FP. (b)ξ1toξ4of the Experiment A. (c)ξ1toξ4of the Experiment B. (d)ξ1toξ4of the Experiment C.

(a) Underlying action potentials.

(b) Spike-detection with threshold value 2.5 s.d.

(c) Spike-detection with threshold value 3 s.d.

Fig. 5. (a) The first 10 APs of the underlying AP sequenceA. (b) The detected spikes (marked by◦) with threshold value 2.5τ. (c) The detected spikes (marked by◦) with threshold value 3.⁰τ^.

(9)

(a) GCI result with varying threshold values.

(b) Number of false positives.

(c) Number of false negatives.

Fig. 6. (a) The relationship between GCI value and threshold value by changing the threshold from 1.⁵τ^{to 3}.⁵τ. (b) Number of FPs increases as threshold value decreases.

(c) Number of FNs increases as threshold value increases.

obtained from a causal sequence of APs, say B. After binning with bin width 0.1, we can infer the causal relationship between these two sequences of APs,AandB, by computing the GCI from the binned data of pX to that of pY . To be consistent with our analyses, the errors created by threshold-detection will be put only on the source sequenceA. We are now ready to investigate the relationship between GCI value and threshold value by changing the threshold from 1

.

⁵

τ

^{to 3}

.

⁵

τ

, and it is shown inFig. 6(a). The result shows that 2

.

⁵

τ

performs the best (i.e., obtaining the largest GCI value), since we know thatBis induced byA. To investigate more deeply into this result, numbers of FPs and FNs are further shown inFig. 6(b) and (c), and these results give us the following findings: (i) Number of FPs increases as threshold value decreases.

(ii) Number of FNs increases as threshold value increases. (iii) FPs affect GCI less than FNs since the number of FPs (

>

^{100 at 1}

.

⁵

τ

^{) is} much larger than that of FNs (

>

^{60 at 3}

.

⁵

τ

) and the decreasing rate (slope) of GCI in

[

2

.

⁵

,

³

.

⁵

] (=

⁰

.

⁴²⁷⁰

)

is larger than the increasing rate (slope) of GCI in

[

1

.

⁵

,

²

.

⁵

] (=

⁰

.

³⁶⁷⁴

)

. (iv) As a result of (iii), we can conclude that choosing a threshold lower than the optimal (

=

2

.

⁵

τ

) is better and preserves more information than choosing a threshold higher than the optimal.

We now discuss in details the result of this simulation through the following four remarks: (1) For a large threshold, the sorting error is only composed of FNs, without FPs. Therefore the GCI increases as threshold decreases. When the threshold decreases to certain value, the error of FPs occurs. The GCI will reach the maximum and then decreases as the threshold decreases to 0.

This result contributes the explanation of the effects of FPs and FNs on the GCI. (2) We can conclude that variations of the GCI are determined by which one of FP and FN to be the dominate sorting error and the total number of FPs and FNs as well for a fixed threshold. (3) Although we cannot choose the optimal threshold in real experiments, (iv) is still useful and it gives a criterion for designing methods of choice of an optimal threshold.

(4) Researchers may imitate the procedure of the simulation by using their ownAand background noise to determine the optimal threshold after examining the number of FPs and FNs.

4. Real data evaluation

Here we design two sorting procedures in real operation and then evaluate the effect of sorting errors on the GCI using real experimental data.

4.1. Experimental setup

Neuronal spikes were recorded from the ventroposterior medial (VPM) nucleus of the thalamus and are the same data set used in our previous study (Tseng, Tsai, Iwata, & Yen, 2012).

The single-unit recording method is described in Tseng’s report (2012). Briefly, spikes were amplified (7000 32,000-fold), filtered (0.25 13 kHz), and digitized at 40 kHz. Recording was performed while a rat was awake. Extracellular single units were recorded in real time using time-voltage windows and a principle component- based template-matching algorithm (Sort Client, Plexon). Wave- forms were saved and re-sorted using Offline Sorter (Plexon), based on principle-component clustering, with a user-defined template.

The sample we used here contained 2 or more distinguishable clusters. To evaluate the effect of sorting errors on the GCI, various percentage errors were created from 20%, 40%, 60%, and 80%

less or more than the data set of a complete cluster. Shrinkage or expansion of the sample size was calculated based on the dif- ference between a waveform of a neuron and a template, computed by the Offline Sorter (tolerance fit function). Note that cluster expansion included the other cluster of a neuron or noises. We used 6 neurons with an averaged firing frequency of 0.199 Hz, and therefore



6 2



×

2

=

30 neuron pairs (i.e., GCIs) were derived (

(

^Neuronⁱ

,

^Neuron^j

)

ⁱ

=

1

, . . . ,

^{6 j}

̸=

i). Being consistent with our analyses, the errors created by shrinkage or expansion were put only on the source (Neuronⁱ). In the sequel, we use the FN and FP-procedures to respectively represent shrinkage and expansion operations.

(10)

(a) Results of real data. (b) Theξ1toξ4of FN-decrease.

(c) Theξ1toξ4of FP-decrease. (d) Theξ1toξ4of FP-increase.

Fig. 7. (a) Three kinds of Granger Causality Index (GCI) patterns which frequently appeared in real experimental data. (b)ξ1toξ4of the false-negative (FN)-decrease pattern.

(c)ξ1toξ4of the false-positive (FP)-decrease pattern. (d)ξ1toξ4of the FP-increase pattern.

4.2. Experimental results

From these real data we found three GCI patterns which frequently appeared under the above-mentioned sorting procedures, one of them was found under the FN-procedure, and the other two were found under the FP-procedure. These patterns are explained in detail.

FN-decrease: The GCIs of all the neuron pairs (24 of 30 pairs) decreased as the error percentage increased under the FN- procedure (Fig. 7(a)) except pairs with a zero GCI (6 of 30 pairs). The four corresponding

ξ

’s are shown inFig. 7(b), and it shows that the error process induced by the FN-procedure was negatively correlated with processes of both the source and target neurons; thus the GCI decreased (resembling that in Fig. 3(b)).

FP-decrease: The GCIs of neuron pairs (6 of 30 pairs) decreased as the error percentage increased under the FP-procedure (Fig. 7(a)) and the four corresponding

ξ

’s of this pattern (Fig. 7(c)) show that the error process was positively correlated with process of the source neuron and was uncorrelated with that of the target neuron. The GCI decreased because the effect of

ξ

2dominated the effect of

ξ

3and

ξ

4. In other words, the error process was composed of spikes of some connected neurons which were positively correlated with the source neuron.

FP-increase: The GCIs of neuron pairs (16 of 30 pairs) increased as the error percentage increased under the FP-procedure (Fig. 7(a)) and the four corresponding

ξ

’s of this pattern

(Fig. 7(d)) show that the error process was positively correlated with processes of both the source and target neurons. The GCI increased because the effect of

ξ

3and

ξ

4dominated the effect of

ξ

2. In other words, the error process was composed of spikes of some connected neurons which were strongly correlated with the target neuron.

Finally, we note that there are 5 neuron pairs with a zero GCI, and 3 neuron pairs with unchanged GCIs under the FP- procedure.

5. Discussion

Because spike sorting errors are almost unavoidable, this study was devoted to investigating how sorting errors affect the identification of information flow among neurons. The analyses of this paper allowed us to directly discuss the effects of FPs and FNs through the proposed formula, and the results also revealed that they do not have the same effect on spike sorting. In Section2, we derived an analytic formula(7)in terms of factors

ξ

1to

ξ

4, and this formula can be used, when incorporating(6), to obtain how the GCI changes according to the error signal. Under the FN type of sorting error, we know the error process is negatively correlated with processes of both the source and target neurons; thus the GCI will always be underestimated. On the other hand, under the FP type of sorting error, the GCI may be underestimated or overestimated depending on the error process. If the error process is only composed of electrical noise or spikes of other unconnected