• 沒有找到結果。

Spike sorting by a minimax reduced feature set based on finite differences

N/A
N/A
Protected

Academic year: 2022

Share "Spike sorting by a minimax reduced feature set based on finite differences"

Copied!
5
0
0

加載中.... (立即查看全文)

全文

(1)

T E C H N I C A L N O T E

Spike sorting by a minimax reduced feature set based on finite differences

Chien-Chang YenÆ Wei-Chang Shann Æ Chen-Tung YenÆ Meng-Li Tsai

Received: 17 September 2008 / Accepted: 12 November 2008 / Published online: 25 December 2008

 The Physiological Society of Japan and Springer 2008

Abstract Spikes are classified according to their finite differences in various orders. The fundamental idea that makes it work is that finite differences can extract and isolate features from spikes. This method showed better sorting quality and involved less labor than the methods of principal component analysis, original reduced feature set, and wavelet-based spike classifiers.

Keywords Spike sorting Multi-channel recording  Reduced feature set

Introduction

While many powerful imaging techniques are being used in neuroscience, extracellular recording remains the only choice that provides time resolution of activities in the brain at the single-neuron level. However, multiple single- unit extracellular recordings are useful only when the

action potentials generated by different neurons can be classified by their particular waveforms, or spikes, from the digital signals detected by electronic instruments.

A well-chosen reduced feature set (RFS) will largely increase both the efficiency and accuracy of classifications.

Suppose that each action potential, once detected, is sam- pled into a digital signal, or spike event. We collect N such events, each with k samples, namely vn¼ v n1; vn2; . . .; vnk for n = 1, 2,…, N. The idea of an RFS is to choose e positions (indices) 1 B s1\ s2\… \ seB k such that e is far less than k and the (reduced feature) set,

vnS1; vnS

2; . . .; vnS

ejn¼ 1; 2; . . .; N

n o

;

is good enough to classify the N events into K clusters. In a pioneering work on RFS, Dinning and Sanderson [1]

selected features that independently maximize a distance between cluster means and minimize within-cluster vari- ances. The features are far apart because of the requirement that sij  sjj  4: They also made the assumption that the clusters have equal a priori probabilities; that is, each cluster has roughly N/K events. The K-means clustering algorithm [3] is applied to a feature set, and the number of clusters is indicated by an F-ratio criterion. They showed in two experiments with N around 50 and k = 34 that two and four clusters (K = 2, 4) were determined by three and four features (e = 3, 4), respectively. Salganicoff et al. [6]

presented an improved criterion that selects features according to their variance weighted by the distance from previously chosen positions. Their experiments were done with N around 100 and k = 64, and three or more clusters were satisfactorily determined by six to eight features.

In a similar spirit with RFS, one may apply clustering algorithms on the first few components of the principal component analysis (PCA) of the events; see, for instance, C.-C. Yen

Department of Mathematics, Fu-Jen Catholic University, Taipei, Taiwan

W.-C. Shann

Department of Mathematics, National Central University, Chung-Li, Taiwan

C.-T. Yen

Institute of Zoology and Department of Life Science, National Taiwan University, Taipei, Taiwan M.-L. Tsai (&)

Department of Biomechatronic Engineering, National Ilan University, 1 Shen Lung Rd., Sec. 1, Ilan 260, Taiwan

e-mail: mltsai@niu.edu.tw DOI 10.1007/s12576-008-0010-x

(2)

the work of Eggermont et al. [2]. In order to make the clustering procedure more manageable, one of the main ideas is to reduce the dimension, and RFS and PCA are the most available means for this purpose. Recently, we have seen wavelet-based spike classifiers (WSCs) make their entrance into this field. For instance, Letelier and Weber [5] selected the few wavelet coefficients of the events that reveal clear signatures of spikes, and Hulata et al. [4] used wavelet packets for the same purpose.

The fundamental reason that WSCs work is that wavelet coefficients extract the signatures of the spikes. The wavelet coefficients are essentially the weighted differ- ences of the averaged input data. But the averaging process would contrarily smooth out the signals. So WSCs actually reveal that the signatures are richer in differentiations rather than in the spikes themselves. This observation motivated us to use the finite differences of signals as the means to directly extract features from the events, that is, take the differentiation without prior smoothing. The result is the minimax reduced feature set (mRFS) method that we present in the next section.

Methods

The mRFS method consists of two steps: (1) calculate the finite differences of the events to a pre-determined order, q-1, and (2) select RFS (two indices only) by minimum versus maximum two-dimensional plots (the mRFS plots).

As only a small number of plots are required to select the RFS, they can be arranged on one screen panel to make effective observations.

To simplify the notation, we drop index n and denote an event by v = (v1, v2,…, vk), of N such events. The finite difference of v of order k is denoted by Dkv = (w1, w2,…, wk) with

wl¼Xk

j¼0

1

ð ÞjCjkvlj; l¼ 1; 2; . . .; k;

where Cjkis the combinatorial number k!= j! kð ð  jÞ!Þ: It is a convolution of event v and the finite difference coefficients

1

ð ÞjCkj: Event v is padded by constants when vl-jis out of range.

In Fig.1, we show a plot of four events along with their differences of order 1, 2, and 3. Observe that the wave- forms become increasingly separated as the order of difference grows. This is what we mean by ‘‘extracting the signatures’’ of the spikes.

Then we prepare for the mRFS plots that allow an operator to select the features. Let pkbe the index where most of the signals in Dkvreach their minimum, and qkbe

make a (Dkv)pk versus (Dlv)ql plot. If we take finite dif- ferences up to order q-1, there are q2plots corresponding to each choice of k, ‘ = 0, 1,…, q-1 with N dots on each plot. A reasonable choice is q = 4, and there are 42= 16 mRFS plots. In this case, one can easily arrange the plots on one screen and make a clear and confident decision based on the scatter patterns in the plots. An operator inspects the mRFS plots and chooses one with the most plausible number of clusters and whose clusters are the best separated. The (two) indices that comprise the ‘‘chosen’’

mRFS plot are the features we are looking for, and the spike templates are determined according to the clusters.

See examples in the next section.

All computations in this article were performed by Matlab 6.5 running on an Intel PC.

Results

For a numerical simulation, we produce three types of test events by contaminating four typical artificial spikes with noise signals cut from real experiments (obtained from a published study by Tsai et al. [7]). The four artificial spikes

origin

order 1

order 2

order 3

Fig. 1 Plots of four events (origin, k = 0) and their finite differences of order 1, 2, and 3 (k = 1, 2, and 3). Observe that the waveforms are separated further by the finite difference plots

(3)

3SD, and 4SD, we scale the noise such that the standard deviation is 1/2, 1/3, and 1/4 of m, respectively. To make a 10-s simulation signal, we add 1,600 artificial spikes (400 each) with random positions to each type of noise signal.

Letting s be the standard deviation of the simulation signal, we run a spike detection algorithm that captures spikes when the ‘‘amplitude’’ is over 2 s. The detected spikes are aligned with their lowest positions. For instance, the algorithm has detected 1,320 spikes for the 3SD case, and their superimposed waveforms are shown on the right of Fig.2. As we can see on the lower part of Fig.2, some artificial spikes are not detected, and some noise signals are mistakenly included.

With q = 4, we present all 16 mRFS plots for each type of test event in Fig.3. For instance, the plot title ‘‘2 vs. 0’’

means the plot ofðD2p2vs.ðD0q0; that is, each event, v, is represented by one dot (x, y), where x is the entry of D2v at position p2and y is the entry of D0v= v at position q0. It

is clear that some scattering patterns stand out from the others with four clear clusters.

Using the test events of the three types described in Fig.2, we compare the performance of mRFS with RFS, WSC, and PCA. For the RFS method with two features, we make all (232

) = 496 scatterplots for each pair of indices and choose those with four clusters by inspecting all of the plots. For the WSC method, we use the so-called Daub4 orthonormal wavelet (with eight coefficients) to perform the discrete wavelet transformation for each event. After the transformation, there are still 32 elements at five scales.

We choose those with four clusters by inspecting all of the 496 scatterplots for each pair of transformed elements.

Again, for the PCA method, we choose those with four clusters by inspecting all of the 496 scatterplots for each pair of principal components. In Table1, we display the number of four-cluster plots out of the total number of scatterplots for each type of test event. Although the mRFS and other methods all failed to separate four-cluster plots under higher noise level (2SD), the mRFS shown the best efficiency among these methods under lower noise levels (3SD and 4SD).

Once the number of classes is determined and the two indices that made the ‘‘best’’ scatterplots are known, any clustering algorithm can be applied to the set of two indices for the actual classification. For instance, the Matlab function fcm() in the Fuzzy Logic Toolbox is a robust implementation of the Fuzzy C-Means clustering algo- rithm. For the test events, we applied fcm() to the two- index sets that produced four-cluster plots and estimated the accurate rates of each case. It can be seen on Table1 that if only two features or components are allowed, the mRFS outperforms the other three methods under 3SD and 4SD noise conditions.

We also used real experimental data to show the prac- ticality of this method. These spikes were recorded from the primary sensorimotor cortex of a rat (also obtained

Type noise Totle (N)

2 SD 390 384 179 153 384 1490

3 SD 393 396 206 225 100 1320

4 SD 400 398 262 275 10 1345

Fig. 2 Upper: 1,600 superimposed noise signals cut from biological experiments (left). Four artificial spikes with typical waveforms (middle). The superimposed detected spikes in the 3SD case, aligned with their lowest positions (right). Lower: For each type of simulation signals, the total number of detected spikes, and a classification of what is really detected

0 vs 3 0 vs 2 0 vs 1 0 vs 0

3 vs 3 2 vs 3 1 vs 3

3 vs 2 2 vs 2 1 vs 2

3 vs 1 2 vs 1 1 vs 1

3 vs 0 2 vs 0 1 vs 0

3 SD 2 SD

0 vs 3 0 vs 2 0 vs 1 0 vs 0

1 vs 3 1 vs 2 1 vs 1 1 vs 0

2 vs 3 2 vs 2 2 vs 1 2 vs 0

3 vs 3 3 vs 2 3 vs 1 3 vs 0

4 SD

0 vs 3 0 vs 2 0 vs 1 0 vs 0

3 vs 3 2 vs 3 1 vs 3

3 vs 2 2 vs 2 1 vs 2

3 vs 1 2 vs 1 1 vs 1

3 vs 0 2 vs 0 1 vs 0 Fig. 3 Sixteen mRFS plots

(all of them) for each type of test event. Those with four clear clusters are marked by grey backgrounds

(4)

from the experimental data of Tsai et al. [7]). As shown in Fig.4, which is similar to the numerical simulations, the best mRFS plot screened from 16 plots shows more distinct clusters than the other three best ones screened from 496 plots of the WSC, RFS, or PCA methods in each case. The test with real data indicates the merits of better sorting quality and less labor using the mRFS method.

Discussion

Since an event consists of spike and noise, the finite dif-

noise. It seems only reasonable that the effects would cancel each other out. In general, this is true. In the par- ticular situation of typical spike sorting, we are fortunate to see that the ‘‘signatures’’ of typical spikes are magnified more than the variances of noise. This is the essential way by which the mRFS method works. We explain this issue by the following mathematical model.

Let v = p ? x where p is the spike and x is the noise modeled by a random variable with a mean of 0 and standard deviation of r. By the linearity of finite differ- ences, we have Dkv= Dkp? Dkx. Here x = (x1, x2,…, xk) is a noise, the component, xi, of which is a sample from a random variable, Xi, the mean of which is 0 and standard deviation r. By the definition of Dkxand by the assumption that random variables Xiand Xjare independent (for i = j), Dkxconsists of samples from a new random variable with a mean still 0 and a standard deviation of rkr. The constant, rk:¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

jjð1ÞjCkj

q  ¼2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P

j Cjk 2

; r

is considered the magnification factor for the random noise by the finite difference of order k.

As for Dkp, there are no general descriptions for the spikes, and we can at best give a mathematical model for typical cases. For this purpose, we use p0= (…, 0, -1, 1, 0, 0,…) as a simple model that presents the essential pat- tern of spikes. Taking this model, it is easy to see that the nonzero elements of Dkp0 consist of ±Cjk?1, for instance D2p0= (…, 0, 0, -1, 3, -3, 1, 0, 0,…). It is natural to consider the constant,

sk:¼max Dj kp0j

max pj j0 ¼ Ckþ1bðkþ1Þ=2c;

as the magnification factor for the typical spike by the finite difference of order k.

Since the number of samples for an event is usually around 32, we do not need finite differences of arbitrarily high orders. In practice, we think that an order of k B 6 will suffice. In this practical range of k, sk[ rk; that is, spikes are magnified more than the noise by the process of finite differences.

In this representation, we see that two features are enough for the classification. In case one needs more fea- tures that are not too close to the chosen ones, one can exclude the chosen indices together with their neighbors and run the mRFS procedure again to find the next pair of minimax features.

There were no more than four clusters sorted in the three types of simulated tests, suggesting that the mRFS method will not produce over-clustering results. However, the mRFS method cannot deal with the overlapping problem directly. The overlapped spikes will likely be detected by Table 1 For each type of test event, and for each of the mRFS, WSC,

RFS, and PCA methods, we list the total number of scatterplots (on the denominator) and the number of plots that show four clear clusters (on the numerator)

Type Analysis methods

mRFS WSC RFS PCA

2SD 0/16 0/496 0/496 0/496

3SD 3/16 3/496 0/496 0/496

(1 vs. 3, 0.61) (8 vs. 22, 0.59) (1 vs. 1, 0.59) (10 vs. 22, 0.58) (3 vs. 2, 0.57) (8 vs. 21, 0.57)

4SD 9/16 7/496 3/496 0/496

(1 vs. 3, 0.81) (10 vs. 22, 0.80) (17 vs. 19, 0.77) (1 vs. 1, 0.80) (10 vs. 21, 0.79) (17 vs. 18, 0.55) (3 vs. 2, 0.79) (8 vs. 9, 0.78) (17 vs. 20, 0.52) (2 vs. 2, 0.78) (9 vs. 11, 0.77)

(0 vs. 3, 0.71) (8 vs. 21, 0.76) (0 vs. 1, 0.66) (8 vs. 22, 0.76) (2 vs. 0, 0.62) (9 vs. 21, 0.72) (1 vs. 2, 0.59)

(3 vs. 0, 0.51)

For each case with four clear clusters, we also list the rate of correct classifications by the fuzzy c-means clustering algorithm

Fig. 4 A comparison of the performances of mRFS, WSC, RFS, and PCA methods on real neuronal spikes. The superimposed spikes consisted of 8,910 spikes; the width of each waveform is 1.6 ms, which covers the entire spike. The plot of mRFS was chosen from 16 plots, while the plots of the other three methods were chosen from 496 plots. Notice that the mRFS plots display more-distinct clusters in this case

(5)

Acknowledgement This study was supported by a grant (NSC95- 2314-B-197-001) from the National Science Council, Taiwan.

References

1. Dinning GJ, Sanderson AC (1981) Real-time classification of multiunit neural signals using reduced feature sets. IEEE Trans Biomed Eng 28:804–812. doi:10.1109/TBME.1981.324679 2. Eggermont J, Epping W, Aertsen A (1983) Stimulus dependent

neural correlations in the auditory midbrain of the grassfrog (Rana temporaria L.). Biol Cybern 47:103–117. doi:10.1007/

BF00337084

3. Hartigan JA (1975) Clustering algorithms. Wiley, New York

4. Hulata E, Segev R, Shapira Y, Benveniste M, Ben-Jacob E (2000) Detection and sorting of neural spikes using wavelet packets. Phys Rev Lett 85:4637–4640. doi:10.1103/PhysRevLett.85.4637 5. Letelier JC, Weber PP (2000) Spike sorting based on discrete

wavelet transform coefficients. J Neurosci Meth 101:93–106. doi:

10.1016/S0165-0270(00)00250-8

6. Salganicoff M, Sarna M, Sax L, Gerstein GL (1998) Unsupervised waveform classification for multi-neural recordings: a real-time, software-based system I. Algorithms and implementation. J Neu- rosci Meth 25:181–187. doi:10.1016/0165-0270(88)90132-X 7. Tsai ML, Kuo CC, Sun WZ, Yen CT (2004) Differential morphine

effect on short-and long-latency laser evoked cortical responses of the rat. Pain 110:665–674. doi:10.1016/j.pain.2004.05.016

參考文獻

相關文件

Generate a data set of size 2 by the procedure above and run the decision stump algorithm on the data set to get g.. Repeat the experiment 10000 times, each with a different

For each time-step an image view (in which all feature estimates are projected into the estimated camera frame) is shown next to an external 3D view in which the camera can also

The conjugate gradient method is an iterative process which involves a sequence of Hessian-vector products. Past works such as Keerthi et al. [9] have shown that for

The image of the set  is the region  shown in the ­plane, a parallelogram bounded by these four line segments.. The transformation maps the boundary of  to the boundary of

C) protein chains maintained by interactions of peptide backbones D) amino acid sequence maintained by peptide bonds. E) protein structure maintained through multiple hydrogen

By correcting for the speed of individual test takers, it is possible to reveal systematic differences between the items in a test, which were modeled by item discrimination and

From the aforementioned discussion, we know that we can not tell the positive semidefiniteness of a symmetric matrix by its leading principal minors whereas we can do it for

這種既是 open 又是 closed 的集合, 有的書會稱 之為 clopen set.. 以下我們列出一些有關 boundary 的性質, 希望大家不要光記憶其結果,