Chapter 2 Fundamentals of Seizure Analysis Algorithm
2.3 Related Works
A chaotic attractor is an attractor where, on the average, orbits originating from similar initial conditions (nearby points in the phase space) diverge exponentially fast (expansion process); they stay close together only for a short time. If these orbits belong to an attractor of finite size, they will fold back into it as time evolves (folding process). The Lyapunov exponents measure the average rate of expansion and folding that occurs along the local eigen-directions within an attractor in phase space. For an attractor to be chaotic, the largest Lyapunov exponent (LLE) must be positive.
As we mentioned before, a relevant time scale should always be used in order to quantify the physiological changes occurring in the brain. Furthermore, the brain being a nonstationary system, algorithms used to estimate measures of the brain dynamics should be capable of automatically identifying and appropriately weighing existing transients in the data.
Iasemidis et al. developed a method [7] for estimation of short-term Lyapunov exponents (STL), an estimate of LLE for nonstationary data. It is well-known and
widely used in many researches. Here we will take an epileptic seizure prediction system, proposed by L. D. Iasemidis et al. in the recent years, for example to explain the STL in detail.
The short-term Lyapunov exponent (STL) is defined as:
,
based on the reconstruction of attractors from time series, discussed in the last section, where: the evolution of this perturbation after time Δt.
z Δt is the evolution time for δXij, that is, the time one allows δXij to evolve in the phase space.
Fig. 2-9 Displacement vectors in the fiducial trajectory
L1
The crucial parameter is the adaptive estimation in time and phase space of the magnitude bounds of the candidate displacement vector to avoid catastrophic replacements. The improvement in the estimates of L can be achieved by using the proposed modifications.
z For L to be a reliable estimate of STL, the candidate vector ( )X tj should be chosen such that the previously evolved displacement vector
( 1),i j( )
X t
δ − Δ is almost parallel to the candidate displacement vector
, (0) angular separation between two vectors.
z For L to be a reliable estimate of STL, δXi j, (0) should also be small in magnitude in order to avoid computer overflow in the future evolution within very chaotic regions and to reduce the probability of starting up with points on separatrices. This means,
, (0) ( ) ( ) max
i j i j
X X t X t
δ = − < Δ (2.8)
with Δmax assuming small values.
A typical long-term plot of STL versus time, obtained by analysis of continuous EEG, is shown in Fig. 2-10. This figure shows the evolution of STL at a focal electrode site, as the brain progresses from interictal to ictal to postictal states. There is a gradual drop in STL over approximately 2 hours preceding this seizure. The seizure, 2 minutes in duration, is characterized by a sudden drop in STL values with a
consequent steep rise. Postictal STL values exceed preictal values and slowly approach interictal values. This behavior of STL indicates a gradual preictal reduction in chaoticity, reaching a minimum shortly after seizure onset, and a postictal rise in chaoticity that corresponds to the reversal of the preictal pathological state. There will be more discussions about this character in Chapter 3.
Fig. 2-10 Unsmoothed STL over time (140 min), including a 2-min seizure. [7]
Having estimated the STL temporal profiles at each electrode site, and as the brain proceeds toward the ictal state, the temporal evolution of the stability of each cortical site is quantified. However, since the brain is a system of spatial extent, information about the interactions of its spatial components should also be taken in consideration by the relations of the STL between different cortical sites.
The T- index at time t between electrode sites i and j is then defined as:
( ) ( )
{ }
i , j( )
i , j i j
T E STL t STL t t
N
= − ÷σ , (2.9)
where E
{ }
denotes the average of all absolute differences STL ti( )
−STL tj( )
within a moving window wt
( )
λ defined as:( )
1wt λ = for λ∈[t-N-1,t] and w
( )
λ =0 for λ∉[t-N-1,t]where N is the length of the moving window. Then, σi , j
( )
t is the sample standard deviation of the STL differences between electrode sites i and j within the moving window.A dynamical transition toward a seizure is announced at time t when the * T-indexes of sites over time transits from a value above threshold T1 at times t t'< , to a value below threshold T2 at time t , as shown in Fig. 2-11. *
Fig. 2-11 The T-index curves denoting entrainment 55 min before seizure SZ2 [7].
The method presented achieved amazing results with high prediction sensitivity, and pretty low false prediction rate. More details and comparison results with other algorithms and ours will be listed in Chapter 5.
Chapter 3
Wavelet-Correlation Dimension Seizure Prediction
In this chapter a real-time seizure prediction method based on correlation dimension analysis is presented, including the system architecture, data flow, and algorithms.
3.1 Architecture of Seizure Prediction
Before starting the prediction processing, first we observe the EEG signal whether exists any clue around the seizures. For example, as show in Fig. 3-1, for different clinical states including pre-ictal, ictal, and post-ictal states, the corresponding properties of intracranial EEG recordings are different.
Fig. 3-1 Typical EEG waveforms corresponding to epilepsy:
(a) pre-ictal, (b) ictal, and (c) post-ictal
In the pre-ictal state the EEG signal is of the chaotic nature. As we approach the epileptic seizure the signals are less and less chaotic and take the regular shape. These findings imply that seizures may represent spatiotemporal transitions of the epileptic brain from chaos-to-order-to-chaos. Therefore the chaoticity measure of the signal is a good prognostic of the incoming seizure. In fact, this phenomenon is confirmed by STL in the last chapter, but we try to use another method to prove it.
In this chapter, we would propose a Wavelet-Correlation Dimension based Seizure Prediction system, called WCDSP, as shown in Fig. 3-2. The system is consisted of three primary parts:
z Discrete Wavelet Transform analysis: Wavelet is used to decompose the EEG into several sub-bands.
z Chaos analysis: Correlation dimension is used to measure the EEG complexity.
z Feature extraction: The correlation coefficient is used to be the main feature for prediction rules and to decide the seizure states.
Phase Space Reconstruction Correlation Coefficient Prediction Rules
Fig. 3-2 System architecture of seizure prediction
3.2 Discrete Wavelet Transform
We would introduce the discrete wavelet transform (DWT) in this section, and briefly discuss the properties of the discrete Fourier transform (DFT), the short-time Fourier transform (STFT), and the wavelet transform.
3.2.1 Discrete Fourier Transform
In mathematics, the discrete Fourier transform (DFT) is one of the specific forms of Fourier analysis, and is widely employed in signal processing and related fields to analyze the frequencies contained in a sampled signal. As such, it transforms one function into another, which is called the frequency domain representation of the original function (which is often a function in the time domain).
The sequence of N complex numbers x ,...,x0 N−1 is transformed into the
The importance of the Fourier transform stems not only from the significance of their physical interpretations, such as time-frequency analysis of signals, but also from the fact that Fourier analytic techniques are extremely powerful.
While Fourier analysis forces us to choose between time on one side of the transform and frequency on the other, “…our everyday experiences insist on a description in terms of both time and frequency,” Gabor wrote. To analyze a signal in both time and frequency, he used the windowed Fourier transform. The idea is to study the frequencies of a signal segment by segment; the way, one can at least limit the span of time during witch something is happening. The “window” that defines the size of the segment to analyzed — and which remains fixed in size — is a little piece of curve.
One of the downfalls of the STFT is that it has a fixed resolution. The width of the windowing function relates to the how the signal is represented — it determines whether there is good frequency resolution (frequency components close together can be separated) or good time resolution (the time at which frequencies change).
3.2.2 Heisenberg uncertainty principle
We want to construct a function f whose energy is well localized in time and whose Fourier transform ˆf has an energy concentrated in a small frequency neighborhood.
The Heisenberg principle [8] says the following. For every function f t
( )
, suchthat
variance of t variance of
ˆ h
These variances measure to what extent t and τ take values far from their average values, t and m τm. Thus the shorter-lived a function, the wider the band of frequencies given by its Fourier transform; the narrower the band of frequencies of its Fourier transform, the more the function is spread out in time. Time and frequency energy concentrations are restricted by the Heisenberg uncertainty principle.
3.2.3 Wavelet and Multiresolution Analysis
Unlike the Fourier transform, whose basis functions are sinusoids, wavelet transforms are based on small waves, called wavelets [9], of varying frequency and limited duration. This allows them to provide the equivalent of a musical score for a signal, revealing not only what notes (or frequencies) to play but also when to play them. Conventional Fourier transforms, on the other hand, provide only the notes or frequency information; temporal information is lost in the transformation process.
In 1987, wavelets were first shown to be the foundation of a powerful new approach by Mallat to signal processing and analysis called multiresolution theory.
Multiresolution theory incorporates and unifies techniques from a variety of disciplines, including subband coding from signal processing, quadrature mirror filtering from digital speech recognition, and pyramidal image processing. As its name implies, multiresolution theory is concerned with the representation and analysis of signals at more than one resolution. The appeal of such an approach is obvious—features that might go undetected at one resolution may be easy to spot at another.
A. Background
When we look at images, generally we see connected regions of similar texture and gray level that combine to form objects. If the objects are small in size or low in contrast, we normally examine them at high resolutions; if they are large in size or high in contrast, a coarse view is all that is required. If both small and large objects—or low and high contrast objects—are present simultaneously, it can be advantageous to study them at several resolutions. This, of course, is the fundamental motivation for multiresolution processing.
(1) Image Pyramids
An image pyramid is a collection of decreasing resolution images arranged in the shape of a pyramid. As can be seen in Fig. 3-3, the base of the pyramid contains a high-resolution representation of the image being processed; the apex contains a low-resolution approximation. As you move up the pyramid, both size and resolution decrease.
Fig. 3-3 A pyramidal image structure and system block diagram for creating it
The level j− approximation output is used to create approximation pyramids, 1 which contain one or more approximations of the original image. The level j prediction residual output is used to build prediction residual pyramids.
For example, Fig. 3-4 shows one possible approximation (Gaussian) and prediction residual (Laplacian) pyramid for the vase. The Laplacian pyramid contains the prediction residuals needed to compute its Gaussian counterpart. To build the Gaussian pyramid, we begin with the Laplacian pyramid's level j 64 by 64
approximation image, predict the Gaussian pyramid's level j+ 128 by 128 1 resolution approximation (by upsampling and filtering), and add the Laplacian's level
1
j+ prediction residual. This process is repeated using successively computed approximation images until the original 512 by 512 image is generated.
Fig. 3-4 Two image pyramids and their statistics: a approximation (Gaussian) pyramid and a prediction residual (Laplacian) pyramid
(2) Subband Coding
Another important imaging technique with ties to multiresolution analysis (MRA) is subband coding. In subband coding, an image is decomposed into a set of band-limited components, called subbands, which can be reassembled to reconstruct the original image without error. Since the bandwidth of the resulting subbands is smaller than that of the original signal, the subbands can be downsampled without loss of information. Reconstruction of the original signal is accomplished by upsampling, filtering, and summing the individual subbands.
Fig. 3-5 shows the principal components of a two-band subband coding and
decoding system. The input of the system is a one-dimensional, band-limited
Fig. 3-5 (a) A two-band filter bank for one-dimensional subband coding and decoding and (b) its spectrum splitting properties
Filter H is a low-pass filter whose output is an approximation of 0 x n
( )
; filterH is a high-pass filter whose output is the high frequency or detail part of 1 x n
( )
.We wish to select h n0
( )
, h n1( )
, g n0( )
, and g n1( )
so that the input can be reconstructed perfectly. That is, so that ˆx n( ) ( )
=x n .We can express the system's output as: impose the following conditions:
( ) ( ) ( ) ( )
Both can be incorporated into the single matrix expression
( ) ( ) ( ) [ ]
For FIR filters, the determinate of the modulation matrix is a pure delay.
(
m( ) )
(2k 1)Then, 1
( ) ( )
1(
2m( ) )
0( ) ( )
1( )
Filter banks satisfying this condition are called biorthogonal. Moreover, the analysis and synthesis filter impulse responses of all two-band, real-coefficient, perfect reconstruction filter banks are subject to the biorthogonality constraint.
Orthonormal filter banks:
In MRA, a scaling function is used to create a series of approximations of a function, each differing by a factor of 2 from its nearest neighboring approximations.
Additional functions, called wavelets, are then used to encode the difference in information between adjacent approximations.
A function f x
( )
can be decomposed into a linear combination of expansion functions as follows.( )
k k( )
k
f x =
∑
α ϕ xIf the expansion is unique, the ϕk
( )
x are called basis functions. The expressible function forms a function space{
k( ) }
Case 1: Orthonormal basis( ) ( )
0 Case 2: Biorthogonal basis( ) ( )
0j k jk 1
x , x j k
ϕ ϕ δ ⎧ j k≠
< > = = ⎨⎩ =
z For the case of wavelet expansion, we restrict ourselves to forming the basis functions by binary scaling (shrinking by factors of two) and dyadic translation (shifting by the amount k/2j)
z Consider the set of expansion functions composed of integer translations and binary scalings of the real, square-integrable function ϕ
( )
x :( )
22j(
2j)
j ,k x x k
ϕ = ϕ − for all j,k Z∈ and ϕ
( )
x ∈L R2( )
-- ϕ
( )
x is called a scaling function-- L R2
( )
: the set of all measurable, square-integrable functions-- By choosing ϕ
( )
x wisely,{
ϕj ,k( )
x}
can be made to span L R2( )
More generally, we denotej
{
j ,k( ) }
k
V =span ϕ x
Four fundamental requirements of multiresolution analysis:
MRA requirement 1:
The scaling function is orthogonal to its integer translates.
MRA requirement 2:
The subspaces spanned by the scaling function at low scales are nested within those spanned at higher scales.
1 0 1 2
Any function can be represented with arbitrary precision.
{
2( ) }
V∞ = L R
z j ,k
( )
n j 1,n( ) [ ]
2(j 1 2)/(
2j 1)
which is called as refinement equation (MRA equation, dilation equation).
[ ]
h nϕ : scaling function coefficients
z Given a scaling function that meets the MRA requirements, we can define a wavelet function ψ
( )
x .h nψ : wavelet function coefficients C. Wavelet Transform
We can now formally define several closely related wavelet transformations: the generalized wavelet series expansion, the discrete wavelet transform, and a computationally efficient implementation of the discrete wavelet transform called the fast wavelet transform.
(1) The Wavelet Series Expansion
We begin by defining the wavelet series expansion of function f x
( )
∈L R2( )
relative to wavelet ψ
( )
x and scaling function ϕ( )
x .c kj : approximation (or scaling) coefficients
j
[ ]
d k : detail (or wavelet) coefficients
For orthonormal bases and tight frames,
[ ] ( ) ( ) ( ) ( )
(2) Discrete Wavelet Transform
Like the Fourier series expansion, the wavelet series expansion maps a function of a continuous variable into a sequence of coefficients. If the function being expanded is a sequence of numbers, like samples of a continuous function f x
( )
, theresulting coefficients are called the discrete wavelet transform (DWT) of f x
( )
.( ) [ ]
0( ) [ ] ( )
W j ,kϕ : approximation (or scaling) coefficients
[ ]
Wψ j,k : detail (or wavelet) coefficients For orthonormal bases and tight frames,
[
0] [ ]
0( )
For biorthogonal bases,
(3) The Fast Wavelet Transform (FWT)
The fast wavelet transform (FWT) is a computationally efficient implementation of the discrete wavelet transform (DWT) that exploits a surprising but fortunate relationship between the coefficients of the DWT at adjacent scales, also called Mallat's herringbone algorithm (Mallat [1989]).
Consider again the multiresolution refinement equation:
( ) [ ]
2(
2)
Fig. 3-6 reduces these operations to block diagram form. We note that the filter bank can be "iterated" to create multistage structures for computing DWT coefficients. Fig. 3-6 FWT analysis bank
D. Time-Frequency Analysis
Fig. 3-7 shows the time-frequency tiles for (a) a delta function (i.e., conventional time domain) basis, (b) a sinusoidal (FFT) basis, and (c) an FWT basis.
(a) (b) (c)
Fig. 3-7 Time-frequency tilings for (a) sampled data, (b) FFT, and (c) FWT basis functions.
Note that the standard time domain basis pinpoints the instants when events occur but provides no frequency information. A sinusoidal basis, on the other hand, pinpoints the frequencies that are present in events that occur over long periods but provides no time resolution. The time and frequency resolution of the FWT tiles vary.
At low frequencies, the tiles are shorter (i.e., have better frequency resolution) but are wider (which corresponds to poorer time resolution). At high frequencies, tile width is smaller (so the time resolution is improved) and tile height is greater (which means the frequency resolution is poorer). This fundamental difference between the FFT and FWT was noted in the introduction to the section and is important in the analysis of nonstationary functions whose frequencies vary in time.
In this research, a two-level Daubechies 4 (db4) wavelet is used for the EEG recordings. Fig. 3-8 shows (a) the corresponding wavelet structure, and (b) the time-frequency tilings of EEG signal.
LEVEL 1
Daubechies 4th Order Wavelet (db4)
L1
(0~64Hz)
(a)
(b)
Fig. 3-8 (a) Two-level Daubechies 4 wavelet and (b) Time-frequency tilings of EEG
The corresponding analysis and synthesis filters are shown in Fig. 3-9. We list the analysis part, including a low-pass and a high-pass filter, as follows:
1 2 3
0 0 1 2 3
2 1 1
1 3 2 1 0
( )
( ) ,
h z h h z h z h z h z h z h z h h z
− − −
−
= + + +
= − + − (3.14)
where 0 1 3 1 3 3 2 3 3 3 1 3
, , ,
4 2 4 2 4 2 4 2
h = + h = + h = − h = −
Fig. 3-9 Analysis and synthesis filters
We decompose the original EEG into several subbands, including L1, H1, LL2, and LH2 (Fig. 3-10). Each subband may contain some specific characteristic of the brain dynamics. In the next section, we will use these subbands for advanced analysis.
Fig. 3-10 Decompose the EEG into many subbands
3.3 Correlation Dimension
Estimating the fractal dimension of a strange attractor from a corresponding time series has attracted considerable attention in the past few years and has become one of the main tools in the analysis of the underlying dynamics. Of all types of dimensions, most attention has been given to the correlation dimension (D ). This is c mainly because this type of dimension is easier to estimate than others and also because it provides a good measure of the complexity of the dynamics, i.e. of the number of active degrees of freedom.
First, we have to reconstruct the attractor on the phase space, introduced in Chapter 2. We divide the EEG into non-overlapping time blocks which is long enough for a good estimation of D , shown in Fig. 3-11(a). Each block contains 256 states of c the attractor (Na =256), and the distance between each state is 9 (Δ =t 9), shown as Fig. 3-11(b). The dimension of the phase space is p , and the delay τ =1, shown in Fig. 3-11(c).
Fig. 3-11 (a) Time blocks, (b) States, and (c) Embedding dimension
Grassberger and Proccacia suggested a procedure of estimating which became widely used by mathematicians and applied scientists immediately [5]. Fix a point x on the attractor A. Let Nx
( )
ε denote the number of points on A inside a ball of radius ε about x (Fig. 3-12).Ball of radius centered at x
ε
Fig. 3-12 Pointwise dimension
Most of the points in the ball are unrelated to the immediate portion of the trajectory through x; instead they come from later parts that just happen to pass close to x. Thus Nx
( )
ε measures how frequently a typical trajectory visits anε-neighborhood of x.
Now vary ε. As ε increases, the number of points in the ball typically grows as a power law:
( )
dNx ε ∝ε , (3.15)
where d is called the pointwise dimension at x. The pointwise dimension can depend significantly on x; it will be smaller in rarefied regions of the attractor. To get an overall dimension of A, one averages Nx
( )
ε over many x. The resulting quantity( )
C ε is found empirically to scale as
( )
DcC ε ∝ε (3.16)
where D is called the correlation dimension. c
If the relation C
( )
ε ∝εDc were valid for all ε , we'd find a straight line ofFig. 3-13 Correlation dimension with different radius
It is known that when calculating the correlation dimension some restrictions are imposed on the value of ε . We use the standard deviation (std.) of each time block to estimate it as follows:
{
1 1 5 2 2 5 3}
k std of the time block, k= , . , , . ,
ε = ∗ (3.19)
The results are shown in Fig. 3-13. When k is less than two, there is no clear trend of the wave before the seizure onset. With the increasing of k, we can find that there is a long-term decreasing before the seizure and a sudden drop during the ictal, and then a strong rise after the seizure. Recall the feature mentioned in the previous
The results are shown in Fig. 3-13. When k is less than two, there is no clear trend of the wave before the seizure onset. With the increasing of k, we can find that there is a long-term decreasing before the seizure and a sudden drop during the ictal, and then a strong rise after the seizure. Recall the feature mentioned in the previous