• 沒有找到結果。

Independent Component Analysis (ICA)

1. Introduction

2.3. Independent Component Analysis (ICA)

The problem of blind source separation of recorded multi-channel signals into sums of temporally independent sources had been posed some years earlier. In 1994, Comon proposed the first approaches to blind source separation by minimizing the third and fourth-order correlations among the observed variables and achieved limited success in simulations [65].

In 1996, Cardoso, Bell, and Sejnowski generalized this approach, demonstrating a simple neural network algorithm that used joint information maximization or “infomax” as a training criterion [66-67]. By using a compressive nonlinearity to transform the data and then following the entropy gradient of the resulting mixtures, they were able to demonstrate unmixing of ten recorded voice and music sound sources that had been mixed with different weights in ten simulated microphone channels. Their algorithm used only minimal assumptions about the nature of the sources to be separated. Mixing weights (and thus scalp projections) of individual components were assumed to be fixed, and the time courses of the sources mutually independent. In 1996, Makeig and et al further extended the applications of blind decomposition to biomedical time series analysis by applying the infomax ICA algorithm to decomposition of EEG and event-related potential (ERP) data and reported the use of ICA to monitor alertness [49]. This first report demonstrated segregation of eye movements from brain EEG phenomena, and separation of EEG data into constituent components defined by spatial stability and temporal independence. Subsequent technical reports by Ghahremani et al. [68] and Makeig et al. [69] demonstrated successful separation of six simulated EEG sources mixed into six simulated EEG channels using a realistic three-shell head model. Unmixing performance of the ICA algorithm was shown to degrade

other EEG decompositions are based on physically modeling the supposed sources [70-71] or on PCA [72]. Makeig et al. evaluated the relative strengths and limitations of the statistical independence criterion using simulations [69]. ICA was successful in separating behaviorally related ERP components in an auditory detection task [73] and several complex visual evoked ERP data sets [43-44, 74-76]. Jung et al. also demonstrated that ICA can also be used to remove artifacts from continuous or event-related (single-trial) EEG data prior to averaging [48, 77-78]. Vigario et al. used a somewhat different ICA algorithm, supported the use of ICA for identifying artifacts in MEG data [79]. Meanwhile, widespread interest in ICA has led to multiple applications to biomedical data as well as to other fields [49, 80]. Most relevant to EEG analysis, McKeown et al. demonstrated the effectiveness of ICA in separating functionally independent components of functional magnetic resonance imaging (fMRI) data [81].

Four main assumptions underlie ICA decomposition of EEG data: (1) Signal conduction times are equal, and summation of currents at the scalp electrodes is linear, both reasonable assumptions for currents carried to the scalp electrodes by volume conduction at EEG frequencies [82]. (2) Spatial projections of components are fixed across time and conditions.

(3) Source activations are temporally independent of one another across the input data. (4) Statistical distributions of the component activation values are not Gaussian. (In contrast, PCA assumes that the sources have a Gaussian distribution). The spatial stationarity of the component scalp maps, assumed in ICA, is compatible with the observation made in large numbers of functional imaging reports that performance of particular tasks increases blood flow within small (≈cm3) discrete brain regions [83]. Since functional hemodynamic imaging experiments typically show metabolic brain increases in defined tasks occur in relatively small cortical areas, EEG sources reflecting task-related information processing may generally assumed to sum activity from compact and spatially stationary generators. However,

spatial stationarity may not apply to some spontaneously generated EEG phenomena such as spreading depression or sleep spindles [84]. To fulfill the temporal independence assumption used by ICA, response components must be activated with temporally independent time courses. For this to occur, the functional degree of independence of different regions of synchronous neural activity, generating the EEG signals, must be expressed in the data.

Typically, this means that sufficient numbers of time points need to be used during training.

The joint problems of electroencephalographic (EEG) source segregation, identification, and localization are very difficult since the EEG data collected from any point on the human scalp includes activity generated within a large brain area, and thus, problem of determining brain electrical sources from potential patterns recorded on the scalp surface is mathematically underdetermined. In this thesis, an application of the concept of non-stationary ICA for EEG decomposition is proposed. This is a complex problem, both theoretically and computationally with a tradeoff between the benefits of more complex methods of analysis and their complexity. Normally, more complex methods require more restrictive assumptions to be beneficial. In this thesis, we attempt to completely separate the twin problems of source identification (What) and source localization (Where) by using a generally applicable ICA. Thus, the artifacts including the eye-movement (EOG), eye-blinking, heart-beating (EKG), muscle-movement (EMG), and line noises can be successfully separated from EEG activities. The ICA algorithm was carried out with the

“infomax” principle [85-86], where the beauty of the “infomax” approach to blind separation or ICA is the close fit of the “infomax” assumptions to the nature of the EEG data which had been demonstrated in many reports [48, 77-81, 85-86]. The ICA is a statistical “latent

independent. The ICA model describes how the observed data are generated by a process of mixing the components s . The independent components i s (often abbreviated as ICs) are i latent variables, meaning that they cannot be directly observed. Also the mixing matrix A are assumed to be unknown. All we observed are the random variables x , and we must estimate i both the mixing matrix and the s using the i x . i

Therefore, given time series of the observed data x(t)=

[

x1(t) x2(t) L xN(t)

]

T in N-dimension, the ICA is to find a linear mapping W such that the unmixed signals u(t) are

statically independent.

Supposed the probability density function of the observations x can be expressed as:

)

The learning algorithm can be derived using the maximum likelihood formulation with the log-likelihood function derived as:

=

Thus, an effective learning algorithm using natural gradient to maximize the log-likelihood with respect to W gives:

[

I u u

]

W

and WTW rescales the gradient, simplifies the learning rule and speeds the convergence considerably. It is difficult to know a priori the parametric density function p u , which ( ) plays an essential role in the learning process. If we choose to approximate the estimated probability density function with an Edgeworth expansion or Gram-Charlier expansion for generalizing the learning rule to sources with either sub- or super-Gaussian distributions, the nonlinearity ϕ( u) can be derived as:

Since there is no general definition for sub- and super-Gaussian sources, if we choose

(

(1,1) (-1,1)

)

before the tanh function and can be determined using a switching criterion as:

[ ]

as the elements of N-dimensional diagonal matrix K. After ICA training, we can obtain 33

)

Fig. 2-7 shows the scalp topographies of ICA mixing matrix W-1 corresponding to each ICA component by spreading each wi,j into the plane of the scalp, which provides spatial information about the contribution of each ICA component (brain source) to the EEG channels, e.g., eye activity was projected mainly to frontal sites, and the drowsiness-related potential is on the parietal lobe to occipital lobe, etc. We can observe that the most artifacts and channel noises included in EEG recordings are effectively separated into ICA components 1 and 4 as shown in Fig. 2-7 and the ICA components 5, 11, and 13 may be considered as effective “sources” related to drowsiness in the VR-based driving experiment.

Figure 2-7. Scalp topography of ICA mixing matrix W-1 of 33 ICA components trained by EEG data.