• 沒有找到結果。

Output Results

2.4 Pattern Classification

Tools of machine learning could be applied to discriminate the emotional states

by the physiological signals. After daily and personal correction, we used the estima-tor Xijkl = Zijkl− ¯Zijk+ ¯Zias our attribute for pattern classification of the emotional state Yjkl, which represented the emotional state in jth subject, on the kth day, and for the lth sample. We let the variable Y represent the emotional status and the variable Xi represent the value of ith feature after removing the daily and personal correction. Six selected classifiers were tested for their performance and accuracy us-ing the method of leave-one-out cross-validation. All of these six classification meth-ods were performed by the software Weka (http://www.cs.waikato.ac.nz/ml/weka), and all of the classifiers used the default option in Weka. Further investigation of other options for classifiers in Weka could be studied in the future. The methods of classifiers were described as below.

2.4.1 Bayesian Network

A Bayesian network, also called Bayes nets, is a directed acyclic graph (DAG) which consists of two components. The first component G comprises vertices cor-responding to a set of variables V = {V1, V2, ..., VN} and a set of directed edges between variables with the Markov properties. The second component θ is attached the potential table P (Vi|UVi), for each variable Vi in V with the corresponding par-ents nodes UVi (Pearl, 1988; Jensen, 2001). Given the structure G and the parameter θ , the joint probability distribution can be written as Eq. (2.3):

P (V ) =

YN

i=1

P (Vi|UVi). (2.3)

For the purpose of learning take place in a Bayesian networks, we have to reconstruct the network structure and the field values. In this study, we apply the hill climbing algorithm and simple estimator to reconstruct the network and estimate the parameters. After getting the network structure, we used junction tree methods which can convert our DAG to a tree by clustering variables (Lauritzen and Spiegelhalt, 1988). Then an efficient algorithm using belief propagation can be applied for our inference. In our study, we would use the estimator X1, X2, ..., XI

Y

X1 X2 Xn

P(X1|Y) P(X2|Y) P(Xn|Y)

Figure 2.5: The network structure of the naive Bayesian classifier.

and Y as the prediction variables V = {V1, V2, ..., VI+1} and calculate the conditional distribution of Y given the observation X1, X2, ..., XI in the constructed Bayesian network structure.

2.4.2 Naive Bayesian

A naive Bayesian classifier is a simple approach based on the Bayes’ theorem.

The network structure is illustrated in Figure 2.5. There are two assumptions in the naive Bayesian classifier as follows (John and Langley, 1995). (i) Given the class attribute (Y ), the predictive attributes (X1, X2, ..., XI) are independent. (ii) There were no other attributes affecting the prediction process. By the Bayes’ theorem,

P (Y = y|X = x) = P (Y = y)P (X = x|Y = y)

P (X = x) . (2.4)

We can predict the class attribute by finding y that maximizes P (Y = y|X = x) in Eq. (2.4) given the predictive attributes x. As the predictive attributes (X1, X2, ..., XI) are assumed to be conditionally independent, we have

P (X = x|Y = y) =

YI

i=1

P (Xi = xi|Y = y). (2.5)

For the numeric attributes, we would assume that Xi is distributed as N(µiy, σiy2) given the class Y = y for every i = 1, 2, ..., I. Hence, we can estimate the parameters by the maximum likelihood estimates for each class.

2.4.3 Support Vector Machine

Support vector machine (SVM) (Vapnik, 1998) is a popular classification method used by a lot of research currently being conducted in the field of emotion recognition (Kim et al., 2004; Chuang and Shih, 2006). Suppose {(x1, y1), (x2, y2), ..., (xn, yn)}

is the training set, where yi is 1 or -1, denoting whether xi belongs to one of two classes. In SVM, it is aimed to minimize the cost function 12wTw + CPni=1ξi under the constraints yi(wTxi + b) ≥ 1 − ξi for i = 1, 2, ..., n. By using the Lagrange multiplier method, the original problem can be transformed as optimizing α0is in Eq. (2.6).

After obtaining αi, we can apply the following decision function for prediction using the new predictive attribute of xnew : f (xnew) = sign(Pni=1yiαiK(xnew, xi) + b), where K() is the kernel function. In this study, we use the Gaussian kernel and the sequential minimal optimization (SMO) algorithm (Keerthi et al., 2001).

Besides, because our case has multiple classes (three emotional statuses), we used the approach of pairwise classification by the one-against-one approach in the SVM classification method.

2.4.4 Decision Tree of C4.5

Decision tree is also a common method used in classification (Hunt et al., 1966).

C4.5 is a hierarchical data structure using the divide-and-conquer strategy to grow-ing decision trees (Quinlan, 1993). In decision trees, each decision node usgrow-ing a test

function to partition original data D into subsets D1, D2, . . . , Dn. Suppose the set D consists of C numbers of classes and p(D, j) denotes the proportion of cases in D that belongs to the jth class. We can define the information gain by a test T with m outcomes as Eq. (2.7): there is one case left in each subset Di. The split information is defined as Eq. (2.8):

Split(D, T ) = −

For every possible test, the ratio of its information gain over its split information is assessed and the test with maximum gain ratio is selected.

2.4.5 Logistic Model

Logistic regression is a classical method to model category data for classification (Le Cessie and Van Houwelingen, 1992). Suppose there are n samples with c classes and I attributes. The parameter matrix B is calculated as an I × (c − 1) matrix.

The probability that the ith sample, given the value of xi, in the jth class but not in the last cth class is shown in Eq. (2.9).

Pj(xi) = exp(xiBj)

Pc−1

k=1exp(xiBk) + 1, where j = 1, 2, ..., c − 1. (2.9) The probability that the ith sample, given the value of xi, in the last cth class is shown in Eq. (2.10). The log-likelihood l of the data (K, X) under this model is shown in Eq. (2.11).

l(β) =

The indicator variable Kij = 1 if the ithsample belongs to the jthclass, where j 6= c.

Otherwise, Kij = 0 if the ith sample belongs to the last cth class. The parameter matrix B can be estimated by the maximize likelihood estimates of the likelihood function, l(β).

2.4.6 K-Nearest Neighbor (KNN)

The k-nearest neighbor (KNN) algorithm is one of the classical classification methods that have wide applications (Aha et al., 1991). KNN compares the similar-ity between testing data and every training data. Then it uses the top k similarsimilar-ity categories of training data to decide the category of the testing data by a weighted vote. For any testing data of H and training data of {G1, G2, ..., Gn}, we would classify the category of H as Eq. (2.12).

C(H) = arg max

m

X

Gi∈S

Sim(H, Gi)I(Gi, Cm). (2.12)

The notation of Sim(H, Gi) is the similarity measure of H and Gi. The set S = { ˜G1, ˜G2, . . . , ˜Gk} is the data set closed to the testing point H, and the notation of I(Gi, Cm) ∈ {0, 1} indicates whether Gi belongs to Cm. If there are tie cases in the classification, we will use the group with a minimal index as the corresponding category of testing data. In this study, we would use the Euclidean distance as the similarity measure and choose the number of nearest neighbors k=3.

Chapter 3

Data Collection and Analysis on

相關文件