Chapter 3. Methodology
3.1 Original Supervised learning Model
Figure 3-2 The flowchart of generating base sentiment classifier using supervised machine learning
The most basic method for sentiment classification adopting supervised machine learning technique is shown in Figure 3-2. We will use the base sentiment classifier in Feature engineering based and Markov-transition based approaches. Just like Figure 3-2 Shows, from training data, at first, we need to do feature engineering, propose features that are effective for sentiment classification, the final found effective features are
Original
11
called base feature in the following sections, and then we decide appropriate supervised machine learning method to train a base sentiment classifier out.
The features used in classification model are listed in below:
N-gram based feature
The most common used feature in sentiment detection, it usually can perform a nice result in related works ([6][8][9][10][11][22]). So we use the n-gram of the sentence in training data as binary feature. In here, we use unigram and bigram, and the words appearing less than defined frequency in the data are filtered.
Emotion dictionary feature
The emotion words are relevant to the sentiment of the sentence intuitively, so we take the emotion word as feature. There are some famous emotion dictionaries, for Chinese, e.g. NTUSD [2], and for English, e.g. AFINN [3]. In addition to direct affective word, like joy, anger,…,etc., there are some words, they are relevant to arouse the emotion of human, called indirect affective word, like earthquake, tsunami, so we collect natural disaster words from Wikipedia to expand the emotion dictionary. In here, we use the word in the dictionary as binary feature.
Emotion word extracted by variance of PMI
Use the point-wise mutual information, we can extract the words, they are relevant to emotion, in here we use the variance of point-wise mutual information introduced by ([4] [17]) to retrieve top N word of each emotion back as binary feature. The definition of variance of PMI is as follows:
Co(s, w) = C(s, w) × log𝑒 𝑃(𝑠, 𝑤) 𝑃(𝑠)𝑃(𝑤)
where C(s,w) means the count of sentiment and word appearing concurrently, and P(s) means the probability of sentiment appearing in the data, P(w) means the probability of word, finally, P(s,w) means the joint probability of sentiment and
12
word, and normalize to 0~1, get Co'(s,w).
Co′(s, w) = 𝐶𝑜(𝑠, 𝑤) − 𝐶𝑜𝑚𝑖𝑛 𝐶𝑜𝑚𝑎𝑥− 𝐶𝑜𝑚𝑖𝑛
Because we only are care about part of speech, which are relevant to emotion, so we segment the sentence, and use POS tagger [16] to keep only adjective, noun, verb words to the extraction process using PMI.
Generic feature
In addition to the above feature, we use the generic feature like the length of sentence, normalize to 0~1, by divide by 140, the maximum sentence length allowed in Micro-blog system. And the question mark(?) and exclamation mark(!) in punctuation are relevant to emotion, we use the count of times appeared in the sentence as feature. Besides, the post may be a share post, containing a HTTP URL address, we use meta tag URL to replace the pattern, and use it as a binary feature.
Use the above feature, we can train a base sentiment classification model adopting any appropriate supervised machine learning model, e.g. NaiveBayes, SVM, and then it allow us to produce a probability distribution of sentiments for a given post p, denoted as 𝑆𝑝. We will try to find the most effective machine learning model, and then apply it as base sentiment classifier for use in Feature engineering based and Markov-transition based approaches.
Before introducing the three approaches proposed for the three type of information, i.e. temporal context, friendship, response. , we first explain the three factors in detail.
The temporal context factor, it is assumed that the sentiment of a Micro-blog post is correlated with the sentiments of the author’s previous posts (i.e., the ‘context’ of the post); and the friendship factor, we assume that the friends’ emotions are correlated with each other. This is because friends affect each other, and they are more likely to be in the same circumstances, and thus enjoy/suffer similarly. Our hypothesis is that the
13
sentiment of a post and the sentiments of the author’s friends’ recent posts might be correlated; finally, the response factor, we believe the sentiment of a post is highly correlated with (but not necessary similar to) that of responses to the post. For example, an angry post usually triggers angry responses, but a sad post usually solicits supportive responses.
Besides, for friendship factor, how to choose friends whose sentiment of recent posts may be correlated from higher probability to low probability? Generally, friends in the same circumstances should experience the same thing with higher probability, so the posts in the same period may be correlated, and the sentiment contained may be correlated, too. For example, students may publish posts containing unhappy emotion in the interval of final exam, and after final exam is over, they may publish happy posts, so if user and friends are classmates, they are in the same circumstances, they will have similar emotion patterns, in here, we can treat sentiment transition as an emotion pattern, in the example, there is a transition from unhappy to happy. Therefore, we assume that if the emotion pattern is more similar, the sentiment of the friends’ recent posts is more correlated. We regard one’s sentiment transition matrix 𝑀𝑐 as emotion pattern of each user, where the (𝑖, 𝑗)𝑡ℎ element in 𝑀𝑐 is the conditional probability from the sentiment of the previous post to that of the current post, and then we propose to give priority to friends with similar emotional patterns.
To achieve our goal, we first learn every user’s contextual sentiment transition matrix 𝑀𝑐 from the data. In 𝑀𝑐, each row represents a distribution that sums to one; therefore, we can compare two matrixes 𝑀𝑐1 and 𝑀𝑐2 by averaging the symmetric KL-divergence of each row. That is,
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑀1, 𝑀2) = 𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑖=1𝑛 𝐾𝐿�𝑅𝑜𝑤(𝑀1, 𝑖), 𝑅𝑜𝑤(𝑀2, 𝑖)�.
Two persons are considered as having similar emotion pattern if their contextual
14
sentiment transition matrixes are similar. That means the distance is small.
After explanation for the three factors, we continue to talk about approaches using the three factors in the following section.