基於語意框架之讀者情緒偵測研究 - 政大學術集成

全文

(1)國立政治大學資訊科學系 Department of Computer Science National Chengchi University 碩士論文 Master’s Thesis. 基於語意框架之讀者情緒偵測研究 Semantic Frame-based Approach for Reader-Emotion Detection. 研究生：陳聖傑指導教授：許聞廉、劉昭麟. 中華民國一百零五年一月 January 2016.

(2) 基於語意框架之讀者情緒偵測研究 Semantic Frame-based Approach for Reader-Emotion Detection. 研究生：陳聖傑. Student：Cen-Chieh Chen. 指導教授：許聞廉. Advisor：Wen-Lian Hsu Chao-Lin Liu. 劉昭麟. 國立政治大學資訊科學系碩士論文 A Thesis submitted to Department of Computer Science National Chengchi University in partial fulfillment of the requirements for the degree of Master in Computer Science 中華民國一百零五年一月 January 2016 I.

(3) 致謝碩士生涯的最後一哩路，腦海中便開始浮現該如何表達心中的感謝之意。這篇論文的完成，首先我誠摯地感謝我的指導教授許聞廉老師，回首在中研院資訊所 IASL 實驗室的研究生涯，在許老師的厚愛與指導之下，讓我有機會一窺自然語言之美，許老師對學問的嚴謹更是我輩學習的典範。感謝劉昭麟老師在政大的細心指導，從參與史學的研究到國際性會議的承辦，每每都是嶄新的學習與體驗。在往來中研院與政大這兩個同質性卻不同風格的研究生活中，有著很深切的感受。而在研究的細節上我要感謝指導我做研究的張詠淳學長，從 TAAI 的初試啼聲到 PACIS 的成果展現，終於能站上學術權威的 ACL，幾經波折的甘苦也述說著學術路上需要堅定的耐心與毅力。戴敏育教授是我要特別感謝的人，多年前在戴教授的推薦之下進入 IASL 實驗室，也開啟我在學術生涯嶄新的一頁。戴教授總能傳遞著滿滿的正能量，給予後輩努力向前的動力。然而知識不等同於智慧，墊腳石與絆腳石往往只是思考角度的不同，卻反映出人生智慧的累積。戴教授一路上的提攜與鼓勵，不僅是豐富了我在學術生涯的寬度，更提升了處世哲學的高度。再者感謝中研院 IASL 實驗室的同事們，謝謝幾經為我禱告的翠玲、照玲，亦師亦友的永瑜，以及在我研究遭遇瓶頸時給予方向的士弘學長，感謝有你們的陪伴與協助。也感謝政大 MIG 實驗室的建良、孫瑋、柏誠學長的經驗分享，與一起打拼的同儕致凱、書佑與植琨以及後進國峰和張逸。最後我要感謝我的家人以及天上的父親。碩士生涯雖然短暫卻有深層的感受，感謝我摯愛的母親，在父親生病到離別時成為家庭堅強的後盾，讓我能無後顧之憂地在學術上奮力前進，也感謝我敬愛的哥哥，一直都是引領我人生最重要的標竿。最後感謝女友晏菁的相伴，使我成為更好的人，也讓我的生命因為有你而更加美好。陳聖傑中華民國一百零五年一月 II.

(4) 基於語意框架之讀者情緒偵測研究摘要過往對於情緒分析的研究顯少聚焦在讀者情緒，往往著眼於筆者情緒之研究。讀者情緒是指讀者閱讀文章後產生之情緒感受。然而相同一篇文章可能會引起讀者多種情緒反應，甚至產生與筆者迥異之情緒感受，也突顯其讀者情緒分析存在更複雜的問題。本研究之目的在於辨識讀者閱讀文章後之切確情緒，而文件分類的方法能有效地應用於讀者情緒偵測的研究，除了能辨識出正確的讀者情緒之外，並且能保留讀者情緒文件之相關內容。然而，目前的資訊檢索系統仍缺乏對隱含情緒之文件有效的辨識能力，特別是對於讀者情緒的辨識。除此之外，基於機器學習的方法難以讓人類理解，也很難查明辨識失敗的原因，進而無法了解何種文章引發讀者切確的情緒感受。有鑑於此，本研究提出一套基於語意框架(frame-based approach, FBA)之讀者情緒偵測研究的方法，FBA 能模擬人類閱讀文章的方式外，並且可以有效地建構讀者情緒之基礎知識，以形成讀者情緒的知識庫。FBA 具備高自動化抽取語意概念的基礎知識，除了利用語法結構的特徵，我們進一步考量周邊語境和語義關聯，將相似的知識整合成具有鑑別力之語意框架，並且透過序列比對(sequence alignment)的方式進行讀者情緒文件之匹配。經實驗結果顯示證明，本研究方法能有效地運用於讀者情緒偵測之相關研究。. III.

(5) Semantic Frame-Based Approach for Reader-Emotion Detection Abstract Previous studies on emotion classification mainly focus on the writer's emotional state. By contrast, this research emphasizes emotion detection from the readers' perspective.. The. classification of documents into reader-emotion categories can be applied in several ways, and one of the applications is to retain only the documents that cause desired emotions for enabling users to retrieve documents that contain relevant contents and at the same time instill proper emotions.. However, current IR systems lack of ability to discern emotion within. texts, reader-emotion has yet to achieve comparable performance.. Moreover, the pervious. machine learning-based approaches are generally not human understandable, thereby, it is difficult to pinpoint the reason for recognition failures and understand what emotions do articles trigger in their readers. We propose a flexible semantic frame-based approach (FBA) for reader's emotion detection that simulates such process in human perception. FBA is a highly automated process that incorporates various knowledge sources to learn semantic frames that characterize an emotion and is comprehensible for humans from raw text. Generated frames are adopted to predict readers' emotion through an alignment-based matching algorithm that allows a semantic frame to be partially matched through a statistical scoring scheme. Experiment results demonstrate that our approach can effectively detect readers' emotion by exploiting the syntactic structures and semantic associations in the context as well as outperforms currently well-known statistical text classification methods and the stat-of-the-art reader-emotion detection method. IV.

(6) TABLE OF CONTENTS 1 . Introduction ....................................................................................................................... 1 1.1. Background............................................................................................................. 1. 1.2. Text Classification .................................................................................................. 1. 1.3. Problem Definition ................................................................................................. 3. 1.4. Our Goal ................................................................................................................. 4. 1.5. Organization of this Dissertation ............................................................................ 4. 2 . Related Work ..................................................................................................................... 5 . 3 . System Architecture .......................................................................................................... 8 3.1. 4 . 3.1.1. Emotion Keyword (Keyword) .................................................................... 9. 3.1.2. Named Entity Ontology (NEO) ................................................................ 10. 3.1.3. Extended HowNet (E-HowNet) ............................................................... 13. 3.2. Semantic Frame Generation(SFG) ....................................................................... 15. 3.3. Semantic Frame Matching(SFM) ......................................................................... 18. Experiment ...................................................................................................................... 21 4.1. 4.2 5 . Crucial Element Labeling (CEL)............................................................................ 9 . Experiment Setting ............................................................................................... 21 4.1.1. Datasets..................................................................................................... 21. 4.1.2. Comparison Setting and Evaluation Metrics ............................................ 22. Results and Discussion ......................................................................................... 23. Conclusion and Future Work ........................................................................................... 28 . References ............................................................................................................................... 30 . V.

(7) LIST OF FIGURES Figure 1: Architecture of our semantic frame-based emotion detection system.......................10 Figure 2: Object-Oriented Ontology Architecture ………………………………….……......12 Figure 3: Architecture of name entity ontology........................................................................13 Figure 4: Crucial Element Labeling process.............................................................................15 Figure 4: The rank-frequency distribution of semantic classes................................................16. VI.

(8) LIST OF TABLES Table 1: Illustration of a dominating frame and some dominated frames in the emotion “Worried” generated by SFG…………………………………………………………………18 Table 2: The distribution of data corpus ……………………………………………………...22 Table 3.1: Accuracy of emotion detection system……………………………………………25 Table 3.2: Accuracy of emotion detection systems with Frame number.................... .............27 Table 3.3: Accuracy of emotion detection systems with Frame coverage................................28. VII.

(9) 1 Introduction 1.1 Background With the rapid growth of the Internet, information is increasing exponentially on the web. People are difficult to obtain of interest from the vast number of documents.. Moreover,. people can easily share information of daily experiences and their opinion anytime and anywhere. People often express their feelings through writing and reading articles, and with the advancement of technology, articles distributed throughout the Internet have become the most common way for population within a society to share information and emotions. While past researches on emotions mainly focused on detecting the sentiments that the authors of the documents were expressing, it is worth noticing that the readers’ emotions, in some aspects, differs from that of the authors and may be even more complex [36].. For. instance, an infamous politician’s blog entry describing his miserable day may not cause the opposing readers to feel the same way.. Reader-emotion detection has several applications.. One of them is to integrate reader-emotion detection into a web search engine, and thereby enabling users to retrieve documents that contain relevant contents and at the same time instill proper emotions.. Another application is to classify a website’s contents into emotion classes. to allow users to browse the website’s sections by emotion.. Multiple emotions are often. evoked in readers as a response to text stimuli like news article.. 1.2 Text Classification To detect reader-emotion of documents effectively, we model reader-emotion detection as a classification problem. In natural language processing (NLP), an important task is to recognize various linguistic expressions. Text classification [38] [39] is a problem in library. 1.

(10) science, information science and computer science. The task is to assign a document to one or more classes or categories.. The intellectual classification of documents has mostly been. the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification. Documents may be classified according to their subjects or according to other attributes. In the rest of this article only subject classification is considered. Emotion detection from text can be simply envisioned to be a classification problem of a given text according to predefined emotional classes.. Rule-based approaches and machine learning based. approaches are usually addressed to solve classification problem. Many such expressions can be represented as rules or templates. matched by computer to identify those linguistic objects in text.. These templates are. Rule induction algorithms. aim at discovering a description for the target concept in the form of explicit rules formulated in terms of tests for certain values of the attributes. The resulting rule set should be able to correctly recognize instances of the target concept and discriminate them from objects that do not belong to it.. The use of rule sets as knowledge representation also makes them very. competitive in terms of interpretability since the rules can be read easily by human experts. Besides, rule-based can obtain high precision as well as the capability of knowledge accumulation.. But there is an unavoidable trade-off between the amount of time and effort. in creating and maintaining rules and the variety and quality of the output utterances. On the other hand, machine learning has become the sophisticated one in this domain. Machine Learning provided with a set of labeled texts for each category, which is used as the training set, and automatically produces a classifier from them. overall reader-emotion article of the whole text. 2. It attempts to detect the. To manifest topic associated features, one.

(11) often needs to annotate the features in documents, which is rarely done in most machine learning models [37].. In addition, domain knowledge is only needed to assign a label to. each existing text in the training set, which involves much lower workload than writing the rules.. The original purpose of machine learning is to learn text patterns that are expectedly. general enough to be applied to other unseen texts.. However, current machine learning. models applied to natural language processing have encountered various bottlenecks due to knowledge shortage.. These patterns can only achieve a mediocre score. This is especially. obvious when we compare the similarity of two sentences [27].. One can easily find two. sentences that are literally different but convey similar semantic knowledge, which confuse most machine learning models.. 1.3 Problem Definition There are researches has novel applications.. For instance, an enterprise toward. business intelligence that capable of identifying the emotional effect of a document on its readers can provide services to retain only the documents that cause desired emotions, and further enabling users to retrieve documents that contain relevant contents and at the same time instill proper emotions. in the competitive market.. They therefore are able to obtain opportunities and advantages However, current information systems lack of ability to discern. emotion within texts, reader-emotion has yet to achieve comparable performance [15]. Nevertheless, how to make the best out of rule-based and statistical approaches has always been a challenging task. The main advantages of rule or template-based approach are the high precision as well as the capability of knowledge accumulation.. When faced with a new domain, rule-based. systems can be adapted by adding new templates or rules that exploit the missing knowledge. However, only a limited number of cases can be captured by a single rule, and adding more 3.

(12) rules could often create conflicts with old rules. On the other hand, current machine learning models applied to natural language processing have encountered various bottlenecks due to knowledge shortage.. The original purpose of machine learning is to learn text patterns that. are expectedly general enough to be applied to other unseen texts. can only achieve a mediocre score. similarity of two sentences.. However, these patterns. This is especially obvious when we compare the. One can easily find two sentences that are literally different but. convey similar semantic knowledge, which confuse most machine learning models.. 1.4 Our Goal In light of this, we proposed a flexible semantic frame-based approach (FBA) for reader-emotion detection that simulates such process in human perception.. Our method. aims to find out what emotions documents trigger in their readers instead of writer-emotion which is investigated in previous researches.. It differs from existing reader-emotion. detection approaches in a number of aspects.. FBA is a highly automated process that. integrates various knowledge to generate discriminative linguistic patterns for document representation, which can be acknowledged as the essential knowledge for each emotion that is comprehensible for humans.. Furthermore, recognizes reader’s emotion of documents. using an alignment-based algorithm that allows a semantic frame to be partially matched through a statistical scoring scheme.. 1.5 Organization of this Dissertation The remaining dissertation contains 5 sections. In section 2, we talk about research with related work and summarize its contributions.. In section 3, we description and. introduce our system architecture of semantic frame-based approach (FBA).. In section 4, we. show up the statistics of the emotion dataset and comparison in different experiments. section 5, we provide some concluding remarks and consider future research avenues. 4. In.

(13) 2 Related Work Mining opinions and sentiments from natural language is challenging, because it requires a deep understanding of the explicit and implicit, regular and irregular, and syntactical and semantic language rules.. While ultimately research in lexicon building [26] [30], and. classification [16] may be relevant, we focus on work in extracting sub-sentential structure relevant to subjectivity. In [23] address the task of extracting propositional opinions and their holders.. They define an opinion as “a sentence, or part of a sentence that would answer. the question ‘How does X feel about Y?’ ” A propositional opinion is an opinion “localized in the propositional argument” of certain verbs, such as “believe” or “realize”. Their task then corresponds to identifying a DSE, its associated direct source, and the content of the private state.. However, in each sentence, they seek only a single verb with a propositional. argument. Cardie et al. [8] discusses opinion-oriented information extraction.. They aim to create. summary representations of opinions to perform question answering.. Moreover, they. propose to use opinion-oriented “scenario templates” to act as summary representations of the opinions expressed in a document, or a set of documents.. Morinaga et al. [20] compare. reviews of different products in one category to find the reputation of the target product. It does not summarize reviews, and it does not mine product features on which the reviewers have expressed their opinions. Although they do find some frequent phrases indicating reputations, these phrases may not be product features. ambiguity.. There are many sources of emotional. Emotional ambiguity may result from the blending of emotions.. Most previous. work focuses on the writer’s perspective. Pang et al. [4] design an algorithm to determine 5.

(14) whether a document’s author expresses a positive or negative sentiment.. They discover that. using Support Vector Machines (SVM) [5] [7] with word unigram features results in the best performance. Since then, more work has been done to find features better than unigrams. Hu et al. [31] show that word sentiment information is exploited to achieve better classification accuracy. unigrams.. Nerveless, it has been done to search for features better than. In [25] show that high-order n-grams are beneficial if the corpus size is large. enough. Sentiment classification of texts is not restricted to the document level.. In [12]. conducts experiments to learn the subjectivity of adjectives, whereas [21] study sentence sentiments.. Several works considered emoticons from weblogs as categories for text. classification.. Yang et al. [32] proposed a sentence level emotion recognition method using. dialogs as their corpus, in which “Happy, “Unhappy”, or “Neutral” was assigned to each sentence as its emotion category. emotions.. In [32] adopted Thayer’s model to classify music. Each music segment can be classified into four classes of moods. Pang et al. [4]. classified movie reviews into positive and negative emotions.. Wu et al. [11] used emoticons. as tags to train SVM classifiers at the document or sentence level, respectively. In their studies, emoticons were taken as moods or emotion tags, and textual keywords were taken as features. However, writers and readers do not always share the same emotions regarding the same text.. Since the recent increase in the popularity of Internet, certain news websites, such as. Yahoo! News1, incorporate the Web 2.0 technologies that allow readers to express their emotions toward news articles.. 1. Only a few studies in the past deal with the reader aspect of. https://tw.yahoo.com/ 6.

(15) emotion analysis.. For example, Lin et al. [17] classify documents into reader emotion. categories. Classifying emotion from the reader's perspective is a challenging task, and research on this topic is relatively sparse as compared to those considering the writers’ point of view.. Lin et al. [15] deal with the method with eight emotion classes from the readers’. perspectives.. Yang et al. [10] automatically annotated reader emotions on a writer emotion. corpus with a reader emotion classifier, and studied the interactions between writers and readers with the writer-reader emotion corpus. Our Approach differs from existing reader-emotion detection approaches in a number of aspects.. First, we proposed a semantic frame-based approach that mimics the perceptual. behaviour of humans in understanding.. Second, the generated semantic frames can be. represented as the domain knowledge required for detecting reader-emotion.. In addition to. syntactic features, we further consider the surrounding context and semantic associations to efficiently recognize reader-emotion.. Finally, our research differs from other Chinese. researches that rely on word segmentation for pre-processing by utilizing ontology for semantic class labeling. The proposed of our method semantic Frame-Based approach (FBA) consist in two mechanisms which are Frame Generation and Frame Matching.. In order to create linguistic. frame that are more general, we adopt an algorithm to reduce frames where we generated. Moreover, a matching mechanism allowing insertion, deletion, and substitution (IDS) of words and phrases is employed together with a statistical scoring mechanism.. We expected. the results that the semantic frame-based method is effective in Emotion detection. Furthermore, the proposed semantic frame generation and matching mechanism will play the role of exploits the syntactic structures, semantic association, and the content within the text.. 7.

(16) 3 System Architecture Our research presents a flexible semantic frame-based approach (FBA) for detecting the reader-emotion of documents. problem.. We model reader-emotion detection as a classification. Our proposed method is different in that we took advantage of multiple knowledge. sources, and implemented an automatic generation algorithm to generate frames that represent discriminative patterns in documents.. FBA mainly consists of three components, crucial. element labeling (CEL), semantic frame generation (SFG), and semantic frame matching (SFM), as shown in Figure 1. The CEL first uses prior knowledge to mark the semantic classes of words in the corpus. To create linguistic patterns that are more general, we adopt dominating set algorithm to reduce from 75,000 patterns to 350.. Dominating set has been used extensively in network. routing researches, and adopted in NLP related tasks such as text summarization. In addition to syntactic features, we further consider the surrounding context and semantic associations. Thus, the obtained semantic frames can be accumulated and considered as the essential knowledge for reader-emotion. During detection, an article is first labelled by the CEL as well.. Then, the SFM applies. an alignment-based algorithm that utilizes our knowledge base to calculate the similarity between each emotion and the article to determine the main emotion of this article. Furthermore, a matching mechanism allowing insertion, deletion, and substitution (IDS) of words and phrases is employed together with a statistical scoring mechanism. these components will be explained in the following sections.. 8. Details of.

(17) Figure 1. Architecture of our semantic frame-based emotion detection system. 3.1 Crucial Element Labeling (CEL) As Shown in Figure 1, this mechanism first labels word in an article with semantic classes in order to increase the frequency of these classes, and enables us to extract distinctive semantic features of a certain emotion.. These frames form the basis of the reader-emotion. detection mechanism that follows. The CEL exploits three knowledge sources; there are Domain keyword, Named Entity Ontology and Extended HowNet.. The following is. describing each role of component. 3.1.1 Emotion Keyword (Keyword) In light of frequent keywords within an emotion is also considered positive information. A keyword is a word whose frequency is significantly higher (or lower) in a corpus of interest than in a reference corpus. Keywords let us see what words can be considered important 9.

(18) words in a given text.. According to Log-likelihood Ratio (LLR) [28] is a probability. statistic which can compares the frequency of occurrence of words in two corpora. Nerveless, the relative proportions of word occurrences are the same, words with higher absolute frequencies, which are most likely common words.. An effective feature selection. method, to learn a set of reader-emotion specific keywords. Given a training dataset, LLR employs Equation (1) to calculate the likelihood of the assumption that the occurrence of a word w in reader-emotion E is not random.. In (1), E. denotes the set of documents of the reader-emotion in the training dataset; N(E) and N(¬E) are the numbers of on-emotion and off-emotion documents, respectively; and N(w^E) is the number of document on-emotion having w.. The probabilities p(w), p(w|E), and p(w|^E) are. estimated using maximum likelihood estimation. associated with the reader-emotion.. A word with a large LLR value is closely. We rank the words in the training dataset based on their. LLR values and select the top 200 to compile an emotion keyword list..   p(w) N (wE ) (1 p(w)) N (T )N (wE ) p(w) N (wE ) (1 p(w)) N (E )N (wE ) 2 log   N (wE ) (1 p(w | E)) N (E )N (wE ) p(w |E) N (wE ) (1 p(w |E)) N (E )N (wE )   p(w | E). (1). 3.1.2 Named Entity Ontology (NEO) The ontology plays a central role in the development of the Semantic Web of the world. It is a crucial part of knowledge base construction and maintenance that enables us to relate text to ontologies, providing on the one hand a customised ontology related to the data and domain with which we are concerned, and on the other hand a richer ontology which can be used for a variety of semantic web-related tasks such as knowledge management, information retrieval, question answering, semantic desktop applications. 10.

(19) Ontology is the philosophical study of existence. puts things at the centre of this study. but that everything exists.. Object-oriented ontology. Its proponents contend that nothing has special status,. Object-oriented ontology rejects the claims that human. experience rests at the centre of philosophy, and that things can be understood by how they appear to us.. In place of science alone, Object-oriented ontology uses speculation to. characterize how objects exist and interact show in figure 2. Associatio Generalizatio n n. Domain. Aggregation. * Domain Category 1. * Category * Concepts - Concept - Attributes - Operations * Relations - Association - Generalization (is-kind-of) - Aggregation (is-part-of). Concept 1 Attributes 1 Operations 1. Concept 5 Attributes 5 Operations 5. Category 2. Category 3. Concept 2 Attributes 2 Operations 2. … …. …. Concept 3 Attributes 3 Operations 3. Concept 6 Attributes 6 Operations 6. …. …. Category n. Concept 4 Attributes 4 Operations 4. Concept n Attributes n Operations n Concepts Set. Figure 2. Object-Oriented Ontology Architecture Recently, we have acknowledged an increasing interest in utilizing ontologies as artefacts to represent human knowledge and critical components in knowledge management, which can be observed in the Semantic Web, business-to-business applications, and several other application areas. It is commonly accepted that ontology is an explicit specification of a conceptualisation.. 11.

(20) Figure 3. Architecture of name entity ontology In the areas of knowledge representation and reasoning and of conceptual modelling, it has long been recognised that conceptualising a domain is a prerequisite for understanding the domain and processing information about the domain.. As shown in Figure 3 depicts the. architecture of the NE ontology, which includes an emotion layer, a semantic layer, and an instance layer.. There are eight emotions in the emotion layer, namely “Angry”, “Worried”,. “Boring”, “Happy”, “Odd”, “Depressing”, “Informative” and “Warm”.. Moreover, each. semantic class in the semantic layer denotes a general semantic meaning of named entities that can be aggregated from many emotions, including "政治人物 (Politician)", "疾病 (Disease)" and others. The instance layer represents 6,323 named entities extracted from. 12.

(21) documents across eight emotions by the Stanford NER2.. Each named entity is annotated. with multiple semantic concepts by mining Wikipedia Infobox attribute value and WikiPage. Domain experts then refine the annotation by their corresponding semantic classes for the purpose of generalization.. Each instance in the instance layer can connect to multiple. semantic classes according to the generalized relations.. For example, named entity “蘋果. (apple)” can be generalized to “水果 (fruit)” and “3C 產品 (3C product)”.. 3.1.3 Extended HowNet (E-HowNet) E-HowNet [13], a frame-based entity-relation knowledge representation model evolved from HowNet [34], is an on-line lexical knowledge base with structured representation of knowledge and semantic.. It connected approximately 90 thousand words of the CKIP. Chinese Lexical Knowledge Base and HowNet, and included extra frequent words that are specific in Traditional Chinese. It also contains a different formulation of each word to better fit its semantic representation, as well as distinct definition of function and content words.. A total of four basic semantic classes are applied, namely object, act, attribute, and. value.. Furthermore, compared to the HowNet, E-HowNet possesses a layered definition. scheme and complex relationship formulation, and uses simpler concepts to replace sememes as the basic element when defining a more complex concept or relationship. To illustrate, E-HowNet defines “血癌 leukemia” as the following: Simple Definition:{癌症|cancer:position={血液|blood}} Expanded Definition:{disease|疾病:position={BodyFluid|體液:telic={transport|送:patient={gas| 氣:predication={respire|呼吸:patient={~}}},instrument={~}}},qualification={serious|嚴重}}. 2. http://nlp.stanford.edu/ 13.

(22) We can see that the E-HowNet not only contains semantic representation of a word, but also its relations to other words or entities. This enables us to combine or dissect the meaning of words by using its semantic components.. Figure 4. Crucial Element Labeling process The CEL uses clause as the minimum labeling unit.. With the above resources, the CEL. can transform words in the original documents into their corresponding semantic labels. research assigns clause as the unit for semantic labeling.. Our. To illustrate the process of CEL,. consider the clause Cn = “歐巴馬又代表民主黨贏得美國總統選舉 (Obama represents the Democratic Party won the Presidential elections in the U. S)”, as shown in Figure 3. First, “巴拉克•歐巴馬 (Barack Obama)” is found in the keyword dictionary and tagged. Then, NEs like “民主黨 (Republican Party)” and “總統選舉 (Presidential elections)” are recognized and tagged as “ { 政黨 (Party)} ” and “ { 總統選舉 (Presidential elections)}”.. Finally, other terms like “代表 (represents)” and “贏得 (won)” are 14.

(23) labeled with their corresponding E-HowNet senses if they exist.. Finally, we can obtain a. sequence like “[巴拉克•歐巴馬]:{代表}:{政黨}:{得到}:{國家}:{總統選舉} ([Barack Obama]: {represents}:{party}:{got}:{country}:{ Presidential elections }).”. This labeling. process can not only prevent errors caused by Chinese word segmentation, but also group the synonyms together using semantic labels.. It enables us to generate distinctive and prominent. semantic templates in the next stage.. Figure 5. The rank-frequency distribution of semantic classes. 3.2 Semantic Frame Generation (SFG) In our experiment, we observed that many of the labeled semantic classes rarely occurred in the documents and the rank-frequency distribution of semantic classes followed Zipf’s law [6] which is shown as Figure 4. Low frequency semantic classes usually identify semantic that are irrelevant to the emotion.. To generate emotion-relevant semantic frame, the high. frequency semantic classes are used to dominate the tail of distribution. 15.

(24) We adopt the dominating set algorithm to only use the most frequent 20% of the frames to cover the rest.. Since it is an NP-hard problem, we base on it to implement an. approximation using a greedy algorithm.. First of all, we construct a directed graph G = {V,. E}, in which vertices V contains all crucial element sequences {CES1, …, CESm} in each reader-emotion, and edges E represent the dominating relations between sequences. dominates CESy, there is an edge CESx → CESy.. If CESx. The definition of a dominating relation. is as follows. 1) High frequency crucial element sequences were selected for the dominators. 2) Longer sequences dominate shorter ones if their head and tail elements were identical. The intermediate elements could be skipped, as they can be identified as insertions and given scores based on their distribution in this category during the matching process.. Using. dominating set can help us capture the most prominent and representative sequences within a category. Afterwards, the dominating sequences further undergo a selection process that is similar to our keyword selection method mentioned above. Lastly, we retain the frames based on its dominating rate and retain the top 100 from approx. 55,000 crucial element sequences. By doing so, we can reduce the number of frames while keeping the most prominent and distinctive ones, and aid the execution of our matching algorithm. For instance, the dominator [微生物 bacteria]:[國家 country]:[人 human]:[傳播 spread] can dominate [微生物 bacteria]:[國家 country]:[傳播 spread] by skipping one semantic class. The reason is that, during our matching process, those skipped classes can be identified as insertions and given scores based on their statistical distribution in this emotion. Table 1 illustrates a dominating frame in the emotion “Worried” generated by SFG.. All the semantic. classes in the dominator are more frequent than other semantic classes. With this dominator, if the sentence can be matched, we assume that the reader feels worried.. 16. Using dominating.

(25) set to find frequent patterns on semantic graphs could help us capture the most prominent and representative frames within an emotion.. Table 1. Illustration of a dominating frame and some dominated frames in the emotion “Worried” generated by SFG. In our system, a reader-emotion is represented by a set of semantic frames that consists of crucial elements and keywords.. The semantic frame generation (SFG) process aims at. automatically generate N representative frames from sequences of crucial elements in the documents.. Those representative (or dominating) frames can be used as background. knowledge for each reader-emotion when recognizing documents, and, more importantly, can be understood by humans. To illustrate, consider the emotion “Happy” and one of the automatically generated semantic frames “ { 運動員 (player)}:{ 得到 (get)}: [ 冠軍 (championship)].” We can think of various semantically similar sentences that were covered by this semantic frame, e.g., “柯瑞帶領勇士贏得了 NBA 總冠軍賽 (Stephon Curry led the Warrior to win the NBA championship)” or “費德勒擊敗納達爾獲得溫普敦冠軍. 17.

(26) (Roger Federer to conquer Miguel Nadal won the Wimbledon championship)”.. This sort. of interpretable knowledge cannot be easily obtained in ordinary machine learning models.. 3.3 Semantic Frame Matching (SFM) FBA uses an alignment algorithm to measure the similarity of frames since alignment enables a single frame to match multiple semantically similar expressions with appropriate scores.. The CEL first uses prior knowledge to mark the semantic classes of words in the. corpus.. Then the SFG collects frequently co-occurring elements, and generates frames for. each emotion.. These frames are stored in the emotion-dependent knowledge base to provide. domain-specific knowledge for our emotion detection. During the detection process, an article is first labeled by the CEL as well. Subsequently, the SFM applies an alignment-based algorithm that utilizes our knowledge base to calculate the similarity between each emotion and the article to determine the main emotion of this article.. We believe the human perception of an emotion is obtained through. recognizing important events or semantic contents to rapidly evoke their emotion.. For. instance, when an article contains strongly correlated words such as “Japan (country)”, “Earthquake (disaster)” and “Tsunami (disaster)” simultaneously, it is natural to conclude that the article has a much higher chance of eliciting emotions like depressed and worried rather than happy and warm. FBA uses an alignment algorithm to measure the similarity of frames, since alignment enables a single frame to match multiple semantically similar expressions with appropriate scores.. During matching, a document is first labeled with crucial elements.. The alignment-based algorithm is applied to determine to what degree a semantic frame fits in a document.. For each clause within a given document dj, we first label crucial. elements cs ={ce1 ,…, cen}, followed by the matching procedure that compares all sequences of crucial element in dj to all the semantic frames SF = {sf1 ,…, sfj} in each emotion, and 18.

(27) calculates the sum of scores for each emotion. during semantic frame alignment.. We consider them as the scoring criteria. The emotion ei, with the highest sum of scores defined in. (2) is considered as the winner.. Emotion  arg max ei E.  ( sf , cs. sf n SFci ,csm CS d j. n. m. )    ( sf n  fek , csm  cel ) k. l. (2). Where fek and cel represent the kth frame element of sfn and lth crucial element of csm, respectively.. As for scoring of the matched and unmatched components in semantic frames. as follows. If sfn·fek and csm·cel are identical, we add a matched score (MS) obtained from LLR value of cel if it matches a keyword.. Otherwise, the score is determined by multiplying. the frequency of the crucial element in emotion ei by a normalizing factor. as in (3).. On the contrary, if an element is not matched, the score of insertion or deletion is calculated. An insertion (IS), defined as (4), can be accounted for by the inversed entropy of this crucial element, which represents the uniqueness or generality of this element among emotions. And a deletion (DS), defined as (5), is computed from the log frequency of this crucial element in this emotion.. The detailed algorithm is described in Algorithm 1.. .  LLRcel , if it matches a keyword   f MS (cel )   cel , otherwise m  fcei   i 1 . 19. (3).

(28) IS (cel ) . 1 m.   P(celei )  log 2 ( P(celei )) i 1. DS (cel )  log. (4) . f cel m.  i 1. (5). f cei. Algorithm 1: Semantic Frame Matching Input: A semantic frame sf = {fe1, …, fem}, fe: frame elements; A sequence of crucial elements from a clause cs = {ce1, …, cen} Output: Matching scoreσbetween sf and cs BEGIN 1: pos ← 0; σ← 0; 2: FOR i = 1 to m DO 3:. isMatched ← false;. 4:. pos ← current matched position in CE;. 5:. IF found cej EQUAL TO fei after pos THEN. 6:. σ←σ+ MatchedScore(cej);. 7:. isMatched ← true;. 8:. END IF. 9:. IF isMatched != true THEN. 10:. σ←σ- DeletionScore (fei);. 11:. σ←σ- InsertionScore (cej);. 12:. END IF. 13: END FOR 14: Outputσ END. 20.

(29) 4 Experiment 4.1 Experiment Setting 4.1.1. Datasets. To the best of our knowledge, there is no publicly corpus for reader-emotion detection. Therefore, we compiled a data corpus for the performance evaluations.. The data corpus. contains news articles spanning a period from 2012 to 2014 collected from Yahoo Chinese News3.. It is an independent common resource for performance evaluation among reader’s. emotion researches (e.g. [15]), since it has a special feature which allows a reader of a news article to choose from eight emotions to represent how one feels after reading a news article, i.e., “Angry”, “Worried”, “Boring”, “Happy”, “Odd”, “Depressing”, “Warm”, and “Informative”.. To ensure the quality of the corpus, only articles with a clear statistical. distinction between the highest vote of emotion and others determined by t-test with a 95% confidence level were retained.. Finally, a total of 47,285 articles were retained from the. original 68,026 articles, and they were divided into the training set consisting of 11,681 articles and the testing set consisting of 35,604 articles, respectively.. Angry. Worried. Boring. Happy. Odd. Depressing. Warm. Informative. #Training. 2,001. 261. 1,473. 2,001. 1,536. 1,573. 835. 2,001. #Test. 4,326. 261. 1,473. 7,334. 1,526. 1,573. 835. 18,266. #Total. 6,327. 522. 2,946. 9,345. 3,062. 3,146. 1,670. 20,267. Table 2. The distribution of data corpus. 3. https://tw.news.yahoo.com/ 21.

(30) 4.1.2. Comparison Setting and Evaluation Metrics. This section provides here a comprehensive performance evaluation of the FBA with other methods.. The first is an emotion keyword-based model which is trained by SVM to. demonstrate the effect of our keyword extraction approach (denoted as KW-SVM).. Another. is a probabilistic graphical model which uses the LDA model as document representation to train an SVM to classify the documents as either emotion relevant or irrelevant [35] (denoted as LDA-SVM).. The last is a state-of-the-art reader-emotion recognition method combines. various feature sets including bigrams, words, metadata, and emotion categories words from [33] (denoted as CF-SVM).. For comparison purposes, we also include results of Naive. Bayes [36] as a baseline (denoted as NB).. Details of the implementations of these methods. are as follows. We employed CKIP 4 for Chinese word segmentation.. The dictionary required by. Naïve Bayes and LDA-SVM is constructed by removing stop words according to a Chinese stop word list provided by [34], and retaining tokens that make up 90% of the accumulated frequency.. In other words, the dictionary can cover up to 90% of the tokens in the corpus.. As for unseen events, we use Laplace smoothing in Naïve Bayes and an LDA toolkit5 is used to perform the detection of LDA-SVM. As for the CF-SVM, the words output by the segmentation tool were used.. The information related to news reporter, news category,. location of the news event, time (hour of publication) and news agency are used as the metadata features.. The extracted emotion keywords are used for the emotion categories. words, since it is unreleased as well as the emotion categories of Yahoo! Blog6 is no provided.. 4. http://ckipsvr.iis.sinca.edu.tw. 5. http://nlp.stanford.edu/software/tmt/tmt-0.4. 6. http //www.yahoo.com.tw 22.

(31) To evaluate the effectiveness of these systems, we adopted the accuracy measures used by [33]. Moreover, we used macro-average and micro-average to compute the average performance.. These measures are defined based on a contingency table of predictions for a. target emotion Ek. The accuracy A (Ek), macro-average AM, and micro-average Aμ are defined as follows:. A(Ek ) . TP(Ek )  TN (Ek ) TP(Ek )  FP(Ek )  TN (Ek )  FN (Ek ). (6). . 1 m A   A(Ek ) m k1 M. (7). . A .  . m k1. m k1. TP(Ek )  TN (Ek ). (TP(Ek )  FP(Ek )  TN (Ek )  FN (Ek )). . (8). Where TP(Ek) is the set of test documents correctly classified to the emotion Ek, FP(Ek) is the set of test documents incorrectly classified to the emotion, FN(Ek) is the set of test documents wrongly rejected, and TN(Ek) is the set of test documents correctly rejected.. 4.2 Results and Discussion The performances of emotion detection systems are listed in Table 2.. First of all, the. Naïve Bayes classifier which is a keyword statistics-based system can only accomplish a mediocre performance.. Since it considers only surface word weightings, it is difficult to. 23.

(32) represent between-word relations. The overall accuracy of the Naïve Bayes classifier is 36.84%, with some emotions like “Warm” having only around 20% correct. On the contrary, the LDA-SVM include both keyword and long-distance relations, which greatly outperforms the Naïve Bayes, with an overall accuracy of 76.1%.. It even achieves the highest accuracy. of 92.83% and 85.40 in the emotion “Worried” and “Odd”, respectively, among all five methods.. As we can see, the KW-SVM can bring about substantial proficiency in detecting. the emotions, with 77.70% overall accuracy. This indicates that using only the LLR scores of keywords can effectively recognize readers' emotion.. The reason is that to calculate the. likelihood that the occurrence of a word in a certain emotion is not random.. Those with a. larger LLR value are considered as closely associated with the emotion.. The FBA can. further improve the basic keyword-based method with rich context and semantic information, thus achieving the best overall accuracy of 84.65%. Topic. Accuracy (%) NB. LDA-SVM. KW-SVM. CF-SVM. FBA. Angry. 47.00. 74.21. 79.21. 83.71. 87.83. Worried. 69.56. 92.83. 81.96. 87.50. 75.80. Boring. 75.67. 76.21. 84.34. 87.52. 90.52. Happy. 37.90. 67.59. 80.97. 86.27. 88.94. Odd. 73.90. 85.40. 77.05. 84.25. 83.34. Depressing. 73.76. 81.43. 85.00. 87.70. 92.15. Warm. 15.09. 87.09. 79.59. 85.83. 91.91. Informative. 20.60. 44.02. 74.74. 83.59. 80.92. AM. 51.69. 76.10. 80.36. 85.80. 86.43. Aμ. 34.52. 58.68. 77.68. 84.61. 84.63. Table 3.1 Accuracy of emotion detection systems. It is worth to mention that the CF-SVM achieves excellent performance with about 80% 24.

(33) accuracy among all emotions.. This is because the combined lexicon feature sets (i.e.. character bigrams, word dictionary, and emotion keywords) of CF-SVM has certain influence on the classification accuracy. reader-emotion.. In addition, the metadata of the articles are associated with. For instance, we found that many sports related news articles evoke. “Happy” emotion. In particular, 45% of all “Happy” instances belong to the news category sports. It is also observed that an instance with the news category sports has 31% chance of having the true class “Happy”. So, the high accuracy of “Happy” emotion can be a result of people’s general enthusiasm over sports rather than a result of a particular event. On top of that, the FBA can generate distinct semantic frames to capture alternations of similar combinations to achieve the most satisfactory outcome. For instance, a semantic frame generated by our system, “{國家country}:[發生 occur]:[ 地震 earthquake]:{ 劫難 disaster} ” , belongs to the emotion "Depressing". It is perceivable that this frame is relaying information about disastrous earthquakes that occurred in a certain country, and such news often makes readers depressed.. This example. demonstrates that the automatically generated semantic frame are comprehensible for humans and can be utilized to effectively detect reader’s emotion.. Nevertheless, our system could. not surpass the LDA-SVM in the emotion “Worried”. It may be attributed to the fact that semantic frames generated in this emotion have inadequate quality.. We examined some of. the frames within this emotion and found that they mostly contain very general semantic classes, such as “{機構institution}:{組織organization}:{政黨party}:{實現realize}:{程度 degree}:{ 念頭 thought} ” , thereby reducing its accuracy.. Despite the “ Worried ”. emotion, we were able to identify distinctive semantic frames for the other emotions.. For. instance, the frame “[婦女women]:{救助help}:[小孩child]:{當作treat}:{人human}:[認為 consider]” was generated for the emotion “Warm”, and it is understandable that news 25.

(34) about a woman helping a child would invoke a warm feeling in the readers’ heart.. The. ability to generate such emotion-specific frames is considered as the main reason for FBA to outperform other systems. Futhermore, to impose our frame was generated by SFG is discriminating to constitute knowledge base of emotion.. We also select tow kind of features: frame number and frame. coverage, the frame number was generated by SFG with Top N and frame coverage was considered with total coverage percentage. number.. Table 3.1 show the result with feature of frame. The accuracy is stable from Top 10 to Top 100 with each emotion.. hand, the feature of frame coverage is not good as frame number.. On the other. We examined some of the. frames selected by frame coverage within this emotion and found that they mostly contain very general semantic classes, such as previous show as the frame of “Worried” emotion.. Top 10 Top 20 Top 30 Top 40 Top 50 Top 60 Top 70 Top 80 Top 90 Top 100. angry 85.28% 87.00% 87.76% 87.82% 87.79% 87.79% 87.79% 87.84% 83.83% 87.83%. boring 90.10% 91.11% 89.70% 89.22% 91.47% 89.81% 91.47% 91.28% 90.29% 90.52%. depressing 82.73% 88.21% 91.14% 90.99% 92.78% 91.64% 92.78% 92.27% 92.00% 92.15%. happy 81.68% 84.57% 81.75% 83.84% 88.96% 85.90% 88.96% 89.36% 88.82% 88.94%. informative 75.24% 77.83% 75.60% 77.13% 81.07% 78.55% 81.07% 81.23% 80.81% 80.92%. odd 75.99% 74.56% 75.61% 77.81% 83.40% 81.19% 83.40% 83.63% 83.22% 83.34%. warm 74.14% 75.37% 78.45% 78.55% 80.41% 80.55% 80.41% 84.34% 82.51% 91.91%. worried 76.71% 77.72% 79.88% 78.86% 76.50% 77.39% 76.50% 75.43% 75.79% 75.80%. AM. Aμ. 80.23% 82.05% 82.49% 83.03% 85.30% 84.10% 85.30% 85.67% 84.66% 86.43%. 78.75% 81.14% 79.71% 81.00% 84.51% 82.38% 84.51% 84.75% 83.82% 84.63%. Table 3.2 Accuracy of emotion detection systems with Frame number. 26.

(35) Coverage10 Coverage20 Coverage30 Coverage40 Coverage50 Coverage60 Coverage70 Coverage80 Coverage90 Coverage100. angry 87.82% 87.81% 83.83% 87.84% 87.84% 87.83% 87.83% 87.84% 87.84% 87.84%. boring 89.22% 89.41% 90.29% 67.81% 91.12% 57.70% 58.26% 54.00% 54.38% 47.90%. depressing 90.99% 90.71% 92.00% 76.28% 92.66% 70.22% 70.59% 95.35% 95.53% 63.79%. happy 83.84% 77.54% 88.82% 75.03% 89.33% 72.15% 72.23% 86.80% 87.09% 67.59%. informative 77.13% 71.96% 80.81% 69.94% 81.21% 67.84% 67.88% 49.53% 49.67% 63.93%. odd 77.81% 74.14% 83.22% 73.59% 83.62% 72.28% 72.44% 72.51% 72.57% 69.19%. warm 78.55% 76.98% 82.51% 74.17% 84.56% 73.76% 73.67% 86.77% 85.93% 73.08%. worried 78.86% 79.76% 75.79% 77.29% 75.48% 76.92% 76.84% 92.79% 92.29% 76.30%. AM. Aμ. 83.03% 81.04% 84.66% 75.24% 85.73% 72.34% 72.47% 78.20% 78.16% 68.70%. 81.00% 76.86% 83.82% 73.67% 84.75% 71.24% 71.32% 66.26% 66.39% 67.45%. Table 3.3 Accuracy of emotion detection systems with Frame Coverage To summarize, the proposed FBA integrates the syntactic, semantic, and context information in text to identify reader-emotions and achieves the best performance among the compared methods.. It also demonstrates the capabilities of our approach to integrate. statistical and knowledge-based models.. Notable, the generated frames can be. acknowledged as the fundamental knowledge for each emotion as well as comprehensible to human instead of models used by previous machine learning-based methods are generally not human understandable.. 27.

(36) 5 Conclusion and Future Work With the rapid growth of computer mediated communication applications, the research on emotion classification has been attracting more and more attentions recently from enterprises toward business intelligence.. Recognizing reader-emotion concerns the emotion. expressed by a reader after reading the text, it has potential applications that differ from those of writer-emotion analyses.. For instance, users are able to retrieve documents that contain. relevant contents and at the same time produce desired feelings by integrating readers’ emotion into information retrieval. The contribition of this research is that we proposed a flexible frame-based approach (FBA) for detecting reader’s emotion that simulates the process of human perception.. By. utilizing knowledge sources, keywords, syntactic and semantic structures, our fully automatic frame generation process can obtain distinctive patterns that may trigger various emotions. FBA successful to combine the advantage of rule base and machine learning that allows us to effectively recognize the reader-emotion of text.. Our experiment results demonstrate that. the FBA can achieve a higher performance than other well-known methods of reader-emotion detection. Results showed that this framework can effectively detect the reader-emotion of documents, as well as assist the user in constructing background knowledge of each reader-emtion in order to better understand the essence of them. In the future, we will expand the ontology to improve the effect of crucial element labeling and semantic frame generation.. Moreover, we will reduce the human effort and. rapidly broaden the coverage of the knowledge ontology through automatic construction. 28.

(37) Furthermore, we will also modularize the semantic labeling mechanism for the ease of use in other researches.. 29.

(38) References [1] A. Bechara and A. R. Damasio, “The somatic marker hypothesis: A neural theory of economic decision”, Games and Economic Behavior, 52, 2, pp. 336-372, 2005. [2] B. Pang, & L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts”. Proceedings of the ACL, Barcelona, Spain, Main Volume, pp. 271–278, 2004. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation”. Journal of Machine Learning Research, 3, pp. 993-1022, 2003. [4] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” Association for Computational Linguistics. 2002. [5] C. Cartes and V. Vapnik, “Support-Vector Networks,” Machine Learning, 20, pp. 273-297, 1995. [6] C. D. Manning and H. Schutze, “Foundations of statistical natural language processing,” volume 999. MIT Press, 1999. [7] C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery. 2(2), pp. 1-47, 1998. [8] C. Cardie, J. Wiebe, T. Wilson and D. Litman, “Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering,” AAAI Spring Symposium on New Directions in Question Answering, 2003. 30.

(39) [9] W. Chai and B. Vercoe, “Folk music classification using hidden Markov models,” In Proceedings of the international conference on artificial intelligence, 2001. [10] C. H. Yang, K. H. Y. Lin, and H. H. Chen, “Writer meets reader: Emotion analysis of social media from both the writer’s and reader’s perspectives,” In Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Vol. 01, pp. 287– 290, 2009. [11] C. H. Wu, Z. J. Chuang, and Y. C. Lin, “Emotion recognition from text using semantic labels and separable mixture models,” ACM Transactions on Asian Language Information Processing (TALIP), vol. 5, issue 2, pp. 165-183, 2006. [12] J. M. Wiebe, “Learning Subjective Adjectives from Corpora,” In Proceedings of 17th Conference of the American Association for Artificial Intelligence, pp. 735-740. AAAI, 2000. [13] K. Chen, S. Huang, Y. Shih, and Y. Chen, "Multi-level Definitions and Complex Relations in Extended-HowNet," Workshop on Chinese Lexical Semantics, 2004. [14] H. Kanayama and T. Nasukawa, “Fully Automatic Lexicon Expansion for Domain-Oriented Sentiment Analysis,” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2006. [15] K. H. Y. Lin, C. H. Yang, and H. H. Chen. “What emotions do news articles trigger in their readers?,” In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 733–734. ACM, 2006. [16] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews,” WWW2003, 2003.. 31.

(40) [17] K. H. Lin, C. H. Yang, and H. H. Chen, “Emotion Classification of Online News Articles from the Reader’s Perspective,” In Proceedings of International Conference on Web Intelligence, pp. 220- 226. 2008. [18] A. McCallum and, K. Nigam, “A comparison of event models for Naïve Bayes text classification,” In Proceedings of AAAI/ICML-98 Workshop on Learning for Text Categorization, 41-48. 1998. [19] M. R. Garey and D. S. Johnson, “Computers and intractability: A Guide to the Theory of NPCompleteness”. Freeman San Francisco. 1979. [20] S. Morinaga, K. Yamanishi., K. Tateishi, and T. Fukushima, “Mining Product Reputations on the Web”. KDD’02. 2002. [21] S. M. Kim and E. Hovy, “Determining the Sentiment of Opinions,” In Proc. of 20th International Conference on Computational Linguistics. ACL, Geneva, CH, 2004 [22] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of molecular biology, 48(3):443–453. 1970. [23] S. Bethard, H. Yu, A. Thornton, V. Hatzivassiloglou, and D. Jurafsky, “Automatic extraction of opinion propositions and their holders,” In Working Notes of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications, 2004. [24] S. Guha and S. Khuller, “Approximation algorithms for connected dominating sets,” Algorithmica, 20(4):374–387, 1998.. 32.

(41) [25] T. Mullen and N. Collier. “Sentiment Analysis Using Support Vector Machines with Diverse Information Sources,” In Proceeding of Conference on Empirical Methods in Natural Language Processing. ACL, Barcelon, ES, 2004. [26] V. Hatzivassiloglou and K. R. McKeown. “Predicting the semantic orientation of adjectives,” In ACL97, 1997. [27] W. L. Hsu, Y. S. Chen, and Y. K. Wang, “A context sensitive model for concept understanding,” In Proceeding of 3rd International Conference on Information Theoretic Approaches to Logic, Language, and Computation, 1998. [28] S. S. Wilks, “The Likelihood Test of Independence in Contingency Tables,” Ann. Math. Statist. 6, no. 4, 190--196. doi:10.1214/aoms/1177732564. 1935. [29] Y. Hu, J. Duan, X. Chen, B. Pei, and R. Lu. “A New Method for Sentiment Classification in Text Retrieval,” In Proceeding of 2nd International Joint Conference on Natural Language Processing, 1-9. Jeju Island, KR, 2005 [30] C. H. Yang, K. H. Y. Lin and H. H. Chen, “Building emotion lexicon from Weblog corpora,” In Proceedings of 45th Annual Meeting of Association for Computational Linguistics, poster, 2007. [31] Y. Hu, J. Duan, X. Chen, B. Pei, and R. Lu, “A New Method for Sentiment Classification in Text Retrieval,” In Proceedings of 2nd International Joint Conference on Natural Language Processing, 1-9. Jeju Island, KR. 2005. [32] Y. H. Yang, C. C. Liu, and H. H. Chen, “Music emotion classification: A fuzzy approach,” In Proceedings of the 14th Annual ACM International Conference on Multimedia, MULTIMEDIA, pages 81–84, 2006. 33.

(42) [33] Z. Kovecses, “Language and emotion concepts. In Metaphor and Emotion: Language, Culture, and Body in Human Feeling,” Cambridge: Cambridge University Press, 2003. [34] Z. D. Dong, Q. Dong, and C. L. Hao. “Hownet and its computation of meaning,” In Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations, pages 53–56. Association for Computational Linguistics, 2010. [35] F. Zou, F. L. Wang, X. Deng, S. Han, and L. S. Wang, “Automatic construction of Chinese stop word list,” In Proceedings of the 5th WSEAS International Conference on Applied Computer Science, pp. 1010-1015, 2006. [36] Y. J. Tang and H. H. Chen. “Mining sentiment words from microblogs for predicting writer-reader emotion transition,” In LREC, pp. 1226–1229, 2012. [37] S. Scott and S. Matwin, “Feature engineering for text classification,” In Proceeding of the 16th International Conference on Machine Learning, pp. 379-388, 1999. [38] B. Mirkin, “Mathematical Classifcation and Clustering”. Kluwer, 1996. [39] Y. M. Yang, “An evaluation of statistical approaches to text categorization,” Journal of Information Retrieval, 1999.. 34.

(43)