• 沒有找到結果。

1.1 Background

Micro-blogging is a newly emerging and popular social media in recent years, such as Twitter3, Plurk4, and Jaiku5, they allow users to share immediate but short messages with friends. People use more brief and colloquial expression to share things happened in the life, or discuss about any interesting things, e.g. fashionable product, political event, News happened today.

Generally, there is a lot of information posted in Micro-blog in each day. Some of them reveal personal status, and some of them are for social purpose. We can know the latest new of friend, and also can know the public news that happened in life and is discussed hotly by people. The people in Micro-blog platform ranges from general people to celebrity, like politician, famous singer, scientist, etc., we can observe different opinion coming from different type of people. Not only that, people comes from all over the world, we can know different opinions about the same issue from people of different country.

Besides, we can observe that there are explicit or implicit emotions contained in the posts. For instance, “back from Amsterdam, was a really nice trip” expresses explicit joy emotion in the post, and “done with today's deadline”, we can sense the relief and implicit joy, even without obvious emotion words in the posts, Therefore, if we further analyze for the sentiment involved in, we may know the public sentiment toward a product or the recent feeling of friends. Consequently, Micro-blog is a good source for

3 http://twitter.com/

4 http://www.plurk.com/

5 http://www.jaiku.com/

2

opinion mining and sentiment analysis.

Generally speaking, the sentiment detection is an issue of sentiment analysis.

Sentiment detection is to detect subjective emotion involved in a given text. The text granularity can be document level, sentence level, or phrase level, and belong to different domain, e.g. blog, review, micro-blog, etc., they may have different writing style. Currently, there are two problems in sentiment detection, first, subjectivity classification, distinguishes whether a text is subjective opinion or objective fact, we can retrieve subjective opinions for further sentiment detection. Second, sentiment classification, classifies a given text to a sentiment from a predefined sentiment categories, for example, positive, neutral, negative. In this paper, we mainly focus on exploring the temporal context, social, and response information whether or not they can aid the sentiment classification of post in Micro-blog. Currently, the common research problem in sentiment detection of Micro-blog is to detect positive and negative emotion in a given post, and usually adopts machine learning method to solve this problem, and can achieve good performance.

1.2 Motivation and Purpose

Recently, a lot of sentiment analysis website for Micro-blog is appearing, for example, TwitterSentiment6[6], TweetFeel7. Given a query item, the service will analyze the sentiment about it in the Micro-blog platform. The sentiment detection component is important for such service, but due to the length-limited post of Micro-blog property, only 140 characters can be used for linguistic and textual analysis, besides, people can express emotion in an inconspicuous way, even human cannot

6 http://twittersentiment.appspot.com/

7 http://www.tweetfeel.com/

3

easily judge and categorize the emotion of post. Most of previous works ([4][5][6][8][9][10][11][22][23]) focus on the 140 characters length post, and try to find any useful linguistic feature or effective classification methods to solve this problem. So in addition to follow their research result, we try to find some related information that may be correlated to the post from property of Micro-blog platform, and propose approaches that use the found information in sentiment detection.

First, we observe the property of Micro-blog. Generally, the micro-blogging services possess some signature properties that differentiate them from conventional weblogs and forum. First, micro-blogging is time-traceable. The temporal information is crucial because contextual posts that appear close together are, to some extent, correlated.

Second, the style of micro-blogging posts tends to be conversation-based with a sequence of responses. This phenomenon indicates that the posts and their responses are highly correlated in many aspects. Third, micro-blogging is friendship-influenced. Posts from a particular user can also be viewed by his/her friends and might have an impact on them (e.g. the empathy effect) implicitly or explicitly. Therefore, posts from friends in the same period may be correlated sentiment-wise as well as content-wise.

So in this paper, we focus on the three found information, i.e. the posts comes from temporal context, responses, and friendship, and propose three approaches that can exploit the found information, and verify these approaches whether or not they can aid the sentiment detection of the post. Besides, for diversity of human emotion, we focus on six basic sentiments {anger, surprise, sadness, disgust, fear, joy}, not just use positive and negative.

1.3 Research Statement

This research discusses how the response, context, and friendship information can

4

be exploited to improve the sentiment analysis given short text in Micro-blog posts.

Moreover, previous works ([4][6][8][9][10]) only focus on binary sentiment classification of positive and negative, and we focus on multi-class sentiment classification of the six basic sentiments.

1.4 Methodology Outline

Besides using the current approach for sentiment detection on Micro-blog, focus on feature engineering of the 140 characters and try different machine learning classifier model, we try to find some information that may be helpful. So from the three factors mentioned in the above, i.e. temporal context, social (friendship), responses, we can find some posts of the three factors, and they may be helpful for sentiment detection.

Therefore, we propose three methods (1) Feature engineering based (2) Graphical model based (3) Markov-transition based, they can exploit the above three type of information for sentiment detection, and then we can verify that the three information are helpful if using appropriate approach.

At first, we follow current approach adopting supervised machine learning for sentiment detection in Micro-blog, so the Feature engineering based approach is the most intuitive way, we try to transform the information coming from the posts of the three factors to features, and see the features if helpful or not.

On the other hand, for modeling the sentiment correlation between target post and posts from three factors explicitly, the Graphical model based approach is the intuitive way for this aspect. We adopt probabilistic graphical model, it can uses graph to represent the relationship between variables, nodes in graph represent variables, and edges represent correlation between variables, so we can model the sentiment correlation between posts via graph explicitly. Besides, graphical model is ever used to

5

model the sentiment correlation between sentences in a blog article in previous works ([17]). Finally, the Markov-transition based approach is based on supervised machine learning like Feature engineering based approach, and we can choose any effective supervised learning method which can output probability distribution for given post.

Besides, using the Markov-transition matrix of sentiment can consider the relationship between the three types of posts and target post, just like Graphical model based model can do. Further explanation, at first, we decide a base sentiment classifier, it's replaceable and need to output the sentiment probability distribution for the given post, and then we can use the base classifier to predict sentiment distribution of target post, meanwhile, we can also use sentiment transition matrix learned and posts from three factors to predict sentiment distribution of target post, finally, we merge the two sentiment distributions to output final adjusted sentiment distribution of target post.

1.5 Contributions

The detailed contributions in this paper are listed as follows:

 We study the problem of sentiment detection for six basic sentiments {anger, surprise, sadness, disgust, fear, joy} in Micro-blog, It's natural to study more detailed sentiment further after positive and negative sentiments are explored widely.

 We propose three approaches to exploit temporal context, friendship, and response to improve sentiment classification, i.e. (1) Feature engineering based (2) Graphical model based (3) Markov-transition based. The Markov-transition based approach has the most improvement after exploiting the three type of information.

 The Markov-transition based approach can be applied to the sentiment detection

6

component of Memetube8 system, make it more accurate than original keyword matching approach.

1.6 Paper Organization

This paper is organized as follows. In Section 2: Overview the related works about sentiment detection in Micro-blog platform. In Section 3: introduction the method we proposed that use the temporal context, friendship, response. In Section 4: experiment our model on Plurk data. In Section 5: Conclusion and future work.

相關文件