This study explored three challenging topics in natural language processing and linguistics—namely emotion modeling, advertising legality identification and irony analysis—based on microtext materials. The intent of the microtext author and/or readers hidden behind a short text were discovered and analyzed.
In this study, emotion generation models and transition models are proposed. Both writers’ and readers’ perspectives on emotion modeling are presented with regard to emotion recognition in microblog posts. Since the Plurk microblogging platform is also a social networking system, social relation, user behavior and relevance degree features were used along with textual features to build classifiers. As a result, the former two types of features proved useful to achieving better emotion-recognition performance.
This study also shows that predicting emotion from readers’ perspective is more challenging than from writers’ perspective.
The results of the emotion analysis suggest that a reader model should be treated differently from a writer model. The same bigram or word can have different effects on writers’ and readers’ emotional expression. For example, greetings can cause a positive reader response even if the writer uses a negative emoticon. These findings suggest that reader emotions need to be further studied in the future.
As for the advertising legality recognition experiments, three subtopics were covered: illegal advertising statement recognition, illegal advertising verb phrase mining and the construction of an automatic false online advertisement recognition system. To identify the legality of a short advertising statements, log relative frequency ratio was found to be useful when used as weight of textual features. Log relative frequency ratio were also used for illegal advertising verb phrase mining. By combining
the techniques and results of the above two tasks, a false advertisements recognition system was built. Internet users, advertisers, online advertising platforms and the authorities all can benefit from the efficiency and convenience of the system and minimize the damage caused by false online adverting.
In this study, the NTU Irony Corpus, which contains more than 1,000 Chinese ironic expressions in the form of microtexts, was constructed based on linguistic forms and sentiment classification. To the best of our knowledge, this is the first Chinese irony corpus that is annotated with irony element labels. To build this corpus, a bootstrapping procedure was used to reduce human effort. In addition, the linguistic structure of irony was also explored. Ironic word and phrases, contextual information, and rhetoric were found to comprise an ironic expression.
However, more irony patterns and phenomena have yet been discovered and should be examined in future studies. For instance, the following types of irony are not included in the NTU Irony Corpus:
(1) Literally negative expressions that are actually positive (2) Ironic understatements
(3) Situational irony
Up to now, irony in the Chinese language has not been thoroughly investigated by either linguists or computer scientists. In order to improve the performance of irony detection, more studies on the linguistic aspects of irony are needed in the future. The diversity of irony corpora should also be increased by including different types and linguistic patterns of irony.
As shown in the data observed in this study, other types of irony or sarcasm—e.g., ironic understatements, situational irony and the irony in which positive meanings are represented by negative literal meanings—cannot be seen frequently in microtexts, and
the retrieval of these ironic texts requires more contextual and non-linguistic information. The identification of these kinds of ironic use is even more challenging and needs to be further explored in the future.
REFERENCE
Aman, S. and Szpakowicz, S. 2007. Identifying Expressions of Emotion in Text. In Proceedings of 10th International Conference on Text, Speech and Dialogue.
Lecture Notes in Computer Science 4629, pp. 196-205.
Becker, C., Kopp, S. and Wachsmuth, I. 2004. Simulating the Emotion Dynamics of a Multimodal Conversational Agent. In Proceedings of Tutorial and Research Workshop on Affective Dialogue Systems, pp. 154-165.
Bernhaupt, R., Boldt, A., Mirlacher, T., Wilfinger, D. and Tscheligi, M. 2007. Using Emotion in Games: Emotional Flowers. In Proceedings of the International Conference on Advances in Computer Entertainment Technology, 41-48.
CFIA. 2010. Advertising Requirements. Canadian Food Inspection Agency. Available at http://www.inspection.gc.ca/english/fssa/labeti/advpube.shtml.
Chang, C.C. and Lin C.J. 2001. LIBSVM: a Library for Support Vector Machines.
Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chao, Yuen Ren. 1968. A Grammar of Spoken Chinese. University of California Press.
Chen, H.H., Lin, C.C. and Lin, W.C. 2002. Building a Chinese-English WordNet for Translingual Applications. ACM Transactions on Asian Language Information Processing, 1(2): 103-122.
Chen K.J. and Hsieh Y.M. 2004. Chinese Treebanks and Grammar Extraction. In Proceedings of International Joint Conference on Natural Language Processing, pp.
560-565.
Cheng, H and Cantú-Paz, E. 2010. Personalized click prediction in sponsored search. In Third ACM International Conference on Web Search and Data Mining (WSDM 2010), pp. 351-359, New York, USA.
Colston, H.L. and O'Brien, J. 2000. Contrast of Kind Versus Contrast of Magnitude: the Pragmatic Accomplishments of Irony and Hyperbole. Discourse and Processes, 30(3):179-199.
Damerau, Fred J. 1993. Generating and Evaluating Domain-Oriented Multi-Word Terms from Text. Information Processing and Management, 29:433-477.
Davidov, D., Tsur, O. and Rappoport, A. 2010. Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon, In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010), pp.
107-116, Uppsala, Sweden.
Dent, K. and Paul S. 2011. Through the Twitter Glass: Detecting Questions in Micro-Text. In Workshop on Analyzing Microtext at the 25th AAAI Conference on Artificial Intelligence.
Derczynski L., Ritter A., Clark S. and Bontcheva K. 2013. Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data. In Proceedings of Recent Advances in Natural Language Processing, pp. 198-206. Hissar, Bulgaria.
DOH (2009). Legal and Illegal Advertising Statements for Cosmetic Regulations.
Department of Health of Taiwan, Available at http://www.doh.gov.tw/ufile/doc/
0980305527.pdf.
Edelman, B., Ostrovsky, M. and Schwarz, M. 2007. Internet Advertising and the Generalized Second Price Auction: Selling Billions of Dollars Worth of Keywords.
American Economic Review, 97(1):242-259.
Ellen, J. 2011. All about Microtext: A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing.
In Proceedings of the Third International Conference on Agents and Artificial Intelligence.
Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. and Ghosh, R. 2013.
Exploiting Burstiness in Reviews for Review Spammer Detection. In Proceedings of the Interna- tional AAAI Conference on Weblogs and Social Media (ICWSM-2013), pp. 175-184.
Filatova, E. 2012. Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), pp. 392-298, Istanbul, Turkey.
FTC. 2000. Advertising and Marketing on the Internet: Rules of the Road, Bureau of Consumer Protection. Federal Trade Commission, September 2000, Available at http://business.ftc.gov/sites/default/files/pdf/bus28-advertising-and-marketing- internet-rules-road.pdf
Gabrilovich, E., Josifovski, V. and Pang, B. 2008. Introduction to Computational Advertising. Tutorial Abstracts of ACL-08: HLT, p. 1.
Gabrilovich, E., Josifovski, V. and Pang, B. 2009. Introduction to Computational Advertising. IJCAI 2009 Tutorial, http://research.yahoo.com/tutorials/ijcai09_compadv/
Ghosh, A., McAfee, P., Papineni, K. and Vassilvitskii, S. 2009. Bidding for Representative Allocations for Display Advertising. CoRR, abs/0910-0880, 2009.
Gibbs, R.W. and Colston, H.L. 2007. Irony in Language and Thought. Lawrence Erlbaum Associates, New York.
Giora, R. and Fein, O. 1999. Irony: Context and Salience. Metaphor and Symbol, 14:241-257.
Go, A., Huang, L. and Bhayani, R. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project Report, Stanford Universiy, Stanford, CA.
Gokhman, S., Hancock, J., Prabhu, P., Ott, M. and Cardie, C. 2012. In Search of a Gold Standard in Studies of Deception. In Proceedings of the EACL 2012 Workshop on
Computational Approaches to Deception Detection, pp. 23-30.
González-Ibáñez, R., Muresan, S. and Wacholder, N. 2011. Identifying Sarcasm in Twitter: A Closer Look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Short Papers, pp. 581-586, Portland, Oregon, USA.
Grice, H. P. 1975. Logic and Conversation. In P. Cole and J. J. Morgan, eds. Syntax and Semantics, 3: Speech Acts. New York: Academic Press.
Gupta, N., Gilbert, M. and Fabbrizio G.D. 2010. Emotion Detection in Email Customer Care. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 10-16.
Hempstalk, K., Frank, E. and Witten, I.H. 2008. One-Class Classification by Combining Density and Class Probability Estimation. In Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Ma- chine Learning, pp. 505-519.
Huang, H.C., Lin, M.S. and Chen H.H. 2008. Analysis of intention in dialogues using category trees and its application to advertisement recommendation. In Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), pp. 625-630, Hyderabad, India.
Jimenez, S., Becerra C. and Gelbukh, A. 2013. UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distributional Statistics and POS Tags. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp.
114-117.
Jung, Y., Choi, Y. and Myaeng, S.H. 2007. Determining Mood for a Blog by Combining Multiple Sources of Evidence. In Proceedings of International
Conference on Web Intelligence, pp. 271-274.
Katz, G. and Giesbrecht, E. 2006. Automatic Identification of Non-compositional Multi-word Expressions Using Latent Semantic Analysis. In Proceedings of the ACL/COLING-06 Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, pp. 12-19, Sydney, Australia.
Kaufmann, M. 2010. Syntactic Normalization of Twitter Messages. In International Conference on Natural Language Processing. Kharagpur, India
Kilgarriff, A. and Rose, T. 1998. Measures for Corpus Similarity and Homogeneity. In Proceedings of 3rd Conference on Empirical Methods in Natural Language Processing, pp. 46-52. Granada, Spain.
Ku, L.W. and Chen, H.H. 2007. Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, 58(12): 1838-1850.
Lafferty, J., McCallum, A. and Pereira F. 2001. Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289.
Lee, Chang-Ye. 2011. Study of recommending friends to organizations in microblog platform. Master thesis. National Taiwan University.
Li, L. and Sporleder, C. 2010. Linguistic Cues for Distinguishing Literal and Non-Literal Usages. In Proceedings of 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, pp. 683-691, Beijing, China.
Li, Z. and Yarowsky D. 2008. Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
Liebrecht, C., Kunneman, F. and Bosch, A. 2013. The Perfect Solution for Detecting Sarcasm in Tweets #not. In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 29-37, Atlanta, Georgia.
Lin, H.Y., Yang, C.H. and Chen, H.H. 2007. What Emotions Do News Articles Trigger in Their Readers? In Proceedings of 30th Annual International ACM SIGIR Conference, pp. 733-734.
Lin, H.Y. and Chen, H.H. 2008. Ranking Reader Emotions Using Pairwise Loss Minimization and Emotional Distribution Regression. In Proceedings of 2008 Conference on Empirical Methods in Natural Language Processing, pp. 136-144.
Lin H.Y., Yang, C.H. and Chen, H.H. 2008. Emotion Classification of Online News Articles from the Reader’s Perspective. In Proceedings of International Conference on Web Intelligence, pp. 220-226.
Liu. B. 2012. Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers.
Liu, Y., Huang, X., An, A. and Yu, X. 2007. ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs. In Proceedings of the 30th Annual International ACM SIGIR Conference, pp. 607-614.
Lukin, S. and Walker, M. 2013. Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue. In Proceedings of the Workshop on Language Analysis in Social Media, pp. 30-40, Atlanta, Georgia.
Mei, J., Zhu, Y., Gao, Y. and Yin, H. 1982. Tóngyìcícílín. (同義詞詞林) Shanghai Dictionary Press.
Mishne, G. 2005. Experiments with Mood Classification in Blog Posts. In Proceedings
of 1st Workshop on Stylistic Analysis of Text for Information Access.
Mishne, G. and Rijke, M. De. 2006. Capturing Global Mood Levels Using Blog Posts.
In Proceedings of AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs, pp. 145-152.
Mukherjee, A., Liu, B. and Glance N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. In Proceedings of the International World Wide Web Conference (WWW 2012), pp. 191- 200.
Mukherjee A., Kumar A., Liu, B., Wang, J., Hsu, M., Castellanos, M. and Ghosh, R.
2013. Spotting Opinion Spammers using Behavioral Footprints. In Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2013), pp. 632-640.
Ott M., Choi, Y., Cardie, C. and Hancock, J. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 309–319.
Pang, B., Lee, L. and Vaithyanathan, S. 2002. Thumbs up? Sentiment Classification Using Machine Learning Techniques. In Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing, pp. 79-86.
Rayson, P. and Garside R. 2000. Comparing Corpora Using Frequency Profiling. In Proceedings of Workshop on Comparing Corpora of ACL 2000, pp. 1-6.
Reyes, A., Rosso, P. and Buscaldi, D. 2012. From Humor Recognition to Irony Detection: The Figurative Language of Social Media. Data & Knowledge Engineering, 74:1-12.
Sahami, M. and Heilman, T.D. 2006. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. WWW’06, pp. 377-386. ACM Press.
Scaiano, M. and Inkpen, D. 2011. Finding Negative Key Phrases for Internet
Advertising Campaigns Using Wikipedia. In Recent Advances in Natural Language Processing (RANLP 2011), pp. 648–653, Hissar, Bulgaria.
Sperber, D. and Wilson, D. 1981. Irony and the Use-Mention Distinction. In Radical Progmatics, pp. 295-318.
Sperber, D. and Wilson, D. 1992. On Verbal Irony. In Lingua 87, pp. 53-76.
Sporleder, C. and Li, L. 2009. Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions. In Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 754-762.
Sun Y.T., Chen, C.L., Liu C.C., Liu, C.L. and Soo V.W. 2010. Sentiment Classification of Short Chinese Sentences. In Proceedings of 22nd Conference on Computational Linguistics and Speech Processing, Nantou, Taiwan, pp. 184-198.
Strapparava, C. and Mihalcea, R. 2007. Affective Text. In Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74.
Tang, Y.J, Chang, Y.L. and Chen, H.H. 2011. A Comparison between Microblog Corpus and Balanced Corpus from Linguistic and Sentimental Perspectives. In Analyzing Microtext: Papers from the 2011 AAAI Workshop.
Utsumi, A. 1996. A Unified Theory of Irony and Its Computational Formalization. In Proceedings of the 16th Conference on Computational Linguistics, pp. 962-967.
Vanzo A., Croce D. and Basili R. 2014. A Context-based Model for Sentiment Analysis in Twitter. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Pp. 2345-2354. Dublin, Ireland.
Veale, T. and Hao, Y.F. 2010. Detecting Ironic Intent in Creative Comparisons. In Proceedings of the 19th European Conference on Artificial Intelligence (ECAI 2010), pp. 765-770, Lisbon, Portugal.
Wang, F., Wu, Y.F. and Qiu, L.K. 2012. Exploiting Discourse Relations for Sentiment
Analysis. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pp. 1311-1320, Mumbai, India.
Wiebe, J. 2000. Learning Subjective Adjectives from Corpora. In Proceedings of 17th Conference of the American Association for Artificial Intelligence, pp. 735-740.
Xia, Y.Q., Wong, K.F. and Gao W. 2005. NIL is not Nothing: Recognition of Chinese Network Informal Language Expressions. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, pp. 95-102.
Yang C.H., Lin, H.Y. and Chen H.H. 2007a. Building Emotion Lexicon from Weblog Corpora. In Proceedings of 45th Annual Meeting of Association for Computational Linguistics, pp. 133-136.
Yang, C.H., Lin, H.Y. and Chen, H.H. 2007b. Emotion Classification Using Web Blog Corpora. In Proceedings of 2007 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 275-278.
Yang, C.H., Lin, H.Y. and Chen, H.H. 2008. Sentiment Analysis in Weblog Using Contextual Information: A Machine Learning Approach. International Journal of Computer Processing of Languages, 21(4): 331–345.
Yang, C.H., Lin, H.Y. and Chen, H.H. 2009. Writer Meets Reader: Emotion Analysis of Social Media from both the Writer’s and Reader’s Perspectives. In Proceedings of International Conference on Web Intelligence, pp. 287-290.
Yeh, Ming-kung. 2014. Weekly Food and Drug Safety. No. 440. February, Food and Drug Administration, Taiwan. Available at http://www.fda.gov.tw/TC/
PublishOther.aspx
Zhou, L., Li, B.Y., Gao, E., Wei, Z.Y. and Wong, K.F. 2011. Unsupervised Discovery of Discourse Relations for Eliminating Intro-Sentence Polarity Ambiguities. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language
Processing (EMNLP 2011), pp. 162-171, Edinburgh, Scotland, UK.