A topic is associated with specific times, places, and persons. Discovering the interactions between the persons would help readers construct the background of the topic and facilitate document comprehension. In this paper, we have proposed an interaction detection method called FISER, which employs nineteen features covering syntactic, context-dependent, and semantic information in text to detect interactive segments in topic documents.
Our method differs from previous relation detection studies in three respects. First, instead of detecting static and permanent relations, our method detects interactive segments and the interactions between persons are dynamic and topic-dependent. Second, in addition to syntactic features, we devise useful context-dependent and semantic features to detect interactive segments effectively. Finally, most previous approaches analyze the text between entities, but our method further considers the contexts before and after person names to enhance the relation detection performance.
We present an effective recognizer which consider syntactic, context-dependent and semantic information for detecting topic-dependent interactive relation. The experiment results demonstrate the efficacy of FISER which outperforms well-known open IE methods
39
dramatically. By using all nineteen features, the precision, recall, and F1-Score are 70.2%, 54.8%, and 60.7%, respectively; Meanwhile, the best combination of the features is {TNB, PS, NS, TNV,SL, NV, NPV, FPP, PD, ICS} and its precision, recall, and F1-Score are 72.6%, 55.6%, and 61.9%, respectively.
In the future work, we will employ sophisticated syntactic features, such as the dependency tree of a sentence, to enhance FISER’s syntactic features. Moreover, external knowledge bases will be incorporated into E-HowNet to increase the detection of interactive segments. We will also investigate using information extraction algorithms to extract interaction tuples from the detected interactive segments and construct an interaction network of topic persons.
40
References
Agichtein, Eugene and Gravano, Luis, " Snowball: extracting relations from large plain-text collections," In Proceedings of the5th ACM conference on Digital libraries, 85-94, (2000).
Banko, Michele, Cafarella, Michael J., Soderland, Stephen, Broadhead, Matt and Etzioni, Oren, "Open information extraction from the web," In Proceedings of the 20th International Joint Conference on Artifical Intelligence, 2670-2676, (2007).
Banko, Michele and Etzioni, Oren, "The tradeoffs between open and traditional relation extraction," In Proceedings of the 46th Annual Meeting on Association for Computational Linguistics on Human Language Technoloies, 28-36, (2008).
Berger, Adam L., Pietra, Vincent J. Della and Pietra, Stephen A. Della, "A maximum entropy approach to natural language processing," Comput. Linguist., 22, 39-71, (1996).
Chen, Chien Chin and Chen, Meng Chang, "TSCAN: A content anatomy approach to temporal topic summarization," IEEE Transactions on Knowledge and Data Engineering, 24, 170-183, (2012).
Chieu, Hai and Ng, Hwee, "A maximum entropy approach to information extraction from semi-structured and free text," In Proceedings of the 18th National Conference on Artificial intelligence, 786-791, (2002).
Christensen, Janara, Mausam, Soderland, Stephen and Etzioni, Oren, "Semantic role labeling for open information extraction," In Proceedings of the NAACL HLT 2010 1st International Workshop on Formalisms and Methodology for Learning by Reading, 52-60, (2010).
Culotta, Aron and Sorensen, Jeffrey, "Dependency tree kernels for relation extraction," In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 423-429, (2004).
Etzioni, Oren, Fader, Anthony, Christensen, Janara, Soderland, Stephen and Mausam, "Open information extraction: the second generation," In Proceedings of the 22nd International Joint Conference on Artificial Intelligence, 3-10, (2011).
41
Fader, Anthony, Soderland, Stephen and Etzioni, Oren, "Identifying relations for open information extraction," In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1535-1545, (2011).
Feng, Ao and Allan, James, "Finding and linking incidents in news," In Proceedings of the 16th ACM International Conference on Information and Knowledge Management, 821-830, (2007).
Feng, Haodi, Chen, Kang, Deng, Xiaotie and Zheng, Weimin, "Accessor variety criteria for Chinese word extraction," Comput. Linguist., 30, 75-93, (2004).
Han, Jiawei, and Kamber, Micheline, Data mining Concepts and Techniques: Morgan Kaufmann Publishers, 2nd edn., 2006.
Hatzivassiloglou, Vasileios and Weng, Wubin, "Learning anchor verbs for biological interaction patterns from published text articles," International Journal of Medical Informatics, 67, 19-23, (2002).
Hirano, Toru, Asano, Hisako, Matsuo, Yoshihiro and Kikui, Genichiro, "Recognizing relation expression between named entities based on inherent and context-dependent features of relational words," In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 409-417, (2010).
Hirano, Toru, Matsuo, Yoshihiro and Kikui, Genichiro, "Detecting semantic relations between named entities in text using contextual features," In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, 157-160, (2007).
Huang, Shu-Ling, Chung, You-Shan and Chen, Keh-Jiann, "E-HowNet: the expansion of HowNet," In Proceedings of the 1st National HowNet Workshop, 10-22, (2008).
Jindal, Nitin and Liu, Bing, "Opinion spam and analysis," In Proceedings of the international conference on Web search and web data mining, 219-230, (2008)
42
Kambhatla, Nanda, "Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations," In Proceedings of the 42nd Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, 178-181, (2004).
Kohavi, Ron, "A study of cross-validation and bootstrap for accuracy estimation and model selection," In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1137-1143, (1995).
Lafferty, John D., McCallum, Andrew and Pereira, Fernando C. N., "Conditional random fields: probabilistic models for segmenting and labeling sequence data," In Proceedings of the 18th International Conference on Machine Learning, 282-289, (2001).
Li, Wenjie, Zhang, Peng, Wei, Furu, Hou, Yuexian and Lu, Qin, "A novel feature-based approach to Chinese entity relation extraction," In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, 89-92, (2008).
Ling, Charles and Li, Chenghui, "Data Mining for Direct Marketing: Problems and Solutions," In Knowledge Discovery and Data Mining, 73-79, (1998).
Ling, Goh Chooi, Asahara, Masayuki and Matsumoto, Yuji, "Chinese unknown word identification using character-based tagging and chunking," In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 197-200, (2003).
Manning, Chris and Schütze, Hinrich, Foundations of statistical natural language processing:
MIT Press, Cambridge, Massachusetts, 1st edn., 1999.
Manning, Christopher D., Raghavan, Prabhakar and Schütze, Hinrich, Introduction to information retrieval: Cambridge University Press, Cambridge, U.K, 2nd edn., 2008.
Mitchell, T.M., Machine learning: McGraw-Hill, New York, 1st edn., 1997.
Nallapati, Ramesh, Feng, Ao, Peng, Fuchun and Allan, James, "Event threading within news topics," In Proceedings of the 13th ACM International Conference on Information and Knowledge Management, 446-453, (2004).
43
Pantel, Patrick and Pennacchiotti, Marco, "Espresso: leveraging generic patterns for automatically harvesting semantic relations, "In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 113-120, (2006).
Vernon, G.M., Human interaction: an introduction to sociology: Ronald Press Co., New York, 1st edn., 1965.
Wang, Yuan-Kai, Chen, Yi-Shiou, and Hsu, Wen-Lian, "Empirical study of Mandarin Chinese discourse analysis: an event-based approach," In Proceedings of 10th IEEE International Conference on Tools with Artificial Intelligence, 466-473, (1998).
Zelenko, Dmitry, Aone, Chinatsu and Richardella, Anthony, " Kernel methods for relation extraction, " The Journal of Machine Learning Research, 3, 1083-1106, (2003).
Zhou, Guodong, Qian, Longhua and Fan, Jianxi, "Tree kernel-based semantic relation extraction with rich syntactic and semantic information," Information Sciences, 180, 1313-1325, (2010).
Zhu, Jun, Nie, Zaiqing, Liu, Xiaojiang, Zhang, Bo and Wen, Ji-Rong, "StatSnowball: a statistical approach to extracting entity relationships," In Proceedings of the 18th International Conference on World Wide Web, 101-110, (2009).