結論與未來展望 - 多文件摘要系統基於Mutual Reinforcement原理

5.1 研究總結

本論文所完成的多文件摘要系統是基於Mutual Reinforcement原理所設計出來的，其原理相當地直觀：重要的字詞會出現在多個重要的句子之中；而重要的句子必定會包含多個重要的字詞。除此之外，更利用了Alignment演算法與 Mutual Reinforcement演算法保留文章中較富含資訊量的句子，這個部分幫助了下個階段的句子評分演算法。

而在句子評分演算法中，本摘要系統完整地考慮了三個不同的面向：字詞對句子、標題對句子、句子對句子等三種不同的關係，並分別利用HITS網頁排序演算法、餘弦相似度計算方法、PageRank網頁排序演算法來實現上列三個不同的面向的評分值，此部分為本摘要系統的重點所在。

接著，在上列三個不同的句子分數經過線性組合後，得到最終的句子分數排序順序，摘要系統最後則依此加權排序順序來挑選摘要句子，並再依照規定的摘要字數來組成系統摘要。

根據評估實驗結果，我們所提出的摘要系統跟其他摘要系統比較後，獲得還不錯的排名。這是因為我們的摘要系統不僅僅只是考慮字詞或句子等單方面的因素，而是完整地考慮了字詞對句子、標題對句子、句子對句子等三種不同的面向來做為挑選摘要句子的依據，使得最後的系統摘要能夠具有可讀性、流暢性、

簡潔性、概括性與客觀性等各種特性。

5.2 未來展望

在未來的研究中，希望能針對本摘要系統不足的部分，進而研究一些技術或方法使得之後所提出的摘要系統能夠獲得更佳的效能。而下列的議題則可能成

為之後我們所更深入探討的研究方向：

1. 在文章中必定會包含許多代名詞(Pronoun)，而此代名詞為代替前面或後面句子中的其中一個名詞。若是我們可以將此代名詞成功地還原成其所代替的名詞，則此對於所建立的投影向量空間模型有極高的幫助。除此之外，在最後產生的系統摘要句子中就不會出現這些代名詞，而造成語意混淆不清甚至是錯誤的情形。故若我們能將每個代名詞都成功地還原成其名詞，則對於系統摘要的效能肯定有一定程度地提升。

2. 在本摘要系統中，較缺乏的為對於文章中字詞或句子的語意分析，故希望可以將此類的語意分析式方法加入我們的摘要系統中。在此類的方法中，我們可以藉由字詞或句子的語意(包含結構、詞性、同義、反義、上下位等)所建立的語彙鏈結來架構其文章中階層式的主題類別，進而分析這些主題類別是否為文章中的主要討論的主題，最後我們則可以根據此類的主要主題來做為挑選其系統摘要的主要依據。

參考文獻

[ 1 ] Hans Peter Luhn, Keyword-in-context index for technical literature.

American Documentation, 11(4):288–295. ISSN: 0002-8231.

[ 2 ] Stergos Afantenosa, Vangelis Karkaletsis, Panagiotis Stamatopoulos, Summarization from medical documents: a survey, Artificial Intelligence in Medicine, 33(2), 157-177.

[ 3 ] Sparck-Jones K, Automatic summarizing: factors and directions. In: Mani I, Maybury MT, editors. Advances in automatic text summarization. 1999. p.

10—12 [chapter 1].

[ 4 ] Alice H. Oh, Generating Multiple Summaries Based on Computational Model of Perspective, A PhD Thesis of Massachusetts Institute of Technology, September 2008.

[ 5 ] Jade Goldstein Stewart, Genre Oriented Summarization, A PhD Thesis of Carnegie Mellon University, December 2008.

[ 6 ] Gerard Salton, Andrew Wong, and Chung Shu Yang, A vector space model for Information Retrieval, In Proceedings of Journal of the American Society for Information Science, 18(11):613-620, November 1975.

[ 7 ] Massih R. Amini, Nicolas Usunier, A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization, In Proceedings of Document Understanding Conference 2007, April 2007. Presented at

NAACL-HLT 2007.

[ 8 ] Daniel Marcu, The Automatic Construction of Large-Scale Corpora for Summarization, In Proceedings of the 22^nd ACM SIGIR Conference, 1999.

[ 9 ] Hongyuan Zha, Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering, In Proceedings of

SIGIR 2002, January 2002.

[ 10 ] Ya Zhang, Xiang Ji, ChaoHsien Chu, Hongyuan Zha al , Correlating Summarization of Multisource News with K-Way Graph Bi-clustering, In Proceedings of ACM SIGKDD Explorations Newsletter, December 2004.

[ 11 ] Hany Hassan, Ahmed Hassan, Ossama Emam, Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement, In Proceedings of Empirical Methods for Natural Language Processing ( EMNLP), 2006.

[ 12 ] J.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, In Proceedings of 9th ACM–SIAM Symp. on Discrete Algorithms, 1998.

[ 13 ] Soumen Chakrabarti, Data mining for hypertext: A tutorial survey, In Proceedings of ACM SIGKDD, Jan 2000.

[ 14 ] Jidong Wang, Huajun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-Ying Ma, ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects, In Proceedings of SIGIR 2003, July 2003.

[ 15 ] Furu Wei, Wenjie Li, Qin Lu, Yanxiang He, Query-Sensitive Mutual Reinforcement Chain and Its Application in Query-Oriented Multi-Document Summarization, In Proceedings of SIGIR 2008, July 2008.

[ 16 ] Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, In Proceedings of Seventh International World-Wide Web Conference, April 1998.

[ 17 ] Meishan Hu, Aixin Sun, and Ee-Peng Lim, Comments-Oriented Document Summarization: Understanding Documents with Readers’

Feedback, In Proceedings of SIGIR 2008, July 2008.

[ 18 ] Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of

Summaries, In Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, 2004.

[ 19 ] Sujian Li, You Ouyang, Wei Wang, Bin Sun, Multi-document Summarization Using Support Vector Regression, In Proceedings of Document Understanding Conference 2007, April 2007. Presented at

NAACL-HLT 2007.

[ 20 ] Xiaojun Wan and Jianwu Yang, Multi-Document Summarization Using Cluster-Based Link Analysis, In Proceedings of SIGIR 2008, July 2008.

[ 21 ] Xiao-Chen Ma, Gui-Bin Yu, Liang Ma, Multi-document Summarization Using Clustering Algorithm, In Proceedings of IEEE, 2009.

[ 22 ] Dani Yogatama, Kumiko Tanaka-Ishii, Multilingual Spectral Clustering Using Document Similarity Propagation, In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009.

[ 23 ] M.F. Porter, An algorithm for suffix stripping, In Proceedings of Program, July 1980.

[ 24 ] Jen-Yuan Yeh, A Study on Extraction-based Multidocument Summarization, A PhD Thesis of National Chiao Tung University, March 2008.

[ 25 ] Zan-Wei Liao, Automatic Text Summarization System for Chinese News, A Master Thesis of National Chiao Tung University, June 2009.

[ 26 ] Elena Lloret and Manuel Palomar, Challenging Issues of Automatic Summarization: Relevance Detection and Quality-based Evaluation, In Proceedings of Informatica, April 2009, 29-35.

[ 27 ] Ramiz M. Aliguliyev, A new sentence similarity measure and sentence based extractive technique for automatic text summarization, In Proceedings

of Expert Systems with Applications, May 2009, 7764–7772.

[ 28 ] Xiaojun Wan and Jianwu Yang, Improved Affinity Graph Based Multi-Document Summarization, In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, June 2006, pages 181–184.

[ 29 ] Xiaojun Wan and Jianwu Yang, Multi-Document Summarization Using Cluster-Based Link Analysis, In Proceedings of SIGIR 2008, July 2008.

[ 30 ] The DUC Dataset’s URL: http://duc.nist.gov/

[ 31 ] The Stanford Parser’s URL:

http://nlp.stanford.edu/software/lex-parser.shtml [ 32 ] The Porter Stemmer’s URL: http://tartarus.org/~martin/PorterStemmer/

[ 33 ] The Wordnet’s URL: http://wordnet.princeton.edu/

在文檔中多文件摘要系統基於Mutual Reinforcement原理 (頁 55-60)