• 沒有找到結果。

5.1 研究總結

本論文所完成的多文件摘要系統是基於Mutual Reinforcement原理所設計 出來的,其原理相當地直觀:重要的字詞會出現在多個重要的句子之中;而重要 的句子必定會包含多個重要的字詞。 除此之外,更利用了Alignment演算法與 Mutual Reinforcement演算法保留文章中較富含資訊量的句子,這個部分幫助了 下個階段的句子評分演算法。

而在句子評分演算法中,本摘要系統完整地考慮了三個不同的面向:字詞對 句子、標題對句子、句子對句子等三種不同的關係,並分別利用HITS網頁排序 演算法、餘弦相似度計算方法、PageRank網頁排序演算法來實現上列三個不同 的面向的評分值,此部分為本摘要系統的重點所在。

接著,在上列三個不同的句子分數經過線性組合後,得到最終的句子分數排 序順序,摘要系統最後則依此加權排序順序來挑選摘要句子,並再依照規定的摘 要字數來組成系統摘要。

根據評估實驗結果,我們所提出的摘要系統跟其他摘要系統比較後,獲得還 不錯的排名。 這是因為我們的摘要系統不僅僅只是考慮字詞或句子等單方面的 因素,而是完整地考慮了字詞對句子、標題對句子、句子對句子等三種不同的面 向來做為挑選摘要句子的依據,使得最後的系統摘要能夠具有可讀性、流暢性、

簡潔性、概括性與客觀性等各種特性。

5.2 未來展望

在未來的研究中,希望能針對本摘要系統不足的部分,進而研究一些技術或 方法使得之後所提出的摘要系統能夠獲得更佳的效能。 而下列的議題則可能成

45

為之後我們所更深入探討的研究方向:

1. 在文章中必定會包含許多代名詞(Pronoun),而此代名詞為代替前面或後 面句子中的其中一個名詞。 若是我們可以將此代名詞成功地還原成其所代替的 名詞,則此對於所建立的投影向量空間模型有極高的幫助。 除此之外,在最後 產生的系統摘要句子中就不會出現這些代名詞,而造成語意混淆不清甚至是錯誤 的情形。 故若我們能將每個代名詞都成功地還原成其名詞,則對於系統摘要的 效能肯定有一定程度地提升。

2. 在本摘要系統中,較缺乏的為對於文章中字詞或句子的語意分析,故希 望可以將此類的語意分析式方法加入我們的摘要系統中。 在此類的方法中,我 們可以藉由字詞或句子的語意(包含結構、詞性、同義、反義、上下位等)所建立 的語彙鏈結來架構其文章中階層式的主題類別,進而分析這些主題類別是否為文 章中的主要討論的主題,最後我們則可以根據此類的主要主題來做為挑選其系統 摘要的主要依據。

46

參考文獻

[ 1 ] Hans Peter Luhn, Keyword-in-context index for technical literature.

American Documentation, 11(4):288–295. ISSN: 0002-8231.

[ 2 ] Stergos Afantenosa, Vangelis Karkaletsis, Panagiotis Stamatopoulos, Summarization from medical documents: a survey, Artificial Intelligence in Medicine, 33(2), 157-177.

[ 3 ] Sparck-Jones K, Automatic summarizing: factors and directions. In: Mani I, Maybury MT, editors. Advances in automatic text summarization. 1999. p.

10—12 [chapter 1].

[ 4 ] Alice H. Oh, Generating Multiple Summaries Based on Computational Model of Perspective, A PhD Thesis of Massachusetts Institute of Technology, September 2008.

[ 5 ] Jade Goldstein Stewart, Genre Oriented Summarization, A PhD Thesis of Carnegie Mellon University, December 2008.

[ 6 ] Gerard Salton, Andrew Wong, and Chung Shu Yang, A vector space model for Information Retrieval, In Proceedings of Journal of the American Society for Information Science, 18(11):613-620, November 1975.

[ 7 ] Massih R. Amini, Nicolas Usunier, A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization, In Proceedings of Document Understanding Conference 2007, April 2007. Presented at

NAACL-HLT 2007.

[ 8 ] Daniel Marcu, The Automatic Construction of Large-Scale Corpora for Summarization, In Proceedings of the 22nd ACM SIGIR Conference, 1999.

[ 9 ] Hongyuan Zha, Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering, In Proceedings of

47

SIGIR 2002, January 2002.

[ 10 ] Ya Zhang, Xiang Ji, ChaoHsien Chu, Hongyuan Zha al , Correlating Summarization of Multisource News with K-Way Graph Bi-clustering, In Proceedings of ACM SIGKDD Explorations Newsletter, December 2004.

[ 11 ] Hany Hassan, Ahmed Hassan, Ossama Emam, Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement, In Proceedings of Empirical Methods for Natural Language Processing ( EMNLP), 2006.

[ 12 ] J.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, In Proceedings of 9th ACM–SIAM Symp. on Discrete Algorithms, 1998.

[ 13 ] Soumen Chakrabarti, Data mining for hypertext: A tutorial survey, In Proceedings of ACM SIGKDD, Jan 2000.

[ 14 ] Jidong Wang, Huajun Zeng, Zheng Chen, Hongjun Lu, Li Tao, Wei-Ying Ma, ReCoM: Reinforcement Clustering of Multi-Type Interrelated Data Objects, In Proceedings of SIGIR 2003, July 2003.

[ 15 ] Furu Wei, Wenjie Li, Qin Lu, Yanxiang He, Query-Sensitive Mutual Reinforcement Chain and Its Application in Query-Oriented Multi-Document Summarization, In Proceedings of SIGIR 2008, July 2008.

[ 16 ] Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, In Proceedings of Seventh International World-Wide Web Conference, April 1998.

[ 17 ] Meishan Hu, Aixin Sun, and Ee-Peng Lim, Comments-Oriented Document Summarization: Understanding Documents with Readers’

Feedback, In Proceedings of SIGIR 2008, July 2008.

[ 18 ] Chin-Yew Lin, ROUGE: A Package for Automatic Evaluation of

48

Summaries, In Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, 2004.

[ 19 ] Sujian Li, You Ouyang, Wei Wang, Bin Sun, Multi-document Summarization Using Support Vector Regression, In Proceedings of Document Understanding Conference 2007, April 2007. Presented at

NAACL-HLT 2007.

[ 20 ] Xiaojun Wan and Jianwu Yang, Multi-Document Summarization Using Cluster-Based Link Analysis, In Proceedings of SIGIR 2008, July 2008.

[ 21 ] Xiao-Chen Ma, Gui-Bin Yu, Liang Ma, Multi-document Summarization Using Clustering Algorithm, In Proceedings of IEEE, 2009.

[ 22 ] Dani Yogatama, Kumiko Tanaka-Ishii, Multilingual Spectral Clustering Using Document Similarity Propagation, In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, August 2009.

[ 23 ] M.F. Porter, An algorithm for suffix stripping, In Proceedings of Program, July 1980.

[ 24 ] Jen-Yuan Yeh, A Study on Extraction-based Multidocument Summarization, A PhD Thesis of National Chiao Tung University, March 2008.

[ 25 ] Zan-Wei Liao, Automatic Text Summarization System for Chinese News, A Master Thesis of National Chiao Tung University, June 2009.

[ 26 ] Elena Lloret and Manuel Palomar, Challenging Issues of Automatic Summarization: Relevance Detection and Quality-based Evaluation, In Proceedings of Informatica, April 2009, 29-35.

[ 27 ] Ramiz M. Aliguliyev, A new sentence similarity measure and sentence based extractive technique for automatic text summarization, In Proceedings

49

50

of Expert Systems with Applications, May 2009, 7764–7772.

[ 28 ] Xiaojun Wan and Jianwu Yang, Improved Affinity Graph Based Multi-Document Summarization, In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, June 2006, pages 181–184.

[ 29 ] Xiaojun Wan and Jianwu Yang, Multi-Document Summarization Using Cluster-Based Link Analysis, In Proceedings of SIGIR 2008, July 2008.

[ 30 ] The DUC Dataset’s URL: http://duc.nist.gov/

[ 31 ] The Stanford Parser’s URL:

http://nlp.stanford.edu/software/lex-parser.shtml [ 32 ] The Porter Stemmer’s URL: http://tartarus.org/~martin/PorterStemmer/

[ 33 ] The Wordnet’s URL: http://wordnet.princeton.edu/

相關文件