結論與未來研究方向 - 新聞論壇多面向分析之研究

本論文研究自動從文章內容集合中挑選出與文章討論相關的面向關鍵字集合，且自動建立出面向的階層架構關係。本論文研究方法不需使用外部的資源來幫忙取出面向，也不需利用外部資源來建立架構關係，從所給文章集合中即可自動探勘，且實驗結果顯示：將多個主題的文章混合在一起時，本論文方法也可以將不同主題的文章所涵蓋的面向正確地萃取出來。

計算面向與文章的相關程度時，我們使用文件集合中的資訊來擴展面向的相關字詞，而不是藉由外部資源來擴展字詞，此方法可以更有效地找出在所給資料文件集中與面向相關的字詞。我們利用向量空間模型來計算文章與面向的相關程度，並且分別計算文章中每一個句子與面向的相似度。在我們的面向分析系統中，使用者可以清楚地看到與某主題討論文章中相關的觀點面向，這些面向的階層架構，還有與這些面向相關的文章，至於文章內容則是會列出相關的面向，並會在文章內容中以不同的顏色標記出與面向相關的句子。從實驗結果來看，本論文系統對所選定的面向與受試者挑選的面向結果一致性很高。

未來研究可以改進的地方像是誤用字的情形，例如：吸『菸』與吸『煙』，

消費『券』與消費『卷』，『抽』菸與『吸』菸等，這些應為同義詞的兩個字詞，

在本論文之系統中卻會將這兩個字詞視為不同的字詞，造成統計出現頻率時被分散，若可以使用一些方法，例如：建立同義字詞典，將此二字詞視為同一字詞合

建立兩層的架構關係，但是有一些下層的面向其實可能也是其它面向的上層面向，若在建立架構的時候，再遞迴檢查是否下層的面向還可以再分出其它的面向，就可以形成多層架構，較能呈現出架構的完整模樣，這些問題皆是未來可進一步改進探討的方向。

參考文獻

[1] W. Dakka, P. G. Ipeirotis, and K. R. Wood, “Automatic construction of multifaceted browsing interfaces,” In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM), 2005.

[2] W. Dakka, R. Dayal, and P. G. Ipeirotis, “Automatic discovery of useful facet terms,” In Proceedings of the 29th ACM SIGIR conference on Faceted Search, 2006.

[3] W. Dakka and P. G. Ipeirotis, “Automatic Extraction of Useful Facet Hierarchies from Text Databases,” in Proceedings of the 24th International Conference on Data Engineering (ICDE), 2008.

[4] D. Dash, J. Rao, N. Megiddo, A. Ailamaki1, and G. Lohman, “Dynamic Faceted Search for Discovery-driven Analysis,” In Proceedings of the 17th ACM international conference on Information and knowledge management (CIKM), 2008.

[5] G. Erkan and D. R. Radev, “LexPageRank: Prestige in Multi-Document Text Summarization,” In Proceeding of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.

[6] M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and A. C. König, “BLEWS:

Using Blogs to Provide Context for News Articles,” In National Conference on Artificial Intelligence (AAAI), 2008.

[7] B. He, C. Macdonald, J. He, and I. Ounis, “An Effective Statistical Approach to Blog Post Opinion Retrieval,” In Proceedings of the 17th ACM international conference on Information and knowledge management (CIKM), 2008.

[8] M. Hu, A. Sun, and E. Lim, “Comments-Oriented Blog Summarization by Sentence Extraction,” In Proceedings of the 16th ACM international conference on Information and knowledge management (CIKM), 2007.

2008.

[10] L. Ku, Y. Liang, and H. Chen, “Opinion Extraction, Summarization and Tracking in News and Blog Corpora,” In National Conference on Artificial Intelligence (AAAI), 2006.

[11] X. Ling, Q. Mei, C. Zhai, and B. Schatz, “Mining Multi-Faceted Overviews of Arbitrary Topics in a Text Collection,” In Proceeding of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, 2008.

[12] G. Mishne, “Multiple Ranking Strategies for Opinion Retrieval in Blogs,” in Proceedings of the 15th of Text REtrieval Conference (TREC 2006), 2006.

[13] G. Mishne, “Using Blog Properties to Improve Retrieval,” In proceedings of International Conference on Weblogs and Social Media (ICWSM), 2007.

[14] G. Salton, “Automatic Information Organization and Retrieval,” McGraw-Hill, New York, 1968.

[15] E. Stoica, M. A. Hearst, and M. Richardson, “Automating creation of hierarchical faceted metadata structures,” In Proceedings of NAACL/HLT 2007, 2007.

[16] W. Zhang, C. Yu, and W. Meng, “Opinion Retrieval from Blogs,” In Proceedings of the 16th ACM international conference on Information and knowledge management (CIKM), 2007.

在文檔中新聞論壇多面向分析之研究 (頁 58-61)