• 沒有找到結果。

結論與未來研究方向

本論文運用不同的重要性評估方式,探討如何有效地自動從論壇文章中擷取

產品特徵字詞。我們除了考慮常被使用來做為特徵擷取依據的詞頻外,並考慮各

個廠牌之間的不同機率分布差異值,也對廠牌本身與特徵字詞之間的相關程度進

行計算。我們提出跨文件庫的字詞分析,我們結合相機介紹文件庫中的字詞統計

分析,透過跨文件庫的機率分布差異度,對候選特徵字詞進行擷取篩選,並以常

見字詞列表的輔助進行一般口語字詞的過濾,最後我們提出產品特徵字詞重要性

的評估函式,結合各方法所得重要性評估值作為特徵字詞擷取的依據。

實驗結果顯示,當我們只考慮文件庫內部字詞分析評估值來擷取特徵字詞,

採用 KL divergence,對於產品廠牌特有特徵字詞可以達到較好的效果,但若考

慮整體特徵字詞擷取的效果,則以結合 TF、KL、MI 並採用常見字詞評估值(FL

值)過濾的表現較佳。當採用跨文件庫的重要性評估分析方法,可有效地過濾論

壇用詞等非相機相關的一般字詞。當同時結合文件庫內部及跨文件庫的重要性評

估結果,單用 TF 結合跨文件庫較採用文件庫內部多項結合(Intra score)再結合跨

文件庫的效果突顯。上述結合方式並加上常見字詞評估值(FL 值)可有效過濾一般

口語字詞,得到較好的產品特徵字詞擷取效果。

本論文的研究將特徵字詞設定為名詞,未來的研究方向可以透過對論壇的文

章進行意見程度的分析,進而可以由意見句的句型語意分析中找出特徵字詞,或

透過特徵字詞出現句所表達的意見程度進行特徵字詞的篩選擷取,運用片語的字

詞形式進行更完善的字詞擷取也是未來可進一步探討的方向。

此外在本論文中我們提出的字詞重要性函式,我們將字詞給予不同的權重值

衡量其字詞的重要性,對於不同的文件庫集合的字詞重要性函式,能否自動依據

其特性給予一個不同的字詞重要性函式也是未來可以研究的議題。

對於論壇的字詞我們著重在擷取出的準確程度,未來如何改善其查全率

(Recall)也是未來研究的重點。

參考文獻

[1] L. Ku, Y. Liang and H. Chen, “Opinion Extraction, Summarization and Tracking

in News and Blog Corpora” in Proceedings of International Conference on

Artificial Intelligence(AAAI) ,2006.

[2] B. Liu and N. Jindal, “Opinion Spam and Sentiment Analysis”, in Proceedings of the 1st ACM International Conference on Web Search and Data Mining (WSDM), 2008.

[3] G.. Mishne “Using Blog Properties to Improve Retrieval”, in Proceedings of the 1st International Conference on Weblogs and Social Media(ICWSM), 2007.

[4] W. Zhang, C.Yu, and W. Meng, “Opinion Retrieval from Blogs”, in Proceedings of the16th ACM Conference on Information and Knowledge Management(CIKM), 2007.

[5] Q.Su, X. Xu, H. Guo, Z. Guo, X. Wu, X. Zhang, B. Swen, “Hidden Sentiment

Association in Chinese Web Opinion Mining”, in Proceedings of the 17

th International Conference on World Wide Web(WWW), 2008.

[6] W. Dakka and P. G. Ipeirotis, “Automatic Extraction of Useful Facet Hierarchies

from Text Databases”, in Proceedings of the 24

th International Conference on Data Engineering (ICDE), 2008.

[7] D. Dash, J. Rao, N. Megiddo, A. Ailamaki1, and G. Lohman, “Dynamic Faceted

Search for Discovery-driven Analysis”, in Proceedings of the 17

th ACM Conference on Information and Knowledge Management(CIKM), 2008.

[8] B. He, C. Macdonald, J. He, and I. Ounis, “ An Effective Statistical Approach to

Blog Post Opinion Retrieval”, in Proceedings of the 17

th ACM Conference on Information and Knowledge Management(CIKM), 2008.

[9] X. Ling, Q. Mei, C. Zhai, and B. Schatz, “Mining Multi-Faceted Overviews of

Arbitrary Topics in a Text Collection”, in Proceedings of the 14

th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD), 2008.

[10] G. Salton, “Automatic Information Organization and Retrieval” McGraw-Hill, New York, 1968.

[11] M. Hu, B. Liu, “ Mining and Summarizing Customer Reviews” , in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD), 2004.

[12] L. Zhuang, F. Jing, X. Zhu, “Movie Review Mining and Summarization” , in Proceedings of the 15th ACM Conference on Information and Knowledge Management(CIKM), 2006.

[13] X. Ding, B. Liu, L. Zhang, “Entity Discovery and Assignment for Opinion

Mining Applications”, in Proceedings of the 15

th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.

[14] W. Jin, H. Ho, R. Srihari, “OpinionMiner: A Novel Machine Learning System for

Web Opinion Mining and Extraction”, in Proceedings of the 15

th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.

[15] M.Grineva, M.Grinev, D. Lizorkin, “Extract Key Terms from Noisy and

Multi-theme Documents”, in Proceedings of the 18

th International Conference on World Wide Web (WWW), 2009.

[16] C. Fautsch, Jacques Savoy, “Adapting the Tf-idf Vector Space Model to Domain

Specific Information Retrieval”

in Proceedings of the 25th ACM Symposium on Applied Computing(SAC),2010.

[17] D. Carmel, H. Rotiman, N.zwerding, “Enhancing Clustering Labeling Using

Wikipedia” in Proceedings of the 32

nd international ACM SIGIR conference on research and development in information retrieval(SIGIR),2009.

相關文件