• 沒有找到結果。

本論文研究自動由使用者微網誌文章擷取關鍵字,並運用維基百科來探勘出 使用者的興趣類別概念,並探討使用者文章興趣類別的集中度。

由於一個字詞由維基百科的類別架構可以探勘出許多的類別概念,我們結合 類別分支度與探勘層數來計算出類別概念的權重,累積所有文章的字詞可以得到 一篇文章的主題概念,集合使用者的所有文章主題類別可以探勘出使用者的興趣 類別概念。我們使用字詞是否出現在維基百科中、文章是否可以找到主題、主題 概念分數的集中與分散程度特徵來探勘微網誌使用者文章的集中度。實驗結果顯 示可以用來區別出文章集中度高與集中度低的使用者,且探勘出的類別與受試者 標定的類別結果有一定程度的一致性。

未來研究可以改進的地方是分析微網誌使用者所發佈的連結內容。由於微網 誌的特性是每一篇文章字數都很少,因此微網誌使用者經常發佈網址連結到其他 部落格、新聞或影音等。如果能分析出網址連結的內容,例如:一個 YouTube 網 址的內容是音樂還是戲劇,即可得知使用者是否喜歡聽音樂、喜歡的歌手等。分 析一個新聞連結是哪一類的新聞,可以知道使用者所感興趣的新聞類別。另外,

也可以結合分析連結內容來萃取出字詞來擴展文章的關鍵字。

微網誌使用者會與關係緊密或興趣相似的朋友建立連結,因此可以考慮分析 微網誌使用者之間所形成的社交網路,找出使用者所屬的社群,藉由相同社群內

50

的微網誌使用者來找出該使用者的興趣類別概念。

維基百科中會有條目名稱部分相同但字詞不夠完整無法直接在維基百科中 查詢到的消歧義字詞,這些字詞在本論文沒有進行處理,因此若能根據該篇微網 誌使用者發表的文章內容或使用者過去的興趣主題,與這些維基百科消歧義字詞 的文章內容來計算相似度找出字詞所隱含的類別概念,應可以使準確率再提升。

51

參考文獻

[1] A. Java, X. Song, T. Finin and Belle Tseng, “Why We Twitter: Understanding Microblogging Usage and Communities,” in Proceedings of the 1st International Workshop on Social Network Mining and Analysis, SNAKDD, 2007.

[2] C. Macdonald and I. Ounis, “Key Blog Distillation: Ranking Aggregates,” in Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management, 2007.

[3] J. Seo and W.B. Croft, “Blog Site Search Using Resource Selection,” in Proceedings of the 17th ACM conference on Conference on Information and Knowledge Management, 2008.

[4] C. Costa, G. Beham, W. Reinhardt, and M. Sillaots, “Microblogging In

Technology Enhanced Learning: A Use-Case Inspection of PPE Summer School 2008,” in Proceedings of the 3rd workshop at the European Conference on Technology Enhanced Learning, 2008.

[5] B. J. Jansen, M. Zhang, K. Sobel and A. Chowdury, “The Commercial Impact of Social Mediating Technologies: Micro-blogging as Online Word-of-Mouth Branding,” in the Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, 2009.

[6] A. L. Hughes and L. Palen, “Twitter Adoption and Use in Mass Convergence and Emergency Events,” in the Proceedings of the 6th International Conference on Information Systems for Crisis Response and Management (ISCRAM), 2009.

[7] A. Passant, T. Hastrup, U. Bojars and J. Breslin, “Microblogging: A Semantic and Distributed Approach, ” in the Proceedings of the 4th Workshop on Scripting for the Semantic Web, 2008.

[8] N. Banerjee, D. Chakraborty, K. Dasgupta, A. Joshi, S. Madan, S. Mittal, S.

Nagar and A. Rai “User Interests in Social Media Sites:An Exploration with Micro-blogs,” in Proceedings of the 18th Conference on Information and Knowledge Management, 2009.

52

[9] X. Hu, X. Zhang, C. Lu, E. K. Park and X. Zhou, “Exploiting Wikipedia as External Knowledge for Document Clustering”, In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009.

[10] P. Kolari, A. Java, T. Finin, T. Oates, A. Joshi, “Detecting Spam Blogs: A Machine Learning Approach”, in National Conference on American Association for

Artificial Intelligence, 2006.

[11] P. Kolari, T. Finin, A. Java, A. Joshi, J. Martineau and J. Mayfield, “Blog Track Open Task: Spam Blog Classification,” in American Association for Artificial Intelligence conference, 2006.

[12] X. Ni, X. Wu and Y. Yu , “Automatic Identification of Chinese Weblogger's Interests Based on Text Classification,” in proceedings of the 2006.

IEEE/WIC/ACM International Conference on Web Intelligence

[13] D. Carmel, H. Roitman, N. Zwerdling, “Enhancing Cluster Labeling Using Wikipedia,” in the Proceedings of the 32nd International ACM SIGIR conference on Research and development in information retrieval, 2009

[14] A. Sun, M. A. Suryanto and Y. Liu, “Blog Classification Using Tags: An Empirical Study,” ICADL 2007. LNCS, vol. 4822, pp. 307–316. Springer, Heidelberg, 2007.

[15] F. Liu, B. Li and Y. Liu, “Finding Opinionated Blogs Using Statistical Classifiers and Lexical Features,” in Proceedings of the Third International Conference on Weblogs and Social Media, 2009.

[16] F. Lin and W. W. Cohen, “The MultiRank Bootstrap Algorithm:

Semi-Supervised Political Blog Classification and Ranking Using

Semi-Supervised Link Classification,” in Proceedings of the 2nd International Conference on Weblogs and Social Media, 2009.

[17] C. Cortes and V. Vapnik. “Support-vector network,” Machine Learning, 20:273-297,1995.

53

附 錄

54

55

附錄 B 微網誌文章之斷詞結果與系統挑選類別範例

說明:以下為一位微網誌使用者之分析結果,四個欄位分別為文章編號、文章內容、文

章斷詞結果及文章所涵蓋的類別概念。

56

相關文件