結論與未來展望 - 結合監督式及非監督式方法進行新聞文章意見持有者辨識之研究

本研究提出一個以監督式學習方法為基礎，使用英語新聞報導文章做為研究語料，進行意見持有者辨識方法，事先經過周詳的自然語言處理細節步驟，而後結合非監督式學習方法實作出本研究的意見持有者辨識系統。

本研究根據意見持有者的類別，分為文章作者意見辨識與意見持有者辨識兩部分。在文章作者意見方面，本研究使用詞彙相關資訊、詞性相關資訊、標點符號相關資訊、具名實體相關資訊、句法相關資訊及意見詞資訊等特徵值，並使用支援向量機進行監督式學習方法，來解決此問題。在意見持有者辨識方面，本研究使用詞性相關資訊、詞彙相關資訊、具名實體相關資訊、文句組成相關資訊及標點符號相關資訊等特徵值，也使用支援向量機進行監督式學習方法，並結合非監督式學習方法進行錯誤結果的修復動作，用以提升整體意見持有者辨識準確度。

本研究實作的辨識系統，經由監督式學習方法使用支援向量機，得到初步的辨識結果，再結合非監督式學習方法，使用我們制定的修改動作，將錯誤的辨識結果經過規則修復與加權距離計算公式，最終從本研究的系統得到更為準確的意見值有者。文章作者辨識的方面，在Taiwan News 英語新聞語料中可以達到訓練模型F-1 值為 89.71%與測試資料 F-1 值為 91.58%的效能；意見持有者辨識的方 面，在語料中可以達到F-1 值 71.83%的效能。

檢視本研究實驗結果之摘要如下，主要的兩個辨識實驗結果如表5.1：

表5.1 本研究的實驗結果表

實驗準確率P 召回率 R F-1 值正確率 A 文章作者意見辨識 99.23% 86.08% 91.58% 95.68%

意見持有者辨識 73.63% 71.94% 71.83% 75.62%

另外，也與其他相關研究做比較，因測試語料不同，所以結果不能直接比較； (Maximum Entropy Ranking Algorithm)，透過解析樹取得

意見持有者辨識的研究中，指代消解問題為一大挑戰，本研究透過 Hobbs Algorithm 解決了大部分的錯誤，使得辨識實驗結果為人稱代名詞的情形，重新指代為具有意義的意見持有者代表詞，經由 Stanford Coreference Resolution System 初步得到的結果與使用 Hobbs Algorithm 基於剖析句法規則的指代消解步驟，最終可以達到正確率 75.41%的效能。而意見持有者短語擴展實驗，使用非監督式方法，運用我們制定的組合規則，將意見持有者由單詞修復為詞組的形式，

得到意義較為完整的持有者名稱，可以達到正確率為94.02%的效能。

本研究使用的是英語新聞語料進行實驗，未來希望本研究方法能夠延伸至多語言的辨識應用，也希望能夠涉及不同用詞習慣的語料，如社群網站資訊、部落格文章與購物平台評論等。本研究透過人工標記意見持有者，建立了一個可供利用的語料庫，可讓後續研究提出更多有利辨識的特徵值。

當今的意見持有者研究，只能夠辨識出單一代表人物或組織，我們可以看到這個問題仍然是尚待解決，一般情況下，一個檔案層級的意見文章，富有多個意見主題、持有者與評論目標，而本研究能夠處理檔案層級的文本內容，判斷每一句為文章作者意見或是具有代表詞的持有者，將檔案層級的文本指派主觀性或客觀性的情感意見，卻無法將同一句中多個人物或組織的意見同時擷取，必須進一步地對字的層級進行深度學習，使用包含複雜結構或由多重非線性變換構成的多個處理層，對資料進行高層抽象的演算法，將用非監督式或半監督式的特徵學習和分層特徵提取的高效演算法來替代人工取得特徵所花費的時間。

本研究的系統可以有效的辨識出意見持有者，即使是有指代問題的情況。然而多數的錯誤是可被解決的，最大的挑戰在於具名實體的假設，更多未知詞與新型態的意見持有者。此問題的一個弱點是太過依賴字彙線索，儘管詞彙特徵是有效的，但其出現頻率較低的字，會使得它很難做為線索。另外，一個特殊定義的實體可以看作是意見持有者，但本研究並沒有專注在辨識地名實體，本研究只針對了人名、組織名與職稱名。也存在一些多重意見持有者的問題，多數決計算辨識結果分數，可以選擇出多個意見持有者，然而本研究的方法只能擷取一句中的

單個持有者，本研究的模型有待加強，因為特徵值並無針對多重持有者做特徵選取，一個更複雜的問題，當意見持有者同時出現指代和非指代，例如：Someone said he/she …，本研究的做法優先考慮人名，而排除人稱代名詞做為持有者。

本研究結合監督式與非監督式學習方法，辨識新聞語料句的文章作者與意見持有者，並驗證了本研究的方法是可行的，本研究貢獻於建立意見持有者標記語料庫，和提出在英語新聞語料方面，有效的結合辨識方法。

參考文獻

[1] Kim, S. M., & Hovy, E. (2004, August). "Determining the sentiment of opinions." In Proceedings of the 20th international conference on Computational Linguistics (p. 1367).

[2] Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). "New avenues in opinion mining and sentiment analysis." IEEE Intelligent Systems, 28(2), 15-21.

[3] Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005, October). "Identifying sources of opinions with conditional random fields and extraction patterns." In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 355-362).

[4] Lafferty, J., McCallum, A., & Pereira, F. C. (2001). "Conditional random fields:

Probabilistic models for segmenting and labeling sequence data." In Proceedings of the eighteenth international conference on machine learning, ICML (Vol. 1, pp.

282-289).

[5] Ku, L. W., Liang, Y. T., & Chen, H. H. (2006, March). "Opinion Extraction, Summarization and Tracking in News and Blog Corpora." In AAAI spring symposium: Computational approaches to analyzing weblogs (Vol. 100107).

[6] Cortes, C., & Vapnik, V. (1995). "Support-vector networks." Machine learning, 20(3), 273-297.

[7] Das, D., & Bandyopadhyay, S. (2011, July). "Emotions on Bengali blog texts:

role of holder and topic." In Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on (pp. 587-592). IEEE.

[8] Elarnaoty, M., AbdelRahman, S., & Fahmy, A. (2012). "A machine learning approach for opinion holder extraction in Arabic language." arXiv preprint arXiv:1206.1011.

[9] Kim, S. M., & Hovy, E. (2006, July). "Extracting opinions, opinion holders, and topics expressed in online news media text." In Proceedings of the Workshop on Sentiment and Subjectivity in Text (pp. 1-8).

[10] Kim, Y., Jung, Y., & Myaeng, S. H. (2007, November). "Identifying opinion holders in opinion text from online newspapers." In grc (p. 699). IEEE.

[11] Gangemi, A., Presutti, V., & Reforgiato Recupero, D. (2014). "Frame-based detection of opinion holders and topics: a model and a tool." Computational Intelligence Magazine, IEEE, 9(1), 20-30.

[12] Wiegand, M. (2013). "Predicate acquisition for opinion holder extraction." HiER 2013, 41.

[13] Wiegand, M., & Klakow, D. (2012, April). "Generalization methods for in-domain and cross-domain opinion holder extraction." In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 325-335).

[14] Kim, S. M., & Hovy, E. (2005, July). "Identifying opinion holders for question answering in opinion texts." In Proceedings of AAAI-05 Workshop on Question Answering in Restricted Domains (pp. 1367-1373).

[15] Chang, C. C., & Lin, C. J. (2011). "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

[16] Markov Random Field, from https://en.wikipedia.org/wiki/Markov_random_field

[17] Kudo, T., "CRF++: Yet Another CRF toolkit." https://taku910.github.io/crfpp/, 2003

[18] MPQA opinion finder, from http://mpqa.cs.pitt.edu/opinionfinder/

[19] Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014, June). "The Stanford CoreNLP natural language processing toolkit." In ACL (System Demonstrations) (pp. 55-60).

[20] 圖 3.2.10 意見表達的涵蓋範圍, from

http://www.aclclp.org.tw/rocling/2010/O10-2006.pdf/

[21] MPQA Subjectivity Lexicon, from

http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

[22] 李佳穎，2009，"意見持有者辨識及其意見立場分析"，國立台灣大學資訊工程所碩士論文。

[23] 台灣主流觀點(Taiwan News), from http://taiwannews.com.tw/etn/index_en.php/

[24] 行政院雙語詞彙對照表, from

http://www.ey.gov.tw/bilingual.aspx?n=878A02401BC1B95E

[25] 外交部雙語詞彙對照表, from

http://www.mofa.gov.tw/Bilingual.aspx?n=00464E5D5C7BF1E0&sms=B2C9A CBE62E87999

[26] 外文姓名拼音對照表, from

http://www.boca.gov.tw/ct.asp?xItem=5609&ctNode=677&mp=1#r22

[27] 台灣地名列表, from

http://tbroc2.eyp.com.tw/eyp/front/bin/ptdetail.phtml?Part=0012-002-001&Categ ory=350001934

[28] 台灣公營事業列表, from

https://zh.wikipedia.org/wiki/%E4%B8%AD%E8%8F%AF%E6%B0%91%E5%

9C%8B%E5%85%AC%E7%87%9F%E4%BA%8B%E6%A5%AD

在文檔中結合監督式及非監督式方法進行新聞文章意見持有者辨識之研究 (頁 72-78)