結論與未來展望 - 生醫文獻中特定關係組合之自動化擷取

本研究以建立自動擷取生醫文獻資料集內指定關係的模型為例，提出了一套系統化建立機器學習模型的方法，此方法可以跨領域使用。

本研究的兩組訓練資料集皆為不平衡資料，增加少數類別資料的方式為合成少數採樣技術(Synthetic Minority Over-sampling Technique, SMOTE)，將少數資料增加到與多數類別資料數量相同為止。建立模型時，先擷取每一指定關係對之特徵，依照不同的特徵選取方式，利用支援向量機得到不同的第一階段結果，將第一階段結果使用不同的回饋方法將結果以新特徵的方式回饋到訓練資料與測試資料中，並再次訓練，得到第二階段結果作為辨識效能。本研究在藥物—疾病關係辨識實驗之辨識效能 Accuracy 為 75.7%，此結果為透過回饋方法後效能提高 7.4%後的結果。在藥物—藥物關係辨識實驗之辨識效能 F-1 measure 為 57.6%，

此結果為透過回饋方法將效能提高1.7%後的結果。

以下為本研究內容與貢獻總結：

1. 本研究提出可以跨領域的文件內容指定關係辨識方法。

2. 本研究設計 N_transform 的演算法將文字轉換成數字。

3. 本研究在訓練流程中加入將分類結果回饋後重新訓練的回饋方法。

4. 建立自己的藥物—疾病關係辨識語料庫，並透過本研究方法在藥物—藥物關係辨識實驗交叉驗證語料庫為有效。

61

在未來，以下為後續可研究發展方向：

1. 回饋方法在目前的資料中回饋一次後結果就收斂了，未來可以更深入地進行回饋方法的理論研究。

2. 文字轉換方法(N_transform)在本研究中證明可以在支援向量機所建立的機器學習模型有效，未來可以研究此方法在其他類機器學習模型中是否有效，

演算法設計是否可以改進，與其他文字轉換方法是否可以組合應用。

3. 藥物—疾病關係辨識實驗的語料庫已透過本研究驗證為有效，未來可以可以透過其他機器學習的方法，建立效能更好的模型。

4. 本研究方法在不同領域也可進行關聯性擷取，可透過不同關係組合實驗來更完善的調整實驗方法。

參考文獻

Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p. 17). American Medical Informatics Association.

Björne, J., Heimonen, J., Ginter, F., Airola, A., Pahikkala, T., & Salakoski, T. (2011).

EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH - BASED FEATURE SETS. Computational Intelligence, 27(4), 541-557.

Björne, J., Kaewphan, S., & Salakoski, T. (2013, June). UTurku: drug named entity recognition and drug-drug interaction extraction using SVM classification and domain knowledge. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 651-659).

Bobic, T., Fluck, J., & Hofmann-Apitius, M. (2013). SCAI: Extracting drug-drug interactions using a rich feature vector. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 2, pp. 676-683

Bokharaeian, B., & Díaz, A. (2013, June). NIL UCM: Extracting Drug-Drug interactions from text through combination of sequence and tree kernels.

In Second Joint Conference on Lexical and Computational Semantics. Atlanta, Georgia, USA (pp. 644-650).

63

Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE:

synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

Chowdhury, M. F. M., & Lavelli, A. (2013). FBK-irst: A multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. Atlanta, Georgia, USA, 351, 53.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

Hailu, N. D., Hunter, L. E., & Cohen, K. B. (2013). UColorado SOM: extraction of drug-drug interactions from biomedical text using knowledge-rich and knowledge-poor features. Proceedings of SemEval, 684-8.

Neves, M. L., Carazo, J. M., & Pascual-Montano, A. (2009, June). Extraction of biomedical events using case-based reasoning. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task (pp. 68-76). Association for Computational Linguistics.

Rastegar-Mojarad, M., Boyce, R. D., & Prasad, R. (2013, June). UWM-TRIADS:

classifying drug-drug interactions with two-stage SVM and post-processing.

In Proceedings of the 7th International Workshop on Semantic Evaluation (pp.

667-674).

Sánchez Cisneros, D. (2013). UC3M: A kernel-based approach to identify and classify DDIs in biomedical texts. Association for Computational Linguistics.

Segura-Bedmar, I., Martínez, P., & Zazo, M. H. (2013). Semeval-2013 task 9:

Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013).

In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (Vol. 2, pp. 341-350).

Thomas, P., Neves, M., Rocktäschel, T., & Leser, U. (2013, June). WBI-DDI: drug-drug interaction extraction using majority voting. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 628-635).

石琢暐(2011)支持向量機簡介，Available form

http://eeil.imis.ncku.edu.tw/knowledgebase/zhi-yuan-xiang-liang-ji-support-vector-machine

李伯勳 (2017) 生醫文獻中疾病與藥物關係之樣式自動化擷取 (未出版之碩士論文)，國立臺灣師範大學資訊工程系

陳佩瑄 (2017) 以混合方法自生醫文獻擷取藥物－藥物交互作用之研究 (未出版

65

之碩士論文)，國立臺灣師範大學資訊工程系

張毓珊 (2009) 發展處理類別不平衡問題之資料探勘模式 (未出版之碩士論文)，

朝陽科技大學資訊管理系

在文檔中生醫文獻中特定關係組合之自動化擷取 (頁 69-74)