結論與未來展望 - 以混合式方法自生醫文獻擷取藥物

本研究提出了以機器學習方式和以規則為基方式擷取文獻資料集內的藥物

－藥物交互作用，並以兩階段完成辨識和分類。由於訓練資料集呈現不平衡狀態，

因此，將少數類別資料增加到與多數類別資料數量不相上下為止，接著擷取每一藥物對之輔助特徵、距離特徵、否定詞特徵、動詞特徵、詞性組合特徵、關鍵字

特徵和相鄰詞性特徵，依照不同的特徵選取方式，利用 SVM 訓練和預測的結果

皆不相同，將預測結果前幾高之實驗加入以規則為基方式，依照不同階段所使用的規則也不相同。在辨識與分類效能上，本研究MedLine 與 DrugBank 混合後之藥物－藥物交互作用擷取所得到辨識效能為 70.8%，分類效能為 62.5%，將此結果與DDI Extraction 2013 之參賽隊伍做比較，雖然效能無法優於第一名隊伍，但仍優於平均效能，在MEC 類別中之效能更為突出。

在未來，還有以下的後續研究發展方向：

(1) 增加更多的規則提升整體之辨識和分類效能，例如：考慮迭代詞的部分。

(2) 每組資料實驗結果在辨識階段分別有 10,816 種，在分類階段分別有 2,159 種，

本研究辨識和分類只選擇排名前 10 高進行分析，在未來可選擇排名前 50、

100 甚至更多進行分析，也可分析效能非排名前 10 高，加上以規則為基方式後排名為前10 之特徵組合方式。

(3) 此方法不僅可用於藥物－藥物交互作用關聯擷取上，也可用於其他關聯性擷取上，例如：食品－藥物關聯擷取或疾病－藥物關聯擷取。

(4) 將本研究所使用之方法撰寫成系統，輸入為斷句完且已知藥物實體位置，系統會依照句子的特徵和訓練模型的特性，輸出藥物對是否有交互作用存在，

如此一來，此系統可以給研究人員做參考，將可減少他們研究藥物對是否有交互作用之時間。

參考文獻

Altincay, H., & Ergun, C. (2004, January). Clustering based under-sampling for improving speaker verification decisions using AdaBoost. In SSPR/SPR (pp. 698-706).

Aronson, A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In Proceedings of the AMIA Symposium (p.

17). American Medical Informatics Association.

Björne, J., Heimonen, J., Ginter, F., Airola, A., Pahikkala, T., & Salakoski, T. (2011).

EXTRACTING CONTEXTUALIZED COMPLEX BIOLOGICAL EVENTS WITH RICH GRAPH ‐ BASED FEATURE SETS. Computational Intelligence, 27(4), 541-557.

Björne, J., Kaewphan, S., & Salakoski, T. (2013, June). UTurku: drug named entity recognition and drug-drug interaction extraction using SVM classification and domain knowledge. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 651-659).

Bobic, T., Fluck, J., & Hofmann-Apitius, M. (2013). SCAI: Extracting drug-drug interactions using a rich feature vector. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Vol. 2, pp. 676-683

Bokharaeian, B., & Díaz, A. (2013, June). NIL UCM: Extracting Drug-Drug interactions

from text through combination of sequence and tree kernels. In Second Joint Conference on Lexical and Computational Semantics. Atlanta, Georgia, USA (pp.

644-650).

Chang, C. C., & Lin, C. J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

Chowdhury, M. F. M., & Lavelli, A. (2013). FBK-irst: A multi-phase kernel based approach for drug-drug interaction detection and classification that exploits linguistic information. Atlanta, Georgia, USA, 351, 53.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

Hailu, N. D., Hunter, L. E., & Cohen, K. B. (2013). UColorado SOM: extraction of drug-drug interactions from biomedical text using knowledge-rich and knowledge-poor features. Proceedings of SemEval, 684-8.

Kubat, M., & Matwin, S. (1997, July). Addressing the curse of imbalanced training sets:

one-sided selection. In ICML (Vol. 97, pp. 179-186).

Lewis, D. D., & Catlett, J. (1994, July). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning (pp. 148-156).

Neves, M. L., Carazo, J. M., & Pascual-Montano, A. (2009, June). Extraction of

biomedical events using case-based reasoning. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task (pp. 68-76). Association for Computational Linguistics.

Rastegar-Mojarad, M., Boyce, R. D., & Prasad, R. (2013, June). UWM-TRIADS:

classifying drug-drug interactions with two-stage SVM and post-processing.

In Proceedings of the 7th International Workshop on Semantic Evaluation (pp. 667-674).

Sánchez Cisneros, D. (2013). UC3M: A kernel-based approach to identify and classify DDIs in biomedical texts. Association for Computational Linguistics.

Segura-Bedmar, I., Martinez, P., & de Pablo-Sánchez, C. (2011). Using a shallow linguistic kernel for drug–drug interaction extraction. Journal of biomedical informatics, 44(5), 789-804.

Segura-Bedmar, I., Martınez, P., & Herrero-Zazo, M. (2013). SemEval-2013 Task 9:

Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). Atlanta, Georgia, USA, 3206(65), 341.

Thomas, P., Neves, M., Rocktäschel, T., & Leser, U. (2013, June). WBI-DDI: drug-drug interaction extraction using majority voting. In Second Joint Conference on Lexical and Computational Semantics (* SEM) (Vol. 2, pp. 628-635).

Weiss, G. M., & Provost, F. (2001). The effect of class distribution on classifier learning:

an empirical study. Rutgers Univ.

Yang, Y., & Pedersen, J. O. (1997, July). A comparative study on feature selection in text categorization. In Icml (Vol. 97, pp. 412-420).

石琢暐(2011)支持向量機簡介，Available form

http://eeil.imis.ncku.edu.tw/knowledgebase/zhi-yuan-xiang-liang-ji-support-vector-machine

張毓珊 (2009) 發展處理類別不平衡問題之資料探勘模式，朝陽科技大學資訊管理系學位論文.

在文檔中以混合式方法自生醫文獻擷取藥物－藥物交互作用之研究 (頁 85-90)