結論與未來發展 - 以機率模型為基礎之生醫文件指代消解方法

在本篇論文中，提出了在生物醫學文獻上處理非代名詞指代消解問題的方法，

本研究將四篇生物醫學全文文本先進行分句處理以及雜訊的過濾，然後使用 GDep 剖析器(GENIA Dependency parser)分析句子，將句子進行標記基因名稱(tag gene names)、詞性標記(part-of-speech tagging)，和名詞組的標記及辨識(noun phrase chunking)。為了得到所需要的各項特徵值進行以下的處理，包括先行詞和指代詞間的範圍偵測(boundary detection)、辨識所有的名詞片語的類型(identify all NPs)，

並且使用特徵集與規則集擷取出需要使用的特徵值，最後使用Bayes‟ theorem 機率模型進行指代消解。實驗結果得到精確度(Precision)為 73.83%、回收率(Recall) 為 67.36%、F-度量(F-measure)70.36%。

本研究應用統計模型進行回指消解，實驗所得到的結果與 Gasperin (2008)等人和 D'Souza (2012)等人做的共指消解沒有辦法互相比較，本研究將統計模型應用在回指消解並提出了同分情形的判斷方法，研究顯示應用統計模型可以得到不錯的結果。

在未來的發展中，雖然在生物醫學文獻上處理指代問題能夠使用的特徵有限，

但希望能找出更多有用的特徵值或是將各個特徵值依照重要性給予權重以及使用辨識能力更好的剖析器，除此之外，可以進行距離特徵的優化，找出最適合此方法的最佳距離特徵，或是更精確的過濾指代詞，以期達到更好的結果。

參考文獻

Bayes‟ theorem. Available from http://en.wikipedia.org/wiki/ Bayes%27_theorem.

BioNLP-2011. Available from https://sites.google.com/site/bionlpst/.

Brennan, S.E., Friedman, M.W. and Pollard, C.J. (1987). “A Centering Approach to Pronouns,” Proceedings of Association for Computational Linguistics

Conference ACL’87, Stanford, California, USA, pp. 155-162.

Briscoe, T., Carroll, J. and Watson, R. (2006) “The second release of the RASP system,” Proceedings of Association for Computational Linguistics Conference ACL’06, Sydney, Australia, pp. 77-80.

Cardie, Claire and Wagstaff, Kiri. (1999). “Noun Phrase Coreference as Clustering,”

Proceedings of Joint SIGDAT Conference on Empirical Methods in NLP and Very Large Corpora, pp. 82-89.

Chen, B., Yang, X.F., Su, J., Zhou, G. and Tan, C.L. (2008). “Other-Anaphora Resolution in Biomedical Texts with Automatically Mined Patterns,”

Proceedings of International Conference on Computational Linguistics Conference COLING’08, Vol. 1, Manchester , pp. 121-128.

Christopher, D. Manning., Prabhakar, Raghavan. and Hinrich, Schütze. (2008).

Introduction to Information Retrieval, Cambridge University Press.

CLEF. Available from http://www.clef2013.org/index.php.

Dagan, I. and Itai, A. (1990) “Automatic Processing of Large Corpora for the

Resolution of Anaphora Reference,” Proceedings of International Conference on Computational Linguistics Conference COLING’90, Vol. 3, Helsinki, Finland,

pp. 330-332.

D'Souza, Jennifer. and Vincent, Ng. (2012). “Anaphora Resolution in Biomedical Literature: A Hybrid Approach, ” Proceedings of the ACM Conference on

Bioinformatics, Computational Biology and Biomedicine, pp. 113-122.

Eilbeck, K. and Lewis, Suzanna E. (2004). “Sequence Ontology annotation guide,”

Comparative and Functional Genomics, Vol. 5, no. 8, pp.642-647.

Gasperin, C. and Briscoe, T. (2008). “Statistical Anaphora Resolution in Biomedical Texts,” Proceedings of International Conference on Computational Linguistics Conference COLING’08, Vol. 1, Manchester, pp. 257-264.

GDep. Available from http://people.ict.usc.edu/~sagae/parser/gdep/.

GENIA corpus. Available from http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/

topics/Corpus/.

Hobbs, J. (1986). Readings in Natural Language Processing. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.

Kennedy, C., Boguraev, B. (1996). “Anaphora for Everyone: Pronominal Anaphora Resoluation Without A Parser,” Proceedings of 16^th conference on

Computational Linguistics COLING’96, Vol. 1, pp. 113-118.

Lappin, S., Leass, H.J. (1994). “An Algorithm for Pronominal Anaphora Resolution,”

Computational Linguistics, Vol. 20, no. 4, pp. 535-561.

Li, D.C., Miller, T. and Schuler, W. (2011), “A Pronoun Anaphora Resolution

System based on Factorial Hidden Markov Models,” Proceedings of Association for Computational Linguistics Conference ACL’11, Portland, Oregon, pp.

1169-1178.

Marcus, M.P., Santorini, B. and Marcinkiewicz, M.A. (1993).“Building a Large Annotated Corpus of English: The Penn Treebank,” Proceedings of

Computational Linguistics, Vol. 19, no. 2, pp. 313-330

McCarthy, J.F. and Lehnert, W.G. (1995). “Using Decision Trees for Coreference Resolution,” Proceedings of International Joint Conference on Artificial Intelligence Conference pp. 1050-1055.

MUC. Available from http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html.

NLPBA. Available from http://www.nactem.ac.uk/tsujii/GENIA/ERtask/report.html.

Penn Treebank. Available from http://www.cis.upenn.edu/~treebank/.

PubMED. Available from http://www.ncbi.nlm.nih.gov/pubmed.

QA4MRE. Available from http://celct.fbk.eu/QA4MRE/.

Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Sagae, K., Tsujii, J. (2007). “Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles,” Proceedings of EMNLP-CoNLL, pp.1044-1050.

Soon, W., Ng, H. and Lim, D. (2001). “A Machine Learning Approach to

Coreference Resolution of Noun Phrases,” Computational Linguistics, Vol. 27, no. 4, pp. 521-544.

Vlachos, A. and Gasperin, C. (2006). “Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain,” Proceedings of BioNLP at

HLT-NAACL. Conference LNLBioNLP’06, New York, pp. 138-145.

Yang , X.F., Su, J., Zhou, G. and Tan, C.L. (2004). “An NP-Cluster Based Approach to Coreference Resolution,” Proceedings of International Conference

onComputational Linguistics Conference COLING’04, Geneva, Switzerland, pp.

226-232.

Yang, X.F., Zhou, G., Su, J. and Tan, C.L. (2003). ”Coreference Resolution Using Competition Learning Approach,” Proceedings of Association for

Computational Linguistics Conference ACL’03, Sapporo, Japan, pp. 176-183.

Yang, Y., Li, Y.C., Zhou, G. and Zhou, Q.M. (2008). “Research on Distance

Information for Anaphora Resolution,” Journal of Chinese Information Processing, Vol. 22, no. 5, PP. 80-90.

在文檔中以機率模型為基礎之生醫文件指代消解方法 (頁 52-56)