Biomedical Semantic Role Labeling System

CHAPTER 4 Results and Discussion

4.3 Related Work

4.3.2 Biomedical Semantic Role Labeling System

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

4.3 Related Work

4.3.1 Biomedical Semantic Role Labeling Corpus

PASBio[21] is the first PAS standard used in the biomedical field, but it does not provide the SRL corpus. GREC[22] is an information extraction corpus focuses on gene regulation event. However, GREC do not support the Treebank format SRL annotations[23].

BioProp is the only corpus that provides SRL annotations and annotates semantic role labels on the syntactic trees. BioProp is created by [24]. BioProp selects 30 most frequently or important verbs appearing in the biomedical literatures, and defines the standard of the biomedical PAS. Furthermore, following the style of PropBank[7], which annotates PAS on Penn Treebank ( PTB ) [23], BioProp annotates their PAS on GENIA TreeBank ( GTB ) beta version[25]. GTB contains a collection of 500 MEDLINE abstracts selected from the search results with the following keywords: human, blood cells, and transcription factors and contains the TreeBank that follows the style of Penn Treebank.

4.3.2 Biomedical Semantic Role Labeling System

Most semantic role labeling systems follow the pipeline method, which includes predicate identification, argument identification and argument classification. However, in recent years,

instead of using pipeline method, several researches have shown that using the collective learning method could outperform the pipeline method. [20] uses Markov Logic to collectively learned these stages on SRL. However, we found that there seem to be no SRL system using MLN in the biomedical field. [26] uses the domain adaption approaches to improve SRL in biomedical field. [27] considers SRL as token-by-token labeling problem and focuses on the SRL in the transport protein. BIOSMILE is the biomedical SRL system

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

focus on 30 frequently appearing or important verbs in biomedical literatures and trained on the BioProp, and it is based on Maximum Entropy ( ME ) Model.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

CHAPTER 5 Conclusion

We observe that some SRL ignore the complexity in classification and the dependencies between the semantic roles. These systems usually take all constituents as candidate semantic roles and use a post-processing step to deal with their dependencies. In this paper, to tackle both problems, we construct a biomedical SRL system that uses SRL patterns and a Markov Logic Network ( MLN ) to collectively learned semantic roles. However, SRL patterns are difficult to be manually written, and we use automatically generated approaches, to recognize the words boundaries and the candidates of semantic roles simultaneously. Our system is trained on BioProp corpus. The experimental results show that using SRL patterns can improve the performance by F-score 0.54% on overall ARG. Furthermore, using collective learning, which incorporated with linguistic constraints, can improve the result by F-score 1.65%. We show that uses SRL patterns can improve the efficiency of training model and predicate instances, and reduce the memory. Also, we show that our approaches can compete with current state-of-the-art approaches. The corpus used in our experiments is a small biomedical SRL corpus that only uses one out of four of GENIA TreeBank corpus and also focuses on 30 verbs. It is important to enable SRL to be trained on a large corpus in the future. We consider that our approaches provide a possible solution to process large SRL corpus.

‧

Technology, vol. 25, pp. 169-179, 2010.

[2] R. T.-H. Tsai, W.-C. Chou, Y.-S. Su, Y.-C. Lin, C.-L. Sung, H.-J. Dai, I. T.-H. Yeh, W.

Ku, T.-Y. Sung, and W.-L. Hsu, "BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features," BMC Bioinformatics, vol. 8, p. 325, 2007.

[3] S. Pradhan, W. Ward, K. Hacioglu, J. Martin, and D. Jurafsky, "Shallow Semantic Parsing using Support Vector Machines," in Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL-2004), Boston, MA, USA, 2004.

[4] K. B. Cohen and L. Hunter, "A critical review of PASBio's argument structures for biomedical verbs.," BMC Bioinformatics, vol. 7, 2006.

[5] S. Pradhan, K. Hacioglu, V. Krugler, W. Ward, J. H. Martin, and D. Jurafsky. (2005).

Support Vector Learning for Semantic Argument Classification

[6] T. Cohn and P. Blunsom, "Semantic role labelling with tree conditional random fields," in In Proceedings of CoNLL-2005, ed, 2005, pp. 169-172.

[7] P. Kingsbury and M. Palmer, "From Treebank to PropBank," ed, 2002.

[8] X. Carreras and L. Marquez, "Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling," 2005.

[9] D. Gildea and M. Palmer, "The necessity of parsing for predicate argument recognition," in ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2001, pp. 239-246.

[10] V. Punyakanok, D. Roth, W.-t. Yih, and D. Zimak, "Semantic role labeling via integer linear programming inference," in In Proceedings of COLING-04, ed, 2004, pp.

1346-1352.

[11] S. Riedel, "Improving the accuracy and Efficiency of MAP Inference for Markov Logic," in Proceedings of the 24th Annual Conference on Uncertainty in AI (UAI '08), ed, 2008, pp. 468-475.

[12] P. Domingos and M. Richardson, "Markov Logic: A Unifying Framework for

‧

Statistical Relational Learning," in PROCEEDINGS OF THE ICML-2004 WORKSHOP ON STATISTICAL RELATIONAL LEARNING AND ITS CONNECTIONS TO OTHER FIELDS, 2004, pp. 49-54.

[13] M. Richardson and P. Domingos, "Markov logic networks," Machine Learning, vol.

62, pp. 107-136, 2006.

[14] K. Crammer and Y. Singer, "Ultraconservative online algorithms for multiclass problems," Journal of Machine Learning Research, vol. 3, pp. 951-991, 2003.

[15] S. Riedel, "Improving the accuracy and efficiency of map inference for markov logic,"

presented at the Proceedings of UAI 2008, 2008.

[16] D. Gildea and D. Jurafsky, "Automatic labeling of semantic roles," Comput. Linguist., vol. 28, pp. 245-288, 2002.

[17] N. Xue, "Calibrating features for semantic role labeling," in In Proceedings of EMNLP 2004, ed, 2004, pp. 88-94.

[18] M. Surdeanu, S. Harabagiu, J. Williams, and P. Aarseth, "Using predicate-argument structures for information extraction," presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan, 2003.

[19] R. Agrawal, T. Imieli\, \#324, ski, and A. Swami, "Mining association rules between sets of items in large databases," SIGMOD Rec., vol. 22, pp. 207-216, 1993.

[20] S. Riedel and I. Meza-Ruiz, "Collective semantic role labelling with Markov logic,"

presented at the Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester, United Kingdom, 2008.

[21] T. Wattarujeekrit, P. Shah, and N. Collier, "PASBio: predicate-argument structures for event extraction in molecular biology," BMC Bioinformatics, vol. 5, p. 155, 2004.

[22] P. Thompson, S. Iqbal, J. McNaught, and S. Ananiadou, "Construction of an annotated corpus to support biomedical information extraction," BMC Bioinformatics, vol. 10, p.

349, 2009.

[23] A. Bies, "Bracketing Guidelines for Treebank II Style Penn Treebank Project," ed, 1995.

[24] W.-C. Chou, R. T.-H. Tsai, Y.-S. Su, W. Ku, T.-Y. Sung, and W.-L. Hsu, "A semi-automatic method for annotating a biomedical proposition bank," presented at the Proceedings of the Workshop on Frontiers in Linguistically Annotated Corpora 2006, Sydney, Australia, 2006.

[25] J. D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii, "GENIA corpus -- a semantically annotated corpus for bio-textmining," Bioinformatics, vol. 19, pp. i180-i182, 2003.

[26] D. Dahlmeier and H. T. Ng, "Domain Adaptation for Semantic Role Labeling in the Biomedical Domain," Bioinformatics (Oxford, England), 2010.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[27] S. Bethard, Z. Lu, J. Martin, and L. Hunter, "Semantic Role Labeling for Protein Transport Predicates," BMC Bioinformatics, vol. 9, p. 277, 2008.

在文檔中利用馬可夫邏輯網路模型與自動化生成的模板加強生醫文獻之語意角色標註 - 政大學術集成 (頁 37-0)

Biomedical Semantic Role Labeling System

CHAPTER 4 Results and Discussion

4.3 Related Work

4.3.2 Biomedical Semantic Role Labeling System

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

4.3 Related Work

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

CHAPTER 5 Conclusion

‧

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學