• 沒有找到結果。

CHAPTER 1 Introduction

1.5 Our Goal

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

emphasizes the importance of collectively learned semantic roles.

1.5 Our Goal

In this paper, we focus on 1) how to automatically select and assign initial semantic roles; 2) how to enhance SRL with collectively learned semantic roles through using a Markov Logic Network[11]. Following is an overview of this paper. In Chapter 2, we describe our method. The subsection 2.1 introduces MLN proposed by [12]. The implementation of our SRL system is described in subsection 2.2. The subsection 2.3 proposes the method of using SRL patterns to select and assign initial semantic role labels. Due to the difficulties for employing biological experts in manually writing the SRL patterns, we propose an automatic pattern generation method. In the subsection 2.4, we introduce the collectively learned semantic roles method that implements linguistic constraints. In Chapter 3, we detail the experiments designed for examining the effect of our methods. In Chapter 4, we show the experimental results and related analysis. Finally, Chapter 5 concludes the contributions of this paper.

MLN combines first order logic (FOL) and Markov networks. In FOL, formulae consist of four types of symbols: constants, variables, functions, and predicates. Constants represent objects in a specific domain ( e.g. Part-of-speech: NN, VB, etc. ) . Variable is the range over the objects ( e.g., PoS ( Part-of-speech ) , where PoS {pos|posPartof speech} ) . Functions represent mappings from tuples of objects to objects ( e.g., ChildrenOf , where

i i

ChildrenOf( )children of treenode ) . Predicates represent relationships among objects ( e.g., PoS of headword ) , or attributes of objects ( e.g., Arg0 ) . Constants and variables may belong to specific types. An atom is a predicate symbol applied to a list of arguments, which may be constants or variables ( e.g., role(p,i,r) ) . A ground atom is an atom

values to all possible ground atoms. A knowledge base ( KB ) is a partial specification of a world; each atom in it is true, false or unknown.

2.1.3 Markov Logic Networks

An MLN is a set of weighted first-order formulae. Together with a set of constants representing objects in the domain, it defines a Markov network with one variable per ground atom and one feature per ground formula. The probability distribution over possible worlds

is given by

   



all first-order formulae in the MLN, gjis the set of groundings of the i-th first-order formula,

andgj

 

x 1if the j-th ground formula is true andgj

 

x 0otherwise. Markov logic enables us to compactly represent complex models in non-i.i.d. domains. General algorithms for inference and learning in Markov logic are discussed in Richardson and Domingos[13]. We uses 1-best MIRA online learning method [14] for learning weights and employs cutting plane inference [15] with integer linear programming as its base solver for inference at test time as well as during the MIRA online learning process. To avoid the ambiguity between the predicates in FOL and SRL, we refer the predicate in SRL as “event trigger” from now on.

2.2 Implement Biomedical Semantic Role Labeling

2.2.1 Formulating SRL

Our SRL system incorporates three components: (1) SRL patterns; (2) collective learning formulae; (3) a MLN-based classifier.

SRL patterns: The SRL patterns are the patterns described in subsection 2.3. We use

pattern to describe that there is an event trigger p and the constituent i has a semantic role r.

Collective learning formulae: The collective learning formulae are the formulae described in subsection 2.4.

MLN-based classifier: Our MLN-based classifier uses the features of BIOSMILE, transform them into the formulae. In subsection 2.2.2, we will propose our method about how to transform these features into the formulae and how to incorporate SRL patterns and the classifier. In section 2.2.3, we propose a method to automatically generate conjunction formulae using only annotated PAS information. In section 2.2.4, we further apply the features into the formulae, since MLN only accepts the formulae rather than features.

A basic formula consists of two predicates, one corresponding to the event trigger and the other one is a feature of a constituent. For example, the headword feature could be expressed in FOL as pattern_match(p,i,r)headword(i,w)role(p,i,r), where w is the headword of the constituent i. If the “+” symbol appears before a variable, it indicates that each different value of the variable has its own weight.

Table 2.1: The features are used in previous SRL systems.

BASIC FEATURES

 Predicate – The predicate lemma

 Path – The syntactic path through the parsing tree from the constituent being classified to the predicate

 Constituent type

 Position – Whether the phrase is located before or after the predicate

 Voice – passive if the predicate has a POS tag VBN, and its chunk is not a VP, or it is preceded by a form of "to be" or "to get" within its chunk; otherwise, it is active

 Head word – Calculated using the head word table described by Collins (1999)

 Head POS – The POS of the Head Word

 Sub-categorization – The phrase structure rule that expands the predicate's parent node in the parsing tree

 First and last Word and their POS tags

 Level – The level in the parsing tree

PREDICATE FEATURES

 Predicate's verb class

 Predicate POS tag

 Predicate frequency

 Predicate's context POS

 Number of predicates

FULL PARSING FEATURES

 Parent, left sibling, and right sibling paths, constituent types, positions, head words, and head POS tags

 Head of Prepositional Phrase (PP) parent – If the parent is a PP, then the head of this PP is also used as a feature

COMBINATION FEATURES

 Predicate distance combination

 Predicate phrase type combination

 Head word and predicate combination

 Voice position combination

OTHERS

 Syntactic frame of predicate/NP

 Headword suffixes of lengths 2, 3, and 4

 Number of words in the phrase

 Context words & POS tags 2.2.3 Conjunction formulae

In addition to the basic formulae described above, we also employ conjunction formulae.

We use a similar approach described in the subsection 2.3 to generate conjunction formulae.

However, unlike those patterns would like to achieve a higher recall and not care about the precision, the conjunction formulae should as possible as improving the recall and precision.

Therefore we use Apriori algorithm to generate conjunction formulae.

Apriori algorithm has been described in the subsection 2.3, and to generate conjunction formulae we set the default minimum support and confidence are 15 times and 80%, and we believe the values could generate the reliable conjunction formulae.

Conjunction formulae are composed of three or more predicates, one is the event trigger and the others are the linguistic properties of a constituent. For instance,

)

means that the constituent i should be labeled as “ARGM-LOC” when its constituent type is

“PP”, its first word is “in”, and its last word is “cell”.

2.2.4 Global formulae

Basic formulae and conjunction formulae are local formulae whose conditions only consider the observe predicates. That is the dependencies of the semantic roles do not take into considered. The global formula is the condition of the formula including hidden predicates or the constraints that cannot be violated. In our system the hidden predicate is learned semantic roles with dependences including tree collective and path collective.

2.3 Patterns for SRL

2.3.1 Introduction of the Patterns

In this section, we propose the formal definition of our patterns. In ideal situation, the patterns of SRL can express the sentences as PASs. However, it is difficult without the help of the biological experts. For example, a pattern indicates that the noun phrase that appears in front of an active verb such as bind is the agent, but another pattern indicates that the protein before bind is the agent. It is difficult to determine which pattern is corrected. The first pattern might be wrong while the noun phrase describes the process about a protein.

However the second pattern might be fail while the protein could not be recognized.

Therefore, it requires to manually design the dependencies of the patterns. Because it is

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

difficult for human to manually design the dependencies. Our patterns are designed to answer what are the candidate labels of the semantic roles on the sentences rather than what are the appropriate semantic role labels. Our patterns focus on removing the constituents that should not be assigned semantic role labels and recognizing the candidate labels of the semantic roles. Following sections describe our SRL patterns.

2.3.2 Tree Pruning

Since the ratio of the constituents with semantic roles is much lower than all constituents.

The goal of tree pruning is to reduce the number of the constituents. Some SRL systems also use the pruning methods[17] or pre/post-processing filtering method[10] on the SRL to reduce the complexity or improve the performance. These methods are also used in our SRL patterns. We use two different tree pruning methods in our SRL. The first one is removing the constituents in the same path with the predicate. If a constituent overlaps with the predicate and it should not be assigned the semantic role. Removing these overlapped constituents before classification not only make sure they cannot be assigned the semantic roles but also make training/testing efficiently. Figure 2.1.a shows an example. The second one is that SRL prefer to annotate semantic roles on the phrase rather than the token, while the constituents 1) are leaves, 2) do not have any sibling 3) are stop words, they should be removed. Figure 2.1.b shows an example.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

14

Figure 2.1: The examples show the tree pruning.

a. The constituents overlap with the predicate regulate.

b. The constituents are stop words or without the siblings.

2.3.3 Lexicon Pattern

Lexicon pattern assigns the semantic role label to the constituent. Like most of the SRL systems use lexicon features on argument identification/classification. Here we describe the semantic role labels could be found by their words, and we use the string match method to identify these semantic role labels.

Discourse ( ARGM-DIS ) : Discourse connects a sentence to a preceding sentence, it is not necessary to use classification to find them but a simple list.

Modals ( ARGM-MOD ) , Negation ( ARGM-NEG ): While the predicate next to these words, we would assign the words the semantic roles. We collect these words and semantic role pairs with the words list.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

2.3.4 Temporal Pattern

The semantic roles with number or time are difficult to be recognized. These semantic roles also make the sentences much complex. Recognizing these semantic roles before classification could decrease the complexity of the sentences. We manually write the patterns to recognize these semantic roles

Extent Marker ( ARGM-EXP ) : Extent marker indicates the amount of change occurring from an action such as “… fold”. We observe that extent markers usually are the siblings of the verb. Therefore, we design our pattern as following: if there is a trigger of the extent markers such as “%” or “fold”, the constituent of the sibling of the verb which contains this trigger would be assigned the extent marker.

Temporal Marker ( ARGM-TMP ) : Temporal marker indicates when an action took place. Like extent marker, temporal markers usually are the siblings of the verb. Therefore, we use the same methods to find temporal markers. Furthermore, temporal markers sometimes appear in the head of sentence, we also assign the temporal marker to the constituent which has the trigger of temporal marker such as “hour” and is the start of the sentence.

2.3.5 Conjunction Pattern

In addition to all above patterns, there still have a lot of potential patterns could be used to annotate the semantic roles. Here we propose a method that uses the association rule mining and can automatically generate the patterns that conjunct several features of the constituents.

For instance, first_word ( i, “in” ) ˄ last_word ( i , “cell” ) => role ( p, i, “ARGM-LOC” ), this pattern means the constituent i started with “in” and ended with “cell” should be assigned the locative modifiers ARGM-LOC. In subsection 2.3.5.1, we introduce the association rule mining; in subsection 2.3.5.2, we propose our formulation of the transactions on SRL; in subsection 2.3.5.3, we describe our filtering methods to select the conjunction patterns.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

16

2.3.5.1 Association Rule Mining

Association rule mining[19] is to discover the interesting relations, called association rules, from certain database, and it also is a popular research method. An association rule is a rule like “If a person buys wine and bread, he/she often buys cheese, too”. We found that the SRL patterns are very similar with the association rules. For an instance, a SRL pattern can be written as a rule like “If a constituent starts with the word in and ends with the word cell, it often plays an ARGM-LOC”. Therefore, we apply association rule mining to generate conjunction patterns. In order to discover the interesting relations, it is necessary to define four things including item, transaction, support and confidence. An item is the object participating in the rules, continuing the example, the started word in, the ended word cell and the semantic role ARGM-LOC are the items. The transaction is a collection of the items.

The support is the number of the itemset, a subset of the transaction, appearing in the collection of the transactions. A minimum support could be used to make sure that the mined rules are not to overfit the database. The confidence is the number of the rule hold divided by the number of the condition hold. A minimum confidence could make sure that mined rules often are corrected in the database. A maximum confidence could make sure that mined rules are not obviously in the database. In our paper, we will focus on how to discover the rules instead of how to implement the association rule mining method.

2.3.5.2 Formulate the Transaction

By observing the individual semantic role, we find sometimes the semantic role could be determined by its first and last words such as a phrase likes “in…cell” usually play a role ARGM-LOC. Therefore, we propose a method which could automatically generate the patterns like that, and the steps are below:

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

17

Figure 2.2: The examples for mining association rules.

a. T3 efficiently induced erythroid differentiation in these cells, thus overcoming the v-erbA-mediated differentiation arrest.

b. In contract mRNA representing pAT 591/EGR2 was not expressed in these cells.

Table 2.2: The information extracted from ARGM-LOC on Figure 2.2.a.

Constituent type : PP First word / POS : in / IN Last word / POS : cells / NNS

Syntactic path from the predicate : VBD > VP < PP Predicate : induce

Step 1: Extracts the information about all the arguments include constituent type, first word and last word, syntactic path from the predicate and the predicate. We treat these information as the items. For instance, in Figure 2.2.a, for ARGM-LOC, we could extract the information as following:

Step 2: We treat each constituent with the semantic role as a transaction, and its

Table 2.3: The transactions are transformed from Figure 2.2.

Itemset

information extracted in Step 1 is its items. For instance, we have two sentences as shown in Figure 2.2, and we could transform them into the transactions:

Step 3: Using association rule mining, we could generate the rule likes

)

2.3.5.3 Select the Patterns

However, the patterns generated in Step 3 probably are not suite for the SRL. We observe the last word” or “the first word and the syntactic path” and should appear more than

syntactic path” and should appear more than 2 times.

2.3.6 Syntactic Path Pattern

In addition to all above methods, we use the shortest syntactic path patterns, while the constituents have no candidate semantic role label, we check whether the constituent has similar syntactic path with semantic roles that appear in training set, if it exists, the constituent would be assigned the semantic role label.

2.4 Collective Learning for SRL

2.4.1 Collective Learning

Collective learning is also known as collective classification. In classification problems, they assign appropriate labels to the instances. For instance, the disease-gene related document classification problem distinguish disease-gene related document from other documents. In this problem, it assumes whether the document is disease-gene related or not, that is independent with other reference documents. However, there is rich information on its reference documents. Using collective learning can benefit from this information. And MLN also show that it performs well on collection learning[20].

2.4.2 Linguistic Constraints

The linguistic constraints[10] have shown their contributions on SRL. In our paper, we called the linguistic constraints as tree collective and path collective.

Tree collective indicates that two or more arguments in a sentence may be assigned the same semantic role, which contradicts PAS. To prevent this, we use the formula

This formula ensures that each semantic role is assigned to only one constituent. We called

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

20

the formula as tree collective, since the formula limits an event trigger could not has more than one core argument ( the number argument : ARGX ).

Furthermore, the arguments may overlap when a node and it antecedent node(s) are all assigned semantic roles. The formula overlap(i, j)role(p,i,r1) role(p, j,r2) 0

ensures that if two or more constituents overlap, then only one can be assigned a semantic role. We called the formula as path collective, since the formula limits the argument could not appear in the same path on the syntactic tree.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Table 3.1: The statistics on the BioProp corpus.

Role Number

Core argument types 11

Adjunctive argument types 21

Feature Number

Constituent types 17

Unique words 5258

Part-of-speech 34

Other Number

Event types 30

Abstracts with Propositions 445

Sentences with Propositions 1622

Propositions 1962

CHPATER 3 Experiment

3.1 Dataset

To evaluate our SRL system, we select BioProp[9] as our dataset. BioProp is a semantic role labeling corpus, including 445 biomedical abstract labeled with the semantic roles and 30 predicates, which are most important or frequently appearing in biomedical literatures.

Table 3.1 shows the statistics of the BioProp.

Core arguments play the major semantic role of the event, including ARGX, R-ARGX and C-ARGX. ARGX usually plays the agent, patients and objects; R-ARGX indicates the start of the clause that describes ARGX; C-ARGX describes the continuous ARGX.

Adjunctive arguments play the location, manner, temporal, extent used to indicate the state of the event.

3.2.1 Experiment 1 – The Effect of Automatically Generated Patterns

In this experiment, we evaluate the effect of using SRL patterns. In order to evaluate the configuration implements the basic formulae. In this configuration,

) means automatically generated patterns are not used in this configuration. 3) BIOSMILE + pattern : The configuration implements all the patterns and the formulae including basic formulae and conjunction formulae. Comparing configuration 2 and configuration 3 could show the effects of using automatically generated patterns.

3.2.2 Experiment 2 – Improvement by Using Collective Learning

In this experiment, we examine whether the patterns incorporated with collective learning could further enhance SRL. We compare four different configurations. 1) BIOSMILE : BIOSMILE is the same configuration with that is used in experiment 1. 2) BIOSMILE + pattern : it is also the same with that is used in experiment 1. 3) BIOSMILE + CL : BIOSMILE incorporate with the collective learning. 4) BIOSMILE + pattern + CL : BIOSMILE + pattern incorporate with the collective learning.

3.3 Evaluation Metric

for calculating precision and recall are as follows:

In order to develop a much fairer environment, we apply a two-sample paired t-test, which is defined as following:

The null hypothesis, which states that there is no difference between the two configurations A and B, is given as

B

H0:A 

whereAis the true mean F-score of configuration A, Bis the mean of the configuration B, and the alternative hypothesis is

B

H1:A 

A two-sample paired t-test is applied since we assume the samples are independent. As the number of samples is large and the samples’ standard deviations are known, the following two-sample t-test is suitable:

If the resulting t-score is equal to or less than 1.67 with a degree of freedom of 29 and a statistical significance level of 95%, the null hypothesis is accepted; otherwise it is rejected.

相關文件