Comparison with Open IE Methods - Performance Evaluation

4. Performance Evaluation

4.4 Comparison with Open IE Methods

We compare the proposed feature set with those of three well-known open IE methods, namely, TEXTRUNNER (Banko et al., 2007), O-CRF (Banko and Etzioni, 2008), and StatSnowball (Zhu et al., 2009). We conduct two types of evaluation on the compared methods. The first type compares the original methods to examine whether the compared methods can detect interactive segments successfully. The other type compares the feature sets of the compared methods. For each compared method, we use its feature sets to train a ME classifier, to evaluate whether the features can detect interactive segments precisely. To ensure the comparison is fair, an ME classifier is trained for each feature set and 10-fold cross validation is used to obtain its global performance.

TEXTRUNNER, the first open IE system, employs six syntactic features to extract the relations between entities. The features includes the part-of-speech tag sequences between an entity pair, the number of tokens and the number of stop-word between an entity pair, whether an object is a proper noun or not, and the part-of-speech tag to the left of target entity and to the right of candidate entity.

O-CRF considers syntactic features, including POS tags and regular expressions of syntax (e.g., detecting capitalization, punctuation, etc.). It also uses context words to identify relation keywords between entities. The context words consist of the word sequence between an entity pair, the conjunctions occurred in the adjacent six words left or right to the current token, and the punctuations between an entity pair.

StatSnowball also adopts syntactic features to identify relation keywords between entities. The selected features include POS tags and occurrences of nonstop words. Originally, the features of StatSnowball focus on nouns, but interactions between persons are usually described by verbs. To have a fair comparison, we adjust the features related to nouns to verbs.

For example, a feature of StatSnowball is to determine whether the previous token of a target entity is a noun. We change it to examine whether the previous token of a target entity is a verb. Specifically, the features of StatSnowball examine whether the verbs between an entity pair are not stop words and their occurrences are more than a pre-defined threshold, whether the previous token of a target entity is a verb, and whether the following token of a candidate entity is a verb.

The features of the compared methods are listed in Table 4-5, and we evaluate the open IE methods on the feature sets. Notably, O-CRF and StatSnowball, which are designed for relation extraction, extract interaction keywords from a candidate segment in our experiment.

Hence, a candidate segment is classified as non-interactive if no interactive keyword is extracted from it.

Table 4-5. The features of the comparative methods

Method Description of features

TEXTRUNNERF

F₁: the presence of part-of-speech tag sequences between entity pair.

F2: the number of tokens between entity pair.

F3: the number of stop-words between entity pair.

F₄: whether or not an object e is found to be a proper noun.

F₅: the part-of-speech tag to the left of target entity e_t. F6: the part-of-speech tag to the right of candidate entity ec.

O-CRFF

F1: the part-of-speech tags sequence between entity pair.

F₂: is there any punctuation between entity pair.

F₃: context words sequence between entity pair.

F4: conjunctions of features occurring in adjacent positions within six words to the left and six words to the right of the current token.

StatSnowball_F

F1: verbs between entity pair are all not stop word and occur more than MIN_OCCUR times.

F₂: the previous token of target entity e_t is verb.

F₃: the following token of candidate entity e_c is verb.

The performance results of the compared features and methods are shown in Table 4-6.

The results of TEXTRUNNER_F, O-CRF_F, and StatSnowball_F denote the performance of the compared feature sets by training ME classifiers, and the result of TEXTRUNNER, O-CRF, and StatSnowball denote the performance of the original systems with our corpus. As shown in the table, FISER outperforms all the compared methods and feature sets. As the compared methods and feature sets simply use syntactic features, they cannot sense the semantics of person interactions in candidate segments successfully. By contrast, FISER incorporates semantic and context-dependent features, and thus achieves the best precision, recall, and F1 score. O-CRF outperforms StatSnowball and TEXTRUNNER because its feature set

considers the context information of a candidate segment. It is interesting to note that O-CRF and StatSnowball are inferior to O-CRF_F and StatSnowball_F, and the recall rates of O-CRF and StatSnowball are very low. Basically, O-CRF and StatSnowball employ the CRF model to learn the extraction patterns of interaction keywords. Since the non-interactive segments have no interaction keywords, only the interactive segments of the training data are useful for pattern learning. As shown in Table 4-1, most of the candidate segments are non-interactive.

Thus, the learned extraction patterns cannot detect interactive segments completely, and the recall rates of the methods deteriorate. The outcome corresponds well with the observation in (Li et al., 2008) that detecting relation segments is necessary to ensure that extractions of relation keywords are reliable.

Table 4-6. The interaction detection result of compared methods

Features Precision Recall F1-score

TEXTRUNNER 32.7% 2.2% 4.0%

O-CRF 42.1% 8.8% 14.6%

StatSnowball 48.1% 5.5% 9.9%

TEXTRUNNER_F 48.8% 34.3% 38.9 %

O-CRF_F 53.2% 39.8% 43.5%

StatSnowballF 52.6% 25.2% 32.1%

FISERall 70.2% 54.8% 60.7%

FISER_best 72.6% 55.6% 61.9%

Based on the experimental results, we conclude that syntactic features cannot detect interactive segments correctly. Existing open IE studies focus on discovering static and permanent relations between entities. Hence, the syntactic features of the text in entities are useful. In (Banko and Etzioni, 2008), Banko and Etzioni claim that 86% of relation expressions are in the given entities. However, according to our data corpus, only 56% of the interaction expressions are in the given person names in Chinese. In addition, Chinese sentences are complex and contain many unknown words (Ling et al., 2003) that affect the correctness of the syntactic features used by the compared methods. Therefore, the compared methods are inferior in terms of detecting interactive segments. To sum up, FISER employs effective features that cover syntactic, context-dependent, and semantic information of text to detect interactive segments in topic documents successfully. Because FISER filters out non-interactive segments and discriminates between interactive segments effectively, it outperforms the compared methods.

在文檔中主題文件內人際互動關係擷取之研究 (頁 39-44)