Improving Definite Anaphora Resolution by Effective Weight Learning and Web-Based Knowledge Acquisition

全文

(1)IEICE TRANS. INF. & SYST., VOL.E94–D, NO.3 MARCH 2011. 535. PAPER. Special Section on Data Engineering. Improving Definite Anaphora Resolution by Effective Weight Learning and Web-Based Knowledge Acquisition Dian-Song WU†a) , Student Member and Tyne LIANG†b) , Nonmember. SUMMARY In this paper, effective Chinese definite anaphora resolution is addressed by using feature weight learning and Web-based knowledge acquisition. The presented salience measurement is based on entropybased weighting on selecting antecedent candidates. The knowledge acquisition model is aimed to extract more semantic features, such as gender, number, and semantic compatibility by employing multiple resources and Web mining. The resolution is justified with a real corpus and compared with a classification-based model. Experimental results show that our approach yields 72.5% success rate on 426 anaphoric instances. In comparison with a general classification-based approach, the performance is improved by 4.7%. key words: definite anaphora resolution, feature weight learning, Web mining. 1.. Introduction. In natural language, anaphora plays an essential role in the cohesion of discourses. Anaphora denotes the phenomenon of referring back to previously mentioned entities in a text. The referring expression is called an anaphor and the entity to which it refers is its antecedent. Anaphora resolution denotes the process of identifying the anaphoric relation between two expressions in a context. Effective anaphora resolution facilitates the task of natural language processing such as text summarization, information extraction, and machine translation. Different kinds of anaphoric expressions can be utilized in the context, such as personal pronouns, definite noun phrases, and ellipses. In this paper, we focus on the resolution of Chinese definite anaphora. Traditionally, anaphora resolution is approached by hand-crafted rules concerning constraints like gender agreement, number agreement, syntactic parallelism, and proximity [1]–[4]. In the recent decade, the trend of anaphora resolution has been moved toward machine learning approaches [5]–[10]. Most of these learning-based approaches recast anaphora resolution as a binary classification problem. A classifier is trained in advance to determine whether a candidate and an anaphor are anaphoric or not. In addition, to deal with insufficient knowledge acquired from lexicons or given corpora, the World Wide Web has been widely used as a corpus [11]–[15]. For example, Modjeska et al. [14] utilized Web search and lexico-syntactic patterns to solve the Manuscript received June 4, 2010. Manuscript revised October 3, 2010. † The authors are with the Department of Computer Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu, Taiwan. a) E-mail: diansongwu@cs.nctu.edu.tw b) E-mail: tliang@cs.nctu.edu.tw DOI: 10.1587/transinf.E94.D.535. out-of-vocabulary problem in hand-crafted lexicon. In contrast to profound studies of English texts, efficient Chinese anaphora resolution has not been widely addressed [16]. Difficulties involved are mainly attributed to the following factors: First, morphological clues are rare for determining gender or number of Chinese nouns [17]. Second, no capitalization feature to identify proper nouns. Third, no sufficient ontology, such as WordNet, is available for identifying hypernymy or hyponymy relation between concepts. Not only morphological or syntactic knowledge but also information on lexical semantics and domain knowledge is required to enhance the resolution results. Moreover, there exist some drawbacks by adapting conventional approaches for anaphora resolution. For a rule-based approach, a salience score by manual weight assignment is usually adopted to select the antecedent. Errors may occur due to intuitive observations and subjective biases in selecting feature weight. On the other hand, the drawback of a classification-based approach is that it forces different candidates for the same anaphor to be considered independently [18]. Only a single candidate is evaluated at a time and the resolution proceeds in the reverse order of sentences until an antecedent is found. This may cause a real antecedent to be neglected once the classifier labels a candidate to be positive. In this paper, a novel approach using two strategies is presented to resolve Chinese definite anaphors in written texts and avoid the drawbacks mentioned above. One is an adaptive weight salience measurement for antecedent identification. A weighted ranker is utilized to estimate the entire set of candidates simultaneously. Another is a Webbased knowledge acquisition model to extract useful lexical knowledge, such as gender, number, and semantic compatibility. The experimental results show that our proposed approach yields 72.5% success rate on 426 anaphoric instances, enhancing 4.7% improvement while compared with the result conducted by a conventional classifier. The rest of the paper is organized as follows: Section 2 introduces the commonly-seen definite anaphora instances in Chinese texts. Section 3 describes the proposed method by using feature weight learning and lexical knowledge acquisition in detail. Section 4 describes the experimental results and analysis. Section 5 presents the final conclusions.. c 2011 The Institute of Electronics, Information and Communication Engineers Copyright .

(2) IEICE TRANS. INF. & SYST., VOL.E94–D, NO.3 MARCH 2011. 536. 2.. Chinese Definite Anaphora. such as hyponymy and hypernymy that are not included in lexicon resources.. 2.1 Types of Chinese Definite Anaphora. 2.2. Three common Chinese anaphoric phenomena are zero, pronominal, and definite anaphora. In our previous research, the distribution ratios of these three types are 57%, 29%, and 14%, respectively. In Chinese zero and pronominal anaphora resolution, it is showed that applying weight learning can significantly improve resolution performance in comparison with un-weighted methods [19]. Therefore, we address the problem of Chinese definite anaphora for overall comprehension of Chinese anaphora resolution tasks. In Chinese definite anaphora (DA), an antecedent can be mentioned by a definite noun phrase preceded by demonstratives like “ ” (this), “ ” (this), “ ” (that), “ ” (that). Similarly, an English definite noun phrase is introduced by a definite article “the”. In this paper, we tackle Chinese definite anaphor with the pattern like “[ (this)] + [ (quantifier)] + [ (physical noun phrase)]”. Grammatically, definiteness is a feature of noun phrases, indicating entities which are specific and identifiable in a given context. The type of DA may be partial overlap relation as in example (1), synonymous relation as in example (2), or hyponymy-hypernymy relation as in example (3).. The training and testing data are selected from Academia Sinica Balanced Corpus† (ASBC). The named entity identification is done by applying the hybrid approach presented in [20]. For noun phrase chunking, we built up a finite state machine chunker to chunk noun phrases which will be treated as antecedent candidates. In Chinese, the head noun occurs at the end of a noun phrase. Therefore, in a noun phrase, words preceding the head noun are regarded as modifiers. The head noun is assigned with feature values such as gender or animate, since it dominates the fundamental property of the noun phrase. There are five types of head nouns defined in [21]; they are: common nouns, proper nouns, location nouns, temporal nouns, and pronouns. Several examples of noun phrases recognized by the presented chunker are as follows: (1). (Nes)†† (Nf) (Na) (DE) (the basic rights of everyone). (2). (Na) (Na) (Nb) (the student union chairman Chenwensheng). (3). (Dfa) (VH) (DE) (a very famous company). (4). (Nd) (Nd) (Nd) (the afternoon of March 3). (1) Partial overlap relation:. Gulf War1 was led by the United States to attack Iraq in 1991. The war1 caused huge impact on international politics and economics. (2) Synonymous relation:. Thieves2 intruded into neighbor’s house last week. The police thought that the burglars2 probably enter the house through the windows. (3) Hyponymy-hypernymy relation:. Text Preprocessing. (A). (Na). (Nc). The presented chunker is also able to recognize verbal nominalization and transformation (as shown in Table 1) by utilizing heuristics discussed in [22]. These cases are handled by the following heuristics: 1. If the preceding word of the verb is tagged with DE, then the verb is treated as a noun during the chunking phase. Table 1 Verbal nominalization and transformation cases of word “努力” (work hard).. Parkinson’s disease3 is caused by the degradation of brain cells. Medically, it is believed that this disease3 is closely related to dopamine. Anaphoric relation in (1) can be resolved by matching the head nouns of noun phrases explicitly. As to the other two cases, surface features are no longer adequate to identify the correct antecedents. Most previous studies rely on pre-constructed lexicons as knowledge sources. However, it suffers from the problem of coverage. Besides, no sophisticated lexicon is available yet for identifying relation between Chinese expressions as shown in (3). Thus, we utilize a Web-based approach for exploiting semantic relationships. † Academia Sinica Balanced Corpus is available at http://www.sinica.edu.tw/SinicaCorpus/ †† The symbol in brackets denotes the part-of-speech tag of a word. A detail description is available at http://ckipsvr.iis.sinica.edu.tw/papers/category list.doc.

(3) WU and LIANG: IMPROVING DEFINITE ANAPHORA RESOLUTION BY EFFECTIVE WEIGHT LEARNING AND WEB-BASED KNOWLEDGE ACQUISITION. 537 Table 2. The positional distribution of anaphor-antecedent pairs.. Fig. 1. 2. If the verb is followed by a word tagged with DE, then the verb is regarded as a modifier of a noun phrase. Table 3. 3. If the verb is followed by the word “ is treated as an adverb.. The system architecture.. A comparison of knowledge acquisition methods.. ”, then the verb. In addition, we investigate the positional distribution of 618 anaphor-antecedent pairs in our training data. Table 2 shows that 93% of antecedents are in two sentences ahead of the definite anaphors. 2.3 Antecedent Candidate Identification For each definite anaphor we extracted the set of all potential NP-antecedents in the two-sentence window. In addition, the following constraints are applied to filter out candidates with respect to a corresponding definite anaphor. can denotes an item in the candidate set preceding the definite anaphor ana. If can satisfies any of the following patterns, it is regarded as a non-antecedent instance. 1. Conjunction pattern: ana [c] can or can [c] ana where c ∈ { } 2. Verb pattern: ana [Vt] can or can [Vt] ana where Vt denotes a transitive verb in a sentence. 3. Preposition pattern: ana [p] can or can [p] ana where p ∈ { } 3.. The Approach. Figure 1 illustrates the presented definite anaphora resolution which is incorporated with three external resources, namely Web search results, CKIP lexicon† , and Tongyici Cilin†† . The resolution is implemented in the training phase and the testing phase. The training phase involves feature weight learning and lexical knowledge acquisition. Three kinds of lexical knowledge are addressed, namely, gender, number, and semantic compatibility. In feature weight learning, an entropy-based approach is employed. The testing phase concerns text preprocessing, antecedent candidate identification, feature extraction, and antecedent identification. The following subsections describe each component. and the resolution procedure. 3.1. Lexical Knowledge Acquisition. To resolve Chinese definite anaphora more accurately, knowledge about gender, number, and semantic compatibility is essential. In order to acquire such knowledge, we utilize pre-constructed patterns, lexicon resources, and context information to extract lexical knowledge. However, these methods may suffer from the problem of data sparseness. To deal with this problem, web-based knowledge acquisition methods are then applied. Latent semantic relations which are not identified in local contexts can be acquired form web mining results. Table 3 shows the comparison of knowledge acquisition methods. 3.1.1. Gender Extraction. The gender extraction aims to classify each noun phrase to be male, female or unknown with the help of so-called gender-indicating pattern (GP) and Web mining results. All the gender modifiers are mined from the Web in advance by implementing the procedure as shown in Fig. 2. Moreover, there are six kinds of GPs (denoted as “GPi ” and 1 ≤ i ≤ 6) and each GP is utilize to identify the occurrence of masculine pattern or feminine pattern as shown in Fig. 3. Figure 4 illustrates the overall three-layer gender feature extraction for each Ni of an input document Di and it is †. CKIP (Chinese Knowledge Information Processing Group) lexicon is available at http://www.aclclp.org.tw/use ckip c.php †† Tongyici Cilin extended version is available at http://ir.hit.edu.cn/demo/ltp/Sharing Plan.htm.

(4) IEICE TRANS. INF. & SYST., VOL.E94–D, NO.3 MARCH 2011. 538. Fig. 2. The gender modifier mining algorithm.. Fig. 4. The gender extraction procedure.. can be decided as male or female, then return the corresponding gender. Step 3: Else transform Ni to queries according to each kind ]”, “Ni + [ of GPs. For example, “Ni + [ ]”. Search the Web corpus for each genderindicating pattern and calculate Gender(Ni ). If the gender feature can be decided as male or female, then return the corresponding gender. Step 4: For other cases, the gender feature is marked unknown. 3.1.2. Number Extraction. The number extraction is aimed to facilitate resolving plural anaphors. With the collection of numerical and quantitative clue words, the extraction is implemented as shown in Fig. 5. 3.1.3 Fig. 3. The gender-indicating pattern identification algorithm.. described as follows: Step 1: If Ni is matched with the tagged CKIP lexicon or Common Name Profile† , then return the corresponding gender. Step 2: Else Search Di with the help of gender-indicating patterns and gender information statistics Gender(Ni ) defined in Eq. (2). If the gender feature. Semantic Compatibility Extraction. To acquire semantic knowledge from the Web, we submit queries consisted of candidates and anaphors to the Google search engine. Queries are formed by patterns that structurally express the same semantic relationships. The cooccurrence statistics of such patterns can then be used as a mechanism for detecting the hypernymy-hyponymy relation between the definite anaphor and its potential antecedents. (apple)” and the definite In the case of a candidate “ (this kind of fruit)”, queries like < “ anaphor “ † Common Chinese person names are available at http://zh.wikipedia.org/w/index.php.

(5) WU and LIANG: IMPROVING DEFINITE ANAPHORA RESOLUTION BY EFFECTIVE WEIGHT LEARNING AND WEB-BASED KNOWLEDGE ACQUISITION. 539 Table 4. Fig. 5. Summary of features.. The number extraction procedure.. 3.3. Feature Weight Learning. The entropy value denotes the uncertainty associated with a random variable. In our case, a feature with lower entropy denotes that it can reduce uncertainty in selecting correct antecedents. Therefore, a feature with lower entropy is given a higher weight, and vice versa. In the training phase, 318 news documents containing 618 positive and 1077 negative pairs are used as training data. The weight of each feature is calculated by Eq. (4). Figure 7 shows the entropy-based weight distribution of each feature. weighti = 1 − entropyi (S ) v |S j | × entropy(S j ) entropyi (S ) = |S | j=1 entropy(S ) = − Fig. 6. The semantic compatibility extraction algorithm.. (apple is a kind of)” + “ (fruit)” >, < “ (apple this kind of)” + “ (fruit)” >, and < “ (apple and other)” + “ (fruit)” > are concerned and the implementation is shown as Fig. 6.. (4). n p n p log2 − log2 p+n p+n p+n p+n. where S : the set of training instances S j : the subset of training instances in which fvali has value j p: the number of positive instances n: the number of negative instances. 3.2 Feature Set 3.4 There are fifteen features concerned as shown in Table 4. can denotes an antecedent candidate and ana denotes the definite anaphor. For each feature, we set its value to be 1 if an antecedent candidate satisfies the feature constraint; otherwise we set its value to be 0.. Classification-Based Module. Support Vector Machine (SVM) is a useful technique for data classification. It is widely used in the research of natural language processing problems. In anaphora resolution, SVM-based classifiers are commonly applied for identifying.

(6) IEICE TRANS. INF. & SYST., VOL.E94–D, NO.3 MARCH 2011. 540 Table 5. Distribution of top 10 semantic classes. Semantic Class mankind equipments place machines organizations buildings fine arts nonhuman solid regions. Fig. 7. Table 6 Performance evaluation. Models Success rate Equal-weighted 48.6% Classification-based 67.8% Our method 72.5%. Entropy-based weight distribution.. Table 7. Performance of leave-group-out evaluation. Type Lexical Grammatical Semantic Heuristic. 4.. Fig. 8. The antecedent identification algorithm.. potential antecedents [8], [9]. To compare with the performance of our proposed method, we used SVM as a baseline model and utilize LIBSVM† as a classification tool. 3.5 Antecedent Identification The task of antecedent identification is to select the most likely candidate from the candidate set by Eq. (5). Each candidate is filtered by checking its gender, number, and animate agreement. “Agreementk ” is a binary function that has a value 0/1. It is noticed that the value of Rank(can, ana) will be set to be zero if one of the three agreements is zero. A candidate with the highest value is selected as the antecedent for the target definite anaphor. The antecedent identification is implemented as shown in Fig. 8.. Ratio 22.0% 8.9% 6.1% 4.4% 4.4% 3.8% 3.3% 3.0% 3.0% 2.8%. Success rate 62.6% 66.8% 58.2% 65.5%. Experiments and Analysis. We extract 204 news documents from ASBC as our resolution corpus and from this corpus 426 anaphor-antecedent pairs are identified by experts. Table 5 lists the top 10 semantic class statistics in our corpus. The resolution performance is evaluated in terms of success rate defined by Eq. (6). To evaluate the performance of our proposed method, we implement three resolution strategies for comparison as shown in Table 6. The first model utilizes equal-weighted salience measures to identify antecedents. Namely, the weight of each feature is set to be 1. In the second model, a classification-based method is implemented by using SVM. In our proposed method, each feature is weighted by Eq. (4). It is found that features with top five weights are Head Match, Sem Com, Animate, Same SC, and Sent Lead, respectively. This result indicates that Head Match, Animate and Same SC features are three dominant features for the characteristic of semantic agreement. In addition, the Sem Com feature shows the significance of collocate compatibility in selecting antecedents. Sent Lead justifies the fact that Chinese is a topic prominent language. Table 6 shows that our method yields 72.5% success rate on 426 anaphoric instances by employing entropy-based weight scheme and web-based lexical knowledge. It improves about 4.7% success rate while compared with a classification-based model. In addition, to find out the contribution of each type of features in our proposed method, we conduct a leave-group-out evaluation as shown in Ta† The LIBSVM tool is available at http://www.csie.ntu.edu.tw/∼cjlin/..

(7) WU and LIANG: IMPROVING DEFINITE ANAPHORA RESOLUTION BY EFFECTIVE WEIGHT LEARNING AND WEB-BASED KNOWLEDGE ACQUISITION. 541. ble 7. Four types of features are concerned, for example, lexical, grammatical, semantic, and heuristic. It shows that the type of semantic features plays the most important role since the success rate decreases significantly when this type of features is disable. success rate number of correct resolution cases = total number of anaphora cases identified 5.. (6). Conclusions. To our knowledge, our method represents the first attempt to use weight learning and Web-based knowledge acquisition for resolving definite anaphora in Chinese text. To overcome the drawback of common rule-based methods that employed manual weights, an effective measurement is constructed on the basis of entropy-based weight to estimate the likelihood of antecedent candidates. Moreover, to cope with the difficulty of feature extraction in Chinese texts, a Web-based knowledge acquisition model is proposed to extract gender, number, and semantic compatibility from contextual information and Web resources. Our experimental results show that the method can achieve a significant increase in the success rate of around 4.7% when lexical knowledge learning and entropy-based weighting are utilized. References [1] C. Kennedy and B. Boguraev, “Anaphor for everyone: Pronominal anaphora resolution without a parser,” Proc. COLING, pp.113–118, 1996. [2] S. Lappin and H. Leass, “An algorithm for pronominal anaphora resolution,” Computational Linguistics, vol.20, no.4, pp.535–562, 1994. [3] R. Mitkov, “Robust pronoun resolution with limited knowledge,” Proc. COLING/ACL, pp.869–875, 1998. [4] N. Wang, C.F. Yuan, K.F. Wong, and W.J. Li, “Anaphora resolution in Chinese financial news for information extraction,” Proc. 4th World Congress on Intelligent Control and Automation, pp.2422– 2426, 2002. [5] V. Ng and C. Cardie, “Improving machine learning approaches to coreference resolution,” Proc. 40th Annual Meeting of the Association for Computational Linguistics, pp.104–111, 2002. [6] J. Lang, T. Liu, and B. Qin, “Decision trees-based Chinese noun phrase coreference resolution,” Student Workshop of Computational Linguistics, 2004. [7] V. Ng, “Machine learning for coreference resolution: From local classification to global ranking,” Proc. 43rd Annual Meeting of the Association for Computational Linguistics, pp.157–164, 2005. [8] S. Bergsma and D. Lin, “Bootstrapping path-based pronoun resolution,” Proc. 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp.33–40, 2006. [9] Y.C. Li, Y. Yang, G.D. Zhou, and Q.M. Zhu, “Anaphora resolution of noun phrase based on SVM,” Computer Engineering, vol.35, no.3, pp.199–204, 2009. [10] J. Peng and K. Araki, “Zero-anaphora resolution in Chinese using maximum entropy,” IEICE Trans. Inf. & Syst., vol.E90-D, no.7, pp.1092–1102, July 2007. [11] R. Bunescu, “Associative anaphora resolution: A web-based approach,” Proc. EACL on the Computational Treatment of Anaphora, pp.47–52, 2003.. [12] K. Shinzato and K. Torisawa, “Acquiring hyponymy relations from web documents,” Proc. HLT-NAACL, pp.73–80, 2004. [13] K. Markert and M. Nissim, “Comparing knowledge sources for nominal anaphora resolution,” Computational Linguistics, vol.31, no.3, pp.367–402, 2005. [14] K. Markert, M. Nissim, and N. Modjeska, “Using the web for nominal anaphora resolution,” Proc. EACL Workshop on the Computational Treatment of Anaphora, pp.39–46, 2003. [15] N. Garera and D. Yarowsky, “Resolving and generating definite anaphora by modeling hypernymy using unlabeled corpora,” Proc. 10th Conference on Computational Natural Language Learning, pp.37–44, 2006. [16] S.P. Converse, “Resolving pronominal references in Chinese with the Hobbs algorithm,” Proc. 4th SIGHAN Workshop on Chinese Language Processing, pp.116–122, 2005. [17] H.F. Wang and Z. Mei, “Robust pronominal resolution within Chinese text,” J. Software, vol.16, no.5, pp.700–707, 2005. [18] P. Denis and J. Baldridge, “A ranking approach to pronoun resolution,” Proc. IJCAI, pp.1588–1593, 2007. [19] D.S. Wu and T. Liang, “Chinese pronominal anaphora resolution using lexical knowledge and entropy-based weight,” J. American Society for Information Science and Technology, vol.59, no.13, pp.2138–2145, 2008. [20] T. Liang, C.H. Yeh, and D.S. Wu, “A corpus-based categorization for Chinese proper nouns,” Proc. National Computer Symposium, pp.434–443, 2003. [21] C.H. Yu and H.H. Chen, A study of Chinese Information Extraction Construction and Coreference, Unpublished master’s thesis, National Taiwan University, Taiwan, 2000. [22] B.G. Ding, C.N. Huang, and D.G. Huang, “Chinese main verb identification: From specification to realization,” International J. Computational Linguistics and Chinese Language Processing, vol.10, no.1, pp.53–94, 2005. [23] S.J. Russell and P. Norvig, Artificial Intelligence: A modern approach, 2nd ed., Prentice Hall, 2003.. Dian-Song Wu received his M.S. degree in Computer Science from National Chiao Tung University, Hshichu, Taiwan in 2003. Now he is studying toward his Ph.D. degree at the graduate school of Computer Science, National Chiao Tung University, Taiwan. His research interests include machine learning, natural language processing, and web mining.. Tyne Liang received her Ph.D. Degree in Computer Science from National Chiao Tung University, Hshichu, Taiwan. Currently, she is an associate professor of the Dept. of Computer Science, National Chiao Tung University, Taiwan. Her research interests include information retrieval, natural language processing, web mining, and inter-connection network..

(8)