Results from Bootstrapping Approach - 中文動詞上下位關係自動標記法

Bootstrapping approach returned totally 11289 Chinese verb pairs mapped from WordNet. The large amount of results were manually evaluated through testing sentences by a group of peo-ple² who are all native to Chinese with linguistic background. Not only correct hypernymy-troponymy verb pairs were calculated, several tags were also assigned to mark incorrect verb pairs. Evaluative tags are shown as follows:

• Hypernymy- troponymy: Returned Chinese verb pairs are in hypernymy-troponymy re-lation.

• Non-lexicalized verb: One or both of the verbs in returned pairs are not lexicalized in Chinese; that is, they may be expressed by phrases or sentences. But conceptually or semantically, the hypernymy- troponymy relation holds true.

• Other relation: Returned verb pairs stand in other lexical semantic relations.

• Unrelated: Returned pairs are unrelated to each other.

Table 4.3 displays a general result from bootstrapping approach. As can be seen, a total of 8305 verb pairs out of 11289 were marked hypernymy- troponymy relation, resulting in a 73.56% accuracy rate (correct returned pairs are shown in Appendix C). In Huang et al.’

studies [24][25], a small portion of experimental data in Chinese (210 lemmas including N, V, Adv. and Adj.) was used to investigate how LSRs can be inferred by bootstrapping. In terms of LSRs inference for verbs, the precision for English-to-Chinese semantic relation inference is around 80%, within which 70% for hypernym inference and 82.4% for hyponymy inference.

Much similar to previous studies, our results reach a 73.56% accuracy rate. With a much larger

2There are totally four people spending about two weeks on evaluating and tagging the returned results.

Tags Number of verb pairs Percentage

Hypernymy- troponymy 8305 73.56%

Non- lexicalized verb 893 7.91%

Other relation 700 6.20%

Unrelated 1391 12.32%

Total 11289 100%

Table 4.3: Overall results from bootstrapping approach

data source, the precision rate is a bit lower than the results in [24]. Previous studies focused on how and which LSRs can be inferred through bootstrapping. Only correct results and the feasibility of the approach were discussed in the studies. Without more in-depth analyses, some problems that cause the error were merely concluded by translational idiosyncrasies. In this thesis, incorrect results receive a scrutinized discussion. For returned verb pairs which can not be assigned a hypernymy- troponymy relation, they are roughly classified into three types of errors. Surprisingly, as Table 4.3 shows, 12.32% returned pairs were marked Unrelated which is the highest one within the three error types. Unlike Non-lexicalized verb and Other relationin which returned pairs are still stand in certain LSRs, Unrelated verb pairs show no semantic connections between two verbs. These results contradict our assumption and deserve an in-depth investigation. Therefore, in the following subsections, several causations that lead to the errors will be discussed.

4.2.1 Error Analyses

Evaluative tags including Non-lexicalized verbs, Incorrect, and Other relation were assigned to returned pairs which are not in hypernymy- troponymy relation.

Tags Number of verb pairs Percentage

Non- lexicalized verb 893 7.91%

Other relation 700 6.20%

Unrelated 1391 12.32%

Total 2984 26.43%

Table 4.4: Non hypernymy-troponymy verb pairs (Total number of returned verb pairs= 11289)

As Table 4.4 shows, there are totally 2984 verb pairs being divided into three types of errors.

These three types of errors were further scrutinized in this subsection to investigate the reasons that decrease the performance, that is, what makes our returned verb pairs incorrect and what other relations could a returned pair stands in if they are not in our target relation. Generally speaking, errors arise from translational idiosyncrasies along with inaccuracy translations and conceptual differences cross languages. They will be illustrated as follows:

Non-lexicalized verbs

Without further illustration, a single word in one language often has meanings that require several words in another language to explain. By observing our results, it is noticed that many verbs, especially for troponyms, could not be described by a single lexeme in Chinese. Consider the following examples:

(16) a. anesthetize ⇒ cocainize

b. 麻醉 (MaZue) ⇒ 用古柯鹼麻醉 (Yong GuKeJian MaZue)

(17) a. laugh ⇒ chuckle

b. 笑 (Shiao) ⇒ 咯咯地笑 (GeGe Di Shiao)

Example (16-a) is a synset pair in WordNet standing in hypernymy-troponymy relation. This thesis hereafter uses the arrow to indicate troponymous relation of which the word to the left of the arrow is the hypernym and the word to the right is the troponym. Hence, under a cer-tain sense in WordNet, ‘anesthetize’ is the hypernym of ‘cocainize’ or ‘to cocainize’ is a way to ‘anesthetize’. After mapping through ECTED, the program returned Chinese verb pairs as (16-b) shows. Apparently, there is no single lexeme in Chinese can properly describe the troponym ‘cocainize’; rather, it is expressed by phrase in which more specific manner (Yong GuKeJian‘use cocaine’) is added to a general verb (MaZue ‘to anesthetize’). Similarly, the tro-ponym ‘chuckle’ in (17-a) can not be properly translated into one single lexeme in Chinese but has to be expressed by an adverbial phrase GeGe Di Shiao ‘to laugh (chucklely)’. By observ-ing the results which were tagged Non-lexicalized verbs, they are wildly found to be expressed by lexico-syntactic patterns yi/yong...Vh ‘by/with (manner) to Vh’ or ...Di Vh‘to VhX-ly’. In-terestingly, this conforms to our lexico-syntactic patterns used in previous approach for we assumed that troponyms are expressed by adding specified manners to more general verbs.

This also explains why non-lexicalized verbs appear more often in the troponyms but not hy-pernyms. Translational discrepancy also appears in verbalized words, here is an example:

(18) a. prepare ⇒ summerize

b. 準備 (ZuenBei) ⇒ 為夏天做準備 (Wei ShaTien Zuo ZuenBei)

English is full of inflectional information; syntactic categories can be flexibly changed and created by altering in morphology. As (18-a) shows, the troponym ‘summerize’ is verbalized from its noun ‘summer’, meaning to prepare for summer. Although verbalization is not re-stricted to English, yet in Chinese, verbalization requires several words to express as (18-b) shows. Foreign words also affect the preciseness in equivalent translation, especially revealing in terminology, jargon, or vernacular. Consider the following example:

(19) a. score ⇒ eagle

b. 得分 (DeFen) ⇒ 在打出低於標準桿兩桿的桿數 (Zai DaChu DiYu BiaoZuenGan Liang Gan De GanShu)

The troponym ‘eagle’ in (19-a) is used as a terminology in golf, meaning to shoot in two strokes under par, thus is a way to score. However, with no precise translation can be found in Chinese, it can not but to be expressed following the definition of its origin as can be seen in (19-b).

From a total number of 11289 verb pairs acquiring from bootstrapping approach, 893 pairs involve in this type of error, account for 7.91% of total as shown in Table 4.4. Comparatively speaking, the proportion is not the highest one; nevertheless, the problem of non-lexicalization might not be easily solved. This problem can be traced to the very beginning issue that there is no word delimiters in Chinese. Without blank to mark word boundaries, the distinction be-tween words, morphemes, and lexicons are not easily defined. Note that all verb pairs marked by Non-lexicalized verb are conceptually or semantically standing in a hypernymy-troponymy relation except for the concepts are described by phrases or sentences. Lexicalization certainly yields problems especially in creating a WordNet-like database. Although in the construction of ECTED, each translated entries were expressed by lexicalized words rather than

descrip-tive phrases as possible as they can [25], yet our results suggest that there is still room for modification.

Other relations

Returned results sometimes were found to stand in other relations. We grouped these wrong pairs together because they are not unrelated to each other nor in our target relation. The most common semantic relation found within wrongly-acquired Chinese verb pairs is near-synonymy or synonymous relation. These can be illustrated by the following examples:

(20) a. disconnect ⇒ detach

b. 使分離 (ShiFenLi) ⇒ 使分開 (ShiFenKai)

(21) a. end ⇒ lapse

b. 結束 (JienShu)⇒ 終止 (ZhongZhi)

Under reasonable postulation, returned verb pairs should stand in hypernymy-troponymy rela-tion since LSRs can be inferred by bootstrapping. However, the returned pairs such as (20-b) and (21-b) exhibit unexpected results in that both of these pairs are indeed in a near synony-mous relation. If we look back to the glosses of their English counterparts in WordNet, we might discover the differences in conceptualizing events across languages. To be more spe-cific, ‘disconnect’ in (20-a), according to its gloss in WordNet, means ‘to make disconnected, disjoin or unfasten’ while its troponym ‘detach’ carries the meaning that ‘cause to become detached or separated’. Apparently, in English, ‘to detach’ can be seen as a way of making things disconnected although the distinction might be slight. However, this distinction disap-peared when they are translated into Chinese. As (20-b) shows, both ShiFenLi and ShiFenKai

carry nearly the same meaning that ‘to cause something to separate’. Similarly, example (21-a) and (21-b) reveal the same problem. Under certain sense in English, ‘to lapse’ is ‘to end’ at least for a long time, thus can be seen as a troponym of ‘end’. Nevertheless, the differentiation vanished after translated into Chinese. As (21-b) shows, Jieshu and ZhongZhi are intuitively judged as near synonymous. The difference between ‘to end’ and ‘to lapse’ can not be found in their Chinese counterpart, at least, this difference has not be transported in ECTED. Moreover, many returned pairs are found to be exactly the same. For example, a pair of hypernymy-troponymy verb in (22-a) were mapped to two identical verbs in Chinese through ECTED as shown in (22-b).

(22) a. change ⇒ convert

b. 改變 (Gaibien) ⇒ 改變 (Gaibien)

Similar to examples (20) and (21), example (22) reveals the same problem in that the concept which makes a pair of verb synset stand in a hierarchical relation disappears after transported into Chinese. This time, as (22-b) shows, bootstrapping returned two identical verbs which are certainly impossible to have a hypernymy-troponymy relation. This indicates the difference in conceptualizing the manners of verbs. In English, the concept of ‘change’ is broader than

‘convert’ and thus includes ‘convert’. Despite the slight difference in English, there is no such distinction in Chinese. Therefore, both ‘change’ and ‘convert’ were translated into GaiBien in Chinese.

In addition to near- synonymy or synonymy relation found in returned pairs, reversed rela-tion is also observed within our results. The reversed relarela-tion here indicates that for an English

hypernymy-troponymy synset pair Eh and Et, each were mapped to the equivalent translations in Chinese C_h and C_t. However, the expected C_h ⇒ C_trelation did not exist, rather, there is a reversed relation between C_hand C_tin which C_tis more reasonable to be the hypernym of C_h. More concrete example is shown as follow:

(23) a. expectorate ⇒ spit b. 吐痰 (TuTan) ⇒ 吐(Tu)

As example (23-b) shows, Tutan ‘to spit (phlegm)’ and Tu ‘to throw up (anything from the stomach or lung)’ were returned by mapping from (23-a). However, the returned verb pair does not stand in a hypernymy-troponymy relation, that is, TuTan is not the hypernym of Tu for they can not pass our evaluating sentences. With a further observation, it is more reasonable to judge that Tu is actually the hypernym of TuTan. Once again, the reason that leads to this kind of error is due to how languages conceptualize verbs and how manners are distincted by verbs across languages. According to the gloss in WordNet, ‘expectorate’ is defined as ‘to discharge (phlegm or sputum) from the lungs and out of the mouth’ and ‘spit’ is defined as ‘expel or eject (saliva or phlegm or sputum) from the mouth’. These two verbs are districted by manner.

However, when these two verbs transported into Chinese, the distinction of manner vanished.

Above examples validate the truth that linguistic ontologies vary, the way to describe a concept may not be exactly the same across languages. Especially, it is found that English conceptualizes verbs in a more detailed way in which manners are distinguished by different lexeme. On the contrary, Chinese conceptualizes verb meanings in a broader or more general mode. Within a total of 11289 returned verb pairs in Chinese, there are 700 pairs being marked

to stand in semantic relations other than hypernymy-troponymy relation, accounted for 6.20%

of total. Whether these incorrect verb pairs stand in near-synonymy relation or reversed rela-tion are not further divided because they come up from similar causarela-tions as described in this subsection.

Unrelated

There are totally 1391 returned pairs being evaluated as unrelated, accounted for 12.32% pro-portion which is the highest within the three error types. Unrelated pairs, unlike the above mentioned error types, do not stand in hypernymy-tropoymy relation nor any lexical semantic relations. Having so many incorrect returned pairs from bootstrapping is somehow surprising since it is unexpected that a hypernymy-troponymy synset pair in WordNet turns out to be unre-lated in Chinese after bootstrapping. Therefore, the causations which decrease the performance deserve a scrutinized investigation. By observing the results, unrelated verb pairs in Chinese can be ascribed to two main problems: inaccurate translation in ECTED and abstract concept verbs such as ‘make’, ‘be’, and ‘put’ in English.

The first problem leads to unrelated verb pairs is due to inaccurate translation in ECTED.

Consider the following examples:

(24) a. change integrity ⇒ condense b. 整型 (ZenShing) ⇒ 濃縮 (NongSuo)

(25) a. dance ⇒ folk dance

b. 舞蹈 (WuDao) ⇒ 民間舞曲 (MinJenWuChu)

Intuitively, example (24-b) could not be judged as a hypernymy-troponymy pair for that Nong-Suo ‘to condense’ is hardly being associated with ZenShing ‘to do practice surgery’. If we compare (24-b) with its English counterpart (24-a), it is apparently that there is a translational discrepancy between ‘change integrity’ and ‘ZenShing’. According to the definition in Word-Net, ‘change integrity’ describes something ‘changes in physical make-up’ which is wrongly translated into ‘ZenShing’ in Chinese. Established by common usage, ZenShing in Chinese al-most exclusively means ‘to do the practice surgery’. The incorrect translation in ECTED con-sequently leads to the unrelated verb pair returned by bootstrapping. Similarly, (25-b) displays an example of inaccurate translation in ECTED for that ‘folk dance’ was wrongly translated into a noun MingJenWuChu ‘a dance music for folk dance’. Inaccurate translation sometimes involves in imprecise translation as well, that is, the translation itself is not totally unrelated to its English counterpart. Rather, the Chinese translation is indistinct and does not match the proper sense. Consider the following example:

(26) a. trouble ⇒ erupt

b. 打擾 (DaRao) ⇒ 發疹 (FaZhen)

As example (26-a) shows, when ‘trouble’ serves as a hypernym of ‘erupt’, it carries the mean-ing that ‘to cause bodily suffermean-ing and make sick or indisposed’. Under this sense, ‘trouble’ is imprecisely translated into DaRao ‘to bother’ in Chinese which makes it intuitively hard to be connected with FaZen.

Another problem causing the returned pairs to be unrelated is that verbs with abstract con-cepts generate problems when transported into another languages. In English, the concon-cepts of verbs such as ‘be’, ‘make’, or ‘seem’ which does not exhibit any literal spatial properties are

hardly conveyed to Chinese equivalently. Following examples give some illustrations:

(27) a. be ⇒ gape

b. 是 (Shi) ⇒ 張開 (ZhangKai)

(28) a. be ⇒ sit

b. 在(Zai) ⇒ 坐 (Zuo)

The copula verb ‘be’ is used widely in English. However, the concept of ‘be’ is rather abstract.

For example, one of the sense of copula ‘be’ in WordNet is defined as ‘having the quality of being’. Under this sense, there are more than one hundred direct troponyms can be found in WordNet. The equivalent translation of copula ‘be’ in Chinese is Shi which is used to connect with an adjective or a predicate noun. However, the copula Shi in Chinese does not carry any concrete meaning nor showing the quality of being as its English counterpart does. Therefore, when a hypernymy-troponymy pair containing copula ‘be’ is mapped to Chinese, none of them indicates hypernymy-troponymy relation as can be seen in (27-b). Similarly, another sense of copula ‘be’, according to WordNet, is ‘occupy a certain position or area; be somewhere’ and carries a troponym such as ‘sit’ (be located or situated somewhere) as example (28-a) shows.

Again, copula ‘be’ under this sense, is translated into monomorphemic word Zai in ETCED. In Chinese, the verb Zai indicates ‘being, to be located at’ which is rather a morpheme that shows no concrete concept unless it is attached to other morphemes. Consequently, Zuo and Zai are not considered in hypernymy-troponymy relation.

Abstract concept verb reveals in Chinese too. Throughout our returned results tagged Un-related, it is found that many verbs exhibit rather abstract concepts such as 使使使 Shi ‘make, let’,

成

成成為為為 ChenWei ‘become’ or 似似似乎乎乎 SiHu ‘seem’ which are hard to be linked to any troponyms.

Consider the following examples:

(29) a. make ⇒ prepare

b. 使 (Shi) ⇒ 準備 (ZuenBei)

(30) a. become ⇒ reduce

b. 成為 (ChenWei) ⇒ 精簡 (JinJen)

(31) a. seem ⇒ glitter

b. 似乎 (SiHu) ⇒ 閃耀 (ShanYao)

The above returned Chinese verbs Shi, ChenWei, and Sihu come from their English correspon-dents ‘make’, ‘become’, and ‘seem’ which exhibit no translational inconsistencies. However, none of the Chinese pairs passed through our testing sentences, at least, it makes no sense when we substitute these pairs into the very basic definition of troponym: VtShi YiZong V_h De Fan-Shi‘To V_tis to V_h in a particular manner’. Take (29-b) for example, ‘*ZuenBei Shi Yi Zong Shi De FanShi’ (? to prepare is a way to make) makes no sense in Chinese. Hence, Zuenbei ‘to prepare’and Shi ‘to make(causative)’ were not regarded as in hypernymy-troponymy relation.

Most of the unrelated verb pairs arise from inaccurate or imprecise translation in ECTED as shown by above examples. Conceptual idiosyncrasies aggravate translational problems as well. As mentioned, the equivalent translation database was manually created basing on Word-Net synsets. Nevertheless, due to a large amount of data (12127 verb synset pairs) and the mutability of polysemous words, it is hard to reach a hundred-percent correspondence. While

the inaccurate translations can be corrected by double-check, conceptual idiosyncrasies, on the contrary, are hard to prevent and consequently become the toughest problem in bootstrapping.

Interim Summary

Section 4.2 reports the results returned by bootstrapping approach. Bootstrapping reaches a 73.56% accuracy, which is higher than syntactic pattern-based approach. For those incorrect results that decrease the performance, an in-depth investigation was made. In general, incorrect results can be roughly divided into three error types which are mainly caused by translational idiosyncrasies and conceptual inconsistencies across different languages.

在文檔中中文動詞上下位關係自動標記法 (頁 79-91)