3.3 Evaluation and Scoring
4.1.1 Error Analyses
By observing the results returned by our program, unwanted or incorrect answers were found to cause the decrease of the performance. A total of 640 incorrect hypernymy-troponymy verb pairs (namely, 640 incorrect hypernyms) were found within the results returned by syntactic pattern 1 (590 incorrect hypernyms) and pattern 2 (50 incorrect hypernyms). This subsection probes into different types of errors which are generalized as follows:
• Type 1. The ambiguity of yi
• Type 2. Errors caused by syntactic structure
• Type 3. Verb and verb phrase
• Type 4. The problem of synonym or near-synonym
• Type 5. Abstract concept verbs
• Type 6. Wrong tagging of POS
Each error type will be discussed in detail in the following subsections and the percentage of each error type was listed in Table 4.2:
Error type Incorrect Hypernyms Incorrect Hypernyms Total Incorrect Hypernyms Error Rate from Pattern 1 from Pattern 2
Type 1 194 0 194 30.31%
Type 2 215 22 237 37.03%
Type 3 71 8 79 12.34%
Type 4 56 17 73 11.40%
Type 5 38 1 39 6.09%
Type 6 16 2 18 2.81%
total 590 50 640 100%
Table 4.2: Error types and percentage
Type 1. The ambiguity of yi
Recall that the first syntactic pattern used in the first approach is ‘yi/yong/liyong. . . Vh. . . ’(by/with...to Vh) for it satisfied the definition of troponym— to Vtis to Vhby a specific way or with a particular
manner where Vtis the troponym of Vh. In Chinese, it could be expressed by the target words yior yong. Therefore, hypernymy-troponymy pairs were expected to be found by first extract-ing sentences containextract-ing the target words. Unfortunately, errors occurred when extractextract-ing the target yi because not every yi extracted by our program is the correct ones. This is due to the inability of WSD (Word Sense Disambiguation). According to the classification and definition in CWN, yi serves as both preposition (P) and conjunction (Cbb). In this approach, only the preposition yi will be extracted. Nevertheless, yi is still ambiguous due to its multi-senseness.
When yi serves as a preposition, there are about ten different senses classified by CWN includ-ing by, in order to, dependant on, accordinclud-ing to, etc. In this approach, it is expected that the preposition yi is used as ‘by...manner’ or ‘with...manner’ as shown in the following example:
(1) 保存 3(VC) to keep @ To store (something) by (their) original state.
(2) 走 3(VA) to walk @ moving forwards by two feet.
In example (1), BaoChun ‘to keep’ can be seen as a way of ShouChang ‘to store/to stock’, elaborating by a certain manner– to store by something’s original state. Therefore, we obtain a hypernymy-troponymy pair: (ShouChang ‘to store’/ BaoChun ‘to keep’). By the same token, we get another pair (YiDong ‘to move’/ Zou ‘to walk’) from example (2).
However, not every yi that found by the program carries the meaning we want. As mentioned, yi is a multi-sense preposition which carries about ten different senses. From the results, it is observed that yi is also wildly used as ‘in order to (Vj)’. Under this situation, yi is found to be failed in indicating hypernymy-troponymy relation. The following examples contain the
preposition yi with the sense of ‘in order to’ within the definition of each verb entry:
to surmise @ testing someone in order to discover the truth.
(4) 施肥 1 to fertilize @ fertilize plants in order to promote the growth.
In (3) and (4), none of the verbs that occur after yi is the correct hypernyms of the defined verb, namely, FaShien ‘to discover’ and Ce ‘to surmise’ is not a hypernymy-troponymy verb pair and either is (ChuJing ‘to promote’/ShiFei ‘to grow’) nor (ShengZhang ‘to grow’/ ShiFei
‘to fertilize’). Therefore, When yi carries the meaning ‘in order to Vj’, it is failed to extract Vj as the hypernym. As Table 4.2 displays, errors caused by the ambiguity of yi yielded 194 tokens out of total 640 incorrect results, result in 30.31% error rate which is comparatively high. Also, ambiguity does not occur in other target words such as yong ‘by/with’ in pattern 1 and di ‘adverbial suffix’ from syntactic pattern 2 for these words carries only one sense under certain POS, that is, there is only one sense of yong when it serves as a preposition and so is di when it serves as an adverb.
Word Sense Disambiguation has long been an open central problem at the lexical level in NLP, ambiguous words can lead to irrelevant or unwanted information retrieval just as the am-biguous preposition yi shows here. Many approaches or disambiguators have been proposed to solve problems caused by ambiguity [36] , while many approaches were used to disambiguate words in English, it is also found by researchers that features that are important for disambigua-tion in English is not the same for that in Chinese [12]. For example, parse, predicate-argument
information and selectional restriction play important role for disambiguation in English but rather minor in Chinese [12] [47]. When the ambiguity occurs at the very basic level—the lexical ambiguity, it seems even harder to disambiguate the polysemous senses automatically, at least in the present approach, sense determination has to be done manually.
Type 2. Errors caused by syntactic structure
The second error type is caused by problems related to syntactic structures. As shown in Ta-ble 4.2, a total of 237 out of 640 incorrect hypernyms belong to this type, ranking the highest with a percentage of 37.03%. Both Pattern 1 and Pattern 2 involved in this type of error. As mentioned, all verbs occurring after the target words will be extracted as the possible hyper-nyms; however, a lot more unwanted results were extracted as well and consequently lowered the accuracy rate. Extracting all verbs implies the regardlessness of the syntactic structure, or to be more precise, the constituency of the sentences. This can be illustrated by the following examples: to pluck @ producing a bed quilt by flipping the cotton.
(6) 釣1 1(VC)
to fish @ To catch creatures in the water by using a pole with hook and fishing line.
Taking example (5) and (6) as inputs, the program extracted all verbs occurring after target words as candidates and hence returned possible hypernymy-troponymy pairs as follows:
(i) *打1 25(VC)/ 彈(VC) *(Da ‘to pluck’/Tan ‘to flip’)
(ii) 打 25(VC)/ 製造出(VC) (Da ‘to pluck’/ZhiZaoChu ‘to produce’) (iii) *釣1 1(VC)/ 連結(VC) *(Diao ‘to fish’/LianJei ‘to connect’) (iv) 釣1 1(VC)/捕捉(VC) (Diao ‘to fish’/ BuZhou ‘to catch’)
The wrongly extracted hypernyms (labelled with *) were caused by the problem of syntac-tic structure. The following tree diagrams illustrate the boundaries of each phrase. The verb Tan‘to flip’ belongs to the preposition phrase that headed by yi(P), and the other verb ZhiZa-oChu‘to produce’ belongs to the head of the entire verb phrase which is the proper slot for the predicted hypernym to appear. By the same token, one of the verbs in example (6), LienJie ‘to connect’, is syntactically embedded into a noun phrase and consequently leads to an incorrect prediction.
This problem involves in sentence parsing during the pre-processing stage. In this thesis, input data were processed only by segmentation and POS tagging, the parsing part was left undone in order to limit the required time and resources for processing the input data. What is more, although syntactic structure is known to raise errors or problems in automatic extrac-tion of semantic relaextrac-tions, it is not rigid or regular enough to generate rules without human decision. However, this issue can be left for future works by using parsers in Chinese such as
Sinica Treebank1 which can do sentence parsing and automatic semantic role assignment for structured trees [8].
Type 3. Verb and verb phrase
A total of 73 (12.34%) out of 640 errors belongs to this error type. This type of error shows that hypernymy-troponymy relation sometimes lies between verbs and verb phrases but not between verbs themselves. To be more precise, the extracted verb alone can not indicate the existence of hypernymy-troponymy relation, rather, it is the whole verb phrase that demonstrate the target relation. The following are two examples:
(7) 標示 1(VC) to label @ Provide messages by words or symbols.
(8) 敬禮 1(VB)
to salute @ showing respect by movement, typically by lower one’s head.
In example (7) and (8), both verbs after yi are not the hypernyms of the input verbs for they can not grammaticize or make sense the evaluating sentences; therefore, TiGon ‘to provide’ is not the hypernym of BiaoShi ‘to label’ and BiaoShi ‘to show’ is not the hypernym of JinLi ‘to salute’, either. If we scrutinize the results and their sentences, we can find that the incorrect hypernyms found by the program are not totally irrelevant with the input verb. Rather, the whole verb phrase is more reasonable to indicate a hypernymy-troponymy relation with the
1http://rocling.iis.sinica.edu.tw/CKIP/treebank.htm
input verb. Therefore, although TiGon ‘to provide’ alone is not valid enough to indicate a troponym relation with BiaoShi ‘to label’, yet the whole verb phrase TiGon(VE)+ ShunShi(Na)
‘to provide message’ can be seen as the hypernym of BiaoShi ‘to label’ for that labeling is a way of providing messages. And this is also true for JinLi ‘to salute’ and BiaoShi(VE)+
JinYi(Na) ‘to show respect’ in example (8). Despite that there might exist a semantic relation between verbs and verb phrases, it is not considered as hypernymy-troponymy pairs in our experiment, since lexical semantic relations arise within lexemes but not phrases. At least, we limit the semantic relation to lexicalized verbs which are segmented and tagged by a single POS in Chinese.
Type 4. The problem of synonym or near-synonym
It is common to define an unknown word by other synonymous words. By observing the results, it is found that synonymous verbs are used to define other verbs even when the definition sentences are in accordance with our target syntactic patterns. As example (9) shows, the definition sentence complies with our first syntactic pattern: ‘yi...to Vj’, but the verb HuaChu
‘to paint’, which occurs after yi, is not the hypernym of HuaTu ‘to paint’. Rather, HuaChu ‘to paint’ and HuaTu ‘to paint’ demonstrate a synonymous relation.
(9) 畫圖 1(VA)
to paint @ Drawing figures or patterns by pens or other equipment.
(10) 拍3 1(VC) to photograph @ Taking photos by photographic apparatus.
Another example is shown in (10), one of the sense of Pai ‘to photograph’ is to take photos.
The program extracted PaiZhaoPien ‘to take photos’ as the possible hypernym of Pai ‘to photo-graph’, notice that we ignore the problem of verb phrase here since PaiZaoPien ‘to take photos’
or PaiZao ‘to take photos’ have already been lexicalized in CWN or CKIP. Nevertheless, Pai ‘to photograph’ and PaiZhaoPien ‘to take photos’, under this sense, are in a synonymous relation rather than a troponymy relation. The problem of synonym also occurred in syntactic pattern 2
‘X Di Vj (to Vj X-ly)’as shown in example (11) where YaoDong ‘to shake’ and YaoHuang ‘to rock’ may be intuitively judged as synonymous rather than in a hypernym-troponymy relation.
(11) 搖晃 1(VAC) to rock @ Objects shake back and force irregularly.
As table 4.1 displays, a total of 73 (11.40%) synonymous errors was found in the results. Al-though this type of error does not show a high percentage of error rate, yet this problem may not be easily solved for it involves in how definitions were made and described by lexicographers, the synonymy can only be filtered out manually.
Type 5. Abstract concept verbs
By observing the incorrect returned results, it is found that the appearance of certain verbs always leads to the error. These verbs are 成成成/成成成 為為為, Cheng/ChengWei ‘become’, 作作作/作作作 為為為, Zuo/ZuoWei ‘be’as the following examples show:
(12) 編 9(VA)
to arrange @ To create the following works by using already existing materials or to assume @ To make an assumption by the following statement.
In Chinese, verbs such as Cheng ‘become’ or Wei ‘be’as shown in example (12) (13) function as copula-like verb where nouns or predicates are attached after them. Despite their grammatical function in Chinese, these verbs indeed, semantically carry very abstract concepts. To be more precise, these verbs are hard to be associated with concrete events, actions, or images. At least, it makes no sense when we substitute these pairs into the very basic definition of troponym: Vt Shi YiZong VhDe FanShi‘To Vtis to Vhin a particular manner’. Take (12) for example, ‘*She Shi Yi Zong Wei De FanShi’ (? to assume is a way to be) makes no sense in Chinese. There are totally 39 returned verbs belong to this type of errors, accounted for 6.09% error rate.
Type 6. Wrong tagging of POS
As can be seen in Table 4.1, there are 18 errors out of 640 belong to the wrong tagging of POS, ranking the lowest with a percentage of 2.81% error rate. POS tagging is the process of assigning a part of speech or other lexical class marker to each word in corpus. Taggers play an increasing important role in speech recognition, natural language parsing and information retrieval. In the experiment, we use CKIP Word Segmentation tool to segment sentences and to tag POS. However, Chinese part-of-speech tagging is more difficult than its English coun-terpart because it needs to be solved together with the problem of unknown words and word segmentation. Also, Chinese part-of-speech classes are very ambiguous; many words can be
both adjective or noun, noun or verb without any change in morphology. There is no taggers or segmentators being proved one hundred percent accurate in Chinese owing to the above reasons [29]. Therefore, incorrect tagging is sometimes found in tagger:
(14) 拍攝 1(VC) to photograph @ To record images with photographic apparatus.
(15) 確保 1(VE)
to guarantee @ accurately indemnify the following subject’s safety or existence.
In example (14), JiLu ‘to record’ can be served as both noun and verb according to CWN, obviously, JiLu in example (14) functions as a verb but is wrongly tagged as a noun in CKIP.
Similarly, BaoZhang ‘to indemnify’ can function as a verb or as a noun (security) according to CWN. Apparently, it is the verb usage in example (15) but is wrongly tagged as noun (Na) through CKIP.