Experimental Results - Translating Common English and Chinese Verb-Noun Pairs in Technical Doc

Translating Common English and Chinese Verb-Noun Pairs in Technical Documents with Collocational and Bilingual Information

6 Experimental Results

The remaining equations, (3) and (4), take an extreme assumption. We assumed the availa-bility of the Chinese translation of the English object at the time of translation, and used this special information in different ways. Equation (3) considers the words EV, EN, and CN. In a strong contrast, Equation (4) considers only EV and CN to determine the translation of the Eng-lish verb. The conditional probabilities in Equations (3) and (4) were calculated using Equation (6) and (7), respectively.

) We considered the exploration of using the information about the Chinese translation of the English noun is interesting. Would the information about CN provide more information, given we had information about EV and EN? How would we achieve when we had information about only EV and CN but not EN?

In all of the experiments, we used 80% of the available aligned VN pairs as the training data, and the remaining 20% as test data. The training data were randomly sampled from the available data.

As a consequence, it was possible for us to encounter the zero probability problems. Take Equation (6) for example. If, for a test case, we needed C(EV, EN, CN) in (6), but we happened not to have observed any instances of (EV, EN, CN) in the aligned VN pairs in the training data, then we would not be able to compute (6) for the test case. When such cases occurred, we chose to allow our system to admit that it was not able to recommend a translation, rather than resort-ing to the smoothresort-ing technologies.

6 Experimental Results

Using the formula in Table 3 would allow our systems to recommend only one Chinese transla-tion. In fact, we relax this unnecessary constraint by allowing our systems to consider that larg-est k conditional probabilities and to recommend k translations.

Although we have been presenting this paper with the 1 million parallel sentences in NTCIR PatentMT data as the example, we have run our experiments with the English-Chinese bilingual

Table 3: Translation decisions )

version of Scientific American. Moreover, we ran experiments that aimed at finding the best Chinese translations of English objects. The formulas were defined analogously with those listed in Table 3.

6.1

Basic Results for the Top 100 Verbs in Patent Documents

When we conducted experiments for the top 100 verbs (cf. Section 5.1), we had 24300 instances of aligned VN pairs for training and 6076 instances of aligned VN pairs for test.

We measured four rates as the indication of the performance of using a particular formula in Table 3. The rejection rate is the percentage of not being to respond to the test cases. This is due to our choosing not to smooth the probability distributions as we explained in Section 5.2.

It is not surprising that the rejection rates

in-creased as we considered more information in the formulas. The rejections rates were 0, 0.201, 0.262, and 0.218 when we applied Equations (1) through (4) in the experiments. As expected, we encountered the highest rejection rate when using (3). Note that using (4) resulted in a higher rejection rate than using (2). This was because that there were no less than one possible Chinese translations for any English verbs. Hence, the distributions for Pr(CVi | EV CN) would be more sparse than Pr(CVi | EV EN) on average.

Table 4 shows the rates of the correct answers were included in the recommended k transla-tions. We did not consider the cases that our systems could not answer in computing the statis-tics in Table 4. Hence, the data show the average inclusion rates when our systems could answer.

As one may have expected, when we increased k, the inclusion rates also increased.

It may be surprising that the inclusion rates for Equations (2) through (4) seem to have satu-rated when we increase k from 3 to 5. This was because our systems could not actually recom-mend 5 possible translations, when they were allowed. Although we had hundreds or thousands of aligned VN pairs for an English verb, cf. Table 1. Including more conditioning information in Equations (2) through (4) still reduced the amount of VN pairs qualified for training and testing, thereby limiting the actual numbers of recommended translations. Table 5 shows the average number of actual recommendations in the tests.

The main advantage of using Equations (2) through (4) is that they were more precise, when they could answer. Table 6 shows the average ranks of the correct translation in the recom-mended translations. The average ranks improve as we considered more information from Equa-tion (1) to (2) and to (3). Using (2) achieved almost the same average rank with using (4), but using (4) led to slightly better performance.

6.2

Improving Results for the Top 100 Verbs in Patent Documents

Results reported in the previous subsection indicated that Equation (4) is robust in that it could offer candidate answers all the time. Methods that employed more information could choose translations more precisely, but were less likely to respond to test cases. Hence, a natural ques-tion is whether we could combine these methods to achieve better performance. To answer this question, we conducted all of the combinations of the basic methods listed in Table 3.

In Tables 7 and 8, we use the notation EqX+EqY to indicate that we used EqX to find as many candidate translations as possible, before we reached a total of k recommendations. If ap-plying EqX could not offer sufficient candidate translations, we applied EqY to recommend more candidate translations until we acquired k recommendations.

Table 4: Inclusion rates for top 100 verbs inclusion k=1 k=3 k=5

Eq1 0.768 0.953 0.975 Eq2 0.786 0.913 0.918 Eq3 0.795 0.911 0.916 Eq4 0.791 0.910 0.916 Table 5: Average number of recommendations

recommend k=1 k=3 k=5

Eq1 1.000 2.919 4.614

Eq2 1.000 1.923 2.225

Eq3 1.000 1.847 2.107

Eq4 1.000 1.920 2.244

Table 6: Average ranks of the answers

ranking k=1 k=3 k=5

Eq1 1.000 1.241 1.310 Eq2 1.000 1.166 1.185 Eq3 1.000 1.151 1.168 Eq4 1.000 1.153 1.173

499

Using Eq1 is sufficiently robust in that the conditional probabilities would not be zero, unless the training data did not contain any instances that included the English verb.

Hence, in our experiments, the rejection rates for “Eq2+Eq1”, “Eq3+Eq1”, and “Eq4+Eq1”

became zero. In other words, our systems re-sponded to all test cases when we used these combined methods by allowing all methods to recommend up to k candidates.

We compare the performance of these combined methods with the best performing methods in Tables 7 and 8. We copy the inclu-sion rates of Eq1 from Table 4 to Table 7 to facilitate the comparison, because Eq1 was the

best performer, on average, in Table 4. The combined methods improved the inclusion rates, although the improvement was marginal.

Moreover, we copy the average ranks for Eq1 and Eq3 from Table 6 to Table 8. Using Eq1 and using Eq3 led to the worst and the best average ranks in Table 6, respectively. Again, using the combined methods, we improve the average ranks marginally over the results of using Eq1.

Statistics in Table 7 suggest that using machine-assisted approach to translating verbs in common VN pairs in the PatentMT data is feasible. Providing the top five candidates to a hu-man translator to choice will allow the translator to find the recorded answer s nearly 98% of the time.

It is interesting to find that using Equation (2) and Equation (4) did not lead to significantly different results in Tables (4) through (8). The results suggest that using either the English nouns or the Chinese nouns as a condition contributed similarly to the translation quality of the English verbs. More specifically, the translation of the English verb may be conditionally inde-pendent of the information about the noun’s Chinese translation given the English verb and the English noun. This does not imply that the translation of the English verb is unconditionally independent of the information of the English noun’s translation. That Eq4 performed better than Eq1 in Table 6 offered a reasonable support.

6.3

Results for the Most Challenging 22 Verbs in Patent Documents

We repeated the experiments that we

con-ducted for the top 100 verbs for the most chal-lenging 22 verbs (cf. Section 5.1). Tables 9 through 13 correspond to Tables 4 through 8, respectively. The most noticeable difference between Table 9 and Table 4 is the reduction of the inclusion rates achieved by Eq1 when k=1. Although the inclusion rates reduced no-ticeably when we used Eq2, Eq3, and Eq4 as well, the drop in the inclusion rate for Eq1 (when k=1) was the most significant.

Although we did not define the challenging index of verbs based on their numbers of poss-ible translations, comparing the corresponding numbers in Table 10 and Table 5 suggest that the challenging verbs also have more possible translations in the NTCIR data.

Corresponding numbers in Table 11 and

Table 9: Inclusion rates for 22 challenging verbs inclusion k=1 k=3 k=5

Eq1 0.449 0.865 0.923 Eq2 0.561 0.818 0.820 Eq3 0.564 0.827 0.829 Eq4 0.550 0.827 0.829 Table 10: Average number of recommendations

recommend k=1 k=3 k=5

Eq1 1.000 2.977 4.756

Eq2 1.000 2.090 2.364

Eq3 1.000 2.022 2.230

Eq4 1.000 2.106 2.411

Table 11: Average ranks of the answers

ranking k=1 k=3 k=5

Eq1 1.000 1.607 1.773 Eq2 1.000 1.365 1.373 Eq3 1.000 1.374 1.383 Eq4 1.000 1.394 1.400 Table 8: Average ranks of the correct answers

(combined methods)

ranking k=1 k=3 k=5

Eq1 1.000 1.241 1.310

Eq3 1.000 1.151 1.168

Eq2+Eq1 1.000 1.240 1.301

Eq3+Eq1 1.000 1.234 1.294

Eq4+Eq1 1.000 1.233 1.296

Table 7: Inclusion rates (combined methods) inclusion k=1 k=3 k=5

Eq1 0.768 0.953 0.975

Eq2+Eq1 0.772 0.960 0.979

Eq3+Eq1 0.778 0.960 0.979

Eq4+Eq1 0.776 0.959 0.978

500

Table 6 supports the claim that translating the 22 challenging words is relatively more difficult.

The average ranks of the answers became worse in Table 11.

Data in Tables 12 and 13 repeat the trends that we observed in Tables 7 and 8. Using the combined methods allowed us to answer all test cases, and improved both the inclusion rates and the average ranks of the answers.

If we built a computer-assisted translation system for these 22 verbs, the performance would not be as good as if we built a system for the top 100 verbs. When the system suggested the leading 3 translations (k=3), the inclusion rates dropped to around 0.90 in Table 12 from 0.96 in Table 7.

Again, using either the English nouns or the Chinese nouns, along with the English verbs, in the conditions of the methods listed in Table 3 did not make significant differences, as sug-gested by the results in Tables 9 through 13.

6.4

More Experimental Results

We repeated the experiments that we conducted for the top 100 verbs for the top 100 nouns in the PatentMT data. For experiments with the nouns, we had only 3952 test instances. The goals were to find the best Chinese translation of the English objects, given its contextual and bilin-gual information. More specifically, in addition to the English verbs and the English nouns, we were interested in whether providing the Chinese translations of the English verbs would help us improve the translation quality of the English objects.

Due to the page limits, we could not show all of the tables as we did in Section 6.1. The sta-tistics showed analogous trends that we reported in Section 6.1. Namely, the availability of the Chinese translation of the English verbs did not help, when we already considered the English verbs and objects in the translation decisions.

Scientific American is a magazine for general public. The writing style is more close to ordi-nary lives. We ran a limited scale of experiment with available text from Scientific American.

We had about 1500 training instances and 377 test instances for 25 verbs. The results indicated that using the Chinese translations of the English objects influenced the translation quality of the English verbs. However, the observed differences were not significant. A side observation was that it was relatively harder to find good translation of English verbs in Scientific American than in the PatentMT data. When providing five recommendations, only about 88% of the time the recommendations of our system can include the correct translation. In contrast, we had achieved inclusion rates well above 90% in Tables 7 and 12.

在文檔中整合及開發人工智慧與語言科技以輔助語文教學活動 (頁 35-38)