In this chapter, the results of the three research questions are presented.
First, the outcome of using a corpus-‐based method for the translation error analysis are presented with the demonstration of the content and interface of the completed annotated translation learner corpus. In the following section of Error Analysis from the Translation Product, the means of errors in each type are compared between groups text by text, followed by statistical tests of the significance of the differences between groups; a review of translation errors of in the six different texts is done group by group, in that errors were viewed as a way that different groups responded to the text, i.e. how each group made errors in translating one text could be seen as their description of that text and different group or individual might describe a text from very different perspectives. Finally, to address the third research question, the section of Error Analysis from the Translation Process unfolds the stories behind the numbers of translation errors, through an attempt to piece together the fragments of information gathered in the retrospective interviews about how participants translated.
The Translation Learner Corpus and the Error Annotation
Although non-‐automatic corpus annotation of all kinds has been proved to be time-‐consuming and labor-‐intensive, the results could be valuable resources for the advancement in the practice of and research on translation teaching/learning.
The realization of applying MMAX2 in error-‐tagging the translation learner corpus of this study demonstrated several benefits that could complement the traditional
teaching methods/techniques used in the translation classrooms and validated a corpus-‐based method for empirical research on translator education. These benefits are described as follows in terms of the customization-‐interface and query of the error-‐tagged translation learner corpus using MMAX2.
Customization-‐Interface
Defining an annotation scheme was the second25 step in the annotation life cycle, where outlined how the data should be described to represent the annotations. The researcher designed a three-‐level annotation scheme, including the text information, translator background, and translation error typology. The markables of these three levels were tagged to each translation in the corpus. In the level of text information (see Table 9), the source text was assigned a text type (informative, operative, or expressive) with a text number ranging from 001-‐00926 to form a unique Source Text ID. This allowed for a possible repertoire of at least 2,997 (=3*999) source texts for the future expansion of the corpus. The language combination included translation from English into Chinese and Chinese into English. The name of the annotator and the year of annotation were also considered for the need of calculating inter-‐/intra-‐rater reliability and the longitudinal research of student development.
25 The third step of the annotation life is , which has been described in Data Analysis
26 In fact, the Source Text Number could be any number not limited to 001-‐009 because it was set to be free text in the style sheet.
Table 9. The Attributes in the Level of Text Information
The results of the customized interface for this study was shown as in Figure 22, with the source text number, annotator, and translation year coming in free text style while text type and translation could be chosen by pressing a nominal button.
Figure 22. The Level of Text Information in MMAX2
In the level of translator background, 15 attributes were assigned (see Table 10) to describe the background of the translators:
n Status: the current status of a translator, which could be chosen from a nominal list including undergraduate student, graduate student, or professional.
n Track: the training received by the translator, which could be chosen from a nominal list including translation, interpretation, or T & I (translation and interpretation).
n Translator number: any number that was given to a student translator to trace their identification. The style of this attribute was free text.
n Gender: the gender of the translator, which could be chosen from a nominal list including female or male.
n Age: the age of the translator, which could be chosen from a nominal list of seven age ranges, including 18-‐23 24-‐28, 29-‐35, 36-‐40, 41-‐50, 51-‐60, and 60+ (years old).
n Year of translation: the year when the translation was done. The time of translation could be specific if needed because the style of this attribute was free text.
n Level of experience in translation: the level of experiences in translation, which could be chosen from a nominal list of seven roughly defined levels including 0 (no experience), 1 (the word count of the source texts translated was no more than 10,000, or the total working days were no more than 90), 2 (the word count of the source texts translated was 10,001-‐100,000, or the total working days were 91-‐180), 3 (the word count of the source texts translated was 100,001-‐1,000,000, or the total
working days were 181-‐360), 4 (the word count of the source texts translated was 1,000,001-‐5,000,000, or the total working days were 361-‐720 ), 5 (the word count of the source texts translated was five to ten millions, or the total working days amounted to two to four years), and 6 (the word count of the source texts translated exceeded ten millions, or the total working days exceeded four years ).
n Level of experience in interpretation: the level of experiences in interpretation, which could be chosen from a nominal list of seven roughly defined levels including 0 (no experience), 1 (no more than 30 hours or five events of simultaneous/consecutive interpreting), 2 (31-‐60 hours or 6-‐10 events of simultaneous/consecutive interpreting), 3 (61-‐120 hours or 11-‐20 events of simultaneous/consecutive interpreting), 4 (121-‐300 hours or 21-‐50 events of simultaneous/consecutive interpreting), 5 (301-‐600 hours or 51-‐100 events of simultaneous/consecutive interpreting), and 6 (more than 600 hours or more than 100 events of simultaneous/consecutive interpreting).
n English in the language combination: English to the translator as the first language (L1), the second language (L2), the first foreign language (FL1), or the second foreign language (FL2), which could be chosen from a nominal list.
n Chinese in the language combination: Chinese to the translator as the first language (L1), the second language (L2), the first foreign language
(FL1), or the second foreign language (FL2), which could be chosen from a nominal list.
n Major in college: the college major of the translator, which could be chosen from eleven roughly defined categories including a (translation and interpretation), b (Chinese or related subjects), c (English, foreign languages or English-‐language related subjects), d (languages other than Chinese or English), e (education related subjects), f (subjects in liberal arts and social science not listed in previous classifications), g computer science related subjects), h (science, medicine, or engineering related subjects), i (arts and music), and j (communication related subjects), and k (others).
n Major in graduate school: the graduate major of the translator, which could be chosen from twelve roughly defined categories including none, (for undergraduate students), a (translation and interpretation), b (Chinese or related subjects), c (English, foreign languages or English-‐language related subjects), d (languages other than Chinese or English), e (education related subjects), f (subjects in liberal arts and social science not listed in previous classifications), g computer science related subjects), h (science, medicine, or engineering related subjects), i (arts and music), and j (communication related subjects), and k (others).
n Credits already earned in translation: the official credits already earned by the translator in translation, which could be chosen from a nominal
list of eight ranges, including 0-‐10, 11-‐20, 21-‐30, 31-‐40, 41-‐50, 51-‐60, 61-‐70, and 71+ (credits).
n Credits already earned in interpretation: the official credits already earned by the translator in interpretation, which could be chosen from a nominal list of eight ranges, including 0-‐10, 11-‐20, 21-‐30, 31-‐40, 41-‐50, 51-‐60, 61-‐70, and 71+ (credits)
n Months living in English-‐speaking communities: the duration of time that the translator had stayed/lived in English-‐speaking countries or communities, which could be chosen from a nominal list of seven ranges, including 0, 1-‐6, 7-‐12, 13-‐24, 25-‐60, 61-‐120 and 120+ (months).
Table 10. The Attributes in the Level of Translator Background
The results of the customized interface of translator background was illustrated in Figure 23, with only two attributes Translator_Number and Translation_Year coming in free text style while the others in nominal lists.
In the level of translation error typology, two attributes were assigned (see Table 11) i.e., binary errors and non-‐binary errors, which ranged from EB11 (mistranslation) to EB31 and from EN11 to EN31. This typology was the tagset used for annotation as detailed in Table 7 and Table 8.
Table 11. The Attributes in the Level of Translation Error Typology
The results of the customized interface was as shown in Figure 24. The error type of a markable could be chosen from a nominal list.
Figure 24. The Level of Translation Error Typology in MMAX2
In spite of the results of the customized interface presented above are for a corpus of one text type (informative texts), the benefit of flexibility in using MMAX2 will allow the raw corpus (the base data) to be expanded by adding new student translations of other text type after the existing ones and will permit the number of attributes within a level and the number of levels to be modified for different research and teaching purposes in the future. That is, under the structure created for the learner translation corpus and the annotation scheme of this research, researchers can expand the size and text types (other than informative texts) of the corpus, and can design more attributes in the level of error typology to include error types for translations of other text types. Furthermore, the three levels can be developed into more levels if necessary.
Query display & statistics
The query function can be termed the most valuable feature of MMAX2 for users. Using the annotated corpus of this research as an example, the query results useful to teachers as researchers and to students can be manifested in three aspects: the display of specific search item, the statistics of the search item, the html output of searched item with annotations.
The query results could be used as feedback to an individual student when the teacher observed an idiosyncratic trait or as an illustration to the whole class when an error seemed to be common to all students. They were also sources of data for longitudinal research on translation error analysis, for the investigation of
the linguistic features of learner translators, and for how translation skopos related to different types of translation errors.
Using Text I005 as an example, to locate the errors on all error types of student GT009 would use the following scripts in the Query Console, where on line 1 the translator for inquiry was identified (the ninth translation student in the Grad Group), on line 2 the error types for inquiry (all binary errors and all non-‐binary errors) were identified, on line 3 the combination of student and error type for search was defined, and on line 4 the statistics of the search item on line 3 were requested:
As illustrated in Figure 25, the four lines of commands were entered one by one with each line followed by the Enter key and then the press of Search. The results were as illustrated in Figure 26 in the Query Console, where on the Markable Tuples sheet showed a total of 17 matches (the number of binary and non-‐binary errors) and the Statistics sheet (see Figure 27) showed the number and the percentage of errors on each type. Clicking on any item in the Markable Tuples, its corresponding context would appear in the Main Window (as illustrated in Figure 28).
Figure 25. The Scripts for Searching All EB and EN Errors of Student GT009 in Text I005 in
the Query Console
Figure 26. The Results of All EB and EN Errors of Student GT009 in Text I005 in the Marable
Tuples of the Query Console
Figure 27. The Statistics of All EB and EN Errors of Student GT009 in Text I005 in the Query
Console
Figure 28. A Search Item Shown in the Main Window
Along with the error statistics, the annotated translation could be sent to the students by exporting the file in the html format (as in Figure 29).
Figure 29. The HTML Output of All Annotations of GT005 in Text I005
In order to produce the annotated translation in html format, the scripts in Figure 30 were edited in a plain text editor and the results saved in a batch file (.bat), which would then be processed automatically in the Windows environment.
Figure 30. The Scripts for HTML Output of Annotations
The marked six segments in Figure 30 are described as below:
1. “Java” was the command to execute the application
“org.eml.MMAX2.process,” which dumped the annotations to the html file of “Show_Error_Details_HTML.html”.
2. “–classpath” was the parameter for Java command, which set java class search paths of directories and jar/zip files and it was followed by a space and the search paths and then separated by a semicolon for each path.
3. “–in”, the first of the four parameters for the application
“org.eml.MMAX2.process”, indicated the .mmax file for process; it was always followed by a space and the file name with path in case that mmax file was not in the same folder of this MMAX2-‐process.bat.
4. “-‐common_paths”, the second of the four parameters for the application
“org.eml.MMAX2.process”, indicated which common_paths.xml was referred to; this file addressed the paths of annotation related files.
5. “–xsl”, the third of the four parameters for the application
“org.eml.MMAX2.process”, indicated the path and the file name of the style file used to determine the layout and contents of the output HTML.
6. “–out”, the last of the four parameters for the application
“org.eml.MMAX2.process”, indicated the output html file name that stored the annotation results.
In addition to the search for the errors of one student, the query also allowed the display of the errors of a group of students; for example, to find out how GT students did in Text I005 on error type EN14, the following scripts in the Query Console were needed:
The results showed nine matches of error EN14 in the Markable Tuples (see Figure 31) and the html output was shown in Figure 32.
Figure 31. The Results of EN14 Errors of GT Students in Text I005 in the Markable Tuples of
the Query Console
Figure 32. The HTML Output of All Annotations of GT Students in Text I005
Used in tandem with the annotated corpus, the concordancing of the raw corpus (the corpus without any annotation) proved to serve as powerful tools for teachers, who no longer had to rely on impressions and intuitions alone, but could use corpus evidence to illustrate points and facilitate discussions. The following several examples from the raw corpus were used as a complement to the error-‐tagged corpus, which offered an approach to issues such as style comparison and translation difficulty through an examination of errors. The concordancing offered basic textual information about the translation in a glance. For example, in
a word list of 1,750 headwords (types) and 22,791 tokens, with 631 occurrences of 能源 [neng yuan] (see Figure 33), the content word of the highest frequency and also the key word in the theme of the text.
Figure 33. The Headword of the Highest Frequency in Text I001: 能源 [neng yuan]
The illustrative and data-‐driven learning nature of corpus could be exemplified in the translation of “Wednesday” in Text I001, where four options for translating “Wednesday” were made by 70 student translators and the comparison among the available options could be an important topic for beginners. As in Figure 34, there were 23 occurrences of 星期三 [xing qi san], chosen by eight graduate students and 15 undergraduate students.
Figure 34. Translation of “Wednesday” in Text I001: Option 1 (星期三 [xing qi san])
The second option for “Wednesday” was 週三 [zhou san] (see Figure 35), the most preferable translation for 20 out of the 39 graduate students while favored only by seven out of the 31 undergraduates.
Figure 35. Translation of “Wednesday” in Text I001: Option 2 (週三 [zhou san])
The third option for “Wednesday” was 周三 [zhou san] (see Figure 36), chosen by seven graduate students and six undergraduates. While the first three options for translation “Wednesday” were acceptable, the fourth option 禮拜三 [li bai san] (see Figure 44, as the same figure illustrating the translation of “Tuesday”) would be marked as an EN13 error (inappropriate style/register).
Figure 36. Translation of “Wednesday” in Text I001: Option 3 (周三 [zhou san])
Another readily obvious example to raise the awareness of the students to the difference in style was the translation of “two-‐thirds” in Text I001. As seen from Figure 37, nine students translated “two-‐thirds” as 2/3, while 55 students chose 三分之二 [san fen zhi er] (see Figure 38), three students 二分之三 [er fen zhi san] (see Figure 39), and two students 3分之2 [3 fen zhi 2] (see Figure 40).
However, 二分之三 [er fen zhi san] was clearly a mistranslation (marked as EB11) which means three-‐halves in Chinese; from the reference column in Figure 39, we could see among the three translators one was a translation graduate student, one an interpretation graduate student, and one an undergraduate.
Figure 37. Translation of “two-‐thirds” in Text I001: Option 1 (2/3)
Figure 38. Translation of “two-‐thirds” in Text I001: Option 2 (三分之二 [san fen zhi er])
Figure 39. Translation of “two-‐thirds” in Text I001: Option 3 (二分之三 [er fen zhi san])
Figure 40. Translation of “two-‐thirds” in Text I001: Option 4 (3分之2 [3 fen zhi 2])
According to the analysis of error frequency in the first section of this chapter, in Text I001, the Under Group had significantly more errors than the Grad group on EB11 (mistranslation), EB Sum, and EB-‐EN Sum, but the errors made among the four subgroups within the Grad Group (GIA, GIB, GTA, and GTB) did not show a significant difference on each type. Nonetheless, looking into the concordance lines, we could see what the number did not reveal about the difference in translations.
Take translating “Tuesday” as an instance, three acceptable options were observed with 28 hits of 週二 [zhou er] (see Figure 41), 23 hits of 星期二 [xing qi er] (see Figure 42), 13 hits of 周二 [zhou er] (see Figure 43) while two erroneous options were observed from 3 hits of 禮拜二 [li bai er] (see Figure 44) to 3 hits of 週四 [zho si] (see Figure 45). 禮拜二 [li bai er] and 週四 [zho si] were both marked an error as the former an EN13 (inappropriate style/register) and the latter an EB11 (mistranslation); the reference column showed that 禮拜二 [li bai er] was translated by one interpretation graduate students and two undergraduates and 週四[zho si] was surprisingly translated by graduate students (two interpretation and one translation). In this case, EB11 (mistranslation errors) did more harm in serving the purpose of a translation that aimed to inform than a translation that intended to be expressive. Demonstrating such comparisons shall effectively improve the awareness of students in the relationship between their responsibility as translators and the communicative nature of their translation. In addition to compare translations of specific items among groups, the inconsistency of style or usage in one translator could be identified as well. For example, two undergraduate students (UT011 and UT014) did not adhere to the same principle
in translating the day of the week in the same text; as shown in Figure 34,
“Wednesday” was translated as “星期三 [xing qi san]” while as in Figure 41,
“Tuesday” was translated as “週二 [zhou er]” when only either 星期 [xing qi] or 週 [zhou] should be consistently used in the same text.
Figure 41. Translation of “Tuesday” in Text I001: Option 1 (週二 [zhou er])
Figure 42. Translation of “Tuesday” in Text I001: Option 2 (星期二 [xing qi er])
Figure 43. Translation of “Tuesday” in Text I001: Option 3 (周二 [zhou er])
Figure 44. Translation of “Tuesday” in Text I001: Option 4 (禮拜二 [li bai er])
Figure 45. Translation of “Tuesday” in Text I001: Option 5 (週四 [zho si])
From the search results of the annotated corpus, a great number of the translations of “artisans” in Text I003 fell into error type EN14 (other inappropriate lexical/phrasal choices). There were as many as 18 solutions found in the translation corpus for translating “artisans”: 工匠 [gong jiang]; 工藝師 [gong yi shi]; 工藝師傅 [gong yi shi fu]; 師傅 [shi fu]; 工匠師傅 [gong jiang shi fu]; 手藝師 [shou yi shi]; 手工藝師 [shou gong yi shi]; 工匠師 [gong jiang shi];
匠師 [jiang shi]; 手藝人 [shou yi ren]; 手藝師 [shou yi shi]; 藝匠 [yi jiang]; 工 藝匠 [gong yi jiang]; 藝術家 [yi shu jia]; 技師 [ji shi]; 師父 [shi fu]; 工藝師父 [gong yi shi fu]; 工匠師父[gong jiang shi fu] (see Figure 46 to Figure 54). Such a wide range of possible solutions to a translation problem could imply the very dissimilar interpretation of individual translators, which played a vital role in understanding their translation process and deserved attention for class discussions. The more possible solutions, the more mental effort might be required during the decision-‐making process and this might denote the term had a higher
匠師 [jiang shi]; 手藝人 [shou yi ren]; 手藝師 [shou yi shi]; 藝匠 [yi jiang]; 工 藝匠 [gong yi jiang]; 藝術家 [yi shu jia]; 技師 [ji shi]; 師父 [shi fu]; 工藝師父 [gong yi shi fu]; 工匠師父[gong jiang shi fu] (see Figure 46 to Figure 54). Such a wide range of possible solutions to a translation problem could imply the very dissimilar interpretation of individual translators, which played a vital role in understanding their translation process and deserved attention for class discussions. The more possible solutions, the more mental effort might be required during the decision-‐making process and this might denote the term had a higher