Non-‐specialized Texts in the Year 2007, 2008, and 2009 Chinese and English Translation and Interpretation Competency Examinations, the only official competency examination on translation and interpretation in Taiwan, organized by the Bureau of International Cultural & Educational Relations under the Ministry of Education (MOE). The topics of these six texts14 were: international oil price, biotechnology and human development, cloud computing, designer and artisans, the dilemma of Wikipedia, and a ban on genetic discrimination, each text consisting of 240 to 270 words in length. As described in the test guideline, the texts were selected from the mass media easily available to the general public, such as books, magazines, newspapers, and the Internet, and the subjects of the texts covered but were not limited to business, finance, education, culture, popular science, health care and information technology, all aiming at non-‐specialized readership. The test scores of Year 2007, 2008, and 2009 Examinations were normally distributed and the results (see Table 6) were compared by the research team for the development of the Examination in the report of Pan, Lai, and Lin (2010), which suggested that text 2009A was the most difficult text for translation (with the lowest average score among the six) as a potential result of the use of figurative expressions while text 2009B was the least difficult (with the highest average score among the six); one of the key findings from the research on the 2007-‐2009 Examination results (2010) was that the difficulty for English to Chinese translation was not aligned with the difficulty of English reading, but influenced in the main by the complexity of the English sentence structure.
14 The six texts were referred to as text 2007A, 2007B, 2008A, 2008B, 2009A, and 2009B in sequence, and were renamed in this research as Text I001_Oil, Text I002_Biotechnology, Text I003_Designers, Text I004_Computers, Text I005_Wikipedia, and Text I006_Anti-‐genetic.
Table 6. Test Scores on the MOE 2007, 2008, and 2009 Translation Competency Examinations, adapted from Pan et al., 2010, p. 19.
Text Title in
These six texts were chosen in this study mainly for the following reasons: (a) the validity of the test items has been endorsed by the MOE; (b) the test items could serve as appropriate teaching materials as they all were authentic texts accompanied by validated purposes of translation; (c) adults over 18 years of age are eligible to sit in the MOE translation proficiency exam15 and all the participants in the study were eligible testees16.
Instruments
For the statistical tests of error frequencies, SPSS 17 (Statistical Package for the Social Sciences), a widely used program for statistical analysis in social science was employed in this study.
15 Although the organizer suggests that the testees be equipped with Effective Operational Proficiency in English, i.e. the C1 level of proficiency as described in the Common European
For the collection of retrospective data, each interview was recorded by a 32MB digital recorder that saved the audio recordings in the mp3 (MPEG 1 Audio Layer-‐3) format. Transcriptions were made using Express Scribe17, an audio player software for both PC and Mac, which offers features for typists including variable speed playback (while remaining constant pitch), multi-‐channel control, playing video, file management and more.
For the compilation of an annotated learner corpus, the Chinese texts needed to be tokenized before further concordancing or annotation. The segmenter used in this study was the freely available “Chinese Word Segmentation System with Unknown Word Extraction and POS Tagging18,” developed by the Chinese Knowledge and Information Processing (CKIP) Group of the Academia Sinica of Taiwan. The system claimed an accuracy rate of 99% in tokenization for non-‐specialized texts, which was considered particularly applicable to this study because the corpus in use comprised the texts that were non-‐specialized and addressed the general public. The condcordancing needed for this study was performed by the freeware concordance program AntConc19, developed by Laurence Anthony. The program has been applicable for Windows, Macintosh OS X, and Linux. For the corpus annotation, the versatile annotation tool Multi-‐Modal Annotation in XML (MMAX) 220, was used in this study primarily for its characteristic of multi-‐level annotation, which is more flexible than existing
17 The researcher has used the free trial version of Express Scribe, which is downloadable for use without expiration at http://www.nch.com.au/scribe/.
18 The service was available at http://ckipsvr.iis.sinica.edu.tw/. The segmenter relied on a built-‐in Chinese dictionary that contained 100,000 entries and claimed a 99% of accuracy for general texts without considering neologisms. Non-‐listed strings were segmented into individual
characters, which were treated by concordancers as separate words.
19 AntConc was available at http://www.antlab.sci.waseda.ac.jp/software.html.
20 Downloadable at http://sourceforge.net/projects/mmax2/files/.
single-‐level annotation tools21, and furthermore for its stand-‐off XML data format as well as its advanced and customizable methods for information and relation visualization. The installation of MMAX2 should be preceded by that of the programming language and computing platform Java22.
Data Collection Procedure
The collection of the student translations and the administration of interviews were conducted by the researcher. The major stages of data collection for this study were described in the following three sections: the translation learner corpus and the error annotation, translation error analyses, and retrospective interviews (illustrated in Figure 1, Figure 2, and Figure 3, respectively).
The Translation Learner Corpus and the Error Annotation
The translation learner corpus was compiled by all the electronic texts participants sent to the researcher via email. All texts were consolidated and saved in the plain text format23 and were subsequently segmented by the CKIP segmenter and saved in plain texts to form the raw corpus. To ensure the compilation of the raw corpus was successful, the corpus was opened by concordancing software and the software functions were tested.
21 Such as the single–level annotation program Markin. A free version with limited use is available at http://www.cict.co.uk/markin/download.php.
22 Java Runtime Environment was downloadable at
http://www.oracle.com/technetwork/java/javase/downloads/index.html.
23 For later MMAX2-‐based annotation, the encoding system is the American Standard Code for Information Interchange (ASCII); for concordancing of the raw corpus, the encoding system is the
The raw corpus was saved as another new text file for the use of annotation with three markable levels by the MMAX2 and the raw corpus was then kept as a backup corpus in case of any data corruption in the subsequent steps. The annotation was performed by following the five steps in “the life cycle of an annotation,” which included “the preparation of the machine–readable corpus, the definition and formalization of the annotation task, the manual annotation proper, the checking of the feasibility of the annotation, and the actual utilization of the completed annotation”(Müller & Strube, 2006, p. 199).
Figure 1. Data Collection Procedure for the Translation Learner Corpus and the Error
The participants sent six translations in MS Word document to the researcher via email.
The translated texts were segmented by the Chinese segmentater CKIP and saved in plain texts to form the raw
corpus. To ensure the compilation of the raw corpus was successful, the corpus was opened by concordancing software
and the software functions were tested.
The raw corpus was saved as another new text {ile and the new text {ile was annotated with three markable levels by the
MMAX2 to be an annotated corpus.
The annotation began with the preparation of the machine–
readable corpus, the de{inition and formalization of the annotation task, the manual annotation proper, and the checking of the feasibility of the annotation, and ended with the
actual utilization of the completed annotation.
Translation Error Analyses
Upon the confirmation of participation in this study, each student received six English texts, in the formats of both print-‐outs and the Microsoft Word documents via email, which s/he was asked to translate into Chinese according to the clearly stated translation brief before each source text. Participants were allowed to use any tools and resources available to them except that to avoid possible interference, they were advised not to refer to translations of the same source texts that might be found on the Internet. The date for submitting translations was decided by each participant at their convenience. Once the six translations were completed, they were emailed to the researcher with a note reporting the amount of time consumed on each translation. All the electronic files of translations were tightly kept by the researcher and a copy of each translation was printed out for error marking and for the use of reviews with the participant in the interviews.
Figure 2. Data Collection Procedure for Translation Error Analyses
Participants were recruited.
The list of participants was {inalized.
The interviews for each participant were scheduled and the dates for submitting translations were
arranged.
Translations from participants were received in electronic forms (Microsoft Word) via email.
Each translation was printed out for the later use in the interview and the electronic text {iles were saved for the compilation of the translation learner corpus.
Retrospective Interviews
The date of interview was scheduled by each participant at their convenience, ranging from one day to three weeks after their completion of the translations. All the interviews were conducted in the same well-‐lit meeting room, which was reserved beforehand for each interview to ensure minimal interference and noises.
When the participants arrived at the meeting room, they were first briefed on the procedures of the interview, signed the consent form (Appendix B), and filled out a questionnaire (Appendix C) on their background. The interview was divided into two parts; the first part of the interview contained general questions to the participants for an overall understanding of the translation learning strategies they generally used, against the backdrop of translating these six texts. The interview guides were designed according to the backgrounds of the Grad Group and the Under Group; therefore, the two interview guides, though covering the same topics—warm-‐ups, metacognitive activities, research tools/ability, and coping strategies for problems—were slightly different in some questions asked (see Appendix D for the Interview guide for the Grad Group and Appendix E for the Under Group).
In the second half of the interview, the researcher reviewed each of the six translations with the participant on the parts highlighted with error marks and asked the participant whether s/he was satisfied with such renderings and why.
The marked parts were reviewed in the order of types shown in the error typology table (see Table 7 and Table 8), i.e. errors marked as EB11 (mistranslation) were discussed first and then those as EB12 (unintelligibility) and so forth. In one case,
the participant did not agree with the researcher in a few error markings; the researcher recorded these problematic segments and sought advice from another researcher after the interview. The error markings and their categories remained the same after discussions with another researcher.
Figure 3. Data Collection Procedure for Retrospective Interviews
The researcher briefed the participants on the procedures of the interview.
Participants {illed in the questionnaire on their backgrounds.
The {irst half of the interview:
The participant answered a number of open-‐ended questions on the interview guide.
The second half of the interview:
The researcher reviewed parts marked as errors in each translation with the participant and probed for the reasons.
[analysis of error sources]
The researcher offered feedback on the translations according to the results of the error analysis .
[error remediation]
The researcher concluded the interview, and the participant shared their feedback on error analysis and this study if any.
Data Analysis
This study collected two types of data: one was 420 Chinese translations produced by 70 participants against the same six English source texts; the other was the transcriptions from the interviews. The 420 Chinese translations were first complied into a translation learner corpus and then the translation learner corpus was annotated with errors. The number of errors in the translations was calculated for statistical tests in order to examine the differences of error frequencies among groups. Meanwhile, the interview data were reviewed to tease out the reasons for each error type.
The Translation Learner Corpus and the Error Annotation
This section describes how the third step in the annotation life cycle, i.e. the manual annotation proper and the fourth step of checking the feasibility of the annotation were completed for this research.
Upon the completion of the translation learner corpus (the first step in the annotation life cycle), the researcher needed to finish the setting of the computer environment by: (a) downloading and unzipping MMAX2; (b) installing Java Runtime Environment24, before proceeding to the second step of defining the annotation scheme (of which the results of customization for this research are described in Chapter Four).
The creation of annotation and the checking of its feasibility are described in terms of the user interface and the folder structure, which can be overwhelmingly complex for researchers new to MMAX2.
User interface
When the MMAX2 was started, three windows would be launched by default:
the Main Window, the Attribute Window, and the Markable Level Control Panel.
The Main Window (see Figure 4) was the main editor and content viewer where the text and annotations where shown.
Figure 4. MMAX2 User Interface-‐the Main Window
The Attribute Window (see Figure 5) allowed the editing of each attribute of a markable.
Figure 5. MMAX2 User Interface-‐the Attribute Window
The Markable Level Control Panel (see Figure 6) was the panel in which the user could manipulate the markable level by choosing active (to edit), visible (to view), invisible (to hide) in the drop-‐down list.
Figure 6. MMAX2 User Interface-‐the Markable Level Control Panel
The style of the content output could be chosen from Style Sheet (see Figure 7) in Settings in the Markable Level Control Panel.
Figure 7. MMAX2 User Interface-‐Style Sheet in the Markable Level Control Panel
The researcher could choose the display style if need be. The following style was the default display (see Figure 8) when the project was created.
Figure 8. MMAX2 User Interface-‐Default Display Style in the Main Window
To fulfill the purpose of this study, showing the error typology label at the right lower corner of each markable (see Figure 9) was necessary and such style was used.
Figure 9. MMAX2 User Interface-‐Display Style Showing Error Typology Label at the Right
Lower Corner of Markables
When annotations were made and saved, the corpus was ready for research inquiries. The Query Console (see Figure 10) would appear by choosing Tools and then Query Console in the Main Window.
Figure 10. MMAX2 User Interface-‐Query Console in the Main Window
Folder structure
After the creation of a project, all the documents generated during the process would go to the project folder. For the effective management of the files, creating subfolders to accommodate different function files was essential, i.e., creating Basedata, Customization, Markable, Scheme, Style folders (as illustrated in Figure 11) and assigning them as the destination for related files.
Figure 11. MMAX2 Folder Structure
The Basedata folder accommodated the files generated by MMAX2 from the input file in the MMAX2 Project Wizard.
The Scheme folder included the files that were automatically generated upon the project creation and corresponding to the section of Markable Levels added in the project wizard. The least granularity of the level should be placed at the top; e.g.
in this study, the sequence of the markable levels should be error typology, translator background, and text information. By manually modifying the files in the Scheme folder, the attributes for a markable level were added or edited.
The Style folder accommodated the files that decided the layout of each markable level; that is, the user could choose an attribute value behind a markable.
The Markable folder contained the files storing the annotation data; normally there was no need to modify these files.
The Customization folders had files that defined the look (e.g. color, font size, etc.) of the markable (annotation). Each markable has its own customization file.
The folder structure of a project could be found in the Common_paths file.
After being familiarized with the working environment of MMAX2 as described above, the researcher took the following steps for annotating the translation learner corpus:
Step 1: Making sure that the Chinese texts to be annotated were segmented and saved in UTF-‐8 encoding (by a plain text editor, e.g. Microsoft Notepad).
Step 2: Starting MMAX2 and creating a project (Tools-‐>Project Wizard) (see Figure 12) for the UTF-‐8 plain texts.
Figure 12. Snapshot of MMAX2 Project Wizard
Step 3: Defining attributes of each markable level by opening xyz_scheme.xml document available in the project folder (xyz is the name of a markable level defined by the creator of the project) in a plain text editor.
The MMAX2 supported three types of attribute: (a) FREETEXT, which defined an attribute to be a string of text; (b) NOMINAL_LIST, which defined an attribute to be a drop-‐down list from which the user could choose a value; (c) NOMINAL_BUTTON, which defined an attribute to be a nominal button for choosing the desired item.
The following was an example of the defined scheme of the markable level of the Source Text Information for this study, to which the research assigned five attributes: (1) the source text ID, (2) the source text type, (3) the direction of
translation, (4) the name of the annotator, and (5) the year of annotation. The document Source_Txt_Info_scheme.xml was opened in a plain text editor, and the scripts were written according to the above five attributes. From line 4 in the scripts shown below, the "Source_txt_ID" was set to be "freetext", while from line 6, the "Source_Txt_Type" was "nominal_button" followed by three choices ("Informative", "Expressive", and "Operative") from line 7 to 9.
---- begin of Source_Txt_Info_scheme.xml ----
The finished display on the user interface was shown below as in Figure 13.
Figure 13. Finished Display of Annotation Levels
Step 4. Loading the new project to start the annotation manipulation (to modify, add, or delete annotations) by choosing the target corpus in the Input File cell, and UTF-‐8 for Encoding.
When the text appeared, the researcher marked a string of base data elements (see Figure 14) to create a markable. A markable could be discontinuous when it crossed a segment boundary and omitted some elements in the string (Muller, 2006, p.74).
Figure 14. Marking Elements in the Base Data
When a string of text was chosen, it could be clicked again and a pop-‐up menu of the markable level (see Figure 15) would appear.
Figure 15. The Pop-‐up Menu of Markable Levels
By clicking the chosen string text again, the researcher could edit its attributes in the Attribute Window. For a chosen string to carry more than one level of markable, the same procedures should be repeated: marking the string, choosing the markable level, and editing the attributes. After applying the changes, it was necessary to click Display -‐> Reapply style sheet after base data editing in the Main Window to make the selected style sheet take effect on the new annotation.
In case it was necessary to make any changes in the corpus, for example, to delete,
encoding, the changes should be made in the Main Window through Settings -‐>
encoding, the changes should be made in the Main Window through Settings -‐>