• 沒有找到結果。

Non-­‐specialized   Texts   in   the   Year   2007,   2008,   and   2009   Chinese   and   English   Translation   and   Interpretation   Competency   Examinations,   the   only   official   competency   examination   on   translation   and   interpretation   in   Taiwan,   organized   by  the  Bureau  of  International  Cultural  &  Educational  Relations  under  the  Ministry   of   Education   (MOE).   The   topics   of   these   six   texts14   were:   international   oil   price,   biotechnology   and   human   development,   cloud   computing,   designer   and   artisans,   the   dilemma   of   Wikipedia,   and   a   ban   on   genetic   discrimination,   each   text   consisting   of   240   to   270   words   in   length.   As   described   in   the   test   guideline,   the   texts   were   selected   from   the   mass   media   easily   available   to   the   general   public,   such   as   books,   magazines,   newspapers,   and   the   Internet,   and   the   subjects   of   the   texts  covered  but  were  not  limited  to  business,  finance,  education,  culture,  popular   science,   health   care   and   information   technology,   all   aiming   at   non-­‐specialized   readership.   The   test   scores   of   Year   2007,   2008,   and   2009   Examinations   were   normally  distributed  and  the  results  (see  Table  6)  were  compared  by  the  research   team   for   the   development   of   the   Examination   in   the   report   of   Pan,   Lai,   and   Lin   (2010),  which  suggested  that  text  2009A  was  the  most  difficult  text  for  translation   (with   the   lowest   average   score   among   the   six)   as   a   potential   result   of   the   use   of   figurative   expressions   while   text   2009B   was   the   least   difficult   (with   the   highest   average   score   among   the   six);   one   of   the   key   findings   from   the   research   on   the   2007-­‐2009   Examination   results   (2010)   was   that   the   difficulty   for   English   to   Chinese   translation   was   not   aligned   with   the   difficulty   of   English   reading,   but   influenced  in  the  main  by  the  complexity  of  the  English  sentence  structure.    

                                                                                                               

14   The  six  texts  were  referred  to  as  text  2007A,  2007B,  2008A,  2008B,  2009A,  and  2009B  in   sequence,  and  were  renamed  in  this  research  as  Text  I001_Oil,  Text  I002_Biotechnology,  Text   I003_Designers,  Text  I004_Computers,  Text  I005_Wikipedia,  and  Text  I006_Anti-­‐genetic.  

Table  6.  Test  Scores  on  the  MOE  2007,  2008,  and  2009  Translation  Competency   Examinations,  adapted  from  Pan  et  al.,  2010,  p.  19.  

Text  Title  in  

These  six  texts  were  chosen  in  this  study  mainly  for  the  following  reasons:  (a)   the   validity   of   the   test   items   has   been   endorsed   by   the   MOE;   (b)   the   test   items   could   serve   as   appropriate   teaching   materials   as   they   all   were   authentic   texts   accompanied  by  validated  purposes  of  translation;  (c)  adults  over  18  years  of  age   are   eligible   to   sit   in   the   MOE   translation   proficiency   exam15   and   all   the   participants  in  the  study  were  eligible  testees16.  

 

Instruments  

For  the  statistical  tests  of  error  frequencies,  SPSS  17  (Statistical  Package  for   the  Social  Sciences),  a  widely  used  program  for  statistical  analysis  in  social  science   was  employed  in  this  study.    

                                                                                                               

15   Although  the  organizer  suggests  that  the  testees  be  equipped  with  Effective  Operational   Proficiency  in  English,  i.e.  the  C1  level  of  proficiency  as  described  in  the  Common  European  

For   the   collection   of   retrospective   data,   each   interview   was   recorded   by   a   32MB  digital  recorder  that  saved  the  audio  recordings  in  the  mp3  (MPEG  1  Audio   Layer-­‐3)  format.  Transcriptions  were  made  using  Express  Scribe17,  an  audio  player   software  for  both  PC  and  Mac,  which  offers  features  for  typists  including  variable   speed   playback   (while   remaining   constant   pitch),   multi-­‐channel   control,   playing   video,  file  management  and  more.  

For  the  compilation  of  an  annotated  learner  corpus,  the  Chinese  texts  needed   to  be  tokenized  before  further  concordancing  or  annotation.  The  segmenter  used   in   this   study   was   the   freely   available   “Chinese   Word   Segmentation   System   with   Unknown   Word   Extraction   and   POS   Tagging18,”   developed   by   the   Chinese   Knowledge   and   Information   Processing   (CKIP)   Group   of   the   Academia   Sinica   of   Taiwan.   The   system   claimed   an   accuracy   rate   of   99%   in   tokenization   for   non-­‐specialized   texts,   which   was   considered   particularly   applicable   to   this   study   because   the   corpus   in   use   comprised   the   texts   that   were   non-­‐specialized   and   addressed   the   general   public.   The   condcordancing   needed   for   this   study   was   performed   by   the   freeware   concordance   program   AntConc19,   developed   by   Laurence  Anthony.  The  program  has  been  applicable  for  Windows,  Macintosh  OS  X,   and   Linux.   For   the   corpus   annotation,   the   versatile   annotation   tool   Multi-­‐Modal   Annotation   in   XML   (MMAX)   220,   was   used   in   this   study   primarily   for   its   characteristic   of   multi-­‐level   annotation,   which   is   more   flexible   than   existing                                                                                                                  

17   The  researcher  has  used  the  free  trial  version  of  Express  Scribe,  which  is  downloadable  for  use   without  expiration  at  http://www.nch.com.au/scribe/.  

18   The  service  was  available  at  http://ckipsvr.iis.sinica.edu.tw/.  The  segmenter  relied  on  a  built-­‐in   Chinese  dictionary  that  contained  100,000  entries  and  claimed  a  99%  of  accuracy  for  general  texts   without  considering  neologisms.  Non-­‐listed  strings  were  segmented  into  individual  

characters,  which  were  treated  by  concordancers  as  separate  words.  

19   AntConc  was  available  at  http://www.antlab.sci.waseda.ac.jp/software.html.  

20   Downloadable  at  http://sourceforge.net/projects/mmax2/files/.  

single-­‐level  annotation  tools21,  and  furthermore  for  its  stand-­‐off  XML  data  format   as   well   as   its   advanced   and   customizable   methods   for   information   and   relation   visualization.   The   installation   of   MMAX2   should   be   preceded   by   that   of   the   programming  language  and  computing  platform  Java22.  

 

Data  Collection  Procedure  

The   collection   of   the   student   translations   and   the   administration   of   interviews  were  conducted  by  the  researcher.  The  major  stages  of  data  collection   for   this   study   were   described   in   the   following   three   sections:   the   translation   learner   corpus   and   the   error   annotation,   translation   error   analyses,   and   retrospective   interviews   (illustrated   in   Figure   1,   Figure   2,   and   Figure   3,   respectively).  

 

The  Translation  Learner  Corpus  and  the  Error  Annotation  

The   translation   learner   corpus   was   compiled   by   all   the   electronic   texts   participants  sent  to  the  researcher  via  email.  All  texts  were  consolidated  and  saved   in   the   plain   text   format23   and   were   subsequently   segmented   by   the   CKIP   segmenter   and   saved   in   plain   texts   to   form   the   raw   corpus.   To   ensure   the   compilation   of   the   raw   corpus   was   successful,   the   corpus   was   opened   by   concordancing  software  and  the  software  functions  were  tested.    

                                                                                                               

21   Such  as  the  single–level  annotation  program  Markin.   A   free  version  with  limited  use  is   available  at  http://www.cict.co.uk/markin/download.php.  

22   Java  Runtime  Environment  was  downloadable  at  

http://www.oracle.com/technetwork/java/javase/downloads/index.html.  

23   For  later  MMAX2-­‐based  annotation,  the  encoding  system  is  the  American  Standard  Code  for   Information  Interchange  (ASCII);  for  concordancing  of  the  raw  corpus,  the  encoding  system  is  the  

  The  raw  corpus  was  saved  as  another  new  text  file  for  the  use  of  annotation   with  three  markable  levels  by  the  MMAX2  and  the  raw  corpus  was  then  kept  as  a   backup   corpus   in   case   of   any   data   corruption   in   the   subsequent   steps.   The   annotation   was   performed   by   following   the   five   steps   in   “the   life   cycle   of   an   annotation,”  which  included  “the  preparation  of  the  machine–readable  corpus,  the   definition  and  formalization  of  the  annotation  task,  the  manual  annotation  proper,   the   checking   of   the   feasibility   of   the   annotation,   and   the   actual   utilization   of   the   completed  annotation”(Müller  &  Strube,  2006,  p.  199).  

 

Figure  1.  Data  Collection  Procedure  for  the  Translation  Learner  Corpus  and  the  Error  

The  participants  sent  six  translations  in  MS  Word  document  to   the  researcher  via  email.  

The  translated  texts  were  segmented  by  the  Chinese   segmentater  CKIP  and  saved  in  plain  texts  to  form  the  raw  

corpus.  To  ensure  the  compilation  of  the  raw  corpus  was   successful,  the  corpus  was  opened  by  concordancing  software  

and  the  software  functions  were  tested.  

The  raw  corpus  was  saved  as  another  new  text  {ile  and  the  new   text  {ile  was  annotated  with  three  markable  levels  by  the  

MMAX2  to  be  an  annotated  corpus.    

The  annotation  began  with  the  preparation  of  the  machine–

readable  corpus,  the  de{inition  and  formalization  of  the   annotation  task,  the  manual  annotation  proper,  and  the   checking  of  the  feasibility  of  the  annotation,  and  ended  with  the  

actual  utilization  of  the  completed  annotation.

Translation  Error  Analyses  

Upon   the   confirmation   of   participation   in   this   study,   each   student   received   six   English   texts,   in   the   formats   of   both   print-­‐outs   and   the   Microsoft   Word   documents  via  email,  which  s/he  was  asked  to  translate  into  Chinese  according  to   the   clearly   stated   translation   brief   before   each   source   text.   Participants   were   allowed   to   use   any   tools   and   resources   available   to   them   except   that   to   avoid   possible   interference,   they   were   advised   not   to   refer   to   translations   of   the   same   source   texts   that   might   be   found   on   the   Internet.   The   date   for   submitting   translations   was   decided   by   each   participant   at   their   convenience.   Once   the   six   translations   were   completed,   they   were   emailed   to   the   researcher   with   a   note   reporting  the  amount  of  time  consumed  on  each  translation.  All  the  electronic  files   of  translations  were  tightly  kept  by  the  researcher  and  a  copy  of  each  translation   was  printed  out  for  error  marking  and  for  the  use  of  reviews  with  the  participant   in  the  interviews.  

 

Figure  2.  Data  Collection  Procedure  for  Translation  Error  Analyses  

Participants  were  recruited.    

The  list  of  participants  was  {inalized.  

The  interviews  for  each  participant  were  scheduled   and  the  dates  for  submitting  translations  were  

arranged.  

Translations  from  participants  were  received  in   electronic  forms  (Microsoft  Word)  via  email.  

Each  translation  was  printed  out  for  the  later  use  in   the  interview  and  the  electronic  text  {iles  were  saved   for  the  compilation  of  the  translation  learner  corpus.  

Retrospective  Interviews  

The  date  of  interview  was  scheduled  by  each  participant  at  their  convenience,   ranging  from  one  day  to  three  weeks  after  their  completion  of  the  translations.  All   the   interviews   were   conducted   in   the   same   well-­‐lit   meeting   room,   which   was   reserved  beforehand  for  each  interview  to  ensure  minimal  interference  and  noises.  

When  the  participants  arrived  at  the  meeting  room,  they  were  first  briefed  on  the   procedures  of  the  interview,  signed  the  consent  form  (Appendix  B),  and  filled  out  a   questionnaire  (Appendix  C)  on  their  background.  The  interview  was  divided  into   two   parts;   the   first   part   of   the   interview   contained   general   questions   to   the   participants   for   an   overall   understanding   of   the   translation   learning   strategies   they   generally   used,   against   the   backdrop   of   translating   these   six   texts.   The   interview  guides  were  designed  according  to  the  backgrounds  of  the  Grad  Group   and   the   Under   Group;   therefore,   the   two   interview   guides,   though   covering   the   same   topics—warm-­‐ups,   metacognitive   activities,   research   tools/ability,   and   coping   strategies   for   problems—were   slightly   different   in   some   questions   asked   (see  Appendix  D  for  the  Interview  guide  for  the  Grad  Group  and  Appendix  E  for  the   Under  Group).  

  In  the  second  half  of  the  interview,  the  researcher  reviewed  each  of  the  six   translations   with   the   participant   on   the   parts   highlighted   with   error   marks   and   asked   the   participant   whether   s/he   was   satisfied   with   such   renderings   and   why.  

The  marked  parts  were  reviewed  in  the  order  of  types  shown  in  the  error  typology   table  (see  Table  7  and  Table  8),  i.e.  errors  marked  as  EB11  (mistranslation)  were   discussed  first  and  then  those  as  EB12  (unintelligibility)  and  so  forth.  In  one  case,  

the   participant   did   not   agree   with   the   researcher   in   a   few   error   markings;   the   researcher  recorded  these  problematic  segments  and  sought  advice  from  another   researcher  after  the  interview.  The  error  markings  and  their  categories  remained   the  same  after  discussions  with  another  researcher.    

 

Figure  3.  Data  Collection  Procedure  for  Retrospective  Interviews  

 

The  researcher  briefed  the  participants  on  the  procedures  of  the   interview.  

Participants  {illed  in  the  questionnaire  on  their  backgrounds.    

The  {irst  half  of  the  interview:      

The  participant  answered  a  number  of  open-­‐ended  questions  on   the  interview  guide.    

The  second  half  of  the  interview:  

The  researcher  reviewed  parts  marked  as  errors  in  each   translation  with  the  participant  and  probed  for  the  reasons.    

[analysis  of  error  sources]  

The  researcher  offered  feedback  on  the  translations  according   to  the  results  of  the  error  analysis  .    

[error  remediation]  

The  researcher  concluded  the  interview,  and  the  participant   shared  their  feedback  on  error  analysis  and  this  study  if  any.    

Data  Analysis  

This   study   collected   two   types   of   data:   one   was   420   Chinese   translations   produced   by   70   participants   against   the   same   six   English   source   texts;   the   other   was   the   transcriptions   from   the   interviews.   The   420   Chinese   translations   were   first   complied   into   a   translation   learner   corpus   and   then   the   translation   learner   corpus  was  annotated  with  errors.  The  number  of  errors  in  the  translations  was   calculated   for   statistical   tests   in   order   to   examine   the   differences   of   error   frequencies  among  groups.  Meanwhile,  the  interview  data  were  reviewed  to  tease   out  the  reasons  for  each  error  type.    

 

The  Translation  Learner  Corpus  and  the  Error  Annotation  

This  section  describes  how  the  third  step  in  the  annotation  life  cycle,  i.e.  the   manual   annotation   proper   and   the   fourth   step   of   checking   the   feasibility   of   the   annotation  were  completed  for  this  research.    

Upon   the   completion   of   the   translation   learner   corpus   (the   first   step   in   the   annotation  life  cycle),  the  researcher  needed  to  finish  the  setting  of  the  computer   environment   by:   (a)   downloading   and   unzipping   MMAX2;   (b)   installing   Java   Runtime   Environment24,   before   proceeding   to   the   second   step   of   defining   the   annotation   scheme   (of   which   the   results   of   customization   for   this   research   are   described  in  Chapter  Four).    

The  creation  of  annotation  and  the  checking  of  its  feasibility  are  described  in   terms  of  the  user  interface  and  the  folder  structure,  which  can  be  overwhelmingly   complex  for  researchers  new  to  MMAX2.    

 

User  interface  

When  the  MMAX2  was  started,  three  windows  would  be  launched  by  default:  

the  Main  Window,  the  Attribute  Window,  and  the  Markable  Level  Control  Panel.  

The   Main   Window   (see   Figure   4)   was   the   main   editor   and   content   viewer   where  the  text  and  annotations  where  shown.  

Figure  4.  MMAX2  User  Interface-­‐the  Main  Window      

The  Attribute  Window  (see  Figure  5)  allowed  the  editing  of  each  attribute  of   a  markable.  

Figure  5.  MMAX2  User  Interface-­‐the  Attribute  Window      

The  Markable  Level  Control  Panel  (see  Figure  6)  was  the  panel  in  which  the   user   could   manipulate   the   markable   level   by   choosing   active   (to   edit),   visible   (to   view),  invisible  (to  hide)  in  the  drop-­‐down  list.  

Figure  6.  MMAX2  User  Interface-­‐the  Markable  Level  Control  Panel      

The  style  of  the  content  output  could  be  chosen  from  Style  Sheet  (see  Figure  7)   in  Settings  in  the  Markable  Level  Control  Panel.  

Figure  7.  MMAX2  User  Interface-­‐Style  Sheet  in  the  Markable  Level  Control  Panel      

The  researcher  could  choose  the  display  style  if  need  be.  The  following  style   was  the  default  display  (see  Figure  8)  when  the  project  was  created.    

Figure  8.  MMAX2  User  Interface-­‐Default  Display  Style  in  the  Main  Window    

 

To   fulfill   the   purpose   of   this   study,   showing   the   error   typology   label   at   the   right  lower  corner  of  each  markable  (see  Figure  9)  was  necessary  and  such  style   was  used.    

Figure   9.   MMAX2   User   Interface-­‐Display   Style   Showing   Error   Typology   Label   at   the   Right    

Lower  Corner  of  Markables  

 

When  annotations  were  made  and  saved,  the  corpus  was  ready  for  research   inquiries.  The  Query  Console  (see  Figure  10)  would  appear  by  choosing  Tools  and   then  Query  Console  in  the  Main  Window.  

Figure  10.  MMAX2  User  Interface-­‐Query  Console  in  the  Main  Window      

Folder  structure  

After   the   creation   of   a   project,   all   the   documents   generated   during   the   process  would  go  to  the  project  folder.  For  the  effective  management  of  the  files,   creating   subfolders   to   accommodate   different   function   files   was   essential,   i.e.,   creating  Basedata,  Customization,  Markable,  Scheme,  Style  folders  (as  illustrated  in   Figure  11)  and  assigning  them  as  the  destination  for  related  files.  

Figure  11.  MMAX2  Folder  Structure      

The  Basedata  folder  accommodated  the  files  generated  by  MMAX2  from  the   input  file  in  the  MMAX2  Project  Wizard.  

The  Scheme  folder  included  the  files  that  were  automatically  generated  upon   the  project  creation  and  corresponding  to  the  section  of  Markable  Levels  added  in   the  project  wizard.  The  least  granularity  of  the  level  should  be  placed  at  the  top;  e.g.  

in   this   study,   the   sequence   of   the   markable   levels   should   be   error   typology,   translator  background,  and  text  information.  By  manually  modifying  the  files  in  the   Scheme  folder,  the  attributes  for  a  markable  level  were  added  or  edited.  

The   Style   folder   accommodated   the   files   that   decided   the   layout   of   each   markable  level;  that  is,  the  user  could  choose  an  attribute  value  behind  a  markable.    

The  Markable  folder  contained  the  files  storing  the  annotation  data;  normally   there  was  no  need  to  modify  these  files.  

The  Customization  folders  had  files  that  defined  the  look  (e.g.  color,  font  size,   etc.)  of  the  markable  (annotation).  Each  markable  has  its  own  customization  file.  

The  folder  structure  of  a  project  could  be  found  in  the  Common_paths  file.  

After   being   familiarized   with   the   working   environment   of   MMAX2   as   described   above,   the   researcher   took   the   following   steps   for   annotating   the   translation  learner  corpus:  

 

Step   1:   Making   sure   that   the   Chinese   texts   to   be   annotated   were   segmented   and   saved  in  UTF-­‐8  encoding  (by  a  plain  text  editor,  e.g.  Microsoft  Notepad).  

Step  2:  Starting  MMAX2  and  creating  a  project  (Tools-­‐>Project  Wizard)  (see  Figure   12)  for  the  UTF-­‐8  plain  texts.  

 

Figure  12.  Snapshot  of  MMAX2  Project  Wizard  

 

Step   3:   Defining   attributes   of   each   markable   level   by   opening   xyz_scheme.xml   document   available   in   the   project   folder   (xyz   is   the   name   of   a   markable   level   defined  by  the  creator  of  the  project)  in  a  plain  text  editor.    

The  MMAX2  supported  three  types  of  attribute:  (a)  FREETEXT,  which  defined  an   attribute  to  be  a  string  of  text;  (b)  NOMINAL_LIST,  which  defined  an  attribute  to  be   a  drop-­‐down  list  from  which  the  user  could  choose  a  value;  (c)  NOMINAL_BUTTON,   which  defined  an  attribute  to  be  a  nominal  button  for  choosing  the  desired  item.    

The  following  was  an  example  of  the  defined  scheme  of  the  markable  level  of  the   Source   Text   Information   for   this   study,   to   which   the   research   assigned   five   attributes:   (1)   the   source   text   ID,   (2)   the   source   text   type,   (3)   the   direction   of  

translation,   (4)   the   name   of   the   annotator,   and   (5)   the   year   of   annotation.   The   document  Source_Txt_Info_scheme.xml  was  opened  in  a  plain  text  editor,  and  the   scripts   were   written   according   to   the   above   five   attributes.   From   line   4   in   the   scripts  shown  below,  the  "Source_txt_ID"  was  set  to  be  "freetext",  while  from  line  6,   the   "Source_Txt_Type"   was   "nominal_button"   followed   by   three   choices   ("Informative",  "Expressive",  and  "Operative")  from  line  7  to  9.    

---- begin of Source_Txt_Info_scheme.xml ----

The  finished  display  on  the  user  interface  was  shown  below  as  in  Figure  13.    

   

Figure  13.  Finished  Display  of  Annotation  Levels  

 

Step  4.  Loading  the  new  project  to  start  the  annotation  manipulation  (to  modify,   add,  or  delete  annotations)  by  choosing  the  target  corpus  in  the  Input  File  cell,  and   UTF-­‐8  for  Encoding.    

When  the  text  appeared,  the  researcher  marked  a  string  of  base  data  elements  (see   Figure   14)   to   create   a   markable.   A   markable   could   be   discontinuous   when   it   crossed   a   segment   boundary   and   omitted   some   elements   in   the   string   (Muller,   2006,  p.74).  

 

Figure  14.  Marking  Elements  in  the  Base  Data  

 

When  a  string  of  text  was  chosen,  it  could  be  clicked  again  and  a  pop-­‐up  menu  of   the  markable  level  (see  Figure  15)  would  appear.    

 

Figure  15.  The  Pop-­‐up  Menu  of  Markable  Levels  

 

By  clicking  the  chosen  string  text  again,  the  researcher  could  edit  its  attributes  in   the   Attribute   Window.   For   a   chosen   string   to   carry   more   than   one   level   of   markable,  the  same  procedures  should  be  repeated:  marking  the  string,  choosing   the   markable   level,   and   editing   the   attributes.   After   applying   the   changes,   it   was   necessary  to  click  Display  -­‐>  Reapply  style  sheet  after  base  data  editing  in  the  Main   Window  to  make  the  selected  style  sheet  take  effect  on  the  new  annotation.  

In  case  it  was  necessary  to  make  any  changes  in  the  corpus,  for  example,  to  delete,  

encoding,   the   changes   should   be   made   in   the   Main   Window   through   Settings   -­‐>  

encoding,   the   changes   should   be   made   in   the   Main   Window   through   Settings   -­‐>  

相關文件