Research questions - 口譯產出停頓時的認知歷程：以視譯眼動軌跡為證

Chapter 1 Introduction

1.2 Research questions

This research proposes to incorporate eye movement data in the analyses of pause occurrence in sight translation output. The moment-to-moment nature of the eye tracker will be able to indicate the interpreter’s cognitive state right at the moment of pause onset.

The purpose of this research is to establish an empirical base for the explanation of pause occurrence in interpretation. The primary research questions are as follows:

(1) To examine the interpreter’s eye movements during pause onset in sight translation. Examined eye movement parameters include the eye’s position in reading passes, fixation location and saccade direction.

(2) To infer the cognitive process during pauses in sight translation signified by eye movement data.

The remainder of this thesis is organized into four sections. Chapter 2 reviews previous studies on sight translation, pauses in spontaneous speech / interpretation and eye tracking. Chapter 3 describes the oral pause data

collection process, and establishes preliminary pause analyses. Chapter 4 describes the eye movement data collection process, and incorporates eye movement data into the final pause analyses. Finally, Chapter 5 discusses exceptions in eye movement data, and concludes with how the findings can be applied to interpretation studies and training, and offers suggestions for future research.

Chapter 2 Literature Review

2.1 Interpretation and sight translation

The act of interpretation can be performed through several different modes, with consecutive interpretation (CI), simultaneous interpretation (SI) and sight translation (ST) being the most common few.

The input of CI and SI are both oral, as interpreter renders the source text spoken by the speaker orally into the target language. In CI, the speaker and interpreter tend to appear before the audience alongside each other, and the two take turns to speak. The interpreter would take notes while the speaker speaks for a certain period of time, usually a few minutes, then begin interpreting once the speaker pauses, sometimes with the help of the note he has taken. In SI, the interpreter would work in the booth, listening to the speaker’s source text through a headphone and producing his interpretation for the audiences to hear through a microphone. The speaker would speak continuously under this situation, thus the interpreter would need to listen and comprehend the source text, reformulate the message and produce the interpretation in the target language at the same time throughout the entire speech.

Sight translation’s major difference with CI and SI is that the source input is visual text instead of oral speech. In normal ST task, the interpreter is given a piece of written text, then either given time to skim through the entire text first or directly begin interpreting orally. In the professional setting, most interpretation is given through the CI and SI mode. ST is most commonly used alongside SI in scientific and technical meetings in the form of SI with

text (Lambert, 2004). As speakers at these conferences tend to read directly from their published reports or prepared speech text, the interpreter is given the written text of the speech beforehand. Later when the interpreter is performing SI, he can rely not only on the speaker’s oral input, but also refer to the text at hand to make up for missed information. SI with text is slightly different from normal ST task since the interpreter could not simply interpret from the visual text but need to pay attention to the speaker’s output constantly, in case the interpreter is going too fast, or the speaker deviates from the originally written text.

2.1.1 Characteristics and difficulties of sight translation

ST is commonly believed to be easier and less demanding than translation, which it shares the same visual input, and interpretation, which it shares the same oral output (Sampaio, 2007). However, scholars of interpretation have argued that ST is not easier but simply different from CI and SI, requiring different interpreting strategies and cognitive efforts. The complexity and techniques of ST should not be deemed less demanding than other modes of interpretation (Agrifoglio, 2004; Shreve, Lacruz, & Angelone, 2010).

The most distinct feature of ST is the visual input. As interpreter translates from a piece of written text, the source text is always available to the interpreter throughout the whole task. This luxury is not present in SI and CI, in which the oral input is transient and only appears once, thereby unable to be referred to or reviewed during the production phase. SI interpreter must rely on his memory and CI interpreter on his note to produce the interpretation in the target language, but interpreter performing ST can move forward and backward in the source text as he wishes for information he

needs for production.

ST production is also not paced by the speaker, as there is no speaker at all. On the other hand, interpreter performing SI cannot set the pace of his interpretation, since the interpreted output must match the pacing of the speaker to keep the audiences on track, and avoid storing too much information in the interpreter’s working memory that increases the difficulty of the task. Even CI interpreter, who can pace his production freely, is under the speaker’s constrain. The interpreter is usually required to begin interpreting as soon as the speaker stops, so the time available for comprehension and reformulation is paced by the speaker, not the interpreter (Agrifoglio, 2004; Shreve et al., 2010).

However, ST interpreter also faces certain difficulties that are unique to the nature of ST. First, the different linguistic and syntactic structure between written text and oral speech means that ST interpreter needs to comprehend an input text that is more formal and contains complex and complicated structures such as compound clauses. Thus, comprehending the source text in ST would cost the interpreter more effort than CI and SI (Agrifoglio, 2004;

Shreve et al., 2010). Moreover, interpreter of ST is expected to render the written text into smooth output that sounds like genuine oral speech (Weber, 1990). Therefore, the interpreter must endeavor to use simpler words and transform the written text into sentences more fitting for oral speech, which all add pressure to the interpreting process (楊承淑, 2000).

In fact, the constant existence of the source text in ST may hinder the production of natural, fluent interpretation. When listening, the speech is transient and cannot be reviewed again, so listeners tend to grasp the gist and general meaning of the content; when reading, the words are forever written

in black on white paper, therefore readers would instead focus on remembering the actual words used in the text. Recalling every single word instead of the meaning of the text may be detrimental to ST interpreter, as he would need to allocate extra effort to achieve independence from the source text. Thus the constant presence of the source text may actually become a significant interference in the interpreting process, affecting the grammatical and syntactic structure of the target production (Agrifoglio, 2004; Shreve et al., 2010). In her experiment, Agrifoglio (2004) found that when comparing the output of ST, SI and CI, ST has significantly less meaning errors, but much more expression problems than the other modes of interpretation.

Sight translation, just like all forms of interpretation, is a complicated task that is highly cognitive-demanding. In Gile’s effort model (1995), he divided the act of interpretation into several different “efforts” that are needed to produce adequate interpretation. As the total amount of effort available for use is fixed, the interpreter must constantly allocate his effort to different tasks at hand in order to maintain a fluent and correct interpretation.

Gile described the efforts of ST as:

ST = R (reading and analysis) + P (production)

The “reading effort” replaced the “listening effort” listed under SI and CI.

The “memory effort” was excluded, as Gile believed the constant presence of the source text requires little use of memory in performing ST. The

“coordination effort” present in SI and CI was also left out.

Agrifoglio (2004) argued that both the memory and coordination effort may still be present in ST. In order to ensure smooth delivery, interpreter of

ST must start reformulating and producing while reading. Therefore, even though the source text in ST can always be reviewed during production, the interpreter still needs to store some of the information already read in memory until it can be produced in the target output. Since the “reading and analysis” and “production” efforts overlap with each other, a coordination effort is also required to smoothly manage the efforts. Agrifoglio proposed a modification to Gile’s ST effort model:

ST = R + P + m (slight use of memory) + C (coordination)

As ST is usually not performed alone in the professional interpreting market, ST nowadays serves a more pedagogical purpose, being taught at the early stage of interpretation training to facilitate the interpreter’s performance of CI and SI (Weber, 1990). Training in ST can equip interpreter with faster reading of written text, rapid conversion of information from the source language into the target language, the skill to avoid word-to-word translation and public speaking techniques. Mastering these skills can greatly improve the interpreter’s performance in all modes of interpretation, as he will be able to quickly familiarize himself with the conference materials, and his processing of the source text can also be quickened.

In short, sight translation is a challenging task that demands no less cognitive effort than other interpretation modes. A close examination of ST’s process can reveal important clues about the complex process of interpretation. And thanks to the use of visual text as input, eye tracker can be employed in ST studies to collect eye movement data. These data can be analyzed alongside ST oral production for triangulation, and bring researchers

one step closer to uncovering what goes on inside the interpreter’s brain when they translate.

2.1.2 The interpretation process

The act of interpretation consists of a series of complex processes that can be roughly separated into the comprehension, reformulation and production phases (Macizo & Bajo, 2004; Seleskovitch, 1976). The precise relation between each phase and their coordination has been the interest for research.

Macizo and Bajo (2004) investigated the relationship between the comprehension and reformulation phases, and suggested two possible perspectives: vertical and horizontal.

The vertical perspective is based on Seleskovitch’s “deverbalization theory”

(1976). The theory proposes that during interpretation, comprehension and reformulation are performed sequentially. The interpreter would first process the incoming source text, absorb the meaning and at the same time discard the linguistic form of the source language. After full comprehension of the source text, the interpreter would then reconstruct the obtained information into the interpretation output that is in accordance with the rules of the target language grammar. As access to the target language only begins after comprehension of the source language is completed, there is no link between the source and target language at the lexical and syntactic level. If ST follows the vertical perspective, then reading for translation should be very similar to normal reading, as the act of translation is not yet involved at this stage.

The horizontal perspective instead suggests that the processes of comprehension and reformulation overlap with each other. While the interpreter is still listening/reading the source text, partial code-switching is

already being done alongside comprehension. Under the circumstances, matches between the source and target languages can be done on the lexical and syntactic level, since both languages are accessed at the same time. If ST follows the horizontal perspective, reading for translation would impose more pressure on the interpreter than normal reading because of the added effort for code-switching.

While Macizo and Bajo’s experiment supported the horizontal perspective, issues with experiment design meant that the study could not be taken as a natural, moment-to-moment observation of the interpreting process. Huang (2011) examined the vertical vs. horizontal perspective in her ST experiment using eye tracker. Her findings supported the vertical perspective in ST. In first-pass indices such as first fixation duration, gaze duration and fixation probability, similar results were found for silent reading and ST tasks, suggesting no extra effort is needed for ST during first pass reading as the interpreter is only comprehending the source text. Indices for after first-pass activities, like rereading time and total viewing time, were significantly higher for ST than silent reading, showing that more efforts are needed for reformulation and production in ST after the initial phase of comprehension.

Due to confounding results from interpreting studies and the lack of clear definition, though vertical and horizontal perspectives are important aspects of interpretation studies and could possibly pertain to the current study, they will not be discussed in the following pause analysis.

The overlapping of the comprehension and production phase can be observed not only in SI, where the interpreter needs to simultaneously interpret and listen to the non-stop source text produced by the speaker, but also in ST. Scholars have noted that in order to give a smooth production that

sounds like genuine communication and avoid hesitation and uncomfortable pauses in ST, the interpreter must continue comprehending the upcoming source text while he is still producing the target text (Sampaio, 2007; Weber, 1990). In other words, the interpreter needs to constantly read ahead, meaning his eyes always move ahead of what he is enunciating.

Huang’s experiment (2011) also provided objective evidence for read-ahead in ST. The interpreter was deemed “reading ahead” when his eye fixated on Sentence N+1 for the first time while still orally producing contents in Sentence N. The findings show that the overall read-ahead probability for ST was 72.80%. This means that comprehension in ST is a continuous effort and does overlap with production.

2.2 Pauses

Physically and linguistically speaking, all pauses in spoken languages can be separated into intra-segmental pause and inter-lexical pause.

Intra-segmental pause refers to brief silence within a word, and is usually caused by natural occlusion of the vocal tract during speech. Inter-lexical pause appear between two words, and can occur for various reasons. This kind of pause can be further separated into silent pause and filled pause according to psycholinguistic classification. While silent pause is a perceived silent segment in speech, filled pause is a voiced segment, which can appear in the forms of “drawls, repetitions of utterances, words, syllables, sounds and false starts” (Zellner, 1994). For example, ums and uhs are some commonly seen filled pauses in spoken English. It has been suggested that silent pauses and filled pauses pose different functions in speech. Fox Tree (2002) inferred from her experiment that the occurrence of a filled pause signifies the speaker has

advance knowledge of an on-coming silent period due to production difficulties. Therefore, filled pause would indicate a more severe production problem than silent pause.

Researches on pause function can be approached from the psycholinguistic front or the rhetoric and public speaking side. The psycholinguistic approach deems all forms of disfluencies, including pause, as cognitive items, and the occurrence of these items in speech indicates production problems encountered. Investigating disfluencies would thus provide researchers with clues about the speech production process (Shriberg, 1999). The rhetoric and public speaking approach instead focuses on the communicative functions of pause. The existence of pause in speech is vital as they can be used skillfully by the speaker to emphasize new or important information and organize the structure of discourse (Cecot, 2001). Speech hesitation can play a huge part in social perception, affecting the listener’s perception of the speaker’s competence, social attractiveness and trustworthiness (Greene, 1984).

Cecot (2001) combined previous categorizations of pauses and proposed a comprehensive categorization that covers both psycholinguistic and rhetoric approach of pauses. All forms of disruption in the flow of the speech, referred to as “non-fluencies,” were separated into silent pauses and disfluencies.

Disfluencies included filled pauses and utterance interruptions like repetition and restructuring. Silent pauses were further divided into communicative pauses and non-communicative pauses. Communicative pause covered the rhetoric approach, including segmentation pauses that facilitate the listener’s understanding of discourse syntactic structure, and rhetorical pauses that emphasize the word they precede. Non-communicative pauses basically

referred to hesitation pause that is the focus of the psycholinguistic approach researches. Due to the scope of this research, only silent pauses in output will be selected and analyzed.

Table 1-1. Categorization of pauses

2.2.1 Juncture pause and hesitation pause

Lounsbury (1954) was the first to hint there are two different silent pauses, named juncture pause and hesitation pause, in spontaneous speech.

He defined juncture pause as pauses that appear at boundaries of major syntactic units, or syntactic junctures. These pauses can be fleetingly brief (shorter than 100ms), or exaggeratingly long for emphasis and stylistic effect.

The purpose of producing juncture pauses is to help the listeners structure the sentences in the speech. Hesitation pause tends to appear not at standard linguistic boundaries, but midsentence between two words of low transitional probability. These pauses are usually longer than juncture pauses, and signify that the speaker is thinking on one’s feet, or groping for the right expression.

To the listener’s ear, hesitation pauses would sound like an annoyance.

According to Lounsbury’s definition, juncture pause corresponds to units of decoding, meaning its appearance is solely to facilitate the listeners in

deciphering the speech. Hesitation pause corresponds to units of encoding, as it occurs due to difficulties encountered during the speaker’s speech encoding process.

Boomer and Dittman (1962) supported Lounsbury’s categorization of pause, and reinforced that there are functional differences between juncture pause and hesitation pause. For their experiment, they defined pauses that occur after a terminal juncture as juncture pause, and the rest as hesitation pause. In accordance with Lounsbury’s claim, they claimed that while hesitation pause can occur for various reasons, ranging from transitional probabilities between words to familiarity with the material, juncture pauses are deliberately inserted in the speech to reinforce the preceding juncture, and aid the listener in grasping the syntactic structure of the speech. Their experiment examined the perception threshold for both groups of pauses, and found that juncture pauses were harder to detect by the listeners. While listeners discerned most hesitation pauses at durations above 200ms and almost perfectly detect all at 500ms, juncture pauses couldn’t be properly discerned until they were longer than 500ms.

Early researches had presented juncture and hesitation pauses as two mutually exclusive elements in function, with hesitation pause indicating increase in processing effort and juncture pause signaling syntactic structure for the listeners. But this notion has been debatable. Researchers have argued that juncture pause does not only facilitate the listener in understanding, but also serve a speaker function in granting more time for processing (Rochester, 1973) Goldman-Eisler found through her experiment that pauses at the same grammatical juncture have shorter duration in silent reading than in

在文檔中口譯產出停頓時的認知歷程：以視譯眼動軌跡為證 (頁 14-0)