Eye movement and pause - Eye movement

Chapter 2 Literature Review

2.3 Eye movement

2.3.3 Eye movement and pause

The eye tracker has rarely been employed in studies of spontaneous

speech. This is once again due to the lack of visual material in most speaking tasks, just as the case with interpretation studies. Therefore, the research of eye movement data during pauses in spontaneous speech has yet to be found.

Shreve, Lacruz and Angelone (2011) may have been the first to touch upon eye movement during pauses in interpreting study. The research was claimed to be “the first report on speech disfluencies in sight translation,” in which they examined the effect syntactic complexity has on disfluencies in the output of sight translation.

The experiment data was retrieved from another research on eye movements during ST and written translation by the same experimenters (Shreve et al., 2010). Four Spanish texts were selected as the experiment material. Each text was cut into two paragraphs of similar length, and one paragraph was manipulated to contain a single syntactically complex segment.

For the A Text of each text, the syntactically complex segment was in the second paragraph; for the B text, that segment was in the first paragraph.

Eleven graduate students from translation programs participated in the experiment, and sight translated one A text and one B text into English. Their eye movements during the task and ST output were recorded.

For the 2010 experiment, the experimenters only examined eye movement indices (fixation number, fixation length, regression number, fixation / regression ratio) and their response to syntactic complexity. ST output was recorded but not examined. In the 2011 study, the examiners instead analyzed how syntactic complexity influenced a series of disfluency measures including silent pause, filled pause, repetition and repair / revision.

The findings confirmed the results of the 2010 research that syntactic complexity does put extra pressure on ST production. Within the syntactically

complex area of interest (AOI), there were 11% more disfluent events than the syntactically simple respective segment. Both silent and filled pauses duration were 5% longer in the complex AOI. This corresponded to the eye movement data examined in the 2010 experiment. Mean total viewing time was longer in the complex AOI (27,658ms) than the non-complex one (19,104ms). The complex AOI also contained a higher mean number of fixations (101) than the non-complex one (73).

The experimenters went a step further in comparing eye tracking data to some of the disfluencies discovered in the ST output. They proposed that as longer pause duration indicates increase in cognitive effort, then the occurrence of long pause should be accompanied by groups of eye fixations.

Using the “hot spot” image, they indeed found that during the onset of several long pauses, large groups of eye fixations clustered around places in the ST source text that correspond to the sites of pauses in the oral output.

Though the correlations between eye tracking and disfluency data was not the focus of the 2011 research, this study nevertheless built the basic associate links between disfluency and eye movement indices used in evaluating cognitive effort. It also set the foundation for possible triangulation of research that includes both disfluency and eye movement data in the analyses of sight translation.

Chapter 3 Pause Data Collection and Analyses

3.1 Data source and collection

The data used in this research was taken from Huang’s experiment (2011) on eye movements in sight translation. Eighteen students (age 23 to 40) from translation and interpreting graduate programs in Taiwan participated in the experiment. They all have at least one year of sight translation training prior to the experiment. All participants have Mandarin Chinese as their A language, and English as their B language.

The materials used in the experiment were six Chinese texts of approximately 150 words each. The texts were excerpts from authentic Chinese speech text. No artificial changes were made to the text to include specific linguistic traits. The topics of the speeches were general and required no specific prior knowledge.

Each participant went through three stages in the experiment. In each stage, they were presented one of the speech texts on the computer screen, and told to conduct one of three different tasks: silent reading, read aloud or sight translation. Their eye movements were recorded throughout the tasks.

Oral output of the participant during the read aloud and sight translation stages were also recorded to be compared with their eye movement data in analysis. For this research, only data collected in the sight translation stage were used and examined.

3.2 Data processing

3.2.1 Selection of observation points

Each participant’s sight translation output files were input into the software Audacity, which amplified the sound file to better contrast the speaking and pausing parts. Recordings of participant 18 were discarded as the sound file failed to record the translation output.

By examining the sound wave displayed in Audacity, all pauses (segments where the sound wave was flat) with duration over 200ms were selected. This threshold was used based on the standard auditory threshold for perception of pauses set by Goldman-Eisler (1968). She found that pauses shorter than 200ms could not be easily noticed by the listener’s ears. These short pauses most likely occurred due to physical constrain, and seldom signify cognitive function in speech production.

The onset time and duration of each pause were tagged. The onset time was used to determine the respective single or plural fixation that occurred during pausing. All eye movement data was previously processed during Huang’s experiment, indicating which word or phrase the eye fixated on for each fixation point. This is very important for combined analyses of eye movements later, as certain words / phrases in the source text would be categorized as “region of interest (ROI)” for analyses of relevant and definite fixation locations during pause onset.

Some fixation points were marked as “out of range,” meaning fixation position could not be determined by the eye tracker either because the interpreter blinked or looked at spaces on the screen outside the designated areas of interest. Pauses that contained any “out of range” fixation were excluded from the data set. For this research, 200 observation points were

collected from 11 tasks for further analyses.

Figure 3-1. Selection of pause data in Audacity

3.2.2 Annotated protocol

Transcripts of all the sight translation output were already done by Huang for her experiment. However, for the purpose of this research, using only transcripts was not enough. According to Shreve et al. (2011), transcript represents not the actual oral production of the speaker or interpreter, but what the transcriber perceive and organize while listening to the output. Thus many speech performance properties, such as silent pauses, fillers, repetitions and rephrasing, are not present in transcripts. Shreve et al. proposed that for the purpose of discourse analyses, annotated protocol is needed instead.

Annotated protocol is the complete authentic transcription of the speaker’s output, including every little sound the speaker produced along the way. It also records the length of every silent and filled pause, which renders annotated protocol as a form of “live record” that documents and replicate the exact occurrence of speech production.

For this research, the output audio files of the 11 selected tasks were repeatedly listened to add or revise the transcripts from Huang’s experiments, fleshing them out into annotated protocols. The positions of every pause over 200ms were marked in the transcription with the mark “^”. The mark was followed by the length of the silent pause, and the words or phrases the

interpreter fixated on during that specific pause in order. Since filled pauses are not discussed in this research, they were noted in the annotated protocols, but their length and eye fixation information were not included.

The pauses were then categorized into “juncture pause” and “hesitation pause” based upon the pause’s position in the ST output. Juncture pauses included pauses that occurred at syntactic junctures, namely where punctuations like comma and period were noted in the transcriptions.

Hesitation pause included the rest of the pauses that occurred within sentences.

Past pause researches have suggested that most pauses tend to occur not at sentence boundaries but between syntactic clauses. However, in attempting to categorize pauses according to this criterion, this research encountered the obstacle of differences between English and Chinese grammatical structure.

The syntactic clause boundaries of the Chinese source text may not conform to the boundaries in the English target transcriptions. Such misalignment of boundary position, which is almost not present for sentence boundary, would complicate further analyses of pause and eye fixation position. Therefore, for the benefit of this research, sentence boundary was selected as the categorization criterion for pauses. Among the 200 observation points, 87 of them were juncture pauses, and 113 were hesitation pauses.

The following are a few examples of annotated protocols from the experiment. Pauses marked with the sign “^” after a comma or period in the annotated protocol were juncture pauses, and the rest of the pauses marked between words were hesitation pause. Information provided in the bracket after each pause included the duration of the pause and the word(s) / phrase(s) fixated during the pause in order. Note how repetition of words (“lose losing”)

and other hesitation markers (“um”) were all noted in the annotated protocol.

Ex. 3-1

SOURCE SEGMENT: 通常大家總是在失去健康之後，才會關心健康的問題。只有在感覺疲倦或者生病的時候…

ANNOTATED PROTOCOL: Often ^[469ms, 健康→失去] when after we are lose losing our health,^[311ms, 問題] we will start caring about ^[670ms, 健康→的] our health problems.^[580ms, 只有→疲倦]

Ex. 3-2

SOURCE SEGMENT: 反過來說，個性保守的人，適合風險比較低的投資工具。但也要記得做有一點風險的投資…

ANNOTATED PROTOCOL: On the other hand, if you are conservative person, ^[379ms, 比較 → 風險 ] then it would bet

^[207ms, 比較→風險] be better for you to invest in ^[1855ms, 風險

→比較→工具] um items that are have lower risks.^[825ms, 做→要

→的→記得]

In order to clearly present the data without any confusion, only the pause being discussed will be marked in the annotated protocols of the examples raised in the following paragraphs. Other pauses that occurred within the sentence but not discussed would not be marked. For examples that only examined the oral production data, the fixation information in the annotated protocol would also be temporarily excluded.

3.3 Oral data analyses results

Following the above mentioned categorization criterion, the 200 observation points were separated into 87 juncture pauses, pause that

occurred at places where commas or periods were marked in the transcripts, and 113 hesitation pauses, pause that occurred at other places in the sentences.

Figure 3-2. Categorization of pauses

The result was not in line with that of spontaneous speech, where more than 66% of all pauses tend to appear at syntactic boundaries (Hawkins, 1971).

The result suggests that the cognitive demand for sight translation is different from spontaneous speech, as revealed in their different pausing patterns.

Spontaneous speaking concerns the delivery of one’s though into verbal output in just one language. Sight translation, on the other hand, involves a much more complex process of reforming information from the source language into the target language and coordinating the simultaneous comprehension of source text and oral interpretation. As a result, sight

translation may demand more cognitive effort, which is translated into an increase in the number of hesitation pause, the kind of pause linked with problem-solving and difficulties.

3.3.1 Juncture pause

All the observation points were first analyzed with the oral data only. All the collected juncture pauses occurred at sentence junctures, either after commas or periods in the transcriptions. Examination of oral output showed that these pauses appeared between two complete and fluently-produced segments, and was seldom followed by signs of hesitancy, such as filled pauses, repetitions or rephrasing. Observations of juncture pauses from oral data signify no signs of production difficulties, which is in accordance to juncture pause definition raised by Lounsbury (1954).

The following are two examples of juncture pauses occurring at positions marked with comma and period in both the source text and target transcriptions. In Example 3-3, the interpreter made a brief stop of 676ms after concluding a segment signaled by a comma in the source text, then continued smoothly with the following segment. In Example 3-4, the juncture pause occurred after a segment marked by a period in the source text. The interpreter again paused for 727ms before moving on to begin the next sentence fluently.

Ex. 3-3

SOURCE SEGMENT: 但也要記得做有一點風險的投資，否則要靠投資賺錢恐怕很難了。

ANNOTATED PROTOCOL: But you should also remember to invest in some items that carry a little bit more risk,^[676ms] or it will be hard for you to make money by investing.

Ex. 3-4

SOURCE SEGMENT: 沒有病痛就就代表身體很健康。事實上，…

ANNOTATED PROTOCOL: …it means you are healthy.^[727ms] But in fact,…

Judging from the oral transcriptions alone, these pauses do not signify encountering of difficulties or production errors, as no hesitation signals were present in the transcriptions. Instead, these juncture pauses seem to be concerned with the overall processing of the next segment, which is interpreted for the first time right after the pause. The interpreter is likely planning the upcoming interpretation during juncture pauses so that he can begin the next sentence fluently immediately following the pause. These findings, however, are no more than assumptions based on oral production data only, and require further support from eye movement data.

3.3.2 Hesitation pause

Since hesitation pauses tend to appear mid-sentence and disrupt the flow of speech, the words directly preceding and following the pause should be able to offer some insights into the activities that occurred during pausing. Studies utilizing oral production transcript to investigate the content processed during pauses have based their assumptions on the Main Interruption rule for repairs (Levelt, 1983). The rule states that within a string of speech such as [X

<pause> Y Z] (X, Y, Z being words), the speaker would only detect the lack of available continuation after the vocalization of Element X, thus prompting the necessity for a pause immediately afterwards. The pause signifies the response time needed to produce the needed continuation (Element Y), which is spoken immediately after the pause to resume continuation of speech. Therefore, the pause in the example is to function in terms of Element Y; or the scope of the pause covers Element Y.

Later scholars have argued that perhaps the scope of pauses could cover longer segments, such as both Element Y and Z in the previous example (Chanquoy, Foulin, & Fayol, 1995; Schilperoord, 2002). They propose that when pauses are hierarchically organized into locations between paragraphs, sentences, clauses and constituents, each group of pauses should cover different scopes, though the hypothesized scopes of each kind of pauses are still in need of empirical verification. But there is no doubt that the scope of the pause at least covers the first element vocalized directly after the pause.

Thus for the following analysis of oral transcripts, the content processed by the interpreter during the pause would be judged by the first word / phrases produced after the pause.

Examination of oral production before and after hesitation pauses in this research of Chinese-English sight translation showed several possible reasons for triggering hesitation pause, including lexical transfer and syntactic transfer.

a) Lexical transfer

74 (65%) of the 113 hesitation pauses fall into this category. The interpreted output following the pause seems to indicate that the interpreter

was engaged in lexical selection during pausing; that is, the interpreter was trying to find the appropriate English equivalence for a word or a phrase in the Chinese source text.

For instance, in the following example, the interpreter made a pause of 401ms in the middle of a noun phrase, before producing the word “risk.” By comparing the output transcription with the source text, one could assume that the pause was made because the interpreter could not find the equivalence for the Chinese word “風險” on the spot, and therefore needed the extra time to search for the proper translation in his memory.

Ex. 3-5

SOURCE SEGMENT: 比如積極型的人適合獲利機會高但是風險也高的東西，

ANNOTATED PROTOCOL: …but of course, it’s…it is companied with high ^[401ms] risks

In this example, the interpreter appears to be searching for the translation, as there is only one translation for the Chinese word “風險” in English.

Comparing transcriptions of different interpreters of the same source text showed that there were also Chinese words and phrases that correspond to multiple English translations. Hesitation pauses that occur before these words or phrases may suggest that the interpreter was not only looking for the correct translation during the pause, but also judging which one of the translations is more appropriate for the current context. For instance, the

phrase “不友善” apparently can be translated in several different ways, which triggered the onset of long pauses for the selection of the appropriate equivalence. In the end, interpreter A translated the phrase into “not kind”

while interpreter B translated it into “not nice.”

Ex. 3-6

SOURCE SEGMENT: 如果有哪一個員工對客人不友善，

ANNOTATED PROTOCOL A: If any one of the employees is

^[922ms] not kind to the costumers,

ANNOTATED PROTOCOL B: If one of the employees ^[810ms] is not nice to the customers,

The selection of equivalence sometimes happens after the interpreter had already given a translation of the word or phrase. In Example 3-7, the interpreter gave a completed translated sentence, then paused for 862ms and produced an alternate translation of “car.” From the output, it is clear that the interpreter deemed the first translation inappropriate, thus replaced “car”

with “automobile” to enhance the interpretation quality. Therefore, it may be argued that the interpreter used this hesitation pause to search for a replacement for the word “car.”

Ex. 3-7

SOURCE SEGMENT: 他在裝配時候的心情對這輛車的品質影響不大 ANNOTATED PROTOCOL: …can be of no influence to the quality of that car ^[862ms] that automobile.

b) Syntactic transfer

Unlike Spanish and English, which share relatively similar grammatical and syntactic structure, the syntax of Chinese and English are very much different (Tao, 1996). Therefore, syntactic problems that may cause pausing in Shreve et al. (2011), such as translating relative pronouns and plural nouns into the correct form in English, or turning contrastive syntax in Spanish into appropriate English order, were not spotted in the materials for this research.

在文檔中口譯產出停頓時的認知歷程：以視譯眼動軌跡為證 (頁 46-0)