• 沒有找到結果。

95 undergraduate students at a university in northern Taiwan participated in the study. All the participants were native speakers of Mandarin Chinese and their English proficiency was comparable to the B2 level of the Common European Framework of Reference for Language (CEFR), as determined by their English proficiency test score (72-95 on TOEFL iBT, 785-945 on TOIEC, 5.5-6.5 on IELTS or high-intermediate on GEPT).

3.2 Materials 3.2.1 Video selection

The viewing material was selected from the TED talk website based on the following reasons. TED talk videos contain authentic academic lectures suitable for the undergraduates participating in the study. The delivery of TED talk is also familiar to the participants since it is commonly used as teaching material in their English classes.

From a more technical perspective, TED talk website provides access to accurate oral transcripts and full captioning, making it an ideal material for self-learning, teaching and caption research.

The TED talk used for this study was selected based on the following three criteria:

accent, difficulty level and topic familiarity. First, the speech in the video is delivered in American English since it is the norm used in the participants’ classroom instruction.

This criterion was also included in Mirzaei et al.’s (2017) video selection. Secondly, due to the importance of utilizing level appropriate materials in caption studies (as evidenced in Perez et al., 2014 and Perez et al., 2014), the difficulty level of the selected video ought to match the participants’ proficiency level. Finally, the topic of the selected

video needs to be familiar to the participants since the listeners’ background knowledge is a crucial factor affecting L2 learners’ listening comprehension (Othman & Vanathas, 2017). In addition, videos containing a great deal of technical terms are also avoided as in Mirzaei et al.’s (2017) study to prevent confusion.

By following the three criteria, an eleven-minute level-appropriate TED talk video on successful leadership—a topic closely familiar to undergraduates—was utilized as the viewing material of the study. The video contained 1832 words in total, with 13.09 words per sentence in average. The average speech rate of the video was 4.24 syllables per second.

3.2.2 Caption mode environment

Under each caption condition (i.e., no, full, partial, and real-time captioning), captions appeared at the bottom of the screen. The selected video was shown on the Youtube platform since its captioning service was used to operationalize the partial and real-time caption modes (see Figure 2 for examples). The following subsections include detailed information regarding the sources and production of the three caption presentation modes.

3.2.2.1 Full captioning

A full captioning of the corresponding oral discourse unit was displayed in the bottom area of the screen before the discourse unit was heard—a viewing environment typical of existing captioned video instruction materials (see Figure 2). Once the oral discourse was heard, the captions, along with the video content, was refreshed by the following video content.

3.2.2.2 Partial captioning

In the current study, the partial caption viewing condition was operationalized in

the form of keyword captions (e.g., Guillory, 1998; Perez et al., 2014; Yang & Chang, 2014). In many existing caption research, keyword captions were defined as important words necessary for viewers to understand the main idea of the video content. As for the keyword determination, the study employed professional judgments—a keyword determination procedure also seen in Guillory (1998), Perez et al. (2014) and Behroozizad and Majidi (2015). To this end, five experienced EFL instructors were invited to select the keywords. They were asked to highlight key words for B2-level viewers to comprehend the oral discourse while watching the video. Words selected by more than three EFL instructors were determined as the keywords for each oral discourse unit.

Under this viewing condition, the keyword was displayed in the center of the bottom area of the screen while the word in the oral discourse was heard (see Figure 2).

The keyword quickly disappeared from the screen as soon as the following word in the oral discourse was heard. In the current study, partial captioning was operationalized in the aforementioned viewing procedure for three important reasons:

(1) Comparability concern: This is the most frequent partial captioning operationalization method used in other studies;

(2) Function concern: Partial captioning that takes the form of other possibilities (e.g., key phrases) may present longer texts and incessantly attract the participants’

focal attention to captions, and hence may be functionally distinguishable from full captions;

(3) Practicality concern: This captioning presentation mode can be readily produced using Youtube services free of charge, and can serve as handy, accessible captioning production possibility used by L2 instructors.

3.2.2.3 Real-time captioning

Under this viewing condition, the participants saw the captions being presented to them serially—word by word and from left to right—in temporal alignment with each word in the oral discourse (see Figure 2). This particular captioning was generated using the Youtube platform, the platform on which the video were shown.

Full captioning Partial captioning Real-time captioning Figure 2. Screen shots of the three caption modes operationalized in the study

3.3 Design

This study was an experimental investigation aiming to explore the impact of (1) an intervention (the three caption viewing conditions) and (2) the participants’ modality preference on the participants’ listening comprehension with random assignment. The study used a 4x3 factorial design that included one dependent variable (the participants’

understanding of a captioned video) and two independent variables, including one between-subject variable (caption condition), and one within-subject variable (modality preference). For the purpose of the study, a Caption Reliance Test (CRT; see Section 3.4 for more detail) was administered to the participants to determine their modality preference in real-time input processing. Based on the results of the CRT test, participants of different modality predilection (visual, auditory, balanced) were randomly assigned to one control group: no captioning, and three captioning viewing

conditions: full captioning, partial captioning, and real-time captioning (See Section 3.4 for more detail). Figure 3 visually schematizes the design of this study.

A questionnaire was administered at the end of the experiment to obtain qualitative data of the participants’ perception of the caption viewing experience. Though it is not the major aim of the study, the participants’ feedback helped explain the qualitative results as well as generated insights into future research and practice.

Figure 3. A visualization of the participants’ grouping process

3.4 Instruments

3.4.1 Caption Reliance Test (CRT)

The CRT, which included 41 items in total, was used to determine the participants’

modality predilection in real-time input processing. The format of CRT was based on Leveridge and Yang’s (2013) design. In this study, each CRT item comprised a short video and a multiple-choice comprehension question. Each video presented a short conversation between a man and a woman with visual images, audio discourse, and full caption support. After watching the video, the participants had to answer a question

based on their understanding of the conversation. The videos were level appropriate and comparable to the TED talk video used as the viewing material in the study. The questions were also designed based on the participants’ proficiency level. To examine the appropriateness of the test items, a pilot CRT test was administered.

Importantly, a CRT test, as prescribed by Leveridge and Yang (2013), contains both congruent and incongruent items. In congruent items, the content presented in the audio recording and captions are identical. Other items involve one-word incongruence between the audio and captions with the purpose of determining the participants’

preferred modalities. For example, the participants may hear the following conversation:

Woman: Hey, just curious, what’s something I do that gets on your nerves?

Man: I think sometimes you underestimate your abilities, and that’s what annoys me the most.

However, they would read the captions “I think sometimes you overestimate your abilities, and that’s what annoys me.” Under this circumstance, there is no “correct”

answer to the succeeding comprehension question, yet the participant’s choice would reveal their modality preference. For example, the participants would see the question as follows:

Question: What does the woman do that annoys the man?

(A) Overestimates herself (B) Underestimates herself (C) Overanalyzes herself (D) Undervalues herself

If the participant is a visual learner, option (B) is more likely to be chosen since visual learners tend to rely more on visual information, captions in this case, when viewing multimodal materials. On the other hand, auditory learners are more likely to select option (A). To prevent the participants from becoming aware of the incongruence, only 25% of the CRT items were incongruent while the other 75% were congruent. This proportion was set based on the guidelines given in Leveridge and Yang (2013). The participants’ answers to the incongruent items were used to determine their modality preferences and categorize them into three modality groups2 —visual, auditory, and balanced learners.

As for the analysis of CRT test results, this study adopted the CRT scoring policy utilized in Liu and Todd (2014). When answering an incongruent CRT item, the participants scored one point under “visual modality” if they chose an answer according to the video captions; on the other hand, they scored one point under “auditory modality”

if they selected an answer according to the audio recording. The researcher then calculated a preferred modality (PM) index value by dividing each participant’s visual modality points by one’s auditory modality points (see Table 1). By doing so, a participant relying more on captions (visual input) than the oral discourse would achieve a PM index value larger than 1. For instance, eight visual modality points and two auditory modality points would result in a PM index value of 4 (8/2). On the other hand, a participant attending more to the oral discourse would attain a PM index value smaller than 1. For example, four visual modality points and six auditory points would result in a PM index value of 0.67 (4/6). The participants with a PM index value that equals 1 would be those who do not show particular modality preference (e.g.,

2 The CRT result of the pilot test and that of the current study were similar regarding the proportion of learners in the three modality groups. The visual, auditory, and balanced learners consisted of 44%, 44%, and 12% of the participants in the pilot test, the three modality groups took up 46%, 43%, and 10% of the participants in the current study. Similar proportions of the three modality groups across the two CRTs are suggestive of the reliability of CRT.

Participant #3 in Table 1). Therefore, the PM index value helped categorize the participants into three modality groups, visual (PM value>1), auditory (PM value<1), and balanced (PM value=1).

If the participant chose an answer based on neither captions nor oral discourse, one point was given under “non-target words.” Two or more points under “non-target words”

(e.g., Participant #4 in Table 1) resulted in data exclusion since it suggested the test items were not fully understood by the participants or were too challenging for them—

an issue as seen in other L2 captioning studies.

Table 1. CRT test ballot sheet aforementioned captioning viewing conditions, a set of listening comprehension test was constructed and administered to the participants immediately after the video viewing session. The comprehension test consisted of fifteen multiple choice questions, each of which comprised four options, one correct answer and three distractors. Among the fifteen questions, five of them targeted global understanding, five of them required understanding of details, and the other five focused on inferences.

For each item, the participants first heard and read the stem of each comprehension question on the screen. Then, they read the options on the screen before selecting their answers. This test design was based on the format of TOEFL iBT listening test. The participants’ answers to the comprehension questions were scored dichotomously: a correct answer was given one point while an incorrect answer was given zero point.

3.4.3 Questionnaire

The questionnaire consisted of seven statements aiming to probe participants’

general perception of the captioned-video viewing task. Among the seven statements, one was designed to establish the level appropriateness of the video, three focused on participants’ perception of caption use, and three catered to their self-reported attention allocation while video watching. The questionnaire ended with an invitation to elicit participants’ open-ended comments on their caption viewing experience.

3.5 Procedure of data collection

The experiment consisted of three major phases: (1) CRT, (2) captioned video viewing session, (3) listening comprehension test and questionnaire. In the first phase, the participants took the CRT, which took approximately twenty minutes to complete, and then were categorized into three different types of learners (i.e., visual, auditory, and balanced) based on their CRT results. One week after the CRT test, the participants of different modality preferences were invited to participate in an eleven-minute video viewing session under four caption conditions, i.e., no, full, partial, and real-time captioning. After viewing the video once, the participants spent another fifteen minutes completing the listening comprehension test and filling out the questionnaire. The third phase ended with a semi-structure interview where the researcher asked the participants follow-up questions based on their response on the questionnaire.

3.6 Data analyses

STATISTICA was used as the statistical analysis software in the current study. A two-way ANOVA was conducted to compare the comprehension performance under the four caption conditions (research question 1) and examine the interaction between caption mode and the participants’ modality preference (research question 2). The two sets of independent variables were caption modes and the participants’ modality preferences. The dependent variable was the listening comprehension test score. To further investigate the differences among groups, a Fisher’s Least Significant Difference (LSD) post hoc test was utilized under the conditions where significant difference existed. As for the questionnaire, the analysis of the five-point Likert scale was based on descriptive statistics.

相關文件