Participants - 人人都需要 “字幕” 嗎?探討不同學習行為如何影響字幕對英語聽力理解之效度

CHAPTER 3 METHODOLOGY

3.1 Participants

This study included 84 college-level learners of English from a local university in northern Taiwan. All participants were native speakers of Mandarin and had learned English as a foreign language (EFL) in their 12-year formal education. To ensure target language comparability, the participants were required to have a proficiency test (i.e., TOEFL, TOIEC, IELTS, or high-intermediate on GEPT) score that was equivalent to the B2 level of Common European Framework of Reference for Language (CEFR).

Prior to the study, the participants were already familiarized with the functions and appearance of captioning support, for it is widely used in various English learning settings enhanced with multimedia.

3.2 Materials 3.2.1 Video Selection

The video used in this study was obtained from TED (Technology, Entertainment, and Design), a multifunctional platform that provides authentic listening materials. This feature is particularly useful for L2 learners who lack exposure to the target language outside of the classroom, such as the participants in this study. Hence, TED is

considered a popular and suitable learning resource for L2 learners in an EFL setting.

Another educational value of TED provides real-time captioning and oral transcripts.

L2 learners are free to utilize these functions, making their multimedia learning experience more flexible and accessible. In a nutshell, the popularity, suitability, and accessibility of TED made it an ideal database for selecting video materials.

The aforementioned rationale propelled this study to select a TED talk video based on the following three criteria suggested in Mirzaei et al. (2017): (1) linguistic difficulty level (2) accent (3) topic familiarity. First, the linguistic difficulty of the video was compatible with the participants’ actual proficiency level, for employing level-appropriate materials is crucial in generating reliable and valid L2 learning outcome in caption studies (Perez et al., 2013). Videos that contained low frequency words or academic specific terms were avoided to prevent misinterpretation or confusion (Mirzaei et al., 2017). Second, the speaker’s accent in the video is standard American English, for it is the most popular and common accent used in the participants’ English learning experiences. Third, the topic of the video had to be familiar to the participants, for L2 learners’ background knowledge is found crucial in modulating the listening outcome (Bloomfield, Wayland, Rhoades, Blodgett, Linck, & Ross, 2010; Gebhard, 2000)

Adhering to the above three criteria, this study utilized a level-appropriate TED talk

video on successful leadership—a topic closely familiar to the participants—as the viewing material. The speaker provided an analysis of the patterns collected from several successful leaders’ effective communicative behaviors. Along with humor and character, the content of this video was considered captivating and educational for the participants in this study. The total length of the video is roughly 11 minutes, with average speech rate of 4.18 syllables/second. The video was played on a projector screen in a noise-free classroom setting.

3.2.2 Caption Viewing Conditions

Two caption modes used in this study were: full captions and no captioning. When the video was played, half of the participants received a full captioning of the corresponding oral text displayed at the bottom of the screen along with the video content, while the other half only viewed the video without captioning support. The video was played on an online viewing platform called Youtube, which provided the above two captioning options at viewer’s disposal.

3.3 Design

A 2x3x2 factorial design was used in this study to explore the impact of two within-subject independent variables (working memory capacity and modality preferences) on

one dependent variable (listening comprehension scores) under one between-subject independent variable (two caption modes). A randomized block design (see Figure 2) was employed to equally assign participants to the experimental (caption) and control (no caption) groups. By doing this, factor variability was controlled to ensure group

Figure 2. A visual schematization of the grouping procedure

comparability prior to the study. The grouping procedure began with all participants receiving (1) a CRT to measure their preferred input modality and (2) a reading-span task (RST) to determine their working memory capacity. After being assessed, all participants were first distributed into two groups based on their CRT results (visual or auditory learners). Then according to the ascending order (lowest to the highest) of RST scores, a number was assigned to each associated participant (e.g., the participant with

the lowest RST score is number “1”, the second lowest is number “2”…). Once a number was assigned to each participant, the odd numbered were put into the no caption group, and the even numbered were put into the caption group.

As Figure 2 shows, this grouping process helped guarantee no statistical significant difference between the caption and the no caption groups in terms of their RST scores, which in turn, resulted in comparable data for later analysis. In addition, to make inferences about the modulating role working memory capacity played in this study, the participants were dichotomously distributed into two groups (i.e., high and low) according to the 50^th percentile converted from their RST scores. Scores below 50^th percentile were categorized as “low”, while scores above 50^th percentile were categorized as “high”.

Under both experimental and control conditions, the participants received a listening comprehension test to assess their understanding of the viewing materials. A questionnaire was distributed at the end of the study to collect information regarding the participants’ background information, their language learning experiences, and their perceptions on the tasks.

3.4 The Assessment Tasks and Scoring

The instruments that were employed in this study were: (a) the Caption Reliance Test (CRT); (b) a reading-span task (RST); (c) a listening comprehension test; (d) a post-study questionnaire.

3.4.1 The CRT

This study used the CRT to determine participants’ modality preferences in real-time input processing based on two criteria: suitability and practicality. First, although the CRT was originally developed to measure L2 learners’ reliance on captioning (Leveridge & Yang, 2014), it was later modified and used in Liu and Todd’s (2014) study to determine L2 learners’ modality preferences online (i.e., visual and auditory).

Second, the items and length of the CRT can be easily adjusted to different kinds of test takers (e.g., an individual or groups of participants) as well as the testing environments (e.g., laboratory and classrooms). The CRT was, therefore, suitable for the investigation of this study and also highly practical for researchers and teachers with different user purposes.

The CRT consisted of items that involved two successive components: (1) watching a captioned video about a short dialogue between a man and a woman and (2) answering a multiple-choice comprehension question. The videos and questions were

level-appropriate and suitable for the participants in this study. To prescribe modality preference online, the CRT featured congruent and importantly, incongruent items. In the congruent items, the audio texts and captions were identical; in the incongruent items, there was a one-word mismatch between what the participants heard and saw:

As the example shows, participants simultaneously saw “meditation” and heard

“medication” while viewing the captioned video. Under this circumstance, there was no right or wrong answer to the succeeding comprehension question, as both options appeared in the test item below:

Question: what can be the man’s solution to his problem?

(A) Meditation (B) Regulation (C) Medication (D) Recitation

Following Mayer’s CTML, it was assumed that visual learners were more likely to attend heavily to captions and thus gave answers based on what they see. In this respect, they were more likely to choose (A) from the example above. Auditory learners, on the other hand, may be more likely to select (C) for they preferred to rely more on input presented in sound form. To accurately assess preferred modality online, it was imperative that the participants were unaware of the incongruent items during the CRT.

To prevent this phenomenon from happening, this study adopted a 75% congruent and 25% incongruent ratio based on the guideline prescribed by Leveridge and Yang (2013).

The participants’ performance on the incongruent items was the focus of the succeeding analysis to determine their online modality preference.

Based on the scoring policy used in Liu and Todd (2014)—another study that also utilized the CRT—one point was awarded under the “visual modality” for choosing (A), and one point was awarded under the “auditory modality” for choosing (C). After totaling the points for either modality, each participant’s preferred modality (PM) index value was calculated by dividing the visual modality points by the auditory modality points (see Table 1).

The participants relying more on visual input, such as captions, achieved a PM index value larger than 1 (e.g., 7 visual modality points / 5 auditory modality points = PM index value > 1), while those attending more to the audio texts achieved a PM index value of less than 1 (e.g., 3 visual modality points / 9 auditory modality points = PM index value < 1). This categorization, consequently, divided participants into groups of two: (1) visual learners (PM value >1) and (2) auditory learners (PM value >1).

As for the outliers, if the participants chose answers from neither captions nor auditory text (answers other than “medication” or “meditation” from the above example), one point was given under the “non-target” answers. If there were two or more “non-target” points under the ten incongruent items, it suggested that the participants did not consistently choose the appropriate cues—either the auditory or visual (captions) cues—to comprehend the video content. This indicated that the CRT may be too difficult for the associated participants; in this case, their performance data were excluded from the analysis.

To ensure the content validity of the CRT, we have consulted three experts who are professional and experienced in the field of TESOL. Also, British National Corpus (BNC) was used to ensure the difficulty level of the CRT matched with the participants’

L2. In addition, a pilot test revealed a close-to-even ratio between visual and auditory learners, which was almost identical to the ratio obtained from the actual results in this

study. The identical ratio between the pilot and the experimental outcomes showed that the reliability of the CRT was secured.

3.4.2 Reading-Span Task (RST)

Many measures have been used to assess L2 learners’ working memory capacity, but not all of them can provide a complete picture of such a ‘dynamic' memory system.

This system, according to Baddeley (2003), is responsible for storing, rehearsing, and simultaneously processing multiple input in real-time manner. To orchestrate the above dynamic processes, L2 learners have to rely on their executive (attentional) control and rehearsal mechanism, making both aspects of the functioning of working memory highly crucial in determining which measure to choose from.

To date, there are three working memory measures that were designed to tap into the functioning of the executive control and rehearsal mechanisms: counting span, operation span, and reading span tasks. In the current study, the reading span task (RST), originally developed by Daneman and Carpenter (1980), were used in consideration of practicality issues. Unlike the other two working memory assessment tasks, the RST can be easily implemented to a group of students in classroom settings (Turner & Engle, 1989), allowing instructors to efficiently make inferences about their students’ working memory capacity for pedagogical purposes.

Typically administered on a computer, the RST had 12 trials in total. Each trial featured a sentence set consisted of 2-5 sentences. The RST used in this study included 3 trials consisted of 2 sentences, 3 trials consisted of 3 sentences, 3 trials consisted of 4 sentences, and 3 trials consisted of 5 sentences. Corpus of Contemporary American English (COCA) was used to determine the frequency of the sentence words to ensure that the difficulty level of the sentence words was appropriate for the participants in this study. All the sentences were composed of 13-16 words and were therefore, comparable in length (sentence length range: 12-17 words, 20-22 syllables, 55 to 73 letters). The maximum presentation time for each sentence was 6.5 seconds. All sentence sets were randomly presented to the participants on a computer screen to deconfound order effects (Engle, Cantor, & Carullo, 1992).

The first trial of RST began with the participants reading a sentence on the projector screen (e.g., Emily slapped the face of a ridiculous sky). Then a slide consisted of yes or no was presented to them to determine whether the sentence made sense or not (Turner & Engle, 1989). Immediately after responding yes or no, an isolated letter appeared in 1-second delay, while the participants attempted to retain the letter in their minds for later recall. After repeating the above procedure for all the sentences in a given trial, a question mark (See Figure 3) appeared on the screen to prompt the participants’ recall of all the letters presented in a trial.

Figure 3. The procedure of a trial in the RST.

The first trial ended when the participants indicated that they had finished recalling all the letters before proceeding to the second trial. A paper-based instrument was used to collect all of the responses in this study. The entire task (12 trials) took roughly 20 minutes to complete.

To calculate the RST scores, the number of isolated letters recalled across all trials were totaled (one point for each accurate recalled letter) (Friedman & Miyake, 2004;

Turner & Engle, 1989). The final scores allowed the researcher to make inferences about each participant’s working memory capacity, the higher the better. Nevertheless, not all recalled letters were counted and included in the analysis. In order to ensure that

the participants engaged in both storing (remembering isolated letters) and processing (understanding the sentences) – two critical functions of working memory – during the RST, this study adopted an 80% accuracy criterion from Turner and Engle (1989) for the sentence verification. In their study, those with total score below 80% accuracy rate were excluded from the analysis, for this suggested that the participants might be exclusively focusing on the isolated letters without devoting their attentional resources to processing sentences. This criterion, therefore, helped guarantee that the participants’

attention was directed to the processing and the storing components in the task.

To ensure the content validity of the RST, professional suggestions from three experts in the field of TESOL were consulted. Also, British National Corpus (BNC) was used to ensure the difficulty level of the RST matched with the participants’ L2. In addition, a pilot test was administered to establish the appropriate time lapse for sentence presentation and verification.

3.4.3 The Listening Comprehension Test

After watching the video, the participants—irrespective of their video viewing condition (with or without captioning support)—took a listening comprehension test designed based on the listening section of a TOFEL test, i.e., listening to the text provided and answering multiple-choice comprehension questions (see Appendix 2) on

an answer sheet. This comprehension test contained 15 test items to assess participants’

global, local, and inferential understanding of the content. Each item consisted of four options with only one correct answer.

The listening comprehension test was administrated in a classroom setting with the assistance of a computer, a projector, and a screen. The participants heard and read the stem of each comprehension question on the screen. They then read the options on the screen before writing their answers on the answer sheet. Each correct answer was given one point whereas an incorrect answer received no points. A pilot test took place to ensure the validity and reliability of the test items.

3.4.4 Questionnaire

The questionnaire (see Appendix 3) administered at the end of the study aimed to elicit participants’ general experience of watching captioned videos and perception of watching the (captioned) video in this study. Clear instructions were given to ensure that all participants understood the meaning of the five-point Likert scale items.

Two versions of questionnaires were distributed to the no-captioned and full-captioned participants. For the no-full-captioned L2 learners, they were given the first version of questionnaire. In this version, the first section contained 8 items with an aim to collect the participants’ general experience of watching captioned videos in their

everyday lives. The second section contained 4 items to probe into the participants’

video viewing experience in this study, with constructs such as, interest in the task, topic familiarity, level appropriateness of the task, and the degree of mental effort invested to complete the task. For the full-captioned L2 learners, the first section of the given questionnaire was identical to the no-caption counterparts. The second section, however, entailed 10 items to probe into their captioned video viewing experience.

Specifically, the second section included items that attempt to elicit how the presence of captions affected the participants’ processing of the multimodal input. At the end with an invitation to elicit participants’ open-ended comments on their experience during task performance.

3.4.5 Interview

To collect participants’ open-ended comments on their (captioned) experience during task performance, a follow-up interview was conducted after the participants filled out the questionnaires. Two professional experts were in charge of analyzing and interpreting the collected data with an aim to triangulate the questionnaire results.

3.5 Data Collection Procedure

Data collection began with gaining ethical approval, as all participants were asked to sign a consent form after reading the information sheet that explained the overall structure and procedure of the study, the potential benefits of participation, measures taken to guarantee their privacy and confidentiality, and the right to withdraw from the study at any given moment (see Appendix 1). Next, all participants were grouped into two viewing conditions based on their CRT and RST scores (see Figure 2). One week after the CRT and RST test, each group was separately invited to take part in a video viewing session. After viewing the video, a listening comprehension test was administered to assess their understanding of the viewing material. The session ended after the participants completed the post-study questionnaire followed by an interview session.

3.6 Statistical Analysis

STATISTICA 13.0 for Windows was used to generate descriptive statistics for the data in this study. A three-way ANOVA was used to explore the effects of the independent variable (i.e., caption modes) and the moderating variables (i.e., working memory capacity measure and modality preference) on the dependent variable (i.e., listening comprehension scores). The grouping conditions from this study resulted in

sufficient number of participants per-condition to use three-way ANOVA to generate main effects and interactions among variables. Also, ANOVA was able to run data that contained one interval scale variable (i.e., listening comprehension) and three categorical scale variables (i.e., captions modes, working memory capacity, and modality preference). Independent t-tests were used to compare the means of different conditions. The alpha level of all tests was set at p < .05 as the level of significance for this study.

CHAPTER 4 RESULTS

This study was set out to investigate whether working memory capacity and online preferred modality affected the effect of captioning on L2 listening comprehension. The following sections will present the quantitative and qualitative data respectively. To begin with, an overview of the participants’ captioning viewing behaviors will be disclosed throughout the descriptive statistics. Then the three-way ANOVA results will be displayed, along with the results from independent t-tests for post-hoc analysis. The questionnaire data supported by the interviews will be presented at last.

In this study, a total of 84 participants were recruited for the experiment. 12 of them were excluded from later analysis because they reported having already watched the selected (captioned) video. Additionally, they felt that content of the viewing materials was “too easy” for them. None of the above were desirable participant behaviors, which resulted in their absence in later data analysis.

4.1 Quantitative

4.1.1 Descriptive Statistics

Table 2 demonstrates an overview of the participants’ listening comprehension scores vis-à-vis the factor/variable of this study. Although the mean difference between

control (no caption) and experimental conditions (full caption) is small (M diff. = .005), larger variance is found in every condition with captions than without captions. Notably, such variance in the full caption conditions increases when taking individual differences (i.e., online preferred modality, working memory capacity, or both) into account. As such pattern consistently manifests throughout Table 2, it calls for a more fine-grained investigation on the analysis of the potential effects and interactions among the

在文檔中人人都需要 “字幕” 嗎?探討不同學習行為如何影響字幕對英語聽力理解之效度 (頁 37-0)