字幕呈現方式對偏好不同訊息型態的英語學習者聽力理解之效益研究

全文

(1)國立臺灣師範大學英語學系碩. 士. 論. 文. Master’s Thesis Department of English National Taiwan Normal University. 字幕呈現方式對偏好不同訊息型態的英語學習者聽力理解之效益研究. Comparing the effects of different caption modes on listening comprehension of EFL learners with different modality preferences. 指導教授：劉宇挺博士 Advisor: Dr. Yeu-Ting Liu 研究生：李秉容 Student: Ping-Jung Lee. 中華民國一百零七年四月 April, 2018.

(2) 摘要許多現存研究指出，字幕能有效輔助第二語言學習者的聽力理解，然而，由於人類的認知負荷量有限，字幕出現的時間也相當短暫，並非所有學習者皆能在觀看影片的過程中，全程專注於字幕，再者，由於字幕影片會同時呈現多種型態(modality) 的訊息，學習者對訊息型態的偏好，容易影響其在觀看時的注意力配置；為了讓字幕及字幕影片的教學價值發揮到極致，須先了解學習者對訊息型態的偏好如何影響他們處理字幕的過程及對內容理解的程度。本研究旨在探究不同字幕呈現方式 (全字幕、部分字幕、即時字幕)對偏好不同訊息型態的英語學習者(視覺型、聽覺型) 的聽力理解之效益。本研究的受試者為 95 位台灣中高級程度的英語學習者，首先，受試者須進行「字幕依賴程度測驗」，以了解受試者在處理即時訊息時，較偏好何種訊息型態，研究者根據測驗結果將受試者區分為視覺型及聽覺型的學習者；接著，受試者被隨機分配到四個組別，分別在四種不同字幕的輔助下(無字幕、全字幕、部分字幕、即時字幕)觀賞影片並完成理解測驗。研究結果顯示，字幕呈現方式雖然沒有對受試者的理解表現造成顯著差異，但是若將學習者對訊息型態的偏好納入考量，差異便呈顯著：聽覺型學習者在部分字幕的輔助下表現最佳，在全字幕的輔助下表現最差，反之，視覺型學習者在全字幕的輔助下表現最佳，在部分字幕的輔助下表現最差，這項結果顯示，每一種字幕對聽力理解的效益，會因學習者對訊息型態的偏好而有所不同。本研究結果凸顯第二語言學習者個別差異之重要性，並鼓勵教學者在使用字幕影片做為教材時，了解學習者對訊息型態的偏好，並適度融入差異化教學，以提升學習成效。. 關鍵字：字幕呈現方式、訊息型態偏好、第二語言聽力理解. i.

(3) ABSTRACT Existing research has established captions as effective second-language (L2) listening comprehension aids. However, due to the limit of human’s cognitive load and the transient nature of captions, not all L2 learners are capable of attending to captions in all cases. To optimize the pedagogical value of captions and to use captioned videos—a multimodal materials frequently used in the L2 classrooms—for the benefit of L2 learners, it is important to understand how L2 learners’ preferred modality (i.e., visual or auditory) affects their processing of captions and resulting comprehension outcomes. The current study was set out to investigate the effects of different caption modes (full vs. partial vs. real-time) on the L2 listening comprehension of 95 high-intermediate Taiwanese EFL learners with different modality preferences. All participants received the Caption Reliance Test (CRT)—an instrument used in L2 research to determine learners’ modality preference in real-time L2 processing—and were categorized into visual and auditory L2 learners based on their test results. Next, the participants were randomly assigned to four video caption viewing conditions (no, full, partial, and real-time captioning) and completed an exit comprehension test. The results showed no significant difference between the participants’ performance under four caption conditions when L2 learners’ modality preference was not considered. However, when this was considered, the difference became salient, which was suggestive of the selective effect of captions on L2 learners of different modality preferences. While auditory L2 learners performed the best under the partial caption condition and the worst under the full caption condition, visual L2 learners scored the highest under the full caption condition yet the lowest under the partial caption condition. The finding underscores the importance of considering L2 learners’ processing profiles when utilizing captioned video as instructional materials and utilizing differentiated video materials for optimal learning outcomes.. Key words: caption mode, modality preference, L2 listening comprehension. ii.

(4) ACKNOWLEDGEMENT Thesis writing is undeniably a journey that requires dedication and perseverance. Were it not for the love and support I have been granted along the way, the completion of this study would not have been possible. My most profound gratitude goes to my advisor, Professor Yeu-Ting Liu, who guided me through this academic adventure with professional instruction and constant encouragement. He was the one who recognized my potential in the first place and gave me the courage to challenge myself with high standard and ambitious goals. I am grateful for his inspiring input and constructive feedback, which not only helped me accomplish this mission, but fostered my growth in critical thinking, research design and academic writing. More importantly, the emotional support he provided was the indispensable source of strength that helped me overcome difficulties and continue to march on. His dedication to his work and his students deeply inspired me to aim high and to become a giving person with love, patience, and wisdom. My next sincere appreciation goes to my committee members, Professor Chun-chieh Tseng from NTNU and Professor Wen-ta Tseng from NTUST. I am grateful for their time invested and their efforts made in reviewing my thesis. Their insightful comments and suggestions helped improve and refine my work in all aspects. Their kind recognition of my achievement meant a lot to me. I am also indebted to the professors who offered their support in the participant recruitment, Professor Yeu-Ting Liu, Professor Chun-chieh Tseng, Professor Hsi-chin Chu, Professor Yu-chuan Shao, Professor Hui-hua Wang, Professor Hsi-yao Su, Professor Pei-ching Chang, Professor Mei-chen Wu, Professor Li-hsin Ning, and Professor Justin Prystash. They were kind enough to allow me into their classrooms to promote my thesis experiment. Without their generosity, the prospective numbers of participants could never. iii.

(5) have been reached within weeks. Special thanks also go to all the participants who took part in the experiment. Thanks to their contribution and cooperation, I was able to collect sufficient data where valuable insights could be generated. In addition, my thesis adventure would not have been so rewarding without the support and assistance from my dear friends. First, I would like to express my greatest gratitude to my thesis partner, Emily Kam, who accompanied me through all the ups and downs as we managed to sail across the stormy sea of academic research. I cherished the mutual trust and respect established in our friendship and partnership as well as the days and nights we spent together pursuing perfection. Deepest gratitude also goes to Emily’s father, who allowed us to have a whole-day discussion in his laboratory, and her father’s student, Ming-feng Chuang, who selflessly provided technical support and expert advice regarding the statistical analysis. Special thanks to Eric Ku, Winnie Chiu, Jojo Tang, and Edward Chin for their assistance in videotaping CRT videos and selecting keywords for partial captioning. The valuable time and effort they devoted gave birth to the research materials that help shed light on the topic investigated. I especially want to express my gratitude to my friend Edward, for he has been a supportive cheerleader who listened to my anxiety in moments of self-doubts and constantly reminded me to have faith in myself. Finally, I owe a debt of gratitude to my family, especially my mother, whose unconditional love and support enabled me to accomplish my graduate study without worries. She has been the greatest blessing in my life and it is to her that I would like to dedicate my thesis.. iv.

(6) TABLE OF CONTENTS 摘要................................................................................................................................. i ABSTRACT ..................................................................................................................ii ACKNOWLEDGEMENT ......................................................................................... iii TABLE OF CONTENTS ............................................................................................ v LIST OF TABLES AND FIGURES .........................................................................vii CHAPTER ONE: INTRODUCTION ........................................................................ 1 CHAPTER TWO: LITERATURE REVIEW ........................................................... 4 Listening as a multimodal process ............................................................................. 4 Theoretical underpinnings of using captions to assist L2 listening ........................... 5 Captions and L2 listening .......................................................................................... 7 Full captioning and L2 listening ............................................................................ 8 Relative effects of partial and full captioning on L2 listening ............................. 11 Real-time captioning and L2 listening.................................................................. 15 The role of modality preference in viewing L2 captioned videos ............................ 18 Summary ................................................................................................................... 20 CHAPTER THREE: METHODOLOGY ............................................................... 23 Participants .............................................................................................................. 23 Materials .................................................................................................................. 23 Video selection ...................................................................................................... 23 Caption mode environment ................................................................................... 24 Full captioning ................................................................................................. 24 Partial captioning ............................................................................................ 24 Real-time captioning ........................................................................................ 26 Design....................................................................................................................... 26 Instruments ............................................................................................................... 27. v.

(7) Caption Reliance Test (CRT)................................................................................ 27 Listening comprehension test ............................................................................... 30 Questionnaire ....................................................................................................... 31 Procedure of data collection .................................................................................... 31 Data analyses ........................................................................................................... 32 CHAPTER FOUR: RESULTS ................................................................................. 33 Descriptive statistics ................................................................................................ 34 Two-way ANOVA analysis ....................................................................................... 35 Post-hoc test results ................................................................................................. 36 Questionnaire results ............................................................................................... 38 CHAPTER FIVE: DISCUSSION............................................................................. 41 Research Question 1: Overall effects of different caption modes ............................ 41 Research Question 2: Effects of L2 learners’ modality preference under different caption modes ........................................................................................................... 43 Effects of different caption modes for visual L2 learners ..................................... 44 Effects of different caption modes for auditory L2 learners ................................. 46 CHAPTER SIX: CONCLUSION ............................................................................. 50 Summary of the major findings ................................................................................ 50 Pedagogical implications ......................................................................................... 51 Differentiated caption support is warranted and should be implemented based on L2 learners’ modality preference. ........................................................................ 51 CRT serves as a feasible tool to determine L2 learners’ modality preferences in L2 classroom and self-learning scenarios. ........................................................... 54 Limitations and suggestions for future research ...................................................... 55 REFERENCES ........................................................................................................... 57. vi.

(8) LIST OF TABLES AND FIGURES TABLES Table 1. CRT test ballot sheet ...................................................................................... 30 Table 2. Means and standard deviations on listening comprehension ........................ 34 Table 3. Two-way ANOVA analysis results ................................................................. 36 Table 4. Fisher’s LSD post hoc test results ...................................................................... 37 Table 5. Descriptive statistics of the five-point Likert scale items .............................. 39. FIGURES Figure 1. Screen shots of a TED talk with Youtube self-generated real-time captioning ...................................................................................................................................... 16 Figure 2. Screen shots of the three caption modes operationalized in the study......... 26 Figure 3. A visualization of the participants’ grouping process ................................. 27 Figure 4. Listening comprehension test means of visual and auditory learners under four caption conditions ................................................................................................ 34 Figure 5. Visual schematization of between-group comparisons between visual and auditory L2 learners .................................................................................................... 37 Figure 6. Visual schematization of respective within-group comparisons under the four caption conditions .............................................................................................. 388. vii.

(9) CHAPTER ONE: INTRODUCTION Scholars have found that listening in the first language (L1) is a multimodal process where various cues are automatically and efficiently drawn upon to interpret the intended meaning of an oral discourse (Vigliocco, Perniss, & Vinson, 2014). This multimodal listening process, however, is not automatic in the second language (L2) due to L2 learners’ immature oral decoding skills. In light of this, many L2 instructors and learners often draw on captioned videos—videos that are appended with the transcription of corresponding oral discourse—to help perceive word boundaries, parse running speech segments, and hence better understand L2 oral discourse (Danan, 2004). To gain insight into the pedagogical potency of captioned videos for various aspects of L2 listening, many empirical studies have been undertaken and established the efficacy of captioned videos in assisting L2 listening (BavaHarji, Alavi, & Letchumanan, 2014; Garza, 1991; Huang & Eskey, 1999; Perez, Van Den Noortgate, & Desmet, 2013; Winke, Gass, & Sydorenko, 2010). For instance, Winke et al.’s (2010) study revealed that captioned video led to significantly higher comprehension test scores than noncaptioned video, demonstrating the beneficial role of captions in making sense of multimodal input for foreign language learners. Notwithstanding, it is important to note that the benefit of captioning is not ubiquitous. Due to the transient nature of captions, L2 learners, especially those whose L2 oral decoding skills are not yet automatic, might not attend to captions in all cases (Kruger & Steyn, 2014; Winke, Gass, & Sydorenko, 2013). Specifically, captions are mostly presented in full captioning, which provides L2 learners with synchronized transcription of the complete corresponding oral discourse. However, due to the limit of their cognitive load and the transient nature of captions on the screen, L2 learners may selectively attend to only parts of full captions—a scenario reminiscent of L2 1.

(10) learners’ nonlinear reading behaviors (Bolter, 2001; Rayner, 1998). If L2 learners do not consistently attend to full captions, the advantage of using them to aid comprehension of multimodal materials would be compromised (Kruger & Steyn, 2014). One possible way to optimize L2 learners’ attention allocation to captions is to manipulate the presentation of captions. Two alternatives to full captioning – partial and real-time captioning – have been developed and are seen in recent video materials and applications (e.g., Live Caption by Gut Reaction), some of which are specifically designed for language learning purposes (see Mirzaei, Meshgi, Akita, & Kawahara, 2017). In partial captioning, only parts of the corresponding oral discourse are presented. Due to the ‘selective’ nature of partial captioning, L2 learners might have a better chance to attend to the presented visual text (Guillory, 1998). Next, real-time captioning, unlike full captioning where oral and visual discourse are presented all at once, displays the oral discourse in an incremental and verbatim manner. Both partial and real-time captioning, despite their potential differences in text-to-speech alignment, aim to redirect and hence optimize L2 learners’ attentional allocation. Such redirection of attention is believed to increase the likelihood of L2 learners making better use of captions as comprehension aids, thereby amplifying the effects of captioning (Kruger & Steyn, 2014). However, empirical evidence is warranted to establish the (relative) efficacy of the two caption presentations in aiding L2 learners’ listening comprehension. Despite the potential of caption manipulation to alter L2 learners’ viewing behaviors, whether such alteration is positive or negative to L2 comprehension might depend on one important variable—L2 learners’ modality preferences/predilections1 in real-time input processing (i.e., visual, auditory, or balanced). It is conceivable that L2 The two terms, modality preference and modality predilection, are used interchangeably in the current study. Both of them refer to the modality each L2 learner is inclined to focus on more when presented with multimodal materials. 1. 2.

(11) learners with different modality predilections (e.g., visual vs. auditory learners) might exhibit different viewing behaviors when attending to different caption modes. For instance, captions may not be consistently attended to if visual input is not their preferred modality. Therefore, it is important to take L2 learners’ modality preferences into consideration when investigating the pedagogical value of different caption modes. Given that captioned videos are often utilized in both L2 classroom and in L2 selflearning scenarios to assist L2 comprehension, whether differentiated caption instruction is needed for L2 learners with different profiles could be rather crucial for L2 instructors and learners alike. In order for L2 instructors and developers of L2 selflearning platforms/software to provide caption support more effectively, the investigation on how L2 learners’ modality preferences affect L2 comprehensions under different caption conditions is warranted. However, caption presentation mode and L2 learners’ modality preferences have not been simultaneously considered in existing L2 caption research. In light of this, the current study investigated the effects of three caption modes— full, partial, and real-time captioning—compared to a no caption control group, on the comprehension of L2 learners with different modality preferences—visual, auditory, and balanced. The study aimed to shed light on the optimal caption presentation for L2 learners with different processing preferences. The research questions are as follows: 1.. Do caption presentation modes (i.e., no, full, partial, and real-time captioning) affect L2 learners’ comprehension? If so, to what extent?. 2.. Do L2 learners’ modality preferences (i.e., visual, auditory, and balanced) affect their comprehension in four caption modes respectively? If so, to what extent?. 3.

(12) CHAPTER TWO: LITERATURE REVIEW 2.1 Listening as a multimodal process Multimodality, as stated in Kress (2010), is “the normal state of human communication” (p. 1), which entails accessing information through diverse semiotic modes such as text, image, gesture, etc. Listening, which serves as a fundamental element in human communication, is no exception. Whether in L1 or L2, listening involves multimodal input processing, which refers to information processing through multiple modalities (i.e., auditory input of speech, facial expressions, and manual gestures) (Vigliocco et al., 2014). This multimodal processing requires listeners to decode different sensory representations, which are then mapped onto their knowledge stored in the long-term memory to extract meaning from utterance (Winke et al., 2010). For L2 learners, however, this process is not automatic owing to their unsophisticated L2 oral decoding skills (Vandergrift, 2004; Vanderplank, 2010). To facilitate L2 learners’ listening comprehension, instructors have appealed to multimedia materials such as audio books and captioned videos to enhance their multimodal L2 listening experience. Among various kinds of multimedia instructional tools, captions have been empirically established as powerful L2 listening comprehension aids (e.g., Garza, 1991; Perez et al., 2013; Winke et al., 2010). However, whether and how these multimedia materials should be implemented has been a matter of concern for CALL researchers (e.g., Guichon & McLornan, 2009; Kozan, Erçetin, & Richardson, 2015) since multimodality input might not benefit all L2 learners alike (Mayer, 2005). As in the case of captions, its transient display of textual support and the limitation of human processing are likely to result in different captions viewing behaviors among L2 learners, which would thereby modulate the effects of captions as L2 listening aids (Kruger & Steyn, 2014). Therefore, the 4.

(13) exploration of how alternative caption presentations and L2 learners’ individual differences in real-time processing influence the effects of captioned videos is essential for both researchers and practitioners when incorporating multimodality in L2 listening training. To contextualize the current study, the following section will provide theoretical background of using captions as L2 listening aids by referring to multimodality-related belief and Mayer’s (2001, 2005) cognitive theory of multimedia learning. The assumptions proposed by Mayer will also provide support for the investigation of the two target factors (i.e., caption modes and L2 learners’ modality preferences).. 2.2 Theoretical underpinnings of using captions to assist L2 listening The rationale of adding captions to enhance L2 listening has its roots in the prevailing belief that more cognitively challenging tasks lead to more effective listening (Robinson & Gilabert, 2007; Robinson, 2001). Incorporating multiple modalities into L2 listening materials is, inter alia, an effective way to render a listening task cognitively challenging and enhance L2 learners’ listening development (Guichon & Mcloran, 2008; Winke et al., 2010). Due to the limited capacity of L2 learners’ cognitive load, L2 materials presented through multiple modality channels require more attentional resources to comprehend, thus resulting in greater depth of information processing. In this vein, captioned videos have been extensively used in various L2 listening activities with the purpose of adding textual support to the originally dualmodality input (i.e., auditory input of oral discourse and visual images) for enhancing and developing L2 listening outcome. The beneficial role of multimodal materials in facilitating L2 listening is also advocated by Mayer’s (2001, 2005) cognitive theory of multimedia learning (CTML). CTML was proposed upon the belief that multimedia instructions designed based on 5.

(14) humans’ cognitive processing are more likely to generate meaningful learning. In this regard, the multichannel input presented in captioned videos, whose comprehension requires L2 learners’ multimodal processing, could potentially enhance L2 multimodal listening. Furthermore, Mayer specified three assumptions regarding human information processing system—dual channels, limited capacity, and active processing assumption, the first of which lends support to captions as comprehension aids. The dual channels assumption suggests that simultaneous processing of both verbal (i.e., spoken or written words) and nonverbal input (such as pictures or videos) leads to better learning outcomes compared to processing of either input alone. According to Mayer, when L2 learners are presented with input from both channels concurrently, they could construct more coherent and elaborate mental representations and integrate them with prior knowledge more effectively, which thus leads to greater depth of processing. On the other hand, the remaining two assumptions posited by Mayer (2001, 2005) (i.e., limited capacity and active processing assumptions) help pinpoint the limitation of traditional full captioning and the significance of considering L2 learners’ modality preferences in caption research. Limited capacity assumption indicates that human could only process certain amount of information in each channel at one time due to the finite capacity of working memory. Under this circumstance, L2 learners are likely to process only parts of full captioning, which, as mentioned in Chapter one, might jeopardize the advantages of captions as L2 listening facilitative tools. This assumption thus speaks to the need of different caption presentations as a means to redirect L2 learners’ attention to captions, which is believed to optimize their learning outcome. Next, active processing assumption emphasizes the active role of our minds in information processing. Instead of passively receiving verbal or nonverbal input, Mayer (2001, 2005) argues that human actively select and organize incoming materials before integrating them with previously stored knowledge to achieve meaningful learning. 6.

(15) During this process, the selection of input could differ significantly between individuals based on their input processing profiles, such as their working memory capacity and modality preference. In the case of viewing multimodal materials such as captioned videos, L2 leaners’ attention allocation would vary specifically according to L2 learners’ modality preferences (i.e., visual, auditory, and balanced). Consequently, as mentioned in Chapter one, L2 learners’ modality preference might be a crucial factor determining whether captions enhance or impair L2 listening comprehension. Accordingly, the pedagogical value of captioned video in L2 listening is theoretically established upon the belief that multimodal input invokes deeper processing and meaningful learning. However, Mayer’s (2001, 2005) CTML stipulates that the pedagogical potency of caption video materials may be modulated by different caption presentation modes and how L2 learners with different modality preferences react to them. The upcoming two sections will provide empirical background of these two factors respectively.. 2.3 Captions and L2 listening The advantages of utilizing captions to assist L2 listening have been widely acknowledged in L2 acquisition and CALL research (e.g., Garza, 1991; Guichon & Mcloran, 2008; Hayati & Mohmedi, 2011; Huang & Eskey, 1999; Markham, Peter, & McCarthy, 2001; Perez et al., 2013; Winke et al., 2010). Empirical research has established that captions in audiovisual materials can help L2 learners better parse the aural input through phonological visualization (Vanderplank, 1988; Guichon & Mcloran, 2008) and offer scaffolding for L2 learners’ comprehension when encountering linguistic elements beyond their current level (Danan, 2004). In terms of affective factors, captions have been found to enhance L2 learners’ learning motivation and reduce anxiety when missing certain parts of the audio text (Almeida & Costa, 7.

(16) 2014; Vanderplank, 2010). Most importantly, the multisensory presentations and multimodal input provided by captioned videos have been proved to increase the depth of L2 learners’ listening comprehension (Danan, 2004; Winke et al., 2010). During the past decade, with the advance in educational technology, videos with various caption presentation methods have been used to enhance L2 listening comprehension outcomes. Among them, full captioning, partial captioning, and realtime captioning are the three most popular research and practice options. While full captioning has been the primary focus of existing L2 caption research and practice, partial and real-time captioning have only started to receive attention from practitioners. Although some studies have been set out to examine the relative efficacy of full captioning and partial captioning (e.g., Guillory, 1998; Mirzaei et al., 2017; Perez et al., 2014), no empirical evidence is available to shed light on whether the current display possibilities (full, partial, and real-time captioning) would benefit L2 learners of different input processing profiles to the same extent. To contextualize the current study, the following subsections will introduce these three caption modes respectively, along with reviews of related L2 studies.. 2.3.1 Full captioning and L2 listening A number of existing studies comparing full captions with no captions have generally demonstrated positive effects of full captioning on L2 learners’ comprehension of audiovisual materials. Garza (1991), being one of the earliest canonical studies, evaluated the use of captioned video on advanced L2 learners of Russia and English and found that the caption group significantly outperformed their counterparts in the content-based comprehension test. Huang and Eskey’s (1999) study generated similar results, suggesting that intermediate ESL learners who watched television series with full captions scored higher in the listening comprehension test. 8.

(17) More recently, in Winke et al.’s (2010) study, a group of second- and fourth-year Spanish learners were asked to watch three selected videos twice with or without full captions. The results showed that the full caption group scored significantly higher than the no caption group in the multiple-choice comprehension tests. This finding, according to Winke et al. (2010), underscores the importance of multi-modality input in L2 listening since different input modes would reinforce one another and thus increase the depth of processing and facilitate comprehension. Another line of research where scholars compared L1 subtitles and L2 captions with no caption also supports the use of full captions to facilitate L2 listening. Markham et al. (2001) divided 169 intermediate Spanish learners into three groups—full subtitles, full captions, and no-caption group—and had them complete a written summary along with a comprehension test after watching the assigned video. The result indicates that both full subtitles and captions groups significantly outperformed the no-caption group. Such finding was later replicated in Guichon and Mcloran’s (2008) and Hayati and Mohmedi’s (2011) research, both of which included intermediate L2 learners as participants and adopted similar experimental design with Markham et al.’s (2001). With the aim to summarize primary L2 caption research, Perez et al. (2013) conducted a meta-analysis of fifteen full captions studies on L2 listening comprehension, where test type and proficiency level were determined as two potential moderators. The analysis revealed a large effect (Hedges’ g = 0.99) of full captioning on general L2 comprehension. In terms of moderators, test type was discovered to decrease the effect size; while significant effect of caption was found when receptive test was administered, non-significant effect was generated when productive test was utilized. On the other hand, proficiency level did not significantly influence the effects of captions on L2 listening comprehension. This meta-analysis demonstrates the overall positive results of the existing full captions studies on L2 listening. In a nutshell, it can 9.

(18) be concluded that the addition of captions as one of the multimodal input channels could improve L2 learners’ listening comprehension of audiovisual materials. Despite the empirically established pedagogical potency of captions as L2 comprehension aids, the efficacy of (full) captions may not facilitate L2 learners of different profiles (Lwo & Lin, 2012; Taylor, 2005). Low and Lin’s (2012) study is a case in point. In this study, L2 learners of different proficiency profiles were exposed to four viewing conditions: Chinese captions, English captions, Chinese plus English captions, and no caption. L2 learners of lower proficiency were found to have satisfactory comprehension outcomes only under the English captions or Chinese plus English captions conditions. That is, the four captioning viewing conditions exerted selective effects on their comprehension. However, such a selective effect was not observed in more advanced L2 learners. In light of this finding, Low and Lin urged scholars and practitioners to consider and explore various captioning (and subtitling) presentation possibilities to cater to the needs of learners who may differ from each other in terms of input processing profiles. Specifically, when processing static texts, L2 learners are inclined to electively focus on certain words depending on word length, contextual constraint, or frequency of use (Rayner, 1998). This processing behavior is even more selective and changeable when viewing captioned video due to the fleeting nature of captions (Bisson, Heuven, Conklin, & Tunney, 2014; Winke et al., 2013). In this case, L2 learners’ idiosyncratic modality preferences in real-time input processing (e.g., visual- vs. auditory-based predilection) would further dictate the extent to which L2 learners may attend to captions (in part or in full) while watching videos or video-learning materials (Mayer & Moreno, 2003; Taylor, 2005). Consequently, L2 learners’ caption viewing behaviors would vary based on their individual processing differences and predilections. Nonetheless, existing research seldom probed into the effects of input processing 10.

(19) predilections on the efficacy of full captions. Kruger and Steyn’s (2014) eye-tracking study is the only example so far. In the study, the researchers conducted a two-week experiment and discovered that those who preferred to read captions more fully tended to score higher in the comprehension posttest. While this finding underscores the importance of focal attention to captions in understanding video content, it also shows that constant attention allocation to full captions is a challenge to L2 learners. Accordingly, other tools that could redirect L2 viewers’ attention, such as partial captions and real-time captions, have gradually gained momentum in L2 learning contexts.. 2.3.2 Relative effects of partial and full captioning on L2 listening When operationalizing the partial captioning viewing condition, researchers generally produced keyword captions, where only a selective set of words from the transcription are presented. Existing partial captioning studies typically established the efficacy of partial captions on L2 listening through the comparison between full and keyword captions. Albeit limited in number, this line of research has demonstrated inconclusive results regarding the effects of partial captions on L2 listening comprehension. Guillory (1998) assigned 202 college French beginners to watch videos under three caption modes—full, keyword, and no captions—before completing a short answer comprehension test. The keywords, defined as words essential to the main idea of the video, were selected by a group of French native speakers. The result reveals no significant difference between full and keyword caption groups, which was interpreted by the researcher as a support for keyword captioning. Her explanation was that keyword captions led to the same level of L2 comprehension as full captions with fewer words being presented; moreover, keyword captions could potentially reorient L2 11.

(20) learners’ attention, prevent cognitive overload and encourage more listening and less reading. Guillory’s (1998) finding was later confirmed in Mirzaei et al.’s (2017) study, where Mirzaei et al. once again found a lack of significant difference between full and partial caption groups. The study reveals that partial and full captions led to equally well L2 comprehension performance, with partial captions providing less than 30% of the textual support. Analogous to Guillory (1998), Mirzaei et al. (2017) proposed a view in favor of partial captioning. Specifically, the observation that the more limited textual support under the partial caption condition could lead to comprehension outcome comparable to that under the full caption condition suggests the efficacy of partial captioning. Mirzaei et al. thus concluded that partial captioning can be implemented as an effective tool which helps decrease L2 learners’ reliance on captions while preparing them for real-life L2 listening. Nonetheless, a counter argument could be made under this circumstance since the production of partial captioning is more time-consuming and requires more technical support than full captioning. Perez et al. (2014) conducted a similar experiment on 226 intermediate French learners yet generated dissimilar results. They found that while full captions are significantly more beneficial to learners than keyword or no captions in global understanding, no significant difference was discovered between the three groups in detailed understanding. However, the comprehension scores under the three caption conditions were rather low, indicating that the video material might be too challenging for the participants, which was likely to diminish the difference between and among the three caption groups. A few months later, the researchers published another study (Perez, Peters, Clarebout, & Desmet, 2014) comparing the effects of four caption modes—full, keyword, no captions, and full captions with highlighted keywords. The result suggests that caption modes made no difference in L2 learners’ comprehension performance. 12.

(21) The researchers explained that the test items were not challenging enough to the participants, which, as in their former research, reduced the difference between the four caption groups. The limitation observed in these two studies addresses the importance of using level-appropriate materials and instruments in caption research. Yang and Chang (2014) also investigated the effects of different caption modes, including full captioning, keyword captioning, and annotated keyword captioning, on 44 EFL learners’ overall listening comprehension and reduced forms learning. Reduced forms refer to the phonological variations appearing in authentic speech such as assimilation (e.g., skirt) and liaison (e.g., gonna, gotta). While the set of keywords in this study referred to word and phrases uttered with reduced forms, the annotated keyword captions contained pictorial symbols providing reduced forms instruction. The results showed that all three caption groups exhibited improvement in listening comprehension from pretest to posttest, with annotated keyword caption group obtaining the highest mean scores and full caption the lowest. Their finding showcases the positive impact of partial captioning on L2 learners’ listening comprehension in comparison with full captioning. Although their operationalization of partial (i.e., keyword) captioning was different from prior studies, Yang and Chang (2014) shed light on the potential of partial captioning as a more effective alternative of full captioning. Different from the preceding cross-sectional studies, Behroozizad and Majidi (2015) adopted a five-week diachronic treatment design. The study also distinguished itself from others by including the listening part of a standardized test as instrument to investigate the improvement of intermediate L2 learners’ comprehension ability over time. Among the three caption conditions (i.e., full, keyword, no captions), both the keyword and full caption groups improved significantly from pretest to posttest, with the latter achieving higher mean score in the posttest. Although the result favors full 13.

(22) captions more than keyword captions, it targets on the development of L2 listening ability instead of content-based L2 comprehension performance, which is the major concern of the current study. Aside from L2 partial captioning, the effects of L1 plus L2 partial captioning was also compared with that of L1 plus L2 full captioning in Hsu, Hwang, Chang and Chang’s (2013) study. The researchers assigned three classes of elementary school lowproficient L2 learners to watch videos on handheld devices with three different caption modes: full Chinese plus English captions, keyword Chinese plus English captions, and no captions. After four times of video watching in four weeks, all three groups achieved significant improvement in L2 listening comprehension from pretest to posttest. On the other hand, no significant difference was found between groups in terms of their posttest results, which corroborates most of the other partial captioning research findings. With the exception of Behroozizad and Majidi (2015) and Yang and Chang (2014), most of the existing empirical evidence seems to suggest that partial and full captioning do not differ from each other in terms of pedagogical potency. While most of the existing studies attributed the lack of difference between the partial and full captioning viewing conditions to different explicans (e.g., materials being too challenging or too easy), this explicandum itself conflicts with the eye-tracking studies indicating that any change in input, i.e., different caption display modes, would significantly affect L2 learners’ viewing behaviors (Choi, 2016; Perez, Peters, & Desmet, 2015). Perez et al.’s (2015) study provides eye-tracking data unveiling L2 learners’ caption viewing behaviors under full and keyword caption conditions. The data show that the visual salience induced by keyword captions led to longer gaze duration, second pass reading time, and total fixation duration compared to full caption. This implies the selective presentation of oral discourse in partial captioning did enhance L2 learners’ attention toward captions. Similarly, Choi (2016) found that L2 learners’ recall and understanding 14.

(23) of L2 text was enhanced when their attention was (re)directed to certain parts of the text. Choi (2016) and Perez et al.’s (2015) studies collectively show that manipulation of input presentation (as in the case of caption modes manipulation) could potentially alter L2 learners’ processing and attention allocation. If this line of thinking is correct, a question is raised concerning the inconclusive result presented in existing partial captioning studies: why did different caption modes lead to no substantial difference in L2 learners’ comprehension, as shown in Gillory (1998), Hsu et al. (2013), Mirzaei et al. (2017), Perez et al. (2014) and Perez et al. (2014)? One possible hypothesis is that L2 learners’ processing profiles intervened and reduced the discrepancy caused by caption manipulation. L2 learners’ input processing predilections, as elucidated in the preceding section, might contribute to different viewing behaviors. In this vein, it is reasonable to assume that individual L2 learner would react differently to each caption mode, which may have affected the overall efficacy of both full and partial captioning in research reviewed above. If this conjecture is accurate, the examination of L2 learners’ processing preferences as modulating factor in the current study could help unravel the inconclusive results generated above. Further investigation with more robust experimental design is also needed to generate more convincing results regarding the efficacy of partial captioning.. 2.3.3 Real-time captioning and L2 listening Real-time captioning, with its verbatim and incremental display of oral discourse (see Figure 1 for an example), often serves as a tool to enhance deaf learners’ access to spoken information in recorded events such as lectures or meetings. Though many normal-hearing L2 learners have also started to use real-time captioned videos to enhance their L2 listening skills, the pedagogical application of this caption mode has mainly targeted deaf or hard-of-hearing learners and was not specific to language 15.

(24) learning. While real-time captioning is traditionally transcribed by stenographers or captionists, the advance of automatic speech recognition provides faster and cheaper way to generate real-time captions. Over the past few years, information technology professionals have been seeking methods and techniques to help increase the intelligibility and accuracy rates of automatic speech recognition in processing realtime captions (Kushalnagar, Lasecki, & Bigham, 2012; Lasecki, Kushalnagar, & Bigham, 2014; Wald, 2006).. Figure 1. Screen shots of a TED talk with Youtube self-generated real-time captioning. With the purpose of helping people parse the ongoing speech stream, real-time captioning provides real-time instantaneous word-level transcription of speech. It is thus different from the sentence-level transcription carried out in full captioning and is more suitable for instantaneous text-to-speech mapping training. Despite the lack of real-time captioning research in L2 contexts, the technique of text-to-speech synchronization—a characteristic of real-time captioning, was utilized by a few scholars as L2 learning aid. Most of them incorporated text-to-speech synchronization into L2 static text reading program by visually enhancing the word vocalized in the 16.

(25) audio recording. For example, Trancoso, Serralheiro, Viana, Caseiro and Mascarenhas (2007) introduced a Digital Talking Book project where written texts were highlighted in alignment with recorded speech in either the same or different languages. This tool, according to the researchers’ preliminary evaluation, held potential for enhancing L2 vocabulary learning and reading. More recently, Bailly and Barbour (2011) proposed a “karaoke-style” synchronous reading system featuring phoneme-level alignment with the aim to assist L2 orthography learning. While listening to the recorded speech, users would see a cursor highlighting the phonemes that were being vocalized in real-time. Their experiment reveals that synchronous reading, compared to audio recording alone, led to increased improvement in L2 learners’ implicit learning of word spelling. These two studies demonstrated that word- and phoneme-level synchronization can be beneficial for L2 reading, vocabulary learning and word spelling. However, it is important to note that the viewing condition in Bailly and Barbour (2011) and Trancoso et al. (2007) did not render a true real-time captioning viewing environment; the texts were not presented in an incremental manner as in real-time captioning and viewers have full access to the textual transcription of the corresponding oral discourse during the whole viewing process. Given the differences between the text-to-speech alignment and real-time captioning, the findings of Bailly and Barbour (2011) and Trancoso et al. (2007) did not lend empirical support to the efficacy of realtime captioning. The impact of real-time captioning on L2 learners’ viewing behaviors and comprehension is yet to be empirically established. In conclusion, real-time captioning, with its synchronized and incremental display of oral discourse, is likely to employ L2 learners’ focal attention to serially process captions. Its dynamic caption display is also hypothesized to actively attract L2 learners’ attention toward captioning. The question as to whether this caption mode disrupts or enhances L2 learners’ listening comprehension and whether L2 learners’ modality 17.

(26) preferences modulate the results still awaits verification. Such empirical validation would shed light on the optimal captioning options for L2 learners of different input processing predilections when choosing videos for developing L2 listening skills. Although existing studies have established that real-time captioning is helpful for deaf and hard-of-hearing learners (Kushalnagar et al., 2014; Lasecki et al., 2014; Wald, 2006), it is essential to note that the information processing of hearing learners could be rather different from deaf and hard-of-hearing learners, especially in multimodal environment. While deaf and hard-of-hearing learners’ oral input is limited to only visual presentation, hearing learners could receive input from both auditory and visual modalities. It therefore warrants further studies to investigate whether L2 learners could likewise benefit from real-time captioning in regard of listening comprehension. To conclude the current section, full captioning, which has been widely utilized and empirically investigated, may not be beneficial in all cases since L2 learners’ attention toward full captions tend to ramble. Partial and real-time captioning might serve as possible solutions since caption display is likely to affect L2 learners’ viewing behaviors and these two caption modes could potentially reorient their attention toward captions. The exploration of these three caption modes and their effects could hopefully shed light on the optimal caption presentation for L2 listening comprehension. Notwithstanding, caption mode alone might not be enough to achieve such research goal due to the possible influence of individual differences, among which L2 learners’ modality preferences might be specifically crucial in the exploration of different caption modes. The following section will further elaborate on the concept of modality preference and present findings of a few related studies.. 2.4 The role of modality preference in viewing L2 captioned videos The processing of captions, regardless of their presentation modes, is different from 18.

(27) that of static text due to the transient nature of caption display. Due to the limit of L2 learners’ attentional resources, they often encounter difficulty focusing on both auditory and visual input in all cases while viewing captioned videos (Taylor, 2005). Under this circumstance, L2 learners would choose to spend more time and effort attending to the input modality they prefer. Auditory L2 learners, while watching captioned videos, may depend more on the auditory input for comprehension since they prefer to process auditory information. In this regard, captions would be less useful as a comprehension support. On the other hand, visual L2 learners may rely more on captions for understanding since they are inclined to process visual information. Balanced L2 learners, as the name suggests, do not show specific preference toward either auditory or visual information processing when two modalities exist at the same time (Liu & Todd, 2014). Therefore, whether L2 learners take advantage of such comprehension aids might depend on their modality preference in real-time processing (i.e., auditory, visual, and balanced) since it could influence their attention allocation and thereby lead to different caption viewing behaviors. Most existing studies examining L2 learners’ preferred modalities when viewing captioned videos administered instruments such as questionnaires or interviews (see Hsu et al., 2013 for example), which allow L2 learners to recall the listening experience. However, since listening processing is rather “unobservable” (Graham, 2006, p. 166) and “implicit” (Vandergrift, 2007, p. 191), L2 learners might not be fully aware of it, especially when such processing is carried out in a real-time manner as in viewing captioned videos. To overcome the limitation of these retrospection techniques (e.g., questionnaire, interview), a real-time measurement of L2 learners’ modality preferences is needed to generate more reliable data (Liu & Todd, 2014). One such solution can be found in Leveridge and Yang’s (2013) caption study. The researchers proposed a testing instrument called Caption Reliance Test (CRT) to assess 19.

(28) L2 learners’ preferred modality in real-time processing of audiovisual materials. CRT was subsequently utilized in an experiment which investigated the degree of EFL high school learners’ caption reliance while viewing captioned videos and its relationship with their L2 listening comprehension. The results revealed a negative correlation between L2 learners’ caption reliance and their comprehension performance, indicating that lower achievers tend to rely more on captions than their counterparts. The CRT test provided a measure of L2 learners’ real-time modality preferences and was later employed in Liu and Todd (2014) study to determine L2 learners’ learning styles. While the main focus of the study is the efficacy of using dual-modality in repeated reading on EFL learners’ reading comprehension and vocabulary acquisition, L2 learners’ learning styles (categorized into auditory, visual, and balanced) was examined as one of the modulating factors. The finding indicates that only auditory and balanced L2 learners benefited from the treatment, showing that such individual difference indeed plays a crucial role in affecting the efficacy of dual-modality in repeated reading. Both Leveridge and Yang’s (2013) and Liu and Todd’s (2014) research demonstrated the importance of considering L2 learners’ modality preferences when implementing multimodal materials. They also exhibited the plausibility of CRT as a measure of this variable in a real-time manner. To confirm the researcher’s hypothesis that L2 learners’ preferred modalities serve as a key factor modulating the efficacy of captioning, the present study will administer CRT test to categorize L2 learners based on their modality preferences in real-time L2 (multimedia) input processing and explore how this individual difference affect their comprehension under three different caption conditions.. 2.5 Summary With the aim to explore the efficacy of three caption modes (i.e., full, keyword, and 20.

(29) real-time captioning) on L2 learners with different modality preferences, previous sections have provided a comprehensive review of existing L2 research on each of the three caption modes, followed by an introduction to L2 learners’ real-time modality preference along with means to measure it. Listed below are several main ideas regarding this research topic based on the studies reviewed above:. What is already known about this topic: . Full captioning has been empirically established as an effective comprehension aid for L2 learners since it makes captioned videos become multimodal, which thereby increases the depth of information processing.. . However, full captions may benefit L2 learners of different proficiency and processing profiles to different extents.. . Some empirical evidence has shown that partial captioning generally leads to similar level of L2 comprehension compared to full captioning. However, this lack of difference may be ‘masked’ by other factors such as L2 learners’ modality preferences in real-time input processing.. . Efficacy of real-time captioning is yet to be established in the context of promoting L2 listening.. Though the efficacy of full captioning has been widely acknowledged, issues of L2 learners’ rambling attention and the concomitant variation of their viewing behaviors have received scarce attention. In addition, the effects of partial captioning remain inconclusive and that of real-time captioning unexamined. Through the comparison of these three caption modes, with L2 learners’ modality preference serving as the hypothesized moderating factor, this study could hopefully add more insights onto this topic. 21.

(30) It is also essential to reiterate a few issues on experimental design illuminated by the studies reviewed. First, utilizing level appropriate materials and instruments is of fundamental importance so as to prevent undesirable moderation of caption effects. Second, L2 real-time modality preferences is more suitably measured not by questionnaires or interviews, but through the implementation of real-time testing instrument such as CRT. The upcoming section comprises detailed information of the research methodology that will be employed in the experiment.. 22.

(31) CHAPTER THREE: METHODOLOGY 3.1 Participants 95 undergraduate students at a university in northern Taiwan participated in the study. All the participants were native speakers of Mandarin Chinese and their English proficiency was comparable to the B2 level of the Common European Framework of Reference for Language (CEFR), as determined by their English proficiency test score (72-95 on TOEFL iBT, 785-945 on TOIEC, 5.5-6.5 on IELTS or high-intermediate on GEPT).. 3.2 Materials 3.2.1 Video selection The viewing material was selected from the TED talk website based on the following reasons. TED talk videos contain authentic academic lectures suitable for the undergraduates participating in the study. The delivery of TED talk is also familiar to the participants since it is commonly used as teaching material in their English classes. From a more technical perspective, TED talk website provides access to accurate oral transcripts and full captioning, making it an ideal material for self-learning, teaching and caption research. The TED talk used for this study was selected based on the following three criteria: accent, difficulty level and topic familiarity. First, the speech in the video is delivered in American English since it is the norm used in the participants’ classroom instruction. This criterion was also included in Mirzaei et al.’s (2017) video selection. Secondly, due to the importance of utilizing level appropriate materials in caption studies (as evidenced in Perez et al., 2014 and Perez et al., 2014), the difficulty level of the selected video ought to match the participants’ proficiency level. Finally, the topic of the selected 23.

(32) video needs to be familiar to the participants since the listeners’ background knowledge is a crucial factor affecting L2 learners’ listening comprehension (Othman & Vanathas, 2017). In addition, videos containing a great deal of technical terms are also avoided as in Mirzaei et al.’s (2017) study to prevent confusion. By following the three criteria, an eleven-minute level-appropriate TED talk video on successful leadership—a topic closely familiar to undergraduates—was utilized as the viewing material of the study. The video contained 1832 words in total, with 13.09 words per sentence in average. The average speech rate of the video was 4.24 syllables per second.. 3.2.2 Caption mode environment Under each caption condition (i.e., no, full, partial, and real-time captioning), captions appeared at the bottom of the screen. The selected video was shown on the Youtube platform since its captioning service was used to operationalize the partial and real-time caption modes (see Figure 2 for examples). The following subsections include detailed information regarding the sources and production of the three caption presentation modes. 3.2.2.1 Full captioning A full captioning of the corresponding oral discourse unit was displayed in the bottom area of the screen before the discourse unit was heard—a viewing environment typical of existing captioned video instruction materials (see Figure 2). Once the oral discourse was heard, the captions, along with the video content, was refreshed by the following video content. 3.2.2.2 Partial captioning In the current study, the partial caption viewing condition was operationalized in. 24.

(33) the form of keyword captions (e.g., Guillory, 1998; Perez et al., 2014; Yang & Chang, 2014). In many existing caption research, keyword captions were defined as important words necessary for viewers to understand the main idea of the video content. As for the keyword determination, the study employed professional judgments—a keyword determination procedure also seen in Guillory (1998), Perez et al. (2014) and Behroozizad and Majidi (2015). To this end, five experienced EFL instructors were invited to select the keywords. They were asked to highlight key words for B2-level viewers to comprehend the oral discourse while watching the video. Words selected by more than three EFL instructors were determined as the keywords for each oral discourse unit. Under this viewing condition, the keyword was displayed in the center of the bottom area of the screen while the word in the oral discourse was heard (see Figure 2). The keyword quickly disappeared from the screen as soon as the following word in the oral discourse was heard. In the current study, partial captioning was operationalized in the aforementioned viewing procedure for three important reasons: (1) Comparability concern: This is the most frequent partial captioning operationalization method used in other studies; (2) Function concern: Partial captioning that takes the form of other possibilities (e.g., key phrases) may present longer texts and incessantly attract the participants’ focal attention to captions, and hence may be functionally distinguishable from full captions; (3) Practicality concern: This captioning presentation mode can be readily produced using Youtube services free of charge, and can serve as handy, accessible captioning production possibility used by L2 instructors.. 25.

(34) 3.2.2.3 Real-time captioning Under this viewing condition, the participants saw the captions being presented to them serially—word by word and from left to right—in temporal alignment with each word in the oral discourse (see Figure 2). This particular captioning was generated using the Youtube platform, the platform on which the video were shown.. Full captioning. Partial captioning. Real-time captioning. Figure 2. Screen shots of the three caption modes operationalized in the study. 3.3 Design This study was an experimental investigation aiming to explore the impact of (1) an intervention (the three caption viewing conditions) and (2) the participants’ modality preference on the participants’ listening comprehension with random assignment. The study used a 4x3 factorial design that included one dependent variable (the participants’ understanding of a captioned video) and two independent variables, including one between-subject variable (caption condition), and one within-subject variable (modality preference). For the purpose of the study, a Caption Reliance Test (CRT; see Section 3.4 for more detail) was administered to the participants to determine their modality preference in real-time input processing. Based on the results of the CRT test, participants of different modality predilection (visual, auditory, balanced) were randomly assigned to one control group: no captioning, and three captioning viewing. 26.

(35) conditions: full captioning, partial captioning, and real-time captioning (See Section 3.4 for more detail). Figure 3 visually schematizes the design of this study. A questionnaire was administered at the end of the experiment to obtain qualitative data of the participants’ perception of the caption viewing experience. Though it is not the major aim of the study, the participants’ feedback helped explain the qualitative results as well as generated insights into future research and practice.. Figure 3. A visualization of the participants’ grouping process. 3.4 Instruments 3.4.1 Caption Reliance Test (CRT) The CRT, which included 41 items in total, was used to determine the participants’ modality predilection in real-time input processing. The format of CRT was based on Leveridge and Yang’s (2013) design. In this study, each CRT item comprised a short video and a multiple-choice comprehension question. Each video presented a short conversation between a man and a woman with visual images, audio discourse, and full caption support. After watching the video, the participants had to answer a question 27.

(36) based on their understanding of the conversation. The videos were level appropriate and comparable to the TED talk video used as the viewing material in the study. The questions were also designed based on the participants’ proficiency level. To examine the appropriateness of the test items, a pilot CRT test was administered. Importantly, a CRT test, as prescribed by Leveridge and Yang (2013), contains both congruent and incongruent items. In congruent items, the content presented in the audio recording and captions are identical. Other items involve one-word incongruence between the audio and captions with the purpose of determining the participants’ preferred modalities. For example, the participants may hear the following conversation:. Woman: Hey, just curious, what’s something I do that gets on your nerves? Man: I think sometimes you underestimate your abilities, and that’s what annoys me the most.. However, they would read the captions “I think sometimes you overestimate your abilities, and that’s what annoys me.” Under this circumstance, there is no “correct” answer to the succeeding comprehension question, yet the participant’s choice would reveal their modality preference. For example, the participants would see the question as follows:. Question: What does the woman do that annoys the man? (A) Overestimates herself (B) Underestimates herself (C) Overanalyzes herself (D) Undervalues herself. 28.

(37) If the participant is a visual learner, option (B) is more likely to be chosen since visual learners tend to rely more on visual information, captions in this case, when viewing multimodal materials. On the other hand, auditory learners are more likely to select option (A). To prevent the participants from becoming aware of the incongruence, only 25% of the CRT items were incongruent while the other 75% were congruent. This proportion was set based on the guidelines given in Leveridge and Yang (2013). The participants’ answers to the incongruent items were used to determine their modality preferences and categorize them into three modality groups 2 —visual, auditory, and balanced learners. As for the analysis of CRT test results, this study adopted the CRT scoring policy utilized in Liu and Todd (2014). When answering an incongruent CRT item, the participants scored one point under “visual modality” if they chose an answer according to the video captions; on the other hand, they scored one point under “auditory modality” if they selected an answer according to the audio recording. The researcher then calculated a preferred modality (PM) index value by dividing each participant’s visual modality points by one’s auditory modality points (see Table 1). By doing so, a participant relying more on captions (visual input) than the oral discourse would achieve a PM index value larger than 1. For instance, eight visual modality points and two auditory modality points would result in a PM index value of 4 (8/2). On the other hand, a participant attending more to the oral discourse would attain a PM index value smaller than 1. For example, four visual modality points and six auditory points would result in a PM index value of 0.67 (4/6). The participants with a PM index value that equals 1 would be those who do not show particular modality preference (e.g., The CRT result of the pilot test and that of the current study were similar regarding the proportion of learners in the three modality groups. The visual, auditory, and balanced learners consisted of 44%, 44%, and 12% of the participants in the pilot test, the three modality groups took up 46%, 43%, and 10% of the participants in the current study. Similar proportions of the three modality groups across the two CRTs are suggestive of the reliability of CRT. 2. 29.

(38) Participant #3 in Table 1). Therefore, the PM index value helped categorize the participants into three modality groups, visual (PM value>1), auditory (PM value<1), and balanced (PM value=1). If the participant chose an answer based on neither captions nor oral discourse, one point was given under “non-target words.” Two or more points under “non-target words” (e.g., Participant #4 in Table 1) resulted in data exclusion since it suggested the test items were not fully understood by the participants or were too challenging for them— an issue as seen in other L2 captioning studies. Table 1. CRT test ballot sheet Visual. Auditory Non-target. PM index. Modality. words. value. preference. modality. modality. score. score. Participant #1. 8. 2. 0. 4. Visual. Participant #2. 4. 6. 0. 0.67. Auditory. Participant #3. 5. 5. 0. 1. Balanced. Participant #4. 4. 3. 3. Excluded. Excluded. 3.4.2 Listening comprehension test To assess the participants’ understanding of the video content under the four aforementioned captioning viewing conditions, a set of listening comprehension test was constructed and administered to the participants immediately after the video viewing session. The comprehension test consisted of fifteen multiple choice questions, each of which comprised four options, one correct answer and three distractors. Among the fifteen questions, five of them targeted global understanding, five of them required understanding of details, and the other five focused on inferences.. 30.

(39) For each item, the participants first heard and read the stem of each comprehension question on the screen. Then, they read the options on the screen before selecting their answers. This test design was based on the format of TOEFL iBT listening test. The participants’ answers to the comprehension questions were scored dichotomously: a correct answer was given one point while an incorrect answer was given zero point.. 3.4.3 Questionnaire The questionnaire consisted of seven statements aiming to probe participants’ general perception of the captioned-video viewing task. Among the seven statements, one was designed to establish the level appropriateness of the video, three focused on participants’ perception of caption use, and three catered to their self-reported attention allocation while video watching. The questionnaire ended with an invitation to elicit participants’ open-ended comments on their caption viewing experience.. 3.5 Procedure of data collection The experiment consisted of three major phases: (1) CRT, (2) captioned video viewing session, (3) listening comprehension test and questionnaire. In the first phase, the participants took the CRT, which took approximately twenty minutes to complete, and then were categorized into three different types of learners (i.e., visual, auditory, and balanced) based on their CRT results. One week after the CRT test, the participants of different modality preferences were invited to participate in an eleven-minute video viewing session under four caption conditions, i.e., no, full, partial, and real-time captioning. After viewing the video once, the participants spent another fifteen minutes completing the listening comprehension test and filling out the questionnaire. The third phase ended with a semi-structure interview where the researcher asked the participants follow-up questions based on their response on the questionnaire. 31.

(40) 3.6 Data analyses STATISTICA was used as the statistical analysis software in the current study. A two-way ANOVA was conducted to compare the comprehension performance under the four caption conditions (research question 1) and examine the interaction between caption mode and the participants’ modality preference (research question 2). The two sets of independent variables were caption modes and the participants’ modality preferences. The dependent variable was the listening comprehension test score. To further investigate the differences among groups, a Fisher’s Least Significant Difference (LSD) post hoc test was utilized under the conditions where significant difference existed. As for the questionnaire, the analysis of the five-point Likert scale was based on descriptive statistics.. 32.

(41) CHAPTER FOUR: RESULTS The current study was set out to explore whether different caption modes affect L2 leaners’ listening comprehension and whether L2 learners’ modality preferences influence their comprehension performance under different caption conditions. The ensuing paragraphs will report on the results of the obtained quantitative and qualitative data. Specifically, the report will begin with the descriptive statistics, which aims to disclose the overview of the participants’ caption viewing behaviors, and will then report on the two-way ANOVA results, post-hoc test analysis and the questionnaire data. It is important to note that a few participants’ data were not included in the ANOVA and post-hoc analysis due to the following reasons: (1) invalid CRT data, (2) insufficient number of balanced learners, and (3) outlier or atypical behaviors based on the interview data. First, several participants scored “two or more points” under the “nontarget words” category in the CRT test, which suggests that the CRT items were probably too challenging for these people and thus did not serve as a valid input modality preference determination mechanism for them (see 3.4.1 for more detailed explanation). Second, the CRT test results indicated that while 46% and 43% of the participants were categorized as visual and auditory learners, only 10% of them were classified as balanced learners. Given that the sample size of the balanced learners was too small to be compared with the other modality groups for meaningful statistical inferences (Faber & Fonseca, 2014), the balanced learners’ data was removed from the current study. Lastly, based on the participants’ interview data, the researcher noticed that a few participants were either familiar with the selected video or fully aware of the purpose of this study. Since these participants might have approached the task with strategies that were not characteristic of normal video viewing behaviors, their data were excluded from the analysis regardless of their comprehension performance. 33.