人人都需要 “字幕” 嗎?探討不同學習行為如何影響字幕對英語聽力理解之效度

全文

(1)國立臺灣師範大學英語學系碩. 士. 論. 文. Master’s Thesis Department of English National Taiwan Normal University. 人人都需要 “字幕” 嗎? 探討不同學習行為如何影響字幕對英語聽力理解之效度. Captions for All? Validating the Effect of Captions on L2 Learners with Different Online Processing Profiles. 指導教授：劉宇挺博士 Advisor: Dr. Liu, Yeu-Ting 研究生：關蕙芯 Graduate: Kam, Emily Fen. 中華民國一百零七年八月.

(2) 摘要. 在多媒體學習中，字幕影片常用來訓練第二語言英語聽力。過去研究也發現字幕能協助第二語言學習者理解英語影片內容。然而，近年來有一整合分析發現，學習者不同的學習行為，會讓他們在理解字幕影片的英語內容時，受到字幕影響的程度也有所差異。此研究探討學習者的兩種學習行為—學習型態 (聽覺與視覺型)和記憶能力(高或低)—如何影響字幕對英語聽力理解之效度。84 名英語學習者的結果顯示，上述兩種學習行為對字幕的效度有不同程度的影響，甚至有顯著的交互作用。此研究發現學習者的記憶能力高度影響學習型態對英語聽力理解之效果。對記憶能力較差的學習者而言，他們的學習型態對英語聽力理解並無顯著的影響；然而，對記憶能力較佳的學習者而言，觀看無字幕的影片對聽覺型的學習者而言比較合適，反之，視覺型的學習者比較適合觀看有字幕的影片來增進聽力理解。本研究結果再次呼應不同學習行為對英語聽力的重要影響，特別是學習者如何透過字幕理解英語多媒體的資訊內容。同時，本論文也提出對教師、學習者與多媒體研發者對於英語聽力之建議，尤其針對不同學習行為提出適切的字幕影片以提升學習者的英語聽力。. 關鍵字：第二語言英語聽力、多媒體學習、字幕、學習型態、記憶能力. i.

(3) Abstract Captioned video is widely used by L2 learners to enhance their exposure to native oral input beyond the classroom setting. Such exposure to captioning has been found to provide useful visual aid for parsing and understanding L2 oral discourse. Notwithstanding, a recent meta-analysis has shown that captioning exerts a selective effect on L2 learners with different input processing profiles. This study investigated whether L2 learners’ modality preferences (visual vs. auditory) and working memory capacity (high vs. low) would modulate the effect of captions on L2 listening outcomes. Results from 84 participants revealed that both cognitive variables impacted their L2 listening to different extents. Notably, working memory capacity modulates the impact of L2 learners’ modality preferences on their listening outcome. Modality preference did not exert any significant impact on the listening outcome from those with lower working memory capacity. For the L2 learners with high working memory capacity, their modality preference was crucial in determining their listening outcome; in this case, visual learners did best when watching videos with captions, whereas auditory learners exhibited the best listening outcome with captions were not provided. The findings in this study further shed light on the importance of taking individual differences into consideration when employing captioned videos to maximize L2 learners’ listening for different pedagogical purposes.. ii.

(4) Keywords: L2 listening, multimedia learning, captions, modality preference, working memory capacity. iii.

(5) Acknowledgment. This long journey of many discoveries would not have been completed without the kindness, support, and love from many.. To Tony: We both know this finished piece did not happen without your unconditional and loving support. Words cannot express the emotions I have towards everything you have put into this journey. Thank you Boss.. To my blood-related family: Daddy, Mommy, Michael, I love you, thank you for everything, seriously.. To my friends at NTNU: Winnie, Jojo, Eric, Aletha, Wendy, Tiffany, and many more. Life at NTNU would have sucked so bad if you guys were absent. Thank you all for this exhilarating ride. Special thanks to Winnie. Thank you for bringing me home.. iv.

(6) To my soul family－our home: The moment I stepped into our home my soul had never cheered this loud for 3000 something years. You are the backbone, the fire, the waves, the breeze, the ocean, the forest, the light, and the love in my life.. Finally, I would like to dedicate this entire journey to Habala, Jesus, and the 12 Archangels. Thank you for your creation, love, and everlasting light. The road is still long, but faith will never fade.. At last, thank you, Master JC, for everything.. v.

(7) Table of Contents. CHAPTER 1 INTRODUCTION ................................................................................... 1 CHAPTER 2 LITERATURE REVIEW......................................................................... 6 2.1 Theoretical Underpinnings of Using Captions to Assist Multimodal L2 Processing ...................................................................................................... 7 2.2 Empirical Studies of Captioning on L2 Listening ........................................... 13 2.3 The Role of L2 Proficiency Profiles on Captioning in L2 Listening ............... 15 2.4 Working Memory and its Potential Role in Modulating the Effect of Captio. 19 2.5 Preferred Input Modality and its Potential Role in Modulating the Effect of Captions ....................................................................................................... 22 2.6 Summary .......................................................................................................... 25 CHAPTER 3 METHODOLOGY ................................................................................ 27 3.1 Participants ....................................................................................................... 27 3.2 Materials .......................................................................................................... 27 3.2.1 Video Selection ................................................................................ 27 3.2.2 Caption Viewing Conditions ............................................................ 29 3.3 Design .............................................................................................................. 29 3.4 The Assessment Tasks and Scoring ................................................................. 32 3.4.1 The CRT ........................................................................................... 32 3.4.2 Reading-Span Task (RST) ............................................................... 36 3.4.3 The Listening Comprehension Test ................................................. 39 3.4.4 Questionnaire ................................................................................... 40 3.4.5 Interview .......................................................................................... 41 3.5 Data Collection Procedure ............................................................................... 42 3.6 Statistical Analysis ........................................................................................... 42 CHAPTER 4 RESULTS .............................................................................................. 44 4.1 Quantitative ...................................................................................................... 44 4.1.1 Descriptive Statistics ........................................................................ 44 4.1.2 A three-way ANOVA ....................................................................... 46 4.2 Qualitative data ................................................................................................ 52 CHAPTER 5 DISCUSSION ........................................................................................ 58 5.1 RQ1: Does preferred modality modulate the effect of captioning? ................. 59 5.1.1 Effects of caption mode on auditory learners .................................. 60 5.1.2 Effects of caption modes on visual learners..................................... 62 5.2 RQ 2: Does working memory capacity modulate the effect of captioning? .... 64. vi.

(8) 5.2.1 Effects of caption modes on L2 learners with lower working memory capacity .......................................................................................... 65 5.2.2 Effects of caption modes on L2 learners with higher working memory capacity .......................................................................................... 66 CHAPTER 6 CONCLUSIONS ................................................................................... 69 6.1 Conclusion and implications ............................................................................ 69 6.1.1 Differentiated instructions for optimal implementations of captions in multimodal listening ................................................................................ 70 6.1.2 The implementation of CRT and RST for pedagogical purposes .... 73 6.2 Future suggestions and limitations................................................................... 74 References .................................................................................................................... 75 Appendix 1 Information Sheet and Consent Form ...................................................... 80 Appendix 2 Listening Comprehension Questions ....................................................... 82 Appendix 3 Questionnaires .......................................................................................... 85. vii.

(9) List of Tables Table 1 CRT Score Ballet Sheet .............................................................................. 34 Table 2 Descriptive Statistics of the participants’ listening comprehension test scores ........................................................................................................... 45 Table 3 A three-way ANOVA result for the effects of captions modes, preferred modality, and working memory and their interactions on L2 listening comprehension ............................................................................................. 47 Table 4 Descriptive statistics of the questionnaire data with five-point Likert scale items ............................................................................................................ 52. viii.

(10) List of Figures Figure 1. Mayer’s cognitive theory of multimedia learning (Mayer, 2005a) .......... 10 Figure 2. A visual schematization of the grouping procedure ................................. 30 Figure 3. The procedure of a trial in the RST.......................................................... 38 Figure 4. Mean difference between auditory and visual L2 learners’ listening comprehension scores .............................................................................. 48 Figure 5. The interactions between caption mode and preferred modality. ............ 49 Figure 6. The interactions among caption mode, preferred modality, and working memory capacity. ..................................................................................... 51. ix.

(11) CHAPTER 1 INTRODUCTION Understanding running speech requires listeners to simultaneously process various input (e.g., audio text, facial expressions, and gestures) (Vigliocco, Perniss, & Vinson, 2014). Yet whether such processing can be successfully executed depends on how listeners hold (i.e., the functioning of working memory capacity) and select (i.e., online modality. preferences—visual. vs.. auditory). relevant. information. delivered. multimodally (different representations of the content presented). While this dynamic processing occurs automatically in first language (L1) listeners, it does not apply to all second language (L2) learners, especially those who have yet to automatize their oral decoding skills. And since human cognitive processing capacity is limited (Mayer, 2001), it makes automatizing L2 listening skills even more challenging. To help overcome the aforementioned challenge, captions in videos have captured pedagogical interests as a multimedia scaffold to enhance the understanding of L2 oral discourse. Designed to help segment and parse running speech during listening, captions are visuals texts delivered simultaneously with animation that matches the auditory target language (Leveridge & Yang, 2013; Stewart & Pertusa, 2004). The addition of captions to videos allows L2 learners to simultaneously parse and process the discourse through dual or multiple channels, which is believed to enhance the width and depth of multimedia learning (Low & Sweller 2005; Mayer, 2001, 2005a; Moreno 1.

(12) 2006; Moreno & Mayer, 2007). In fact, the benefits of captioned videos have been evidenced in some empirical studies, showing that captioning provides instantaneous, useful visual aid for parsing. Specifically, captioning is found facilitative to beginning and intermediate L2 learners’ understanding of L2 running speech (Chai & Erlam, 2008; Gruba, 2004). Although some studies have endorsed the pedagogical potency of captioning on L2 listening, the findings from a recent meta-analysis study (Perez, Noortgate, & Desmet, 2013) illuminated that captioning may exert a selective effect on L2 learners with different proficiency profiles (i.e., beginning, intermediate, and advanced). That is, participants’ different degrees of L2 command could modulate the effect. of. captioning on their listening comprehension. Nevertheless, existing studies that have attempted to further this line of research have yielded inconclusive results due to methodological differences (e.g., achievement tests, proficiency tests, length of study, and other individually certified criteria) in determining participants’ L2 proficiency levels (Leveridge & Yang, 2013; Pujola, 2002; Taylor, 2015). Such methodological incongruence poses comparability issues in existing L2 caption research, raising questions as to whether the findings can be generalized to L2 learners of different proficiency profiles. The above inconclusive findings of L2 caption research on listening. 2.

(13) comprehension have jointly implied that not all L2 learners would benefit from captions equally. The issue at focus narrows down to: who will? To further explore this issue, it is imperative to target variables that directly affect real-time listening process. Thus far, L2 caption research has yet to explore factors that influence how L2 listeners simultaneously process information presented through multiple channels. Whether these online input processing profiles alter the effect. of captioning awaits empirical. investigation, which could potentially shed light on optimizing its effect on L2 listening comprehension. In light of the significance of online input processing profiles, this study selects two variables that are highly relevant to the multichannel processing of L2 listening: L2 learners’ working memory capacity (Mayer, 2005a) and their preferred input processing modality (Leveridge & Yang, 2013). With multiple input (i.e., video, audio, texts) at play during video watching, how efficient L2 learners process these information at their disposal depends on their working memory capacity (Mayer, 2001; Moreno & Mayer, 2007). And since L2 learners are limited in the amount of information they can process, it is conceivable that those with better working memory capacity may benefit more from the addition of an extra input—caption—while viewing video. Furthermore, the constraint of processing capacity forces L2 listeners to be more selective in their attention allocation (Mayer, 2005b), which is likely to drive L2. 3.

(14) learners to orient their attention to the information delivered in their most preferred modality (e.g., visual vs. audio). Nonetheless, whether both variables indeed modulate the effect of captioning in a multimodal environment is yet to be empirically established. Further investigation is needed to offer evidence-based guidelines when using captioned videos to enhance listening comprehension for L2 learners of different input processing profiles. To the above end, the current study is set out to explore whether differential working memory capacity and modality preferences during input processing would modulate the effect of captioning on L2 listening comprehension. By investigating how learners with different L2 online processing profiles benefit from captioned videos, the present study provides suggestions regarding differentiated instruction on the use of captions that caters to individual differences during multimodal input processing. In language learning classrooms, in particular, providing students with different avenues of learning allows them to comprehend audiovisual materials according to their preferred mode of learning. This way, learning becomes personalized even in a classroom setting, which in turn, empower all students to learn effectively. As captioning is found not beneficial to all L2 learners, it is crucial for instructors to gain further insights on learner differences to provide optimal multimedia scaffold for L2 listening. The two research questions explored in this study are as follows:. 4.

(15) 1. Does working memory capacity modulate the effect of captions in understanding L2 videos? If so, to what extent? 2. Does preferred input processing modality (i.e., visual and auditory) modulate the effect of captions in understanding L2 videos? If so, to what extent?. 5.

(16) CHAPTER 2 LITERATURE REVIEW This chapter reviews theoretical frameworks and empirical findings that are crucial to identify the research niche and contextualize the present study. To this end, the literature review will consist of five major sections. The first section (2.1) aims to provide the theoretical background for this study, and (the theoretical framework) will introduce Mayer’s cognitive theory of multimedia learning (CTML)—a theoretical framework addressing how human minds process multiple input in a multimediaenhanced learning setting. This framework is then discussed vis-à-vis the use of captions to enrich learners’ L2 listening experience. In particular, three key assumptions of CTML that address the success and constraint of multimodal input processing will be highlighted, followed by a discussion vis-à-vis the effect of captioning on L2 listening. After addressing the theoretical basis of using captioning to assist L2 listening, the second section (2.2) will review existing L2 caption research. Throughout a critical review of these empirical studies, the third section (2.3) proposes that captioning exerts a selective effect on learners with different L2 proficiency profiles, which further validates the necessity of investigating factors highly relevant to real-time parsing. In this vein, the fourth (2.4) and fifth (2.5) sections discuss two input processing factors. 6.

(17) that might modulate the effect of captioning in assisting multimodal (L2) listening process: 1) working memory capacity; and 2) preferred modality in real-time input processing (visual and auditory), respectively.. 2.1 Theoretical Underpinnings of Using Captions to Assist Multimodal L2 Processing L2 Listening entails rigorous real-time processing of input (i.e., audio input of speech, prosody, postures, gestures, facial expressions, and body movements) delivered from different channels (Vigliocco et al., 2014). Such real-time processing requires efficient and flexible attentional control, allowing L2 learners to attend to various input in order to parse running speech and form mental notes (Vandergrift, 2004; Vanderplank, 2010). This attentional allocation, however, is not always easy for L2 learners— especially those whose L2 is still in progress—for they are limited in the amount of information they can rehearse and process in the working memory (Mayer, 2001, 2005a). As the above cognitive constraint increases the difficulty of automatizing oral decoding and encoding skills, it makes multichannel listening a cognitively challenging activity for many L2 learners. To overcome the aforementioned challenge, instructors and researchers have sought various multimedia supports to facilitate L2 listening. Among them, captioning. 7.

(18) in videos prevails as an increasingly popular listening scaffold. Captioning—the visual texts delivered with corresponding oral discourse—has been extensively used to facilitate and enhance L2 learners’ listening experience, for its presence gives additional cues for L2 learners to better visualize, parse, and hence understand the running speech. Although such additional input may impose cognitive load to L2 listening process, its support has been found and established in many empirical evidence—captioned videos could potentially aid L2 listening comprehension (e.g., Garza, 1991; Guichon & Mclornan, 2008; Hayati & Mohmedi, 2011; Huang & Eskey, 1999; Markham, Peter & McCarthy, 2001; Perez et al., 2013; Winke, Gass, & Sydorenko, 2013). The multimodal effect of captioned videos is theoretically premised on a “learner centered” and “cognitive-constructivist oriented” theory—Mayer’s (2001, 2005a) cognitive theory of multimedia learning (CTML). CTML stipulates that multimedia designed to invoke processes compatible to how the brain works is more likely to lead to meaningful learning. In this vein, any multimedia-enhanced learning that provides and calls for multimodal processing would potentially facilitate L2 multimodal process. This contention thus offers the theoretical basis for using captions as a way to enrich the multichannel input provided by video to enhance L2 multimodal listening. Although CTML believes that multimedia-enhanced learning (MEL) has the potency to enhance L2 multimodal process, Mayer argues that this stipulation may not. 8.

(19) be true in all cases. He proposes three assumptions that would either facilitate or constrain learners’ multimedia-enhanced processing experience—dual channels, limited capacity, and active processing assumptions. First, regarding the dual-channel assumption, Mayer and other cognitive scientists argue that the outcome of multimodal processing is optimal when learners are simultaneously exposed to verbal (i.e., spoken or written words) and nonverbal input (such as animation, picture, or video), rather than just either input alone. Specifically, when both input are simultaneously presented to learners, they are more likely to build elaborate, meaningful and coherent mental representations of the input (Mayer, 2001, 2005a, 2005b). He further posits that such simultaneous processing of various verbal and nonverbal input encourages active mapping and integration with prior knowledge, which is likely to generate greater depth of learning (Low & Sweller, 2005; Mayer, 2001; Moreno & Mayer, 2007). The above dual channels assumption is theoretically based on the architecture of working memory. Specifically, working memory entails rehearsal/processing mechanisms for visual and auditory input, respectively. That is, what is heard (i.e., audio text and background sounds) will be processed in the auditory phonological loop and what is seen (i.e., animation and written texts) will be processed in the visuospatial sketchpad. Multimodal input that combines both verbal (e.g., captions) and nonverbal (e.g., video content) has the potency to activate the functioning. 9.

(20) of visuospatial sketchpad and phonological loop, thereby optimizing the processing in working memory. Second, despite the benefit of simultaneous processing of multimodal (e.g., verbal + nonverbal) input, working memory has its limitation. This gives rise to the second assumption in CTML: the limited capacity assumption. In simultaneous input processing, working memory functions as a middleman that first converts sounds and images from sensory memory to verbal and pictorial models, and later integrates the converted models with prior knowledge before storing in long-term memory (see Figure 1). During the above processes, whether simultaneous processing of multimodal input would be beneficial to learners depends on whether the aforementioned middleman job can be efficiently and automatically performed by learners.. Figure 1. Mayer’s cognitive theory of multimedia learning (Mayer, 2010, p. 545). The benefits of the above processing, however, may not apply to all learners for. 10.

(21) working memory capacity varies significantly from person to person. Since such cognitive capacity is in charge of the amount of information learners can temporarily rehearse and retain in each channel at a given time, its variance may become a crucial determiner to their learning success. In this case, when a captioned animation or video—a typical multimodal input—is presented, L2 learners are only able to attend to limited portions of the video input (e.g., the video content without the captions) in their working memory rather than snap shooting the entire (video) content (captions + video content). Any cognitive overload from either channel (i.e., visual or auditory; verbal or nonverbal) can jeopardize learning for working memory capacity is exceeded (Jong, 2010).. In light of inter-learner difference in their processing capacity, Mayer (2001, 2005a) proposed the third assumption of CTML: the active processing assumption. Specifically, due to limited processing capacity, learners have to be highly active and selective in determining what is relevant and what to attend to among different modalities (i.e., videos, audio, and captions) in order to learn meaningfully and to benefit from MEL (Mayer, 1996, 1999). This active role learners play during MEL emphasizes the importance of learner control (Hasler, Kersten, & Sweller, 2007) in selecting input during real-time multimodal processing. Since learning is personal and invisible to the eye, individual differences in how learners utilize different input is. 11.

(22) likely to influence their MEL outcomes. It is conceivable that learners may rely more heavily on the input from the modality they prefer in multimodal processing. In the case of using captioned videos to assist L2 listening, which modality of input L2 learners prefer the most and select during multimodal listening is likely to modulate the extent to which they will attend to captions, and ultimately benefit from it. But when the aural input is preferred over the visual input, the advantage of captioning in aiding multimodal L2 listening could be compromised. Accordingly, whether or not L2 learners can take captioning support to their listening advantage depends on their working memory capacity and their preferred modality in multimodal input processing. The theoretical underpinning from CTML helps the present study contextualize the potential role of captioning in enhancing L2 listening; importantly, the effect of captioning in assisting L2 multimodal listening may be modulated by differential learner input processing profiles, in particular with regard to their working memory capacity and preferred modality in input processing (see more elaboration on this issue in subsection 1.4-1.6). However, this has not been examined in existing L2 captions studies. To contextualize our examination of the role of working memory capacity and preferred modality in modulating the effect of captions, a review of existing captions studies is warranted. The following section will review L2 caption studies investigating the effects of full/partial captioning on L2 listening comprehension.. 12.

(23) 2.2 Empirical Studies of Captioning on L2 Listening Defined as bimodal (audio and visual) presentation of texts delivered via videos in multimedia, captioning is believed to facilitate the development of various listening skills to enhance L2 listening. When input is enhanced through the provision of captioning, it makes indistinct auditory text salient to L2 learners (Chang, Tseng & Tseng, 2011; Danan, 2004). Such saliency helps visualize word boundaries and parse the aural streams, making multimodal input more comprehensible (Bird & Williams, 2002; Garza, 1991; Winke, Gass, & Sydorenko, 2010). It also provides appropriate scaffolding to L2 learners when input is slightly beyond their proficiency level (Danan, 2004), and thereby increases learner motivation by reducing anxiety during listening (Vanderplank, 2010). Accordingly, captioning facilitates the process of decoding and understanding L2 content, and has the potential to generate a positive mindset towards L2 listening (Bird & Williams, 2002). With the prevailing belief that captioning benefits L2 listening, many empirical findings obtained from experimental designs have also lent support to such effect . An earlier quest made by Garza (1991) used full captions in videos to investigate whether advanced L2 learners of Russian and English would achieve higher listening comprehension. The results showed that caption group significantly outperformed their. 13.

(24) counterparts in the content-based comprehension test. Specifically, when captions were provided, L2 learners of Russian had higher gains on comprehension than L2 learners of English. Similar benefits from captioning are also shown in Huang and Eskey’s (1999) study. This study exhibits that intermediate ESL leaners who have watched television series with full captions scored significantly higher in the listening comprehension test. In a more recent study, Winke et al. (2010) asked a group of 2nd and 4th year Spanish learners to watch three selected videos twice with or without full captions. Then L2 listening comprehension was measured by using multiple choice tests. A significant difference was found between caption and no caption group, suggesting that the exposure to captions substantially aided the understanding of audiovisual materials. The findings are also in line with Mayer’s CTML, which stipulates that when multimodality reinforces one another during input processing, the impact it has on L2 listening is substantial. Other lines of research that compare full and no captions with other caption variations also support the prominence of captioning. Several studies have found that L2 intermediate learners with the support of L1 subtitles and L2 captions significantly outperformed the no-caption group in listening comprehension tests (Markham et al., 2001; Hayati & Mohmedi, 2011; Markham & Peter, 2003). In a more recent study, Mirzaei, Meshgi, Akita, and Kawahara (2017) compared the effect of two captioning. 14.

(25) conditions i.e., full (FC) and partial synchronized (PSC) with no captioning on 58 Japanese learners of English with varying command of the target language. The quantitative results corroborate with the previous findings (Markham et al., 2001; Markham & Peter, 2003; Winke et al., 2010), suggesting that the presence of captions, regardless of either FC or PSC, significantly aids listening comprehension. On a larger scale, Perez et al. (2013) conducted a meta-analysis with an aim to determine the overall effectiveness of full and partial captioning on L2 listening. Although a large effect of full captioning on L2 listening comprehension was found (Hedges’ g = 0.99), it was noted that different L2 proficiency profiles could account for variation of the effect size (beginning: g = 0.6; intermediate: g = 1.6, p < .001; advanced: g = 0.7).1 Differences in participants’ proficiency profiles may lead to comparability issues, as L2 learners of various command in the target language process input differently (Leveridge, 2015). Consequently, this difference may potentially modulate the effect of captioning. Further insights into how differential L2 proficiency profiles may adjust the effect of captioning and other MEL devices are warranted. This insightful research attempt will help explain why the effect of captioning is not equivocal—an issue to be turned to in the following subsection. 2.3 The Role of L2 Proficiency Profiles on Captioning in L2 Listening. 1. Perez et al. (2013) noted that this finding should be interpreted with care due to unproportioned inclusion of studies (beginning: intermediate: advanced = 3: 6:1) 15.

(26) As shown in the meta-analysis by Perez et al. (2013), there is still a lack of consensus on the effect of captioning, for it may not be facilitative to L2 learners with different proficiency profiles. In an effort to bridge this gap, Taylor (2005) investigated whether differences in the length of L2 (Spanish) study would modulate the effectiveness of captioning on L2 listening comprehension. Two groups of Spanish beginners were divided into captioning and no captioning session, and the data was divided into first-year learners versus 3-year learners. t test results from between-group comparison revealed that L2 learners with longer (3-year) study outperformed those in their first year in the captioning session. Importantly, between-group comparison from first-year learners showed that not only did captioning not aid their listening comprehension, it was in fact detrimental to their understanding of the audiovisual materials. This significant difference (p = .03), however, was not shown in third-year learners (p = .49), indicating that captioning was neither harmful nor beneficial to their listening process. Retrospective data subsequently revealed that captioning was more distracting to first-year students when attending to all three modalities (i.e., video, audio, and captions), yet all learners expressed positive attitude towards visual texts. Findings from Taylor (2005), therefore, not only sheds light on the possible modulating effect of different L2 proficiency profiles, but also raised an issue as to whether, with longer L2 exposure, captioning in videos can be a valuable scaffold to comprehension for. 16.

(27) beginners. Contrary to the findings in Taylor (2005), Lwo and Lin’s (2012) study found that captioning is in fact more beneficial to the less competent L2 learners. This study examined whether different target language competency (determined by their achievement test results) would benefit from different types of captioning (i.e., Chinese captions, English captions, Chinese plus English captions, and no captions). Results indicated that when target language competency was considered, the less competent benefited from viewing only English or Chinese plus English captions, whereas their counterparts did not receive any facilitative effect in the listening comprehension results. Such findings also correspond with the study by Chang et al. (2011), as it established that employing captions improved low English proficiency learners’ listening comprehension as compared to their more advance counterparts. Mixed results from the above studies echoed Markham’s (1989) findings, suggesting that captioning support may benefit L2 learners of different proficiency profiles to different extents. However, findings on this difference have been anything but consistent; exactly which group of learners (e.g., advanced vs. beginning) would benefit the most from captioning is still an unresolved issue (Winke et al., 2010). Which learner profile factor actually modulates the effect of captioning requires more empirical validation for two important reasons. First, there is a lack of methodological. 17.

(28) homogeneity in determining the level of L2 command (e.g., achievement. tests,. proficiency tests, length of study, and other individually certified criteria) in existing studies. Such inconsistency does not shed light on which demographic factor(s) actually modulate(s) the pedagogical potency of captioning. Importantly, the inconsistency in methodological instruments poses comparability issue as to whether the results are comparable and generalizable to L2 learners of different proficiency profiles. Second, idiosyncratic L2 command obtained through offline measurement (e.g., achievement tests) – as seen in the studies reviewed above in this subsection – may not reflect their online information processing while viewing captioned video, which significantly underlies the nature of L2 listening. In this respect, factors that are more straightforwardly related to this process are more likely to determine the degree to which learners utilize captions in viewing videos. L2 learners’ input processing profiles, therefore, may provide invaluable answer to unravel this quest. The present study, therefore, selects two variables that are highly relevant to the nature of L2 listening—the functioning of working memory capacity and idiosyncratic modality preferences—to investigate the extent to which both variables modulate the effect of captioning. First, since listening is transient in nature, how automatic L2 learners simultaneously decode and encode multiple input will determine the amount of information that can be stored in their working memory for comprehension purposes.. 18.

(29) Second, the multimodality of captioned videos requires efficient input selection; in this case, L2 learners’ preferred input modality will most likely affect their caption viewing behaviors, and ultimately influence its listening effect. Therefore, to optimize the effectiveness of captions, more research investigating the aforementioned two factors is warranted.. 2.4 Working Memory and its Potential Role in Modulating the Effect of Captions Working memory, a cognitive system that is limited in maintaining and storing information in the short term, is a multicomponent concept that underlies human thought process in learning (Baddeley, 2003). Such information processing is not only important in L2 learning, but also essential to comprehending and producing language (Miyake & Friedman, 1998). In particular, this cognitive capacity has been established as a major source of individual differences in L2 learner profiles (Révész, 2012). As a result, the role of working memory capacity has received considerable attention in bettering the understanding of L2 learning. Among the many theoretical models of working memory, Baddeley and Hitch’s (1974) proposed model is the most widely accepted and cited. Its cognitive architecture consists of an executive control which subsumes two slave systems: (1) the phonological loop and the (2) visuo-spatial sketchpad. While the phonological loop is. 19.

(30) responsible for storing and processing verbal and acoustic messages, the visuo-spatial sketchpad is in charge of information concerning visual and spatial elements. The executive control operationalizes complex cognitive activities such as focusing and allocating attention; activating, inhibiting, and selecting different processes; and policing information traffic between short-term storage and long term memory. The commonality that weaves the three components together is that they are all limited in capacity. How efficiently each system can complete a task largely depends on whether they can deal with other activities that need to be complete. For example, during listening, since articulation takes place in real time, it increases the items that need to be rehearsed in phonological loop. In this case, the more incoming items, the more likely the first item will decay before it can be rehearsed (Révész, 2012). The second principle of Mayer’s CTML—Whether or not listeners are able to process multisensorial and multimodal input efficiently and with automaticity—is based on this theoretical premise regarding the constraint of the phonological loop and the visuospatial sketchpad. In the multimodal context of captioned videos, such cognitive capacity may become a determinant to whether captioning is facilitative to the depth of listening comprehension. Because such activity takes place under time pressure, it is conceivable that greater working memory capacity enables more efficient processing of multiple. 20.

(31) input. Empirical evidence has shown that individuals with higher scores in working memory test—an indication of having greater working memory capacity—have better control over what they attend to, exhibiting more dexterous skills during information processing (Colflesh & Conway, 2007). More control in processing indicates more flexibility in utilizing verbal information, and thereby enabling deeper input processing to occur (Révész, 2012). Consequently, L2 learners are better able to retain the comprehended materials for longer period of time. Hence the effect of captioned videos is highly dependent on the cognitive capacity of working memory in maintaining and processing multiple incoming items. And since such limited capacity (Engle, Kane, & Tuholski, 1999; Mayer, 2001,2005a, 2005b) varies from one person to another, it can then be hypothesized that individual differences in working memory capacity is likely to lead to differential degree of listening success during real-time information processing. In light of the aforementioned rationale, more research on the mediating role of working memory capacity in L2 listening is needed in order to optimize the effect of captioning in videos. In the following section, we will discuss another important factor that may modulate the effect of caption, preferred input modality (i.e., audio and visual) in online input processing. As noted above, whether multimodality in captioned videos is positive or negative to L2 listening may depend on L2 learners’ input modality predilections during. 21.

(32) real-time input processing.. 2.5 Preferred Input Modality and its Potential Role in Modulating the Effect of Captions To benefit from captions in multichannel listening, L2 learners have to make the best of their attentional resources while actively process all three modalities (i.e., audio, video, and texts). The biggest challenge lies in the fact that, as pointed by Taylor (2005), L2 learners need to efficiently process input from different modalities, and then integrate it with existing schema or knowledge (Al-Shehri & Gitsaki, 2010). What L2 learners would draw on to a greater extent during this multimodal process is further complicated by L2 learners’ idiosyncratic modality preference in real-time processing; L2 learners are more likely to focus more on the channel of information that is more preferred for parsing and for comprehension (Leveridge & Yang, 2012). The possibility that L2 learners may process input in light of their preferred modality implies that L2 learners would learn best when their preferred input modality matches with what is presented to them (Oxford, 2003). Under this circumstance, visual learners may comprehend better under captioned viewing condition due to their reliance on the visual support of captioning. Auditory leaners, on the other hand, may not benefit as much from under captioned viewing condition because captioning (presented. 22.

(33) visually) may not be of their preference during input processing. In this respect, their real-time processing may not be affected by the presence or absence of captioning. Additionally, different modality preferences may lead to excessive information processing from the preferred channel, which could consequently, impose extra load on L2 learners’ limited input processing capacity. Differential modality preferences, therefore, should be considered as an important factor that could modulate the benefits of captioning, which may in turn, exert various listening outcomes. To optimize the effect of captioning, an accurate understanding of L2 learners’ caption viewing behaviors is helpful, especially in the case of exploiting different input combinations to benefit L2 listening (Leveridge & Yang, 2012; Sun & Dong, 2004). Nevertheless, questions regarding how L2 learners of different modality preferences process captions or what they attend to have gone unanswered (Winke et al., 2010). To uncover L2 caption viewing behaviors during listening, an understanding on learners’ visual and auditory preference for processing linguistic input is needed. Thus far, many researchers have employed retrospective techniques such as questionnaires or selfreports (see Reid, 1995 , for examples of such surveys; see also Scarcella & Oxford, 1992) to gain an understanding of L2 learners’ self-perceived modality reliance. However, results from offline measurements may not accurately reflect on the online functioning of input processing, which is often left unopened to L2 learners’ conscious. 23.

(34) state of mind during listening (Leveridge & Yang, 2013; Winke et al., 2013). To accurately assess visual-auditory preferences in language learning, Leveridge and Yang’s (2013a, 2013b) caption reliance test (CRT) offers a time-sensitive assessment that provides a relatively reflective and accurate measure of L2 learners’ real-time modality preferences. In their study, the CRT was used to assess EFL high school learners’ reliance on captions while viewing captioned videos. The results exhibited individual variance in the degree of reliance, with lower-level learners relying more on captions than their higher-level counterparts. Also, a negative correlation was found between caption reliance and their listening comprehension scores; EFL learners with a heavier reliance on captions (visual learners) tended to score lower in their listening comprehension tests. The insights from this study, therefore, demonstrates the intimate relationship between modality reliance and L2 listening, which again stresses its importance in affecting the effect of captioning in multichannel processing. Furthermore, the application of CRT also helped Liu and Todd’s (2014) study to identify preferred learning styles (i.e., visual and auditory) as a modulating factor in determining the effect of dual-modality input in enhancing L2 learners’ reading comprehension and vocabulary acquisition. The study found a selective effect of dualmodality input on L2 learners’ performances, with modality preferences as a key mediating variable. With both studies demonstrating the importance of differential. 24.

(35) preferred modality, the reliability of CRT is confirmed as a plausible measurement to activities that are carried out in real-time manner. In this vein, the present study administers the CRT test to determine L2 learners’ preferred modality, and explore how this individual variance modulates the effect of captioning on L2 listening in a multimodal environment (i.e., captioned video).. 2.6 Summary This literature review has provided a critical appraisal of theoretical underpinnings and empirical evidences to explore how the effect of captioning may be potentially modulated by L2 learners’ working memory capacity and modality preferences. To summarize this study, the listed followings are the main ideas based on the studies reviewed above:. 1.. Captioning (full or partial) in videos is a multimodal processing experience that has been theoretically motivated and empirically established to enhance L2 listening comprehension.. 2.. Nevertheless, existing L2 caption studies show that captioning exerts a selective effect on L2 learners with different proficiency profiles, and it is still unclear as to which learner profile factor actually modulates the effect of captioning.. 25.

(36) 3.. Since the modulating factors investigated in the previous studies were not directly related to how L2 learners actually process input during listening, different input processing profiles, such as differential working memory capacity and input modality predilections, should be empirically investigated to optimize the effectiveness of captions.. Investigating how different input processing profiles may potentially affect the effect of captioning will shed light on optimal differentiated instruction when multimedia (in this case captioned videos) are used to assist L2 listening. This insight may help teachers tailor their instructions and/or use different video materials (with and without captioning) to meet different individual needs during their multimodal L2 listening. By understanding internal variance and its relationship with captioning support on L2 listening, the present study hopes to shed more light onto this issue in search of optimal listening outcomes in a multimodal-enhanced environment.. 26.

(37) CHAPTER 3 METHODOLOGY 3.1 Participants This study included 84 college-level learners of English from a local university in northern Taiwan. All participants were native speakers of Mandarin and had learned English as a foreign language (EFL) in their 12-year formal education. To ensure target language comparability, the participants were required to have a proficiency test (i.e., TOEFL, TOIEC, IELTS, or high-intermediate on GEPT) score that was equivalent to the B2 level of Common European Framework of Reference for Language (CEFR). Prior to the study, the participants were already familiarized with the functions and appearance of captioning support, for it is widely used in various English learning settings enhanced with multimedia.. 3.2 Materials 3.2.1 Video Selection The video used in this study was obtained from TED (Technology, Entertainment, and Design), a multifunctional platform that provides authentic listening materials. This feature is particularly useful for L2 learners who lack exposure to the target language outside of the classroom, such as the participants in this study. Hence, TED is. 27.

(38) considered a popular and suitable learning resource for L2 learners in an EFL setting. Another educational value of TED provides real-time captioning and oral transcripts. L2 learners are free to utilize these functions, making their multimedia learning experience more flexible and accessible. In a nutshell, the popularity, suitability, and accessibility of TED made it an ideal database for selecting video materials. The aforementioned rationale propelled this study to select a TED talk video based on the following three criteria suggested in Mirzaei et al. (2017): (1) linguistic difficulty level (2) accent (3) topic familiarity. First, the linguistic difficulty of the video was compatible with the participants’ actual proficiency level, for employing levelappropriate materials is crucial in generating reliable and valid L2 learning outcome in caption studies (Perez et al., 2013). Videos that contained low frequency words or academic specific terms were avoided to prevent misinterpretation or confusion (Mirzaei et al., 2017). Second, the speaker’s accent in the video is standard American English, for it is the most popular and common accent used in the participants’ English learning experiences. Third, the topic of the video had to be familiar to the participants, for L2 learners’ background knowledge is found crucial in modulating the listening outcome (Bloomfield, Wayland, Rhoades, Blodgett, Linck, & Ross, 2010; Gebhard, 2000) Adhering to the above three criteria, this study utilized a level-appropriate TED talk. 28.

(39) video on successful leadership—a topic closely familiar to the participants—as the viewing material. The speaker provided an analysis of the patterns collected from several successful leaders’ effective communicative behaviors. Along with humor and character, the content of this video was considered captivating and educational for the participants in this study. The total length of the video is roughly 11 minutes, with average speech rate of 4.18 syllables/second. The video was played on a projector screen in a noise-free classroom setting.. 3.2.2 Caption Viewing Conditions Two caption modes used in this study were: full captions and no captioning. When the video was played, half of the participants received a full captioning of the corresponding oral text displayed at the bottom of the screen along with the video content, while the other half only viewed the video without captioning support. The video was played on an online viewing platform called Youtube, which provided the above two captioning options at viewer’s disposal.. 3.3 Design A 2x3x2 factorial design was used in this study to explore the impact of two withinsubject independent variables (working memory capacity and modality preferences) on. 29.

(40) one dependent variable (listening comprehension scores) under one between-subject independent variable (two caption modes). A randomized block design (see Figure 2) was employed to equally assign participants to the experimental (caption) and control (no caption) groups. By doing this, factor variability was controlled to ensure group. Figure 2. A visual schematization of the grouping procedure. comparability prior to the study. The grouping procedure began with all participants receiving (1) a CRT to measure their preferred input modality and (2) a reading-span task (RST) to determine their working memory capacity. After being assessed, all participants were first distributed into two groups based on their CRT results (visual or auditory learners). Then according to the ascending order (lowest to the highest) of RST scores, a number was assigned to each associated participant (e.g., the participant with 30.

(41) the lowest RST score is number “1”, the second lowest is number “2”…). Once a number was assigned to each participant, the odd numbered were put into the no caption group, and the even numbered were put into the caption group. As Figure 2 shows, this grouping process helped guarantee no statistical significant difference between the caption and the no caption groups in terms of their RST scores, which in turn, resulted in comparable data for later analysis. In addition, to make inferences about the modulating role working memory capacity played in this study, the participants were dichotomously distributed into two groups (i.e., high and low) according to the 50th percentile converted from their RST scores. Scores below 50th percentile were categorized as “low”, while scores above 50th percentile were categorized as “high”. Under both experimental and control conditions, the participants received a listening comprehension test to assess their understanding of the viewing materials. A questionnaire was distributed at the end of the study to collect information regarding the participants’ background information, their language learning experiences, and their perceptions on the tasks.. 31.

(42) 3.4 The Assessment Tasks and Scoring The instruments that were employed in this study were: (a) the Caption Reliance Test (CRT); (b) a reading-span task (RST); (c) a listening comprehension test; (d) a post-study questionnaire.. 3.4.1 The CRT This study used the CRT to determine participants’ modality preferences in realtime input processing based on two criteria: suitability and practicality. First, although the CRT was originally developed to measure L2 learners’ reliance on captioning (Leveridge & Yang, 2014), it was later modified and used in Liu and Todd’s (2014) study to determine L2 learners’ modality preferences online (i.e., visual and auditory). Second, the items and length of the CRT can be easily adjusted to different kinds of test takers (e.g., an individual or groups of participants) as well as the testing environments (e.g., laboratory and classrooms). The CRT was, therefore, suitable for the investigation of this study and also highly practical for researchers and teachers with different user purposes. The CRT consisted of items that involved two successive components: (1) watching a captioned video about a short dialogue between a man and a woman and (2) answering a multiple-choice comprehension question. The videos and questions were. 32.

(43) level-appropriate and suitable for the participants in this study. To prescribe modality preference online, the CRT featured congruent and importantly, incongruent items. In the congruent items, the audio texts and captions were identical; in the incongruent items, there was a one-word mismatch between what the participants heard and saw:. As the example shows, participants simultaneously saw “meditation” and heard “medication” while viewing the captioned video. Under this circumstance, there was no right or wrong answer to the succeeding comprehension question, as both options appeared in the test item below:. Question: what can be the man’s solution to his problem? (A) Meditation. (B) Regulation. (C) Medication. (D) Recitation. Following Mayer’s CTML, it was assumed that visual learners were more likely to attend heavily to captions and thus gave answers based on what they see. In this respect, they were more likely to choose (A) from the example above. Auditory learners, on the other hand, may be more likely to select (C) for they preferred to rely more on input presented in sound form. To accurately assess preferred modality online, it was imperative that the participants were unaware of the incongruent items during the CRT.. 33.

(44) To prevent this phenomenon from happening, this study adopted a 75% congruent and 25% incongruent ratio based on the guideline prescribed by Leveridge and Yang (2013). The participants’ performance on the incongruent items was the focus of the succeeding analysis to determine their online modality preference. Based on the scoring policy used in Liu and Todd (2014)—another study that also utilized the CRT—one point was awarded under the “visual modality” for choosing (A), and one point was awarded under the “auditory modality” for choosing (C). After totaling the points for either modality, each participant’s preferred modality (PM) index value was calculated by dividing the visual modality points by the auditory modality points (see Table 1).. Table 1 CRT Score Ballet Sheet Visual. Non-. Auditory. PM. Preferred. modality. target. Modality. index. modality. Scores. Scores. Scores. value. Participant # 1. 7. 0. 5. 1.4 >1. Visual. Participant # 2. 3. 0. 9. 0.3 <1. Auditory. 34.

(45) The participants relying more on visual input, such as captions, achieved a PM index value larger than 1 (e.g., 7 visual modality points / 5 auditory modality points = PM index value > 1), while those attending more to the audio texts achieved a PM index value of less than 1 (e.g., 3 visual modality points / 9 auditory modality points = PM index value < 1). This categorization, consequently, divided participants into groups of two: (1) visual learners (PM value >1) and (2) auditory learners (PM value >1). As for the outliers, if the participants chose answers from neither captions nor auditory text (answers other than “medication” or “meditation” from the above example), one point was given under the “non-target” answers. If there were two or more “non-target” points under the ten incongruent items, it suggested that the participants did not consistently choose the appropriate cues—either the auditory or visual (captions) cues—to comprehend the video content. This indicated that the CRT may be too difficult for the associated participants; in this case, their performance data were excluded from the analysis. To ensure the content validity of the CRT, we have consulted three experts who are professional and experienced in the field of TESOL. Also, British National Corpus (BNC) was used to ensure the difficulty level of the CRT matched with the participants’ L2. In addition, a pilot test revealed a close-to-even ratio between visual and auditory learners, which was almost identical to the ratio obtained from the actual results in this. 35.

(46) study. The identical ratio between the pilot and the experimental outcomes showed that the reliability of the CRT was secured.. 3.4.2 Reading-Span Task (RST) Many measures have been used to assess L2 learners’ working memory capacity, but not all of them can provide a complete picture of such a ‘dynamic' memory system. This system, according to Baddeley (2003), is responsible for storing, rehearsing, and simultaneously processing multiple input in real-time manner. To orchestrate the above dynamic processes, L2 learners have to rely on their executive (attentional) control and rehearsal mechanism, making both aspects of the functioning of working memory highly crucial in determining which measure to choose from. To date, there are three working memory measures that were designed to tap into the functioning of the executive control and rehearsal mechanisms: counting span, operation span, and reading span tasks. In the current study, the reading span task (RST), originally developed by Daneman and Carpenter (1980), were used in consideration of practicality issues. Unlike the other two working memory assessment tasks, the RST can be easily implemented to a group of students in classroom settings (Turner & Engle, 1989), allowing instructors to efficiently make inferences about their students’ working memory capacity for pedagogical purposes.. 36.

(47) Typically administered on a computer, the RST had 12 trials in total. Each trial featured a sentence set consisted of 2-5 sentences. The RST used in this study included 3 trials consisted of 2 sentences, 3 trials consisted of 3 sentences, 3 trials consisted of 4 sentences, and 3 trials consisted of 5 sentences. Corpus of Contemporary American English (COCA) was used to determine the frequency of the sentence words to ensure that the difficulty level of the sentence words was appropriate for the participants in this study. All the sentences were composed of 13-16 words and were therefore, comparable in length (sentence length range: 12-17 words, 20-22 syllables, 55 to 73 letters). The maximum presentation time for each sentence was 6.5 seconds. All sentence sets were randomly presented to the participants on a computer screen to deconfound order effects (Engle, Cantor, & Carullo, 1992). The first trial of RST began with the participants reading a sentence on the projector screen (e.g., Emily slapped the face of a ridiculous sky). Then a slide consisted of yes or no was presented to them to determine whether the sentence made sense or not (Turner & Engle, 1989). Immediately after responding yes or no, an isolated letter appeared in 1-second delay, while the participants attempted to retain the letter in their minds for later recall. After repeating the above procedure for all the sentences in a given trial, a question mark (See Figure 3) appeared on the screen to prompt the participants’ recall of all the letters presented in a trial.. 37.

(48) Figure 3. The procedure of a trial in the RST.. The first trial ended when the participants indicated that they had finished recalling all the letters before proceeding to the second trial. A paper-based instrument was used to collect all of the responses in this study. The entire task (12 trials) took roughly 20 minutes to complete. To calculate the RST scores, the number of isolated letters recalled across all trials were totaled (one point for each accurate recalled letter) (Friedman & Miyake, 2004; Turner & Engle, 1989). The final scores allowed the researcher to make inferences about each participant’s working memory capacity, the higher the better. Nevertheless, not all recalled letters were counted and included in the analysis. In order to ensure that. 38.

(49) the participants engaged in both storing (remembering isolated letters) and processing (understanding the sentences) – two critical functions of working memory – during the RST, this study adopted an 80% accuracy criterion from Turner and Engle (1989) for the sentence verification. In their study, those with total score below 80% accuracy rate were excluded from the analysis, for this suggested that the participants might be exclusively focusing on the isolated letters without devoting their attentional resources to processing sentences. This criterion, therefore, helped guarantee that the participants’ attention was directed to the processing and the storing components in the task. To ensure the content validity of the RST, professional suggestions from three experts in the field of TESOL were consulted. Also, British National Corpus (BNC) was used to ensure the difficulty level of the RST matched with the participants’ L2. In addition, a pilot test was administered to establish the appropriate time lapse for sentence presentation and verification.. 3.4.3 The Listening Comprehension Test After watching the video, the participants—irrespective of their video viewing condition (with or without captioning support)—took a listening comprehension test designed based on the listening section of a TOFEL test, i.e., listening to the text provided and answering multiple-choice comprehension questions (see Appendix 2) on. 39.

(50) an answer sheet. This comprehension test contained 15 test items to assess participants’ global, local, and inferential understanding of the content. Each item consisted of four options with only one correct answer. The listening comprehension test was administrated in a classroom setting with the assistance of a computer, a projector, and a screen. The participants heard and read the stem of each comprehension question on the screen. They then read the options on the screen before writing their answers on the answer sheet. Each correct answer was given one point whereas an incorrect answer received no points. A pilot test took place to ensure the validity and reliability of the test items.. 3.4.4 Questionnaire The questionnaire (see Appendix 3) administered at the end of the study aimed to elicit participants’ general experience of watching captioned videos and perception of watching the (captioned) video in this study. Clear instructions were given to ensure that all participants understood the meaning of the five-point Likert scale items. Two versions of questionnaires were distributed to the no-captioned and fullcaptioned participants. For the no-captioned L2 learners, they were given the first version of questionnaire. In this version, the first section contained 8 items with an aim to collect the participants’ general experience of watching captioned videos in their. 40.

(51) everyday lives. The second section contained 4 items to probe into the participants’ video viewing experience in this study, with constructs such as, interest in the task, topic familiarity, level appropriateness of the task, and the degree of mental effort invested to complete the task. For the full-captioned L2 learners, the first section of the given questionnaire was identical to the no-caption counterparts. The second section, however, entailed 10 items to probe into their captioned video viewing experience. Specifically, the second section included items that attempt to elicit how the presence of captions affected the participants’ processing of the multimodal input. At the end with an invitation to elicit participants’ open-ended comments on their experience during task performance.. 3.4.5 Interview To collect participants’ open-ended comments on their (captioned) experience during task performance, a follow-up interview was conducted after the participants filled out the questionnaires. Two professional experts were in charge of analyzing and interpreting the collected data with an aim to triangulate the questionnaire results.. 41.

(52) 3.5 Data Collection Procedure Data collection began with gaining ethical approval, as all participants were asked to sign a consent form after reading the information sheet that explained the overall structure and procedure of the study, the potential benefits of participation, measures taken to guarantee their privacy and confidentiality, and the right to withdraw from the study at any given moment (see Appendix 1). Next, all participants were grouped into two viewing conditions based on their CRT and RST scores (see Figure 2). One week after the CRT and RST test, each group was separately invited to take part in a video viewing session. After viewing the video, a listening comprehension test was administered to assess their understanding of the viewing material. The session ended after the participants completed the post-study questionnaire followed by an interview session.. 3.6 Statistical Analysis STATISTICA 13.0 for Windows was used to generate descriptive statistics for the data in this study. A three-way ANOVA was used to explore the effects of the independent variable (i.e., caption modes) and the moderating variables (i.e., working memory capacity measure and modality preference) on the dependent variable (i.e., listening comprehension scores). The grouping conditions from this study resulted in. 42.

(53) sufficient number of participants per-condition to use three-way ANOVA to generate main effects and interactions among variables. Also, ANOVA was able to run data that contained one interval scale variable (i.e., listening comprehension) and three categorical scale variables (i.e., captions modes, working memory capacity, and modality preference). Independent t-tests were used to compare the means of different conditions. The alpha level of all tests was set at p < .05 as the level of significance for this study.. 43.

(54) CHAPTER 4 RESULTS This study was set out to investigate whether working memory capacity and online preferred modality affected the effect of captioning on L2 listening comprehension. The following sections will present the quantitative and qualitative data respectively. To begin with, an overview of the participants’ captioning viewing behaviors will be disclosed throughout the descriptive statistics. Then the three-way ANOVA results will be displayed, along with the results from independent t-tests for post-hoc analysis. The questionnaire data supported by the interviews will be presented at last. In this study, a total of 84 participants were recruited for the experiment. 12 of them were excluded from later analysis because they reported having already watched the selected (captioned) video. Additionally, they felt that content of the viewing materials was “too easy” for them. None of the above were desirable participant behaviors, which resulted in their absence in later data analysis.. 4.1 Quantitative 4.1.1 Descriptive Statistics Table 2 demonstrates an overview of the participants’ listening comprehension scores vis-à-vis the factor/variable of this study. Although the mean difference between. 44.

(55) control (no caption) and experimental conditions (full caption) is small (M diff. = .005), larger variance is found in every condition with captions than without captions. Notably, such variance in the full caption conditions increases when taking individual differences (i.e., online preferred modality, working memory capacity, or both) into account. As such pattern consistently manifests throughout Table 2, it calls for a more fine-grained investigation on the analysis of the potential effects and interactions among the variables in this study.. Table 2 Descriptive Statistics of the participants’ listening comprehension test scores. Variables. Level of factors. N. Mean. SD. 60. 11.15. 1.62. NC. 28. 11.17. 1.46. FC. 32. 11.12. 1.77. A. 29. 11.55. 1.37. V. 31. 10.77. 1.76. L. 24. 10.91. 1.44. H. 36. 11.30. 1.73. NC*A. 14. 12.07. 1.26. NC*V. 14. 10.28. 1.06. FC*A. 15. 11.06. 1.33. FC*V. 17. 11.17. 2.12. Total CM. PM. WM. CM*PM. 45.

(56) CM*PM*WM. NC*A*L. 4. 11.5. 0.57. NC*V*L. 8. 10.5. 1.30. FC*A*L. 5. 11.8. 0.83. FC*V*L. 7. 10.42. 1.98. NC*A*H. 10. 12.3. 1.41. NC*V*H. 6. 10. 0.63. FC*A*H. 10. 10.7. 1.41. FC*V*H. 10. 11.7. 2.16. Note: Caption Mode (CM), Preferred Modality (PM), Working Memory (WM), No caption (NC), Full Caption (FC), Auditory (A), Visual (V), Low (L), High (H).. 4.1.2 A three-way ANOVA The Shapiro-Wilk test was used to determine the normality of the listening comprehension scores, which yielded a slightly left-skewed distribution (W=.958, p=.039). Since the results did not follow a normal distribution, this study adopted a nonparametric generalized technique (Thomas, Nelson, & Thomas, 1999) to convert the raw data into ranked data. This procedure allowed the research to perform normal parametric tests (e.g., t-test, ANOVA, and regression) on the converted comprehension scores. The results from a three-way ANOVA test revealed that the inclusion of caption mode, online preferred modality, and working memory capacity can explain 27% variance of the listening comprehension performance (R2=.27). Table 3 displays the. 46.

(57) main effects of the three independent variables on listening comprehension, showing that only the participants’ preferred modality significantly affected their listening comprehension scores (p = .02). This significance was found to have a close-to-large effect size (η2 = .09). Notably, the auditory learners significantly outperformed the visual learners in their listening comprehension outcomes, as Figure 4 visually schematizes such contrast.. Table 3 A three-way ANOVA result for the effects of captions modes, preferred modality, and working memory and their interactions on L2 listening comprehension DF. F. p. Partial eta-. Observed power. squared (η2). (alpha=0.05). CM. 1. 0.44. 0.5. 0.008. 0.10. PM. 1. 5.67. 0.02*. 0.09. 0.64. WM. 1. 0.001. 0.96. 0.00002. 0.05. CM*PM. 1. 4.94. 0.03*. 0.08. 0.58. CM*WM. 1. 0. 0.99. 0. 0.05. CM*PM*WM. 1. 5.797. 0.01*. 0.10. 0.65. 47.

(58) 13.5 13 12.5 12 11.5 11 10.5 10 9.5 9. Auditory. Visual. Figure 4. Mean difference between auditory and visual L2 learners’ listening comprehension scores.. The listening advantage of the auditory learners, however, did not always hold true when taking caption mode into consideration. As shown in Table 3, there is an interaction (p = .03; η2 = .08) between the participants’ preferred modality and their caption viewing conditions (i.e., full caption or no caption) on their listening performances. Post-hoc analysis from an independent t-test further confirmed that captions had a significantly disruptive effect on the auditory learners (p = .048). The disruptive effect is visually schematized in Figure 5, showing that when exposed to captions, auditory learners performed substantively poorer than visual learners than without captions. However, without captions, the listening advantage for the auditory. 48.

(59) learners resurfaces, as they significantly outperformed their visual counterparts (p = .00) in the no caption condition. Despite the negative effect of caption imposed on the auditory learners, such an effect was not observed in the visual learners’ listening comprehension under full and no caption conditions (p = .144).. 13.5 13 12.5 12 11.5 11. 10.5 10 9.5 9 No Caption. Full Caption Auditory. Visual. Figure 5. The interactions between caption mode and preferred modality.. Although the effect of captioning did not reach a significant level for the visual learners, a robust variance was observed in their listening performances (SD = 2.12). Such variance implies that preferred modality may not be the only factor that could potentially modulate the effect of captioning. This study found that whether captions were indeed facilitative to L2 listening comprehension was not only depended on preferred modality, but more importantly, working memory capacity.. 49.

人人都需要 “字幕” 嗎?探討不同學習行為 如何影響字幕對英語聽力理解之效度

人人都需要 “字幕” 嗎?探討不同學習行為如何影響字幕對英語聽力理解之效度