• 沒有找到結果。

美國影集的字彙涵蓋量-語料庫分析

N/A
N/A
Protected

Academic year: 2021

Share "美國影集的字彙涵蓋量-語料庫分析"

Copied!
318
0
0

加載中.... (立即查看全文)

全文

(1)國立臺灣師範大學英語學系 碩. 士. 論. 文. Master’s Thesis Department of English National Taiwan Normal University. 美國影集的字彙涵蓋量 語料庫分析. The Vocabulary Coverage in American Television Programs A Corpus-Based Study. 指導教授:陳. 浩. 然. Advisor: Dr. Hao-Jan Chen 研 究 生:周. 揚. 亭. Yang-Ting Chou. 中 華 民 國一百零三年七月 July, 2014.

(2) 亭. 揚. 周. 語 料 庫 分 析. 美 國 影 集 的 字 彙 涵 蓋 量. 碩 士 論 文. 103. 國 英 立 臺 語 灣 學 師 範 系 大 學.

(3) 中文摘要 身在英語被視為外國語文的環境中,英語學習者很難擁有豐富的目標語言環 境。電視影集因結合語言閱讀與聽力,對英語學習者來說是一種充滿動機的學習 資源,然而少有研究將電視影集視為道地的語言學習教材。許多研究指出媒體素 材有很大的潛力能激發字彙學習,研究者很好奇學習者要學習多少字彙量才能理 解電視影集的內容。 本研 究探 討理 解 道地 的美 國電 視影 集 需要 多少 字彙 涵蓋 量 (vocabulary coverage)。研究主要目的為:(1)探討為理解 95%和 98%的美國影集,分別需要 英國國家語料庫彙編而成的字族表(the BNC word lists)和匯編英國國家語料庫 (BNC)與美國當代英語語料庫(COCA)的字族表多少的字彙量;(2)探討為理解 95%和 98%的美國影集,不同的電視影集類型需要的字彙量;(3)分析出現在美國 影集卻未列在字族表的字彙,並比較兩個字族表(the BNC word lists and the BNC/COCA word lists)的異同。 研究者蒐集六十部美國影集,包含 7,279 集,31,323,019 字,並運用 Range 分析理解美國影集需要分別兩個字族表的字彙量。透過語料庫的分析,本研究進 一步比較兩個字族表在美國影集字彙涵蓋量的異同。 研究結果顯示,加上專有名詞(proper nouns)和邊際詞彙(marginal words),英 國國家語料庫字族表需 2,000 至 7,000 字族(word family),以達到 95%的字彙涵 蓋量;至於英國國家語料庫加上美國當代英語語料庫則需 2,000 至 6,000 字族。 i.

(4) 若須達到 98%的字彙涵蓋量,兩個字族表都需要 5,000 以上的字族。 第二,有研究表示,適當的文本理解需要 95%的字彙涵蓋量 (Laufer, 1989; Rodgers & Webb, 2011; Webb, 2010a, 2010b, 2010c; Webb & Rodgers, 2010a, 2010b),為達 95%的字彙涵蓋量,本研究指出連續劇情類(serial drama)和連續超 自然劇情類(serial supernatural drama)需要的字彙量最少;程序類(procedurals)和連 續醫學劇情類(serial medical drama)最具有挑戰性,因為所需的字彙量最多;而情 境喜劇(sitcoms)所需的字彙量差異最大。 第三,美國影集內出現卻未列在字族表的字會大致上可分為四種:(1)專有 名詞;(2)邊際詞彙;(3)顯而易見的混合字(compounds);(4)縮寫。這兩個字族表 基本上包含完整的字彙,但是本研究顯示語言字彙不斷的更新,新的造字像是臉 書(Facebook)並沒有被列在字族表。 本研究也整理出兩個字族表在美國影集字彙涵蓋量的異同。為達 95%字彙涵 蓋量,英國國家語料庫的 4,000 字族加上專有名詞和邊際詞彙的知識才足夠;而 英國國家語料庫合併美國當代英語語料庫加上專有名詞和邊際詞彙的知識只需 3,000 字族即可達到 95%字彙涵蓋量。另外,為達 98%字彙涵蓋量,兩個語料庫 合併的字族表加上專有名詞和邊際詞彙的知識需要 10,000 字族;英國國家語料 庫字族表則無法提供足以理解 98%美國影集的字彙量。 本研究結果顯示,為了能夠適當的理解美國影集內容,3,000 字族加上專有 名詞和邊際詞彙的知識是必要的。字彙涵蓋量為理解美國影集的重要指標之一, ii.

(5) 而且字彙涵蓋量能協助挑選適合學習者的教材,以達到更有效的電視影集語言教 學。. 關鍵字:字彙涵蓋量、語料庫分析、第二語言字彙學習、美國電視影集. iii.

(6) ABSTRACT In EFL context, learners of English are hardly exposed to ample language input. Television program, combining properties as those in reading and listening programs, is a source of motivating language input for EFL learners. However, television programs have not been widely investigated as a source of authentic materials for language learning. While much of the research suggested there was great potential for learning vocabulary through media exposure, it was intriguing for the researcher that how much learners can comprehend with learned vocabulary coverage. The study set out to investigate what vocabulary size is needed to comprehend authentic American television programs. The purposes of the study are: (1) to examine to what extent do the BNC lists and the latest combination of the BNC lists with the COCA lists reach the vocabulary coverage of 95% and 98% respectively through watching authentic American television programs, (2) to investigate what vocabulary size is needed for different genres to reach 95% and 98% coverage for American television programs in the BNC lists and the BNC/COCA lists respectively, and (3) to investigate the vocabulary not found in the BNC lists and the BNC/COCA lists. In addition, the comparisons between the results of the two sets of lists were also discussed. The scripts of 7,279 episodes of sixty television programs consisting of 31,323,019 running words were analyzed using Range program (Heatley et al. 2002) with the BNC lists and the BNC/COCA lists respectively. Qualitative analysis was also carried out to examine the differences between coverage in television programs by the two sets of lists. The analysis yielded several interesting findings. First, the most frequent word families varying from 2,000 to 7,000 plus proper nouns and marginal words in the BNC lists could provide 95% coverage. In the BNC/COCA lists, a vocabulary size iv.

(7) varying from 2,000 to 6,000 word families would provide 95% coverage for the television programs. To reach 98% coverage, a vocabulary size of 5,000 to over 14,000 word families plus proper nouns and marginal words were needed. Second, as 95% coverage was suggested to be sufficient for adequate comprehension (Laufer, 1989; Rodgers & Webb, 2011; Webb, 2010a, 2010b, 2010c; Webb & Rodgers, 2010a, 2010b), television programs of serial dramas and serial supernatural dramas would be the least demanding. Procedurals and serial medical dramas were the most challenging programs in the present study since they needed a larger vocabulary size to comprehend the television programs. Sitcoms, however, were dependent on the topics, which varied the most among all the genres in the present study. Third, the words Not Found in Any Lists were basically proper nouns, marginal words, transparent compounds, and abbreviations, which were the four lists, added on to the BNC/COCA lists. The two sets of lists included almost complete vocabulary; however, the study also found that the new-forming word, such as Facebook, was not found in the lists, suggesting that the vocabulary be ever-growing in a language. The comparison of the word families in the BNC and BNC/COCA lists providing 95% and 98% coverage showed that both the lists could provide 95% coverage for the television programs at the 4,000- and 3,000-word level respectively. However, with the proper nouns added in the BNC/COCA lists, which could provide 98% coverage at the 10,000-word level, the BNC lists couldn’t provide 98% coverage. The findings also suggested that with the most frequent 3,000 word families plus proper nouns and marginal words, adequate comprehension could occur. Based on the findings in the present study, further pedagogical implications and possible directions for future studies were discussed in detail. v.

(8) Keywords: vocabulary coverage, corpus-driven study, L2 vocabulary learning, television programs. vi.

(9) ACKNOWLEDGEMENT First and foremost, I would like to express my utmost gratitude to my advisor, Dr. Hao-Jan Howard Chen. During the year of graduate study, you showed me many up-to-date tools and the latest development in Computer Assisted Language Learning, which developed my interest in corpus study and helped me come up with my thesis topic. I still recollected not only your enthusiasm and professionalism toward the research, but also the emails full of cares about the study and life. Thank you for your suggestions and support. They led me continuously toward the end of the study. My sincere gratitude also goes to my honorable committee members of the thesis, Dr. Chih-Cheng Lin and Dr. Zhao-Ming Gao. Your insightful comments and valuable opinions made the thesis more organized and informative. Your helpful advices were great assistance in improving and enriching the study. What’s more, I owned my deepest gratitude to my friends and colleagues in different places and schools. Whenever you asked me about the progress of thesis writing, I felt really lucky to have you in my life. Moreover, you pushed me hard to finish my writing and encouraged me all the way towards the end. Lastly, I really appreciate the selfless love from my family. You helped me go through the tough time between work and study. With your encouragement, care and consideration, I could finally finish Mission Almost Impossible. Thank you and I will not let you down by working hard on my career, contributing to the society.. vii.

(10) TABLE OF CONTENTS 中文摘要......................................................................................................................... i ABSTRACT.................................................................................................................. iv ACKNOWLEDGEMENT ...........................................................................................vii TABLE OF CONTENTS ........................................................................................... viii LIST OF TABLES ......................................................................................................... x LIST OF FIGURES .....................................................................................................xii CHAPTER ONE. INTRODUCTION .......................................................................... 1. Background ................................................................................................................ 1 Motivation .................................................................................................................. 2 Purpose of the Study .................................................................................................. 4 Significance of the Study ........................................................................................... 5 Definition of Key Term .............................................................................................. 7 CHAPTER TWO LITERATURE REVIEW .............................................................. 9 Vocabulary Needed for Comprehension .................................................................... 9 Vocabulary Needed in Reading ............................................................................ 10 Vocabulary Needed in Listening .......................................................................... 12 Vocabulary Coverage in Related Texts ................................................................ 16 Vocabulary Coverage for Comprehending Television Programs ............................ 19 Vocabulary Needed in Television Programs ........................................................ 19 Related Texts in Television Programs .................................................................. 22 CHAPTER THREE METHOD ................................................................................ 27 viii.

(11) Corpus Data .............................................................................................................. 27 The Word Lists ......................................................................................................... 30 Range Program ......................................................................................................... 32 Analytical Procedures .............................................................................................. 34 CHAPTER FOUR. RESULTS AND DISCUSSION ................................................ 35. Coverage of Individual Programs ............................................................................ 35 Vocabulary Size to Reach 95% and 98% in the BNC Lists ................................. 35 Vocabulary Size to Reach 95% and 98% in the BNC/COCA Lists ..................... 50 Coverage of Different Genres .................................................................................. 63 Coverage of Different Genres in the BNC Lists ................................................... 63 Coverage of Different Genres in the BNC/COCA Lists ...................................... 68 Coverage of Words Not Found in Any Lists............................................................ 72 Comparison of Coverage Distribution between the Two Lists ................................ 75 Discussion of the Coverage Necessary to Reach Adequate Comprehension........... 77 CHAPTER FIVE CONCLUSION ............................................................................ 81 Summary of the Major Findings .............................................................................. 81 Pedagogical Implications ......................................................................................... 84 Limitations and Future Research.............................................................................. 88 REFERENCES ............................................................................................................ 89 APPENDIX .................................................................................................................. 96. ix.

(12) LIST OF TABLES Table 1 Vocabulary Coverage Suggested in Studies .................................................... 14 Table 2 Studies on Vocabulary Coverage in Television Programs and Movies ........... 21 Table 3 Genres of the Television Programs in the Present Study ................................ 25 Table 4 Genres of the Television Programs and Numbers of the Episodes ................. 28 Table 5 Coverage for Individual Programs of Sitcom in the BNC Lists ..................... 36 Table 6 Coverage for Individual Programs of Procedural in the BNC Lists ............... 38 Table 7 Coverage for Individual Programs of Serial Drama in the BNC Lists ........... 40 Table 8 Coverage for Individual Programs of Serial Medical Drama in the BNC Lists .............................................................................................................................. 42 Table 9 Coverage for Individual Programs of Serial Supernatural Drama in the BNC Lists ...................................................................................................................... 44 Table 10 Coverage for Individual Programs of Serial Criminal Drama in the BNC Lists ...................................................................................................................... 46 Table 11 Numbers of Programs Reaching 95% and 98% Coverage at the Word-Family Levels in the BNC Lists ....................................................................................... 49 Table 12 Coverage for Individual Programs of Sitcom in the BNC/COCA Lists ....... 51 Table 13 Coverage for Individual Programs of Procedural in the BNC/COCA Lists . 53 Table 14 Coverage for Individual Programs of Serial Drama in the BNC/COCA Lists .............................................................................................................................. 55 Table 15 Coverage for Individual Programs of Serial Medical Drama in the BNC/COCA Lists................................................................................................. 56 Table 16 Coverage for Individual Programs of Serial Supernatural Drama in the BNC/COCA Lists................................................................................................. 58 Table 17 Coverage for Individual Programs of Serial Criminal Drama in the BNC/COCA Lists................................................................................................. 60 x.

(13) Table 18 Numbers of Programs Reaching 95% and 98% Coverage at the Word-Family Levels in BNC/COCA Lists ................................................................................. 61 Table 19 Coverage of Different Genres in the BNC Lists ........................................... 64 Table 20 The Percentage in the Most Frequent BNC 1,000 Word Families in Each Genre .................................................................................................................... 68 Table 21 Coverage of Different Genres in the BNC/COCA Lists ............................... 69 Table 22 The Percentage in the Most Frequent BNC/COCA 1,000 Word Families in Each Genre ........................................................................................................... 71 Table 23 Categorization and Frequency of Words Not Found in Any Lists ................ 72 Table 24 Percentage of Coverage for All the Programs ............................................... 76 Table 25 Proportion of Word Levels Providing 95% and 98% Coverage ................... 79 Table 26 Recommended American Television Programs ............................................ 85. xi.

(14) LIST OF FIGURES Figure 1 Retrieving Vocabulary Coverage with Range Program ................................. 33 Figure 2 The Word-Level in the BNC Lists Providing 95% Coverage in Each Genre 66 Figure 3 The Word-Level in the BNC/COCA Lists Providing 95% Coverage in Each Genre .................................................................................................................... 70. xii.

(15) CHAPTER ONE INTRODUCTION. Background In the English as a foreign language (EFL) context, learners of English are hardly exposed to ample language input. The increasing growth of extensive reading programs has become solution to this by having language learners engage in reading a great number of books. These programs did help learners learn second language vocabulary incidentally (Kweon & Kim, 2008). However, these extensive reading programs couldn’t satisfy the need for aural input and, therefore, recent research has put effort into investigating authentic listening for rendering appropriate sources to EFL learners (Bonk, 2000; Brown, Waring, & Donkaewbua, 2008; Penno, Wilkinson, & Moore, 2002). A considerable amount of vocabulary knowledge, as well as listening skills, were proved to be gained from the exposure of extensive reading (Pigada & Schmitt, 2006) or listening (Vandergrift, 2007). Television program, combining properties as those in reading and listening programs, is a source of motivating language input for EFL learners. However, television programs have not been widely investigated as a source of authentic materials for language learning. Since much has changed for language learners today, the accessibility of the target language, of course, is easier through mass media and all sorts of communication. Watching authentic television programs can be a potential option for language learning. Television programs, the audio-visual input is now as accessible as written materials. The accessibility of the potential resource cannot be ignored (Sherman, 2003). While written materials may still be powerful, movies, televisions, computers, and mobile devices saturate our lives with texts, images and audio that tell stories around the world (Chamberlin-Quinlisk, 2012). Television programs, nowadays, are not necessarily watched on standard 1.

(16) television. With the increasing use of Internet, the online television industry is growing rapidly. Many of the movies as well as network television series can be watched online with a great number of commercial suppliers (Waterman, Sherman, & Ji, 2013). Being the long-term popularized medium, together with the increasingly use of Internet and handheld devices, television programs are sure to be an informative and entertaining language input.. Motivation Television programs were identified as having the positive attribute of authenticity, as well as the books in extensive reading programs (Rodgers & Webb, 2011). While Widdowson (1998) challenged the use of authentic language in the classroom which itself couldn’t provide contextual cues for language learners that he suggested it was “largely impossible” to use authentic language, media language does reflect what is going on in language which could be seen as authentic materials for language learners (Tagliamonte & Roberts, 2005). In fact, lately, researchers actually did apply methods and approaches from the study of conversations in real life to the study of dialogues written for television programs (Herman, 1995; Norrick & Spitz, 2010; Quaglio, 2009b; Richardson, 2010; Tagliamonte & Roberts, 2005). The methods included ethnography and conversation analysis. Data in television programs can provide interesting and informative for study in the field of sociolinguistics as well as in TESOL field. Since television programs caught the audience’s eyes to enjoy the thrill of the real thing, the inducement should be the authenticity itself, in which “native speakers engage in speaking for purposes other than to teach their language” (Rings, 1986). In addition, learners’ motivation to learn English through watching videos is quite 2.

(17) important. Learners may be motivated through watching movies or television programs, which could also be a valuable source for language input (Chapple & Curtis, 2000). Indeed, if suitable television programs are chosen, which are goal-oriented and tailor-made for learners’ learning needs and proficiency level, pleasurable language learning opportunities are provided through watching videos (King, 2002). For learning language as discourse in spoken language, “vocabulary must be a concern as much as any other aspect of language form” (McCarthy & Carter, 1994). The dialogues in television programs offer a great potential for incidental vocabulary learning since which provide fairly accurate examples of “the relationship between certain structural forms and their functional correlates for EFL purposes” (Quaglio, 2009a). The variable lexical density is one of the characteristics of the contextual dialogues in which language-in-action texts serves the function of providing illustrative examples for demonstrative usage (McCarthy, 1998). Recent research has started to explore incidental vocabulary learning from watching American television drama (Wang, 2012), subtitled animated cartoons (Karakaş & Sariçoban, 2012), discipline-related movies and television programs (Csomay & Petrovic, 2012), and even aural input (Lévesque, 2013). As McCarthy (1998) noted, “The vocabulary of a language is an integrated resource which serves the progression and development of topics and participant goals, and just as importantly, the construction and maintenance of social relations.” That is exactly why the researcher wants to focus on vocabulary learning in television programs.. 3.

(18) Purpose of the Study While much of the research suggested there was great potential for learning vocabulary through media exposure (d'Ydewalle & Van de Poel, 1999; Karakaş & Sariçoban, 2012; Koolstra & Beentjes, 1999; Koskinen, Wilson, Gambrell, & Neuman, 1993; Lévesque, 2013; Neuman & Koskinen, 1992; Rice & Woodsmall, 1988; Wang, 2012), it was intriguing that how much learners can comprehend with learned vocabulary coverage. Such phenomenon in the other way around arouses the researcher’s interest for further investigation. To unveil to what extent can English learners comprehend authentic American television programs with learned vocabulary coverage, the first purpose of the analysis is to examine to what extent do the BNC lists and the latest combination of the BNC lists with the COCA lists reach the vocabulary coverage of 95% and 98% respectively through watching authentic American television programs. The second purpose of the study is to investigate if there is a difference between the vocabulary size needed to reach 95% and 98% coverage of different genres of American television programs in the BNC list and the BNC/COCA list respectively. Television programs are of tons of genres, e.g., sitcom, soap opera, and drama, under which are sub-genres such as action, fantasy, crime, horror, science-fiction, etc. The study attempts to investigate whether different vocabulary coverage exists in television programs in terms of genre. The third purpose of the study is to investigate the vocabulary appearing in American television programs with high frequency but not included in the BNC list and the BNC/COCA list. Also, the differences of the words not included between the three lists are worth investigating. Dialogues in television programs do reflect what is going on with the real language (Tagliamonte & Roberts, 2005). Vocabulary in television programs as well as in spoken language with high frequency might be 4.

(19) encouraging for “empowering the learner to become a natural user of the target spoken vocabulary” (McCarthy, 1998). Hence, identifying the vocabulary which appears frequently in television programs without being listed in the BNC or the BNC/COCA lists would be helpful for English learners. In view of the preceding research purposes, three major research questions to be addressed in the study are as follow: 1. How many words are necessary to reach adequate comprehension in American television programs? 2. What is the vocabulary size necessary to reach respectively 95% and 98% coverage of the different genres? 3. What is the coverage of words Not Found in Any Lists consisting of the vocabulary items in the programs?. Significance of the Study The study is one of the few studies which focus on the vocabulary coverage in television programs. Preceding investigations have explored incidental vocabulary learning from watching authentic television programs or videos either with captions or subtitles (d'Ydewalle & Van de Poel, 1999; Karakaş & Sariçoban, 2012; Koskinen et al., 1993; Neuman & Koskinen, 1992; Rice & Woodsmall, 1988; Wang, 2012). Some research has also discovered vocabulary coverage in reading (Hazenberg & Hulstijn, 1996; Hu & Nation, 2000; Laufer, 1989; Laufer & Ravenhorst-Kalovski, 2010; Laufer & Sim, 1985), listening (Bonk, 2000; Nation, 2006), and spoken discourse (Adolphs & Schmitt, 2003). Research on the combination of vocabulary coverage in television programs, however, is relatively sparse in the literature. 5.

(20) Since Webb and Rodgers have done series of studies from 2009 to 2011 on vocabulary coverage by watching authentic language materials, the present study differs in data collection. The data in the study are obtained from television programs, which is different from the previous studies collecting data from movies (Webb, 2010a; Webb & Rodgers, 2009a). Although investigations (Webb, 2010b, 2010c; Webb & Rodgers, 2009b) on vocabulary coverage in television programs have been done to some extent, they focused on the low-frequency vocabulary and how the vocabulary coverage could be gained after learning the list of low-frequency vocabulary in Nation’s (2004) British National Corpus (BNC) word lists. However, in the other way around, the present study focuses on the potential of English learners to comprehend the authentic television programs with target vocabulary lists. In addition, since previous studies investigated the differences of the vocabulary coverage between related versus random television programs (Rodgers & Webb, 2011) and the discrepancies of the vocabulary coverage between the same genre, drama, and unrelated sets of television programs (Webb, 2011), the present study goes further to explore the vocabulary coverage in television programs with different genres, which might provide suggestions for language teachers and learners in a more specific way to choose authentic television programs for vocabulary learning. Rodgers and Webb (2011) have analyzed the scripts of 288 episodes in television programs. Though the data were authentic, the corpus in their study was only composed of six individual television programs and six random sets of programs, which limited to only few genres. The study expanded to size of the corpus by collecting 7,618 television episodes from more different genres to obtain more considerable results. Furthermore, most of the previous studies investigated the vocabulary coverage 6.

(21) based on the BNC word lists. However, the only large and balanced corpus of American English, The Corpus of Contemporary American English (COCA), has recently come into play. The most famous corpus – BNC, used in Webb and Rodgers’ series of studies on vocabulary coverage in television programs, was completed in 1993 and since then it stopped adding texts to the corpus. While most of the corpora stayed static, COCA was constructed from the ground up as “monitor corpus”, which can be used to track the changes in current language (Davies, 2010). The present study not only expands the quantity of data, but also explores the vocabulary coverage in television programs in the BNC/COCA lists, which little literature has done research on. Moreover, the latest word lists combining BNC with COCA will also be investigated vocabulary coverage in American television programs. The differences between the vocabulary coverage found in the two corpora with three lists will be worth discussing since more informative results could benefit language teachers and learners as well.. Definition of Key Term Vocabulary Coverage Vocabulary coverage is the amount of vocabulary a person needs in a foreign language to be able to understand an authentic text (Laufer, 1989). Coverage is a valuable measurement because it provides a target vocabulary size with which a language learner may understand that discourse with reasonable comprehension or guess words from context (Hu & Nation, 2000; Webb & Rodgers, 2009b). Vocabulary coverage has been shown to anticipate the achievement in reading (Hirsh & Nation, 1992), listening (Bonk, 2000), and even both (Nation, 2006; van Zeeland & 7.

(22) Schmitt, 2012). In the present study, vocabulary coverage is used to measure the amount of words language learners have learned to comprehend authentic television programs.. 8.

(23) CHAPTER TWO LITERATURE REVIEW. This chapter is divided into two sections. The first section presents studies related to the suggested vocabulary coverage necessary for comprehension in reading, listening, and related texts. The second section reviews previous studies on vocabulary coverage needed in television programs.. Vocabulary Needed for Comprehension Several studies have investigated language learning through watching television programs in different dimensions. In general, Wang (2012) suggested that by watching American drama, factors such as the authentic language, words in context, and the elements of drama all contribute to the learning of L2 vocabulary. Al-Surmi (2012) also suggested that television programs, especially sitcom, captured the linguistic features between daily conversation, which could be useful for language learning and teaching. Csomay and Petrovic (2012) went further and found out that by watching L2 discipline-related movies and television shows, great potential for incidental technical vocabulary learning occurred. Kuppens (2010) also concluded that L2 learners who frequently watch subtitled English television programs and movies perform significantly better on oral translation tests, which confirmed the incidental language learning from media exposure. Furthermore, MacFadden, Barrett, and Horst (2009) compiled a list of specialized vocabulary that appeared frequently in television programs. The list was found to be useful to gain 1-2% coverage to comprehend particular programs. Since television programs are motivating for language learners to learn a second language, vocabulary coverage may be an issue that how much vocabulary is needed for comprehension of a text or whether the 9.

(24) learners are able to guess words from context in order for incidental vocabulary learning to occur. Whether language learners watch television programs may depend on whether they have the vocabulary necessary to comprehend the programs. Television programs are the combination of reading, listening, and visual input. Little research has studied on the relationship between vocabulary coverage and television programs comprehension. However, coverage research on reading and listening comprehension may provide insight for the present study. From reading comprehension research, though the grammatical structures and background knowledge of subject matters affected comprehension, greatest need was still for vocabulary (Laufer & Sim, 1985). For listening comprehension, though factors such as strategies, may be effective for comprehension, high vocabulary coverage was still needed for good comprehension (Bonk, 2000). Moreover, Webb and Rodgers (2009a) discovered the importance of vocabulary coverage needed by language learners to comprehend movies. However, the coverage needed for comprehending television programs is still unclear. In order to understand television programs, vocabulary coverage needed for reading and listening for comprehension are reviewed in the following sections.. Vocabulary Needed in Reading Vocabulary coverage studies have focused on reading comprehension primarily since Laufer and Sim (1985) started the earliest research on the vocabulary threshold for academic purposes texts, in which they recognized the importance of vocabulary size. Later, studies differed in suggesting vocabulary coverage needed for adequate reading comprehension. Laufer (1989) attempted to find out the vocabulary coverage for EFL learners to comprehend an authentic academic text, in which 95% coverage was suggested for reasonable comprehension. Similarly, at least 95% coverage was 10.

(25) needed before efficiently guessing L2 vocabulary from context with unsimplified texts (Liu & Nation, 1985) and graded readers (Nation & Wang, 1999). However, Hirsh and Nation (1992) suggested higher vocabulary coverage of 97-98% for reading unsimplified texts, short novels, for pleasure. Moreover, Carver (1994), later, even pointed out that 98% coverage wasn’t easy for comprehension. With fiction texts, 98% coverage of vocabulary was needed to comprehend in an unassisted sense for language learners (Hu & Nation, 2000). Nation (2006) reviewed on several types of written texts, such as novels, newspaper, and graded readers, in which 98% was suggested to be the ideal coverage for comprehension. Laufer and Ravenhorst-Kalovski (2010) revisited the vocabulary threshold for reading comprehension of the text with more rigorous research instrument, such as the most updated version of the Vocabulary Profile, than the earlier study. They suggested a minimal one of 95% coverage with an optimal one of 98% for comprehension. The reason was that with the same group of participants, different levels of text difficulty contributed to different percentage of vocabulary coverage. Similarly, Schmitt, Jiang, and Grabe (2011) found a linear relationship vocabulary coverage and reading comprehension. They suggested that “There was no indication of a vocabulary ‘threshold,’ where comprehension increased dramatically at a particular percentage of vocabulary knowledge.” However, 98% coverage was still suggested for reasonable reading comprehension of academic texts. As recent research on vocabulary coverage research, van Zeeland and Schmitt (2012) investigated vocabulary coverage needed for both reading and listening comprehension. For reading comprehension, they concluded that 98% coverage for reading comprehension (Nation, 2006) may be overestimated since the focus was not on full comprehension. 95% coverage, suggested then, might be sufficient for 11.

(26) listening comprehension. Likewise, Schmitt and Schmitt (2012) reassessed the frequency and vocabulary size in second language vocabulary. They labeled the mid-frequency vocabulary between high-frequency vocabulary of the most frequent 3,000 word families and low-frequency vocabulary of 9,000 word families, which Nation and Anthony (2013) showed that the typical coverage of high-frequency as in 3,000 word families, mid-frequency vocabulary as in 4,000 to 9,000 word families with proper nouns accumulated up to over 98% of the running words in the text; the high-frequency vocabulary, together with proper nouns accumulated up to around 95% coverage of the running words. The studies on vocabulary coverage for reading comprehension varied since the difficulty of reading materials and language proficiency of learners varied as well. 95% and 98% coverage were suggested among previous studies and still, could be important measurements since the coverage indicated whether or not learners will be able to understand a reading text.. Vocabulary Needed in Listening Comparing to vocabulary coverage research on reading comprehension, relatively little research investigated vocabulary coverage on listening comprehension. From the earliest research investigating the effects of vocabulary coverage on listening comprehension, Bonk (2000) suggested that coverage far below 95% was enough for listening comprehension with effective coping strategies. He also pointed out that the study focused only on listening comprehension without visual support. That is to say, with the support of images or texts, good listening comprehension may occur with vocabulary coverage lower than 95%. In addition, Adolphs and Schmitt (2003) also suggested coverage less than 95% for comprehending modern spoken 12.

(27) discourse by analyzing the CANCODE corpus. However, higher vocabulary coverage around 96% was suggested in the second study for the spoken component in the British National Corpus. In contrast, Nation (2006) pointed out that if take 98% coverage necessary for comprehension, a 8,000 - 9,000 word-family vocabulary was needed for written text, and lower need as 6,000 - 7,000 word-family vocabulary would be enough for listening text. High-frequency vocabulary was found to be used greater in spoken language than in written one. Similarly, Stæ hr (2009) also suggested 98% coverage for comprehending spoken texts containing listening tests. However, in the latest study investigating vocabulary coverage on both reading and listening comprehension, van Zeeland and Schmitt (2012) showed that even with 90% coverage, both native and non-native speakers achieve adequate listening comprehension. For non-native speakers, 95% coverage was suggested for demonstrating relatively good listening comprehension. Although the listening texts reviewed above varied, researchers suggested similar vocabulary coverage for good listening comprehension, 95% or less. Comparing to necessary coverage suggested for good reading comprehension, spoken discourse required less vocabulary coverage for comprehension since high-frequency vocabulary appeared more often in spoken than in written texts (See Table 1). In addition, Mueller (1980) determined the effect of visual aids on the listening comprehension process. Visual input, in the study, supported listening comprehension and had no effect at the same time on language learners with different proficiency levels. However, later, video-based teaching materials were confirmed as an effective and strong support for enhancing listening comprehension (Nation, 2006; Secules, Herron, & Tomasello, 1992). Moreover, Chapple and Curtis (2000) and Rubin (1994) suggested that visual input in movies and television programs supported listening 13.

(28) comprehension, making it possible that the vocabulary coverage needed for comprehension of television programs, combining both subtitles for reading and sound track for listening, might be lower than the suggested coverage of 95%. Table 1 shows the vocabulary coverage suggested in the previous literature in reading and listening.. Table 1 Vocabulary Coverage Suggested in Studies Study. Text type. Tools for Measuring. Reading. Vocabulary Coverage Liu and Nation. Unsimplified. Counting. (1985). texts. Laufer (1989). Academic. The underlining of the. texts. unknown words and. 95% at least. 95%. translation of the vocabulary lists. Nation and. Graded. The Oxford Bookworms. 95%. Wang (1999). readers. (6 levels of a vocabulary. at least. of 2,410 word families). Hirsh and. Unsimplified. The General Service List. Nation (1992). novels. of 2,000 words. 14. 97-98%. Listening.

(29) Table 1. (continued) Study. Text type. Tools for Measuring. Reading. Listening. Vocabulary Coverage Carver (1994). Textbook. Counting. texts. Bonk (2000). Short stories. 98% at least. Measured with a recall test. 95% below. Hu and Nation. Fiction texts. Measured with a recall test. Adolphs and. General. The BNC. Schmitt (2003). spoken. 98%. (2000). 95% below. discourse. Nation (2006). Written texts. The BNC. Stæ hr (2009). Spoken texts. Measured by the. with listening. Vocabulary Levels Test. tests. 15. 98%. 96-97%. 98%.

(30) Table 1. (continued) Study. Text type. Tools for Measuring. Reading. Listening. Vocabulary Coverage Laufer and. Texts from a. Vocabulary Profile and. Ravenhorst-Kal. standardized. measured by the. ovski (2010). national test. Vocabulary Levels Test. Schmitt, Jiang,. Academic. Measured by a vocabulary. and Grabe. texts. checklist. Short stories. Measured with the. 95-98%. 98%. (2011). vanZeeland and Schmitt (2012). Nation and. 98%. 90%-95%. Vocabulary Levels Test. Various texts. BNC/COCA word lists. 98%. Anthony (2013). Vocabulary Coverage in Related Texts Essentially, vocabulary learning from extensive reading is very fragile. If the small amount of learning of a word is not soon reinforced by another meeting, then that learning will be lost. It is thus critically important in an extensive reading program that learners have the opportunity to keep meeting words they have met before. (Nation, 1997, p. 15). “It may be that narrow input is much more efficient for second language acquisition (Krashen, 2004).” Usually, we, as language teachers, provided learners 16.

(31) with exposure of various topics. However, Krashen (2004) argued that this may be all wrong. As noted by Krashen (1996): Narrow reading means focusing on the work of a single author or reading a great deal about a single topic that the reader is interested in, rather than attempting to read a wide variety of texts. Narrow reading, it was argued, is better because it helps ensure that the input is comprehensible; the reader has advantage of the previous context to help him or her understand the current text.. Corpus-driven studies on narrow reading with related texts compared to unrelated texts showed that with almost equivalent number of running words, reading texts with related topics were likely to have vocabulary of fewer varieties than in unrelated ones, which relatively lowered the vocabulary burden for language learners. Hwang and Nation (1989) investigated the vocabulary in newspaper stories and their follow-up stories comparing. to. unrelated stories from. newspapers.. Low-frequency vocabulary repeated in the related stories more often than the unrelated ones, which not only increased the potential for incidental vocabulary learning, and relatively lowered the encounters of low-frequency vocabulary. They suggested that narrow reading reduced the vocabulary load by decreasing the chance to encounter low-frequency vocabulary. Similarly, Schmitt and Carter (2000) compared vocabulary with the same running words of 7,843 words in a set of stories about Princess Diana and a set of unrelated stories in newspaper. They also found that low-frequency vocabulary repeated more often in the sets of related stories than the unrelated stories. In addition, vocabulary burden was lowered when reading related stories. Gardner (2008) examined vocabulary in fourteen sets of children’s reading materials with the same theme or written by the same author comparing with unrelated stories. Specific vocabulary items were analyzed to determine under what 17.

(32) conditions and how frequently they repeated. Results showed that the theme-related words repeated more often within expository ones while narrative ones repeated more often on with names of the characters and places. “Narrow reading is meant to provide a transition to more advanced input. There are indications that this will happen naturally (Cho, Ahn, & Krashen, 2005)”. They also suggested that language learners in the beginning level were motivated to read continually with narrow reading. Research on narrow reading indicated that stories with related texts were likely to have less low-frequency vocabulary for language learners to encounter than in stories with unrelated ones. Investigations on narrow reading shed light on what materials were more suitable for language learners to gain vocabulary knowledge. In addition, language teachers were also provided with directions on how to choose the materials. Again, reading materials with related texts may be more effective since stories with related topics may lower the vocabulary load when reading. Hence, vocabulary coverage in television programs might differ with different genres. Vocabulary coverage in American television programs might provide with clearer educational clues about which genres of programs are suitable for both language teachers and learners. This section has reviewed studies about the vocabulary coverage needed for comprehension, which indicates that for good comprehension of television programs, vocabulary coverage might range from 90% (van Zeeland & Schmitt, 2012) to 98% (Liu & Nation, 1985; Nation, 2006) or more (Carver, 1994). In the present study, coverage of 95% and 98% were chosen as the lower and upper boundaries for comprehending American television programs since which may possibly represent good comprehension of listening (Adolphs & Schmitt, 2003; Bonk, 2000) combining with reading (Hu & Nation, 2000; Laufer & Ravenhorst-Kalovski, 2010; Nation, 18.

(33) 2006).. Vocabulary Coverage for Comprehending Television Programs Vocabulary coverage, set in the present study, not necessarily guaranteed full comprehension of television programs. However, this may provide insight for language learners whether good comprehension of television programs may occur for incidental vocabulary learning. The following section discusses more about studies on vocabulary coverage in television programs.. Vocabulary Needed in Television Programs A series of studies on vocabulary coverage through watching television programs and movies have been done (Webb & Rodgers, 2009a & 2009b; Webb, 2007, 2010a, 2010b & 2010c). They found out that low-frequency words were encountered rarely in sets of movies and television programs. However, as the number of television programs and movies increased, potential for incidental vocabulary learning could still occur for language learners. Although positive results of learning English through watching authentic television programs were confirmed, vocabulary coverage was still an issue for learners to comprehend the authentic television programs. The earliest research starting to investigate vocabulary coverage on videos was Nation (2006) that he examined how big a vocabulary the language learners need to understand the children’s movie, Shrek. A vocabulary of the most frequent 4,000 words provided 95% coverage while coverage of 98% was reached with a vocabulary of the most frequent 7,000 words from BNC. 19.

(34) Later, Webb and Rodgers (2009a) analyzed 318 scripts of movies with total 2,841,887 running words to determine the vocabulary coverage necessary for comprehension in movies. They classified the movies as either American or British, and then genres. Similar to Nation’s (2006) study on Shrek, 95% vocabulary coverage of different genres was reached with the most frequent 3,000 to 4,000 word families from the British National Corpus (BNC), while 98% vocabulary coverage of different genres was reached at the level of the most frequent 5,000 to 10,000 word families. Rather than analyzing individual genre, Webb (2010a) analyzed 5 sets of movies from 5 different genres in the scripts of 143 movies. The most frequent 2,000 to 4,000 word families provided vocabulary coverage of 95%, and the most frequent 4,000 to 6,000 word families provided coverage of 98%, with one set of genre reached 98% coverage at the most frequent 10,000 word families. Taken together, the three studies shed some light on the vocabulary coverage for comprehension of movies. However, the features and the use of vocabulary might differ from movies to television programs. Hence, Webb and Rodgers (2009b) started to examine the vocabulary demands in television programs. Eighty-eight television programs with 264,384 running words were analyzed according to genre. With different genre, the most frequent 2,000 to 4,000 word families provided 95% coverage, with the most frequent 5,000 to 9,000 word families providing 98% coverage. In addition, they suggested that 95% coverage may be sufficient for comprehension of television programs with reading the subtitles, listening, and support by the strong visual imagery (Nation, 2006). Similarly, Rodgers and Webb (2011) analyzed 288 television episodes with 1,330,268 running words. By comparing six television programs from a single season to six random television programs, 95% coverage was reached at the most frequent 3,000 word families, while 98% coverage was reached from 6,000 to 8,000. The 20.

(35) similar results were showed in Webb’s (2011) study in which he investigated the vocabulary coverage in television programs from the same genre. Table 2 shows Webb and Rodgers’ studies on vocabulary coverage on both movies and television programs.. Table 2 Studies on Vocabulary Coverage in Television Programs and Movies Study 1. Webb (2009a). Content Movies. Study Focus. Word list. The vocabulary size necessary to. The BNC. understand 95% and 98% of the. lists. 318 movies 2. Webb &. Television. Vocabulary coverage and the. The BNC. Rodgers. programs. encounters of low-frequently. lists. (2009b). vocabulary in eighty-eight programs. 3. Webb (2010a). Movies. The number of encounters of. The BNC. low-frequency vocabulary in 143. lists. movies. 4. Webb (2010b). Television. Coverage of low-frequency. The BNC. programs. vocabulary in eight programs from. lists. eight different genres. 5. Webb (2010c). Television. Coverage of low-frequency. The BNC. programs. vocabulary in two programs. lists. 21.

(36) Table 2. (continued). 6. 7. Study. Content. Study Focus. Word list. Rodgers &. Television. One single season of six programs. The BNC. Webb (2011). programs. vs. six sets of random programs. lists. Webb (2011). Television. The same subgenres vs. unrelated. The BNC. programs. ones from different genres. lists. Webb and Rodgers devoted themselves into investigating the vocabulary coverage in movies and television programs, hoping to find out how the low-frequency vocabulary can help language learners reach better comprehension of the authentic language input. However, in their studies, BNC was the only corpus they used to investigate the coverage. In the present study, the latest word lists combining the lists derived from BNC and COCA come into play. Furthermore, the differences of vocabulary coverage in American television programs between the three lists are worth investigating.. Related Texts in Television Programs Related texts affecting comprehension in television programs has been discussed in Rodgers and Webb’s (2011) study. Six individual programs from a single season were compared with six sets of unrelated television programs. With similar number of running words, low-frequency vocabulary was more likely to reoccur in the sets from a single season than in the unrelated episodes. In addition, Webb (2011) went further to investigate the extent to which vocabulary appear repeatedly in the same genre compared to unrelated ones. The 22.

(37) genres in one group were three American dramas: medical, spy/action, and criminal forensic investigation while the other group contained unrelated episodes from different television programs. Similarly, the results showed that the low-frequency vocabulary was encountered more than in the related ones than in the unrelated ones. Webb, then, suggested that watching television programs from the same subgenre could be a potential approach for language learning since “it reduces the lexical demands of viewing”. Since these were the only two studies on comprehension of television programs with related themes, comprehension of reading and listening with related texts will be the focus of the present study. The studies on narrow reading have important implications for language learning. The texts related to the same topic were likely to lower language learners’ vocabulary burden so as to enhance comprehension. Thus, comparing to stories with random topics, learners deal with less vocabulary to get to know a single topic or related topics than to understand unrelated topics. In addition, low-frequency or theme-based vocabulary repeated more with the same topic or related topics. The increasing encounters of the vocabulary items, especially low-frequency ones, appeared more frequently in texts, which assisted language learners’ comprehension. The more often unknown words are encountered in context, the more likely they are to be learned since extensive reading resulted in a great amount of vocabulary learning (Gardner, 2004; Kweon & Kim, 2008; Pigada & Schmitt, 2006; Saragi, Nation, & Meister, 1978). Not only do language learners should focus on narrow reading with related themes, but also should they be exposed to a great amount of input. Television programs, comparing to reading materials, offer language learners to be exposed to a large amount of authentic language input with related texts (Webb, 2011). Three basic genres in television programs were well-defined as procedurals, 23.

(38) sitcoms, and serial dramas (also called soap operas) (Sherman, 2003). Basically, there are always the same main settings and cast in procedurals. A leading storyline penetrates through the procedurals with each episode understood quite well without reference to previous episodes. The background of procedurals are very often set with realistic and professional settings, such as the medical practices, the law firms (e.g., Drop Dead Diva), and the crime scenes (e.g., Crime Scene Investigation). Each episode is considered a case to be solved in a procedural, such as a surgery, a lawsuit or a crime. A crisis usually happens as the climax and the characters come to a solution in the end in each episode. However, “a procedural drama” is defined more clearly as “Shows usually have an episodic format that does not require the viewer to have seen previous episodes. Episodes typically have a self-contained, also referred to as stand-alone, plot that is introduced and resolved within the same episode.” Usually, the episode is called a "case-of-the-week" since the story focuses on dealing with the problems in each episode. Similar to procedurals, situation comedies (hereafter sitcoms), have the same main characters with different comic situations in each episode. Classic sitcom such as Friends is highly comprehensible with up-to-date language and themes. However, sitcoms are complicated in plots, which expose audience to language with a fast pace and a double meaning. Idioms, colloquialisms and slang are often used in sitcoms which language learners may sense difficulties to read between the lines. Serial dramas, as the former term of soap operas, which rely more on story arcs, are typically contrasted with procedurals. Serial dramas usually go in an endless running narrative with several ongoing stories. Unlike procedurals, serial dramas repeat the events and the relationship over and over again, which create more ‘intertextuality’, which means that “Meaning in a text can only ever be understood in 24.

(39) relation to other texts; no work stands alone but is interlinked with the tradition that came before it and the context in which it is produced (Allen, 2011).” Since soap operas or serial dramas unfold the story in a sequential episode-by-episode fashion, audiences rely heavily on previous episodes to understand the whole story. The studies reviewed in this section highlights two points. First, with related themes, topics, or story lines, vocabulary in texts could be possibly enhanced for learners to comprehend the context. Television programs, composed of serialized episodes, take advantage of these features in addition to the great amount of input as extensive reading. However, only two studies focused on the vocabulary coverage in related television programs (Rodgers & Webb, 2011; Webb, 2011). In the present study, based on Sherman’s (2003) definition of the three genres, the American television programs are divided into procedurals, serial dramas (soap operas), and sitcoms. In addition, based on Webb’s (2009a) classification of the subgenres, the present study further divides the American television programs into nine subgenres: dramedy, drama, medical drama, legal drama, crime-thriller drama, supernatural / science-fiction drama, teen drama, period drama, and action drama. Table 3 shows the genres categorized in the present study. The researcher hopes to identify suitable American television programs for language teachers to choose appropriate ones for learners with certain learned vocabulary.. Table 3 Genres of the Television Programs in the Present Study Genre. Subgenre. Sitcom Procedural Serial Drama. Drama + Teen Drama. Medical Drama. Crime-Thriller + Action Drama. Supernatural + Si-fi Drama. 25.

(40) In summary, the review in this chapter has showed that two boundaries set for vocabulary coverage. for. good. comprehension. were. 95%. for. reasonable. comprehension (Laufer, 1989) and 98% for ideal comprehension (Nation, 2006). With the BNC lists and the BNC/COCA lists, whether language learners comprehend authentic television programs as a potential and motivating language resource will be the focus of the present study. In addition, three types of television programs are composed of different nature of intertextuality. The second focus of the present study is to identify the extent of vocabulary coverage in television programs from different genres. This would help, for language learners, to identify the most potential type of television programs for good comprehension, and further for incidental vocabulary learning to occur.. 26.

(41) CHAPTER THREE METHOD. In this corpus-based study, the transcripts of sixty television programs consisting of 31,323,019 running words will be analyzed using Range program (Heatley et al. 2002). In addition, the present study aims to identify the coverage in American television programs to learn English vocabulary in terms of genre. This chapter consists of four sections. The first section expounds the corpus data collected in American television programs. The second section describes the software, Range program, in the present study. The third section delineates the compilation of the two baseword lists, the BNC lists and the BNC/COCA lists, employed in Range. The fourth section explains the analysis procedures.. Corpus Data The transcripts of 7,279 episodes of sixty American television programs were downloaded from TVsubtitles.net, available at http://www.tvsubtitles.net/, and analyzed in the present study. The transcripts downloaded couldn’t one hundred percent guarantee the accuracy of the real dialogues in television programs. However, “the scripts should provide a reliable assessment of the vocabulary in television programs (Rodgers & Webb, 2011).” The programs were chosen according to the following factors: popularity of the programs, genre, and availability of transcripts (Webb & Rodgers, 2009b). For popularity of the programs, TV Guide, the magazine with great credibility in the USA, provided a list of 97 most popular television programs on its website available at http://www.tvguide.com/top-tv-shows. It included both American and British television programs with various kinds of genres, such as comedy, drama, crime, 27.

(42) reality television shows, game shows, etc. Reality television shows and game shows were excluded because of the availability of transcripts. For genre, the basic three types of television programs, procedurals, serial dramas, and sitcoms, were then categorized and analyzed. Subgenres, including drama, medical drama, crime-thriller drama, and supernatural/si-fi drama, then were categorized as in Table 4. The titles of each episode of the television program are listed in the Appendix.. Table 4 Genres of the Television Programs and Numbers of the Episodes No. Title. Episodes. No. Title. Episodes. Sitcoms 1. The Big Bang Theory. 121. 2. Two and a Half Men. 210. 3. How I Met Your Mother. 169. 4. The Simpsons. 420. 5. 30 Rock. 130. 6. Community. 71. 7. The Office. 170. 8. Modern Family. 81. 9. New Girl. 34. 10. Friends. 200. Procedurals 11. NCSI. 218. 12. Person of Interest. 30. 13. Criminal Minds. 169. 14. Bones. 151. 15. The Closer. 108. 16. Castle. 90. 17. Chase. 18. 18. Law & Order: SVU. 260. 19. The Mentalist. 103. 20. CSI: Miami. 232. 28.

(43) Table 4. (continued) No. Title. Episodes. No. Title. Episodes. Serial Dramas Drama 21. Revenge. 57. 22. Brothers & Sisters. 109. 23. Ugly Betty. 85. 24. Dawson’s Creek. 127. 25. The West Wing. 154. 26. Friday Night Lights. 76. 27. Glee. 96. 28. Pretty Little Liars. 93. 29. Gossip Girl. 119. 30. Desperate. 177. Housewives Medical Drama 31. Grey’s Anatomy. 177. 32. House M.D.. 176. 33. ER. 300. 34. Saving Hope. 32. 35. Scrubs. 181. 36. Emily Owen M.D.. 13. 37. Body of Proof. 42. 38. A Gifted Man. 16. 39. Private Practice. 112. 40. Nip/Tuck. 100. Supernatural / Si-fi Drama 41. True Blood. 70. 42. Supernatural. 158. 43. The Walking Dead. 47. 44. Fringe. 100. 45. The Vampire Diaries. 103. 46. Lost. 121. 47. Smallville. 218. 48. Charmed. 178. 49. Angel. 110. 50. Buffy the Vampire. 166. Slayer. 29.

(44) Table 4. (continued) No. Title. Episodes. No. Title. Episodes. Crime-Thriller Drama 51. Breaking Bad. 62. 52. Dexter. 96. 53. Sons of Anarchy. 79. 54. The Wire. 69. 55. The Shield. 88. 56. Justified. 60. 57. Chuck. 91. 58. Nikita. 73. 59. Burn Notice. 91. 60. Leverage. 72. Total episodes. 7,279. The Word Lists The word lists in the present study to determine the 1,000 level at which the words occurred are the BNC lists and the BNC/COCA lists. The BNC is largely written with only 10% of spoken language (Nation, 2004). According to Nation (2006), the 14 lists were based on the range, frequency, and dispersion data of occurrence of words in the BNC. The validity of the word-family lists were checked if they are properly ordered by three ways. First, the number of tokens, types, and families should decrease from the first 1,000 word list to the fourteen 1,000 word list in an independent corpus. Second, since low-frequency words have fewer family members than high-frequency ones, the former with the same number of families tend to have fewer word types than the latter. The third way to check the validity of the lists were to be run over several corpora to make sure wide-range and high-frequency words are not missing in the lists. The word family are the unit of counting in the BNC lists (Nation, 2006), and 30.

(45) the level of the word family was set at Level 6 in Bauer and Nation (1993) defining word families. Level 6 includes inflections (the plural -s; third person singular present tense; past tense -ed; past participle; -ing; comparative -er; superlative -est) and over 80 derivational affixes including -able, -ee, -ic, -ify, -ion, -less, -age, -ant, -ward, circum-, and -y, etc. The high-frequency word-family nature at Level 6 has the following members: natural, unnatural, unnaturally, naturally, natures, naturistic, naturistically, naturalness, naturalist, naturalists, and naturalism. The fifteenth word list in the BNC contains proper nouns and marginal words, such as interjections, exclamations, and hesitation procedures, appearing necessarily in television programs. Proper nouns and marginal words can be easily understood in television programs. Hence, the fifteen word list can be added and excluded in both cases to see the differences. Two important things should be noted that the first one is the BNC lists contain primarily written language with only 10% spoken language, so Rodgers and Webb (2011) suggested that the estimation of coverage might be a little conservative. The second should be noted is that the BNC lists consist primarily of British text (Nation, 2006). The coverage might be conservative since the corpus data are from American programs. At the time when Rodgers and Webb (2011) analyzed related versus unrelated television programs by using the BNC lists, they expected to see if the coverage of American programs could be higher with lists developed from an American corpus. In the present study, COCA comes into play. Different from the BNC, COCA is the corpus of the 400 million words evenly divided between spoken and written language, including fiction, magazines, newspapers, and academic journals (Davies, 2010). Unlike other corpora, COCA is called a ‘monitor corpus’, which monitors the changes of language in the real world. The development of the BNC/COCA lists were almost the same as the BNC lists. 31.

(46) Twenty-five word family lists were developed based on frequency and range data. Four additional lists contains (1) proper nouns, (2) marginal words, such as swear words, exclamations, and letters of alphabet, (3) transparent compounds, and (4) abbreviations. Unlike the BNC lists, the first two 1000 word family lists were made using a specially designed 10 million token corpus (Nation, 2012). Since the previous lists developed from the BNC were strongly influenced by the corpus primarily consisted of written language (Nation, 2004), very common spoken English words were then added on to the high-frequency lists, the first two 1000 lists. The rest lists starting from the third to twenty-five 1000 lists developed based on frequency and range data, same as the way to develop the BNC lists.. Range Program The transcripts were analyzed with the Range software (Nation & Heatley, 2002). This is a computer program used to compare the vocabulary in up to 32 different texts at the same time. For each word in the texts, it provides the information of how many texts the word occurs in, a headword frequency, a word family frequency figure, and a frequency figure for each of the texts the word occurs in (see Figure 1). In the present study, it is used to investigate the coverage in the sixty American television programs by the BNC lists and the BNC/COCA lists. Fourteen 1,000-word-frequency lists were used with the Range program to show which words occur in which level. Nation (2006) created the fourteen 1,000-word-frequency lists in the British National Corpus (BNC). The word families in the lists were rated as Level 6 according to Bauer and Nation (1993). Proper nouns and marginal words such as, ah, oh, and um were listed in the fifteen and the sixteen lists. Words in texts cannot be found in the lists are listed as “Not found in any lists”. 32.

(47) More details on the lists can be found in Nation (2006).. Figure 1 Retrieving Vocabulary Coverage with Range Program. Another set of lists, the BNC/COCA word family lists, expanded the size to twenty-five word family lists. Lists twenty-sixth to thirtieth contain one nonsense word each since Nation and Davies tended to leave room for expanding additional lists. Lists thirty-first to thirty-fourth are (1) proper nouns, (2) marginal words including letters of alphabet, swear words and exclamations, (3) transparent compounds such as airbag and leadwork, and (4) abbreviations such as CSI and FBI. For more information about the word lists, see Nation and Webb (2011). The word lists and Range program can be downloaded from Paul Nation’s website: http://www.victoria.ac.nz/lals/about/staff/paul-nation. Though Range program can count frequency and range, it cannot distinguish homographs. The lists integrated the BNC and COCA corpora, but phrases weren’t 33.

(48) included. Limitations above should be noted in the present study.. Analytical Procedures The transcripts of the fifty American television programs were run through Range program to determine the cumulative coverage for each program, programs with the same genre, and sixty programs combined with the BNC lists and the BNC/COCA lists respectively. Considering that proper nouns, marginal words, transparent compounds, and abbreviations are easily learned (Nation, 2006; Webb & Rodgers, 2009a, 2009b), the four lists were included in the cumulative coverage. To answer the research questions in the present study, the cumulative coverage of each program was calculated by the three sets of lists. In addition, the researcher also focused on which word-frequency each program reached 95% and 98% coverage respectively. To determine which genres of programs reached 95% and 98% coverage with the most and the least frequent word lists, sets of programs under the same genre were examined respectively. Qualitative analysis was also carried out to examine the differences between coverage in television programs by the two sets of lists. Words appeared with high-frequency in the television programs but not in any of the lists are worth discussing. The researcher explored the words not found in any lists to examine the discrepancies between the corpus data and the BNC lists and the BNC/COCA lists.. 34.

(49) CHAPTER FOUR RESULTS AND DISCUSSION. This chapter reports the results of the present study. It begins with the section displaying the overall coverage of individual American television programs in the two word lists, and then shows the coverage of genres. Coverage of words Not Found in Any Lists is discussed then. The comparison of the coverage distribution in the two word lists and the coverage among genres are also presented in this chapter. In the end, the major findings of the study are further discussed.. Coverage of Individual Programs The results of the study revealed that the American television programs used 31,323,019 tokens of 7,279 episodes in the 14,000 Level-6 word families from the BNC lists and the 25,000 Level-6 word families from the BNC/COCA lists. The results showed that the first 1,000 word families accounted for the highest percentage of coverage among all the programs, indicating the importance of the most frequent words. However, each program reached 95% and 98% coverage at different levels.. Vocabulary Size to Reach 95% and 98% in the BNC Lists Table 5 shows the coverage of individual programs in sitcom. The vocabulary necessary to reach 95% coverage ranged from 2,000 to 7,000 word families, including proper nouns and marginal words. To reach 98% coverage, over 11,000 word families should be known.. 35.

(50) Table 5 Coverage for Individual Programs of Sitcom in the BNC Lists. 6,000. 3.57. 3.98. 3.07a. 2.50. 1.91. 1.96. 1.52a. 1.66. 1.66a. 1.37. 1.33. 1.47. 1.33. 1.38. 0.89. 0.90a. 0.91. 0.71. 0.59. 1.24. 1.19. 0.83a. 1.08. 0.61. 0.74. 0.57. 0.45. 0.60. 0.46. 0.45a. 0.73. 0.52. 0.58a. 0.45. 0.68. 0.43. 0.35b. 7,000. 0.38. 0.30. 0.40. 0.57a. 0.33. 0.40. 0.28. 0.25. 0.34. 0.17. 8,000. 0.27. 0.17. 0.22. 0.35. 0.21. 0.29. 0.17. 0.13. 0.15. 0.11. 9,000. 0.33. 0.15. 0.19. 0.28. 0.17. 0.26. 0.14. 0.13. 0.13. 0.13. 10,000. 0.22. 0.14. 0.22. 0.29. 0.17. 0.22. 0.15. 0.17. 0.17. 0.10. 11,000. 0.18. 0.09b. 0.10. 0.17. 0.12. 0.13. 0.10. 0.08. 0.12. 0.06. 12,000. 0.13. 0.07. 0.09. 0.11. 0.08. 0.10. 0.06. 0.04. 0.06b. 0.04. 13,000. 0.11. 0.05. 0.06. 0.14. 0.07. 0.09. 0.06. 0.07. 0.05. 0.03. 14,000. 0.08. 0.04. 0.04. 0.06. 0.05. 0.04. 0.04. 0.04. 0.05. 0.02. PN. 2.14. 2.04. 1.88. 2.72. 2.69. 2.02. 2.46. 1.69. 2.24. 2.50. MW. 0.91. 1.57. 1.27. 1.71. 1.15. 1.74. 1.21. 1.90. 1.38. 4.30. Not found. 2.39. 1.84. 3.03. 3.37. 2.73. 2.83. 2.15. 2.45. 1.87. 1.33 596,773. 0.77a. Friends. 5,000. 3.83. 113,109. 1.08. New Girl. 1.35. 4.57. 293,934. 4,000. Modern Family. 1.72. 4.45. 525,159. 1.58a. The Office. 1.97. 85.25. 214,685. 3,000. Tokens. a. 79.29 83.15. Community. 5.07. 85.89. 424,687. 3.83. 30 Rock. 3.50. 85.49. 915,377. 4.74. 85.88. 507,305. 2,000. 82.3. 548,656. 83.42. 86.34 83.94. The Simpsons. How I Met Your. Mother Two and a Half. 1,000. 311,803. Men The Big Bang. Theory. Word level. Reaching 95% coverage.. b. Reaching 98% coverage. 36.

(51) However, in sitcoms, seven out of the ten programs couldn’t reach 98% coverage of comprehension since over 2% coverage of tokens couldn’t be found in any of the BNC lists. It’s worth noted that proper nouns ranked the third highest percentage after the first and second 1,000 lists. This clearly shows that it’s important to recognize proper nouns when watching television programs. In addition, by the sixth set of 1,000 word families, the percentage of the text all below 1%, indicating the relative importance of the frequent words. Although the percentage differed from each programs, six out of ten reached 95% coverage among 3,000 to 5,000 word families. Friends reached 95% at the 2,000-word level while How I Met Your Mother and Community reached 95% at the higher of the 6,000-word level. Furthermore, The Simpsons reached 95% at the 7,000-word level, the less frequent word families. Even though The Simpsons is an animation, it’s for the adults and the contents are full of sarcasm, which is involved in less-frequent vocabulary. On one hand, this is similar to the results in Webb’s (2010b) study that children’s program required 4,000 word families plus proper nouns and marginal words to reach 95% coverage. On the other hand, the American comedy analyzed in Webb’s (2010b) study needed only the most 2,000 word families plus proper nouns and marginal words to reach 95% coverage, similar to the sitcoms analyzed needing the most 3,000 word families plus proper nouns and marginal words (Webb and Rodgers, 2009b). Only three out of the ten sitcoms reached 98% within the 14,000 word lists, while the other seven programs couldn’t reach 98% coverage within the 14,000 word lists. The three programs, Two and a Half Men, New Girl, and Friends, went around daily lives and relationship of the fixed characters. More frequent vocabulary were used in such sitcoms. 37.

參考文獻

相關文件

The difference resulted from the co- existence of two kinds of words in Buddhist scriptures a foreign words in which di- syllabic words are dominant, and most of them are the

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

(a) the respective number of whole-day and half-day kindergarten students receiving subsidy under the Pre-primary Education Voucher Scheme (PEVS) or the Free Quality

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Hope theory: A member of the positive psychology family. Lopez (Eds.), Handbook of positive

volume suppressed mass: (TeV) 2 /M P ∼ 10 −4 eV → mm range can be experimentally tested for any number of extra dimensions - Light U(1) gauge bosons: no derivative couplings. =>

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix

• Formation of massive primordial stars as origin of objects in the early universe. • Supernova explosions might be visible to the most