This chapter reports the results of the present study. It begins with the section displaying the overall coverage of individual American television programs in the two word lists, and then shows the coverage of genres. Coverage of words Not Found in Any Lists is discussed then. The comparison of the coverage distribution in the two word lists and the coverage among genres are also presented in this chapter. In the end, the major findings of the study are further discussed.
Coverage of Individual Programs
The results of the study revealed that the American television programs used 31,323,019 tokens of 7,279 episodes in the 14,000 Level-6 word families from the BNC lists and the 25,000 Level-6 word families from the BNC/COCA lists. The results showed that the first 1,000 word families accounted for the highest percentage of coverage among all the programs, indicating the importance of the most frequent words. However, each program reached 95% and 98% coverage at different levels.
Vocabulary Size to Reach 95% and 98% in the BNC Lists
Table 5 shows the coverage of individual programs in sitcom. The vocabulary necessary to reach 95% coverage ranged from 2,000 to 7,000 word families, including proper nouns and marginal words. To reach 98% coverage, over 11,000 word families should be known.
36
Table 5 Coverage for Individual Programs of Sitcom in the BNC Lists Word level
The Big Bang
Theory Two and a Half
Men How I Met Your
Mother The Simpsons 30 Rock Community The Office Modern Family New Girl Friends
1,000 83.42 86.34 83.94 79.29 83.15 82.3 85.88 85.49 85.89 85.25 2,000 4.74 3.50 3.83 5.07 4.45 4.57 3.83 3.57 3.98 3.07a 3,000 1.97 1.58a 1.72 2.50 1.91 1.96 1.52a 1.66 1.66a 1.37 4,000 1.35 1.08 1.33 1.47 1.33 1.38 0.89 0.90a 0.91 0.71 5,000 0.77a 0.59 1.24 1.19 0.83a 1.08 0.61 0.74 0.57 0.45 6,000 0.60 0.46 0.45a 0.73 0.52 0.58a 0.45 0.68 0.43 0.35b 7,000 0.38 0.30 0.40 0.57a 0.33 0.40 0.28 0.25 0.34 0.17 8,000 0.27 0.17 0.22 0.35 0.21 0.29 0.17 0.13 0.15 0.11 9,000 0.33 0.15 0.19 0.28 0.17 0.26 0.14 0.13 0.13 0.13 10,000 0.22 0.14 0.22 0.29 0.17 0.22 0.15 0.17 0.17 0.10 11,000 0.18 0.09b 0.10 0.17 0.12 0.13 0.10 0.08 0.12 0.06 12,000 0.13 0.07 0.09 0.11 0.08 0.10 0.06 0.04 0.06b 0.04 13,000 0.11 0.05 0.06 0.14 0.07 0.09 0.06 0.07 0.05 0.03 14,000 0.08 0.04 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.02 PN 2.14 2.04 1.88 2.72 2.69 2.02 2.46 1.69 2.24 2.50 MW 0.91 1.57 1.27 1.71 1.15 1.74 1.21 1.90 1.38 4.30 Not found 2.39 1.84 3.03 3.37 2.73 2.83 2.15 2.45 1.87 1.33
Tokens
311,803 548,656 507,305 915,377 424,687 214,685 525,159 293,934 113,109 596,773
aReaching 95% coverage.
bReaching 98% coverage.
37
However, in sitcoms, seven out of the ten programs couldn’t reach 98%
coverage of comprehension since over 2% coverage of tokens couldn’t be found in any of the BNC lists. It’s worth noted that proper nouns ranked the third highest percentage after the first and second 1,000 lists. This clearly shows that it’s important to recognize proper nouns when watching television programs. In addition, by the sixth set of 1,000 word families, the percentage of the text all below 1%, indicating the relative importance of the frequent words.
Although the percentage differed from each programs, six out of ten reached 95% coverage among 3,000 to 5,000 word families. Friends reached 95% at the 2,000-word level while How I Met Your Mother and Community reached 95% at the higher of the 6,000-word level. Furthermore, The Simpsons reached 95% at the 7,000-word level, the less frequent word families. Even though The Simpsons is an animation, it’s for the adults and the contents are full of sarcasm, which is involved in less-frequent vocabulary.
On one hand, this is similar to the results in Webb’s (2010b) study that children’s program required 4,000 word families plus proper nouns and marginal words to reach 95% coverage. On the other hand, the American comedy analyzed in Webb’s (2010b) study needed only the most 2,000 word families plus proper nouns and marginal words to reach 95% coverage, similar to the sitcoms analyzed needing the most 3,000 word families plus proper nouns and marginal words (Webb and Rodgers, 2009b).
Only three out of the ten sitcoms reached 98% within the 14,000 word lists, while the other seven programs couldn’t reach 98% coverage within the 14,000 word lists. The three programs, Two and a Half Men, New Girl, and Friends, went around daily lives and relationship of the fixed characters. More frequent vocabulary were used in such sitcoms.
38
Table 6 Coverage for Individual Programs of Procedural in the BNC Lists Word
level
NCSI Person of Interest Criminal Minds Bones The Closer Castle Chase Law Order SVU The mentalist CSI Miami
1,000 82.94 85.91 84.87 82.47 84.9 84.53 84.38 84.17 84.62 84.39 2,000 5.07 4.54 4.79 5.04 5.08 4.43 4.07 4.87 4.54 4.71 3,000 1.98 1.66 1.65 2.15 1.77a 2.19 1.60 1.93 1.79 1.93 4,000 1.30 0.98a 1.04a 1.32 0.99 1.17a 1.05a 1.49a 1.02a 1.14a 5,000 0.67a 0.63 0.56 1.14a 0.47 0.55 0.91 0.65 0.55 0.60 6,000 0.64 0.30 0.39 0.52 0.58 0.42 0.34 0.44 0.41 0.42 7,000 0.32 0.25 0.23 0.33 0.20 0.24 0.25 0.28 0.28 0.27 8,000 0.28 0.18 0.24 0.30 0.23 0.22 0.20 0.31 0.25 0.28 9,000 0.23 0.13 0.16 0.28 0.13 0.17 0.16 0.16 0.21 0.19 10,000 0.19 0.12 0.15 0.24 0.11 0.13 0.12 0.14 0.16 0.15 11,000 0.14 0.10 0.11 0.18 0.11b 0.14 0.19 0.14 0.10 0.14 12,000 0.15 0.09 0.1 0.18 0.10 0.11 0.09 0.16 0.08 0.19 13,000 0.10 0.05 0.06 0.11 0.05 0.07 0.04 0.06 0.06 0.10 14,000 0.05 0.27 0.04 0.10 0.04 0.06 0.04 0.05 0.05 0.07 PN 2.71 2.31 2.38 2.22 2.53 2.42 3.39 2.59 2.44 2.46 MW 0.48 0.45 0.73 0.91 0.91 1.01 0.85 0.28 1.31 0.64 Not found 2.75 2.02 2.47 2.49 1.80 2.14 2.33 2.28 2.13 2.31
Tokens
1,135,465 143,590 883,878 938,516 656,406 598,467 76,735 154,402 540,897 1,025,464
aReaching 95% coverage.
bReaching 98% coverage.
39
Table 6 shows the coverage for the tokens in the programs of procedurals. A vocabulary of 3,000 to 5,000 word families plus proper nouns and marginal words (MW) would provide over 95% coverage, and nine out of the ten programs couldn’t reach 98% coverage in the sets of BNC lists. Of the procedurals, a vocabulary of 4,000 word families plus proper nouns and marginal words would provide over 95%
coverage of eight procedurals.
Comparing to the former ten sitcoms, the results of procedurals shows consistency in reaching 95% at a certain level. The similar themes and contents, crime-related, may contribute to the more consistent results than the sitcoms.
However, more programs couldn’t reach 98% may result from the crime-related or police-related themes, which vocabulary less frequent than the 14,000 word families may occur.
Two things should be noted here were that first, similar to the sitcoms, by the sixth set of 1,000 word families, there was only 1% coverage below. In addition, proper nouns ranked the third highest percentage after the first and second 1,000 word families. Again, the frequent vocabulary and proper nouns showed the significance according to the vocabulary coverage.
Similar to CSI analyzed (Rodgers & Webb, 2011), the program needed the most frequent 4,000 word families plus proper nouns and marginal words to reach 95%
coverage. However, the word families needed to reach 95% coverage may differ from programs to programs, even episodes to episodes (Webb, 2011).
40
Table 7 Coverage for Individual Programs of Serial Drama in the BNC Lists Word
level
Revenge Brothers & Sisters Ugly Betty Dawson’s Creek The West Wing Friday Night Lights Glee Pretty Little Liars Gossip Girl Desperate
Houswives
1,000 86.54 87.15 85.70 87.34 85.07 86.32 89.03 87.01 87.97 89.06 2,000 4.22 3.02 3.69 3.74 4.71 4.35 3.11a 3.60 3.59 3.47 3,000 1.50a 1.23a 1.39a 1.54a 1.68 1.77a 1.28 1.39a 1.63a 1.19a 4,000 0.92 0.73 1.01 0.81 1.43a 0.99 0.65 0.91 0.81 0.73 5,000 0.52 0.80 0.51 0.55 0.82 0.69 0.43 0.58 0.52 0.47 6,000 0.28 0.41 0.33 0.33 0.51 0.48 0.25b 0.30 0.77 0.33 7,000 0.18b 0.20 0.22 0.22 0.31 0.40 0.16 0.19 0.20 0.16 8,000 0.18 0.13 0.14 0.14 0.28 0.18 0.10 0.15 0.14 0.09 9,000 0.13 0.12 0.12 0.12b 0.20 0.36b 0.09 0.11 0.14b 0.16b 10,000 0.13 0.09 0.11 0.11 0.15 0.16 0.10 0.13b 0.11 0.07 11,000 0.11 0.06 0.09 0.07 0.11 0.11 0.05 0.09 0.09 0.15 12,000 0.06 0.04b 0.05b 0.05 0.09 0.09 0.07 0.05 0.06 0.07 13,000 0.06 0.05 0.05 0.04 0.07 0.10 0.08 0.06 0.07 0.08 14,000 0.03 0.03 0.03 0.03 0.05b 0.08 0.17 0.04 0.04 0.02 PN 2.98 2.04 2.70 1.94 2.43 1.77 2.62 2.56 1.89 1.68 MW 0.95 1.99 1.89 1.28 0.11 1.04 0.77 1.16 0.45 0.76 Not found 1.20 1.91 1.99 1.67 1.97 1.21 1.05 1.68 1.52 1.50
Tokens
143,230 727,945 458,449 802,600 902,129 450,977 279,645 716,028 659,138 418,385
aReaching 95% coverage.
bReaching 98% coverage.
41
As can be seen in Table 7, different from the former twenty programs, a vocabulary of 2,000 to 4,000 word families would provide 95% coverage and a vocabulary of 6,000 to 12,000 word families would provide 98% coverage as to reach ideal comprehension suggested by Nation (2006). The results show that among the twenty programs of sitcoms and procedurals, only four of which reached 98%
coverage in the fourteenth BNC lists. The genre, serial drama, contributed to the high coverage with more frequent vocabulary.
Although the programs in Table 7 reached 95% at the 2,000 to 4,000-word level, eight of which reached 95% coverage at the 4,000-word level. The West Wing, unlike other programs, involved with the political issues, which needed less frequent vocabulary. Glee, on the other hand, is a teen and musical drama, used more frequent vocabulary in the dialogues and the lyrics of the songs. One more thing should be noted is that it provides less than 1% starting from the 5,000-word level. As can be seen, programs in serial dramas, comparing to the sitcoms and procedurals, make greater use of higher frequency vocabulary.
Similar to the programs analyzed as dramas, the results were consistent with previous studies (Webb, 2010b; Webb & Rodgers, 2009a & 2009b). Dramas in movies (Webb & Rodgers, 2009a) and in television programs (Webb, 2010b; Webb &
Rodgers, 2009b) needed the most frequent 3,000 word families plus proper nouns and marginal words to reach 95% coverage. However, even if the results were similar to one another, programs such as Glee and The West Wing still needed different levels of word families to reach 95% coverage.
42
Table 8 Coverage for Individual Programs of Serial Medical Drama in the BNC Lists Word
level
Grey’s Anatomy House M.D. ER Saving Hope Scrubs Emily Owen M.D. Body of Proof A Gifted Man Private Practice Nip/Tuck
1,000 86.13 84.73 85.40 85.16 85.63 85.59 84.36 84.62 87.41 85.72 2,000 3.95 4.77 4.02 3.94 4.30 3.79 4.90 4.15 3.31 3.98 3,000 1.76 2.35 2.06 2.10 1.98 1.84 1.99 2.08 1.39a 1.76 4,000 0.90a 1.26 1.09 0.93 0.87a 1.08 1.09a 0.90a 0.71 1.22a 5,000 0.60 0.88a 0.69a 0.84a 0.72 0.86a 0.66 0.72 0.61 0.62 6,000 0.29 0.52 0.41 0.43 0.44 0.40 0.37 0.42 0.27 0.47 7,000 0.24 0.35 0.28 0.36 0.32 0.26 0.29 0.25 0.20 0.22 8,000 0.18 0.33 0.24 0.15 0.19 0.28 0.22 0.21 0.20 0.20 9,000 0.19 0.27 0.23 0.27 0.17 0.21 0.20 0.17 0.14 0.19 10,000 0.12 0.21 0.19 0.12 0.14 0.23 0.21 0.22 0.11 0.12 11,000 0.17 0.18 0.17 0.12 0.15 0.16 0.12 0.12 0.09 0.10 12,000 0.08 0.15 0.11 0.18 0.07 0.08 0.15 0.10 0.04 0.07 13,000 0.07 0.10 0.10 0.14 0.07 0.10 0.11 0.10 0.04 0.07 14,000 0.05 0.12 0.08 0.07 0.04 0.11 0.07 0.05 0.05 0.03 PN 1.84 0.74 1.87 1.58 1.79 1.59 2.23 2.48 1.99 2.06 MW 1.22 0.47 0.52 1.02 0.80 1.10 1.01 0.91 1.27 0.58 Not found 2.20 2.56 2.54 2.61 2.32 2.34 2.02 2.48 2.19 2.59
Tokens
970,557 923,936 1,686,261 10,399 558,862 55,951 235,614 49,518 570,941 428,688
aReaching 95% covergae.
bReaching 98% coverage.
43
Table 8 lists the coverage for ten individual programs of the genre, medical drama, in the BNC lists. Similar to the programs in the procedurals, the 3,000 to 5,000 word families would provide 95% coverage in the texts. However, the obvious difference is that programs here in the medical genre reached 95% mainly at the 4,000- and 5,000-word level while the procedurals at the 4,000-word level.
Two programs, Grey’s anatomy and House M.D. were analyzed to reach 95%
coverage at the most frequent 3,000-word level and the most frequent 4,000-word level respectively as a whole program. The results were slightly different from the results in the present study. Taking a closer look at Webb’s (2010c) study, though a single episode from Grey’s Anatomy required the most frequent 3,000 word families to reach 95% coverage, one single episode from House M.D., however, required the most frequent 5,000 word families, indicating that the word families needed to reach a certain coverage differ from episodes to episodes.
The discrepancy between the previous studies and the present one could be resulted from the differences between data collection since Rodgers and Webb (2011) collected one season of the program (18 to 22 episodes) while the present study collected 177 episodes from Grey’s Anatomy and 176 episodes from House M.D.
The results in the present study also reveal a striking difference that the proper nouns (PN) provide less percentage of coverage comparing to the former programs.
The coverage of the proper nouns in half of the programs shown in Table 8 no longer ranked the third highest percentage. Coverage of the proper nouns in ER, Saving Hope, Scrubs, and Emily Owen M.D. accounted for the fourth highest percentage of tokens.
Moreover, the percentage of the proper nouns was only 0.74% in House M.D., ranked only the sixth highest of tokens. The percentage suggests that proper nouns were relatively less used in House M.D. than other programs, even the least percentage of proper nouns of all programs.
44
Table 9 Coverage for Individual Programs of Serial Supernatural Drama in the BNC Lists
Word level
True Blood Supernatural The Walking Dead Fringe The Vampire Diaries Lost Smallville Charmed Angel Buffy the Vampire
Slayer
1,000 87.33 86.03 88.56 85.25 87.43 88.08 86.79 87.08 86.63 89.04 2,000 3.59 3.71 3.49 4.55 3.12 3.92 3.85 3.75 3.92 3.30 3,000 1.52 1.79 1.39a 1.83 1.36a 1.45a 1.64a 1.67 1.94 1.52a 4,000 1.52a 1.68a 0.90 1.17a 0.86 0.77 0.92 1.26a 1.53a 1.08 5,000 0.52 0.58 0.56 0.65 0.57 0.43 0.47 0.56 0.65 0.36 6,000 0.47 0.41 0.37 0.40 0.36 0.26 0.46 0.41 0.48 0.27 7,000 0.23 0.29 0.28b 0.25 0.29 0.28 0.20 0.37 0.28 0.48b 8,000 0.12 0.18 0.17 0.25 0.13 0.18b 0.17 0.17 0.23 0.15 9,000 0.13 0.17 0.12 0.19 0.09b 0.13 0.22 0.18 0.17 0.13 10,000 0.40b 0.18 0.14 0.18b 0.38 0.11 0.15 0.40 0.30b 0.10 11,000 0.07 0.10b 0.06 0.12 0.05 0.07 0.15 0.10 0.11 0.07 12,000 0.06 0.07 0.06 0.10 0.05 0.04 0.08b 0.08b 0.08 0.07 13,000 0.05 0.07 0.03 0.11 0.06 0.03 0.08 0.07 0.09 0.17 14,000 0.03 0.05 0.03 0.05 0.03 0.04 0.05 0.04 0.07 0.05 PN 1.65 1.91 1.95 2.53 2.93 2.27 2.61 1.27 1.35 1.78 MW 0.83 1.02 0.68 0.81 0.92 0.44 0.34 0.73 0.60 0.39 Not found 1.47 1.76 1.19 1.56 1.37 1.52 1.81 1.84 1.57 1.02
Tokens
313,644 582,861 88,133 399,251 305,294 391,074 823,462 599,090 430,940 707,921
aReaching 95% coverage.
bReaching 98% coverage.
45
As can be seen, Table 9 shows the coverage for the ten programs of serial supernatural drama in the BNC lists. Knowledge of the most frequent 3,000 and 4,000 word families provided 95% coverage for the programs. The results were similar to Webb (2010b) and Webb and Rodgers (2009b) examining sci-fi/supernatural in television programs. To reach 95% coverage, the most frequent 4,000 word families plus proper nouns and marginal words were needed. However, the coverage was different from programs to programs, where the most frequent 3,000 word families would be enough for a sci-fi/supernatural movie to reach 95% coverage (Webb &
Rodgers, 2009a).
Similar to the programs in the genre, serial drama, a vocabulary of 7,000-word level to 12,000-word level provided 98% coverage of the tokens. Percentage of the proper nouns remained the third highest coverage, revealing clearly the significance of knowing the proper nouns.
Coverage for the programs shown in Table 9 indicated that the higher frequency vocabulary was needed. The reason could be that even though it was categorized in the genre – supernatural, the theme could surround the human nature and the interactions in daily lives, even focused on the relationships between the characters, which higher frequency vocabulary was used more often like serial dramas. Other evidence was shown in the coverage of the first 1,000 word families. Knowledge of the first 1,000 word families provided over 85% coverage in each program, indicating that the most frequent vocabulary were used.
As shown in Table 10, knowledge of the most frequent 3,000 to 4,000 word families provided 95% coverage of the texts, while none of them, except Nikita, reached 98% coverage within the fourteenth BNC lists. It wouldn’t be surprising that knowledge of the 14,000 word families couldn’t provide 98% coverage of the tokens since the genre here was crime-related theme, similar to the procedurals.
46
Table 10 Coverage for Individual Programs of Serial Criminal Drama in the BNC Lists
Word level
Breaking Bad Dexter Sons of Anarchy The Wire The Shield Chuck Nikita Burn Notice Leverage Justified
1,000 86.37 85.79 84.71 85.92 85.97 84.35 86.42 86.35 85.04 86.53 2,000 3.60 4.04 3.90 4.04 4.05 4.51 4.19 4.12 4.18 3.45 3,000 1.53 1.76 1.56 1.37 1.70 1.68 1.74a 1.62a 1.78 1.42 4,000 1.02a 1.47a 1.97a 2.03a 1.16a 0.92a 0.99 0.94 1.11a 1.04 a 5,000 0.53 0.57 0.55 0.61 0.55 0.75 0.62 0.64 0.65 0.62 6,000 0.35 0.44 0.52 0.40 0.34 0.40 0.34 0.30 0.37 0.36 7,000 0.25 0.31 0.3 0.27 0.23 0.43 0.24 0.22 0.25 0.19 8,000 0.18 0.17 0.15 0.23 0.19 0.20 0.26 0.18 0.23 0.20 9,000 0.14 0.13 0.16 0.16 0.13 0.16 0.17 0.15 0.19 0.11 10,000 0.12 0.14 0.16 0.07 0.16 0.19 0.13b 0.15 0.18 0.13 11,000 0.09 0.11 0.07 0.11 0.12 0.11 0.08 0.10 0.12 0.13 12,000 0.07 0.13 0.07 0.12 0.05 0.06 0.06 0.06 0.07 0.05 13,000 0.05 0.09 0.03 0.06 0.05 0.07 0.06 0.05 0.09 0.05 14,000 0.05 0.06 0.02 0.03 0.04 0.05 0.03 0.03 0.07 0.02 PN 1.55 2.08 2.37 1.61 1.81 2.31 2.56 2.24 1.90 2.31 MW 1.39 0.70 0.50 0.25 0.47 1.33 0.45 0.69 1.35 0.81 Not found 2.72 2.01 2.97 2.72 2.98 2.48 1.66 2.15 2.43 2.57
Tokens
202,731 432,100 237,784 310,628 539,978 465,446 212,140 568,193 374,038 293,810
aReaching 95% coverage.
bReaching 98% coverage.
47
However, coverage of words not found in any lists was worth noting and discussing in the later sections since the 95% coverage was reached at the 3,000-word level in Burn Notice, 98% coverage should be expected within the fourteenth BNC lists.
In answer to the first research question, the results indicated that the coverage of the sixty American television programs varied slightly between genres, but that there were large differences between individual programs (see Table 11). A vocabulary of 2,000 to 7,000 word families plus proper nouns and marginal words reached 95% coverage of the tokens. To reach 98% coverage, knowledge of 6,000 to 14,000 word families were needed, even not enough. We can also see from Table 11 that a vocabulary of 4,000 to 6,000 word families provided less than 1% coverage, which was suggested by Nation and Anthony (2013) that the number of word families dropped drastically in the 4,000 word-family level.
As can be seen, with all the sixty programs, mainly a vocabulary of 3,000 to 4,000 word families provided 95% coverage of the sixty American television programs. Similar to the results done by Webb and Rodgers (2009b), the most frequent 2,000 to 4,000 word families provided 95% coverage for the American televisions, including the genres of news, drama, science fiction, children’s programs, older programs and situation comedies.
The word families needed to reach 95% coverage in a television program varied, the most frequent 3,000 word families plus proper nouns and marginal words were suggested in the previous studies to reach 95% coverage (Rodgers & Webb, 2011;
Webb, 2010a, 2010b, 2010c; Webb & Rodgers, 2009a, 2009b). However, as can be seen in Table 11, the results in the present study suggested that to reach 95% coverage in the television programs, the most frequent 4,000 word families plus proper nouns and marginal words were needed.
48
This could be resulted from three reasons. First, the present study collected a lot more data than the previous studies, which analyzed at most one season from a program (Webb, 2010b, 2010c) or even one single episode representing one program (Webb, 2011). However, the present study collected the whole program from the first episode of the first season to the latest episode. The size of the data collection may result in the difference between the results.
Secondly, as discussed previously, the word families needed to reach 95%
coverage varied from episodes to episodes, not even mentioning programs to programs. The types of programs or episodes collected in the previous studies might differ from the present study. Although the same genre was chose to analyze, the programs and episodes chosen could make a significant difference when analyzing the results.
Last but not least, speaking of the word families needed to reach 95% coverage, it didn’t mean that it was the accurate 95% coverage; neither did it mean that the whole 3,000 word families were needed to reach 95% coverage. In other words, to reach 95% coverage, there might be slightly difference between two 1,000 word levels. For example, the most 3,000 word families were needed to reach 95.02% in one program while to reach 94.98% in the other one. The coverage between the two programs was only 0.04%, so the results would report that the most frequent 3,000 word families would be enough for the first program to reach 95% coverage. However, for the second program, the results would suggest the most 4,000 word families be enough to reach 95% coverage.
As to reach 98% coverage, Webb and Rodgers (2009b) suggested knowledge of the 5,000 to 9,000 word families while Rodgers and Webb (2011) showed that a vocabulary of 6,000- to 8,000-word level. However, in the present study, Table 11 reveals that a vocabulary of 6,000 to 14,000 word families were needed to reach 98%
49
coverage of the texts, with thirty-five of which couldn’t reach the percentage. Words not found in any lists will be discussed in the later sections to see the striking discrepancies between the present study and the reviewed literature.
Table 11 Numbers of Programs Reaching 95% and 98% Coverage at the Word-Family Levels in the BNC Lists
Word Level 95% 98% Below 1%
2,000 2 - -
3,000 20 - -
4,000 27 - 26
5,000 8 - 30
6,000 2 2 4
7,000 1 3 -
8,000 - 1 -
9,000 - 5 -
10,000 - 5 -
11,000 - 3 -
12,000 - 5 -
14,000 - 1 -
Not found - 35 -
Total 60 60 60
50
Vocabulary Size to Reach 95% and 98% in the BNC/COCA Lists
Table 12 shows that the first 1,000 word families accounted for the most of the percentage as the number of each decreased consistently with the word frequency decreasing. A vocabulary of 2,000 to 6,000 word families provided 95% coverage of the texts, while knowledge of 5,000-word level above was needed to reach 98%
coverage. Similar to the data in the BNC lists, coverage of proper nouns accounted for the third highest percentage after the first and second high-frequency vocabulary lists.
However, the percentage of proper nouns (PN) in Friends here accounted for the second highest coverage in the BNC/COCA lists, indicating the relative significance of knowing the proper nouns (Webb & Rodgers, 2009b).
Although six of the programs reached 95% coverage at the level of 3,000 to 4,000 word families, Friends reached 95% at the second 1,000-word level while How I Met Your Mother, 30 Rock, and Community reached at the 6,000 word families, using less frequent vocabulary.
Table 12 also presents a vocabulary of 5,000 to over 14,000 word families provided 98% coverage of the texts. Two and a Half Men, The Office, Modern Family, and New Girl reached 98% coverage at the 9,000-word level while How I Met Your Mother and The Simpsons couldn’t reach 98% coverage within the 25,000 word lists.
The words not found in any lists were worth discussing since such big corpora like BNC and COCA should provide 98% of texts in television programs.
51
Table 12 Coverage for Individual Programs of Sitcom in the BNC/COCA Lists Word
level
The Big Bang Theory Two and a Half Men How I Met Your Mother The Simpsons 30 Rock Community The Office Modern Family New Girl Friends
1,000 84.04 87.21 84.78 80.41 83.71 83.09 86.14 86.57 86.88 85.92 2,000 4.26 3.12 3.41 4.68 3.78 4.07 3.61 3.22 3.06 2.64a 3,000 1.82 0.89a 1.13 1.52 1.46 1.31 1.31a 0.88a 0.88a 0.69 4,000 1.12a 0.77 0.84 1.36a 1.21 1.12 0.74 0.77 0.76 0.58 5,000 0.85 0.65 0.97 1.32 0.67 1.00 0.57 0.63 0.71 0.47b 6,000 0.56 0.44 1.04a 0.71 0.48a 0.62a 0.48 0.54 0.45 0.37 7,000 0.35 0.26 0.27 0.42 0.25 0.37 0.26 0.35 0.23 0.18 8,000 0.31 0.21 0.33 0.39 0.28 0.41 0.21 0.40 0.62 0.15 9,000 0.25 0.23b 0.20 0.31 0.19 0.24 0.16b 0.17b 0.19b 0.10 10,000 0.17 0.12 0.11 0.22 0.15 0.17 0.12 0.09 0.11 0.07 11,000 0.17 0.14 0.12 0.17 0.12 0.18 0.12 0.10 0.11 0.08 12,000 0.13b 0.10 0.15 0.16 0.11 0.17 0.08 0.07 0.11 0.06 13,000 0.10 0.07 0.07 0.10 0.07 0.11 0.05 0.06 0.05 0.04 14,000 0.10 0.04 0.05 0.08 0.05b 0.05b 0.04 0.03 0.03 0.03 PN 2.56 2.26 2.25 3.31 3.71 1.69 2.86 2.13 2.61 2.84 MW 0.97 1.66 1.32 1.83 1.19 1.75 1.27 1.94 1.43 4.50 CP 0.47 0.45 0.46 0.45 0.45 0.42 0.38 0.42 0.35 0.34 Abbr. 0.08 0.04 0.11 0.07 0.08 0.06 0.05 0.04 0.08 0.03 Not found 1.41 1.16 2.16 2.14 1.87 1.70 1.40 1.47 1.19 0.66
52 Table 12. (continued)
Tokens
311,803 548,656 507,305 915,377 424,687 214,685 525,159 293,934 113,109 596,773
aReaching 95% coverage.
bReaching 98% coverage.
Table 13 presents the vocabulary coverage of procedurals in the BNC/COCA lists. Knowledge of 3,000 word families would provide 95% coverage for seven of the programs. A vocabulary of 4,000 word families would provide 98% coverage for CSI and NCSI in this genre while the 5,000-word level could provide 98% for Bones. The results were consistent with three other studies setting out to identify the coverage of television programs (Rodgers & Webb, 2011; Webb, 2010b; Webb & Rodgers, 2009b).
The average 3,000 word families were found to provide 95% coverage for television programs in the three studies.
Differences of word levels were found to provide 98% coverage. Knowledge of 7,000 to 13,000 word families would provide 98% coverage for the ten programs in the present study. However, knowledge of 5,000 to 9,000 word families were needed to reach 98% coverage in Webb and Rodgers’ (2009b) study and knowledge of 6,000 to 8,000 word families were needed to reach 98% coverage in Rodgers and Webb’s (2011) study. The reason could be that there were only eighty-eight programs and 288 episodes of texts in the two respective studies, which consisted of similar theme-related episodes.
53
Table 13 Coverage for Individual Programs of Procedural in the BNC/COCA Lists Word
level
NCSI Person of Interest Criminal Minds Bones The Closer Castle Chase Law & Order: SVU The Mentalist CSI Miami
1,000 82.33 86.02 84.78 82.89 84.73 84.64 84.88 84.41 84.87 84.55 2,000 5.31 4.55 4.55 5.15 5.02 4.73 3.59 4.80 4.32 4.63 3,000 1.98 1.59a 1.98a 2.13 1.67a 1.58a 1.28a 2.11a 1.54a 1.68 4,000 1.26a 0.81 0.88 1.30 1.21 0.97 0.71 1.07 0.91 1.16a 5,000 0.73 0.57 0.52 0.75a 0.46 0.56 0.77 0.69 0.59 0.67 6,000 0.45 0.35 0.38 0.78 0.32 0.40 0.43 0.44 0.41 0.42 7,000 0.27 0.28 0.26 0.39 0.21b 0.31 0.33 0.32 0.25 0.29 8,000 0.27 0.15 0.21 0.34 0.16 0.23 0.36 0.24 0.29 0.27 9,000 0.16 0.10b 0.13 0.26 0.14 0.19b 0.16 0.16b 0.19b 0.18 10,000 0.13 0.08 0.11 0.19 0.10 0.11 0.07b 0.13 0.11 0.12b 11,000 0.11 0.29 0.10 0.16 0.10 0.12 0.08 0.11 0.10 0.11 12,000 0.09 0.04 0.09b 0.13 0.06 0.10 0.11 0.08 0.06 0.10 13,000 0.08b 0.03 0.05 0.09b 0.02 0.04 0.03 0.05 0.05 0.06 14,000 0.06 0.03 0.03 0.06 0.02 0.06 0.05 0.04 0.05 0.05 PN 3.71 2.72 2.81 2.20 3.02 2.93 3.92 3.00 2.86 2.84 MW 0.56 0.49 0.78 0.94 1.01 1.06 1.05 0.38 1.41 0.66 CP 0.39 0.33 0.30 0.29 0.32 0.38 0.39 0.35 0.32 0.38 Abbr. 0.16 0.10 0.13 0.08 0.12 0.08 0.10 0.07 0.08 0.16 Not found 1.69 1.36 1.72 1.59 1.21 1.37 1.49 1.44 1.47 1.49
aReaching 95% coverage.
bReaching 98% coverage.
54
Table 14 shows vocabulary coverage of serial dramas in the BNC/COCA lists.
It’s not surprising to see that the most frequent 1,000 word families accounted for the highest percentage of coverage of all lists. In addition, not surprisingly, the second set of 1,000 word families accounted for only 5.2% the most. In the genre, serial drama, the most frequent 2,000 word families plus proper nouns, marginal words, apparent compounds, and abbreviations would provide 95% coverage while the 3,000 word families would provide the same percentage for The West Wing and Dawnson’s Creek.
The reason why the two programs reached 95% at less-frequent word level could be that they were originally broadcast in 1999 and 1998, which the vocabulary usage should be different from nowadays. This finding supports Webb’s (2011) earlier
The reason why the two programs reached 95% at less-frequent word level could be that they were originally broadcast in 1999 and 1998, which the vocabulary usage should be different from nowadays. This finding supports Webb’s (2011) earlier