Organization of this study - 測量華語兒童早期詞彙成長:以語料庫為本之研究

Chapter 1 Introduction

1.3. Organization of this study

The thesis consists of five chapters. Chapter 1 provides the background of this study, and it introduces the purpose of the study and the research questions. Chapter 2 is literature review, introducing methods of observing children language development in past studies and previous findings of children’s early vocabulary acquisition. Chapter 3 presents the methodology used in this study, which is a corpus-based analysis. In Chapter 3, the corpus used in this study is introduced first, including the reason of choosing this corpus, information about language samples examined in this study, the definitions of lexical categories, semantic categories, and conceptual levels used in the study. Besides, the items of measuring vocabulary growth and vocabulary organization are introduced. Chapter 4 presents the results of the measurements mentioned in chapter 3 and builds up the profile of children’s early lexicon. A general observation of the whole corpus is provided first, and then detailed results of all measured items are provided. Chapter 5 provides a general discussion based on the results, implication of this study, suggestion for future studies and conclusion. Abbreviations used in this study are listed below.

CHILDES: Child Language Data Exchange System

MCDI: MacArthur-Bates Communicative Development Inventory (Liu & Tsao, 2010) NVR: Noun/Verb ratio; N/V ratio

POS: Part-of-speech

TCCM: Taiwan Corpus of Child Mandarin TTR: Type/Token ratio

Chapter 2 Literature Review

2.1. Methods of exploring children’s early vocabulary

Various methods have been adopted for the study of children’s vocabulary development. In some studies, a wordlist was first constructed and validated with parents’ input (Fenson, Dale, Reznick, & Bates, 1994; Dale & Fenson, 1996). Parents were asked to check the wordlist to record words and expressions that children have acquired. Researchers also designed experiments to examine children’s performance of specific language knowledge (Haryu et al., 2005; Mervis & Crisafi, 1982; Zeng & Zou, 2012). Some other researchers also recorded children’s speeches on audio files, transcribe the collected data, and then analyze it (Hsu, 1996; Huang, 2009; Lee, 2014;

Tardif, 1996; Yeh, 2009). In this section, three methods of investigating children’s language are reviewed: parental report, experiment, and corpus-based study using CHILDES.

Parental report

Vocabulary development is an essential milestone in children’s early language development. It can be used to predict children’s performances of later language skills (Hao et al., 2008). Due to the importance of children’s early vocabulary, researchers have built norms for vocabulary measurements, among which the parental report has been the most widely used measurements. Parental report is less time consuming and easy to conduct, so researchers can easily collect a large number of children’s early words in a short time. Parental report needs parents to judge whether their children can

and Dale and Fenson (Dale & Fenson, 1996) were the first to do this study and provided a norm for the parental report which was called MacArthur-Bates Communicative Development Inventory (MCDI). It was used to investigate American children’s comprehension and production vocabulary between the ages of 0;8 (8 months) and 2;4 (2 years 4 months). Based on the American English version of MCDI, researchers have established MCDI in other languages and linguistic variants, such as British English (Hamilton, Plunkett, & Schafer, 2000), Spanish (Jackson-Maldonado, Thal, Marchman, Bates, & Gutierrez-Clellen, 1993), and Mandarin Chinese (Liu & Tsao, 2010).

Experiments

Some researchers may choose to conduct experiments to examine children’s language performance. The advantage of an experiment approach is that researchers can test an assumption by controlling variables. In this way, researchers can know which assumption can be applied to children’s acquisition. However, the disadvantage is that children’s language samples are elicited, and the elicitation data might be different from children’s natural speech. Since there are advantages and disadvantages of different research methods, researchers would use different methods or different subjects to repeatedly test an assumption.

For instance, in answering the question of a possible noun bias in early vocabulary acquisition, Gentner (1982), Hsu (1996), and Tardif (1996) counted the numbers of words in children’s spontaneous speech. On the other hand, Haryu and colleagues (2005) used an experimental method to investigate the noun bias in early acquisition. They conducted an experiment to investigate whether there is a “Noun Bias” in Mandarin Chinese, Japanese, and English preschoolers of three- and five-year-old. They compared the ease of fast-mapping novel nouns to a novel object and novel verbs to a novel action.

Their findings supported the universal “Noun Bias” view.

In order to exploring the basic level effect in children’s vocabulary, Jiang (2000) and Lee (2014) collected a number of spontaneous speech samples from children, and counted the number of words. On the other hand, Mervis and Crisafi (1982) conducted an experiment, and found that basic-level terms were more advantageous to children’s lexicon learning than superordinate terms and subordinate terms. Furthermore, Zeng and Zou (2012) have worked on their study with three methods, including controlled experiments, to explore the early development of category levels of Mandarin-speaking children. Their results showed that basic-level words were dominant in comprehension and production of early vocabulary.

Corpus-based method

A corpus is a collection of a large number of language transcriptions. Corpus database provides children’s spontaneous speech, which is exactly recorded from their real daily speech. The most well-known corpus of child language is Child Language Data Exchange System (CHILDES) (MacWhinney, 2000). It provides the transcription format, CHAT format, and analyzing tools, CLAN tools (MacWhinney, 2000). There are some advantages of using CHILDES. Researchers can conduct their researches with available data when it is difficult to have native speakers of other languages to be their subjects. They can make cross-linguistic comparisons to find out a general tendency of language development. On the other hand, the disadvantage of a corpus-based approach is that researchers cannot test certain assumptions particularly. What corpus data contains is a sample of children’s everyday language use. The collected language samples may be restricted to some conversation topics. Researchers interested in other

themselves.

2.2. Performance of children’s early vocabulary

2.2.1. Vocabulary size

Past studies of English-speaking children have found that few children produce any words before age one. Most children produce their first recognizable words in 15 months or so. They have approximately 100 to 600 distinct words at 2 years old. They have about 14,000 words in comprehension and fewer in production by 6 years old.

These numbers imply that children acquire words between 2 to 6 years old at a rate of nine to ten words a day (Clark, 1993). Hsu (1996) has reported the vocabulary size of Mandarin-speaking children before 6 years old in his study. Children had 260 words before 2 years old, 634 words before 3 years old, 771 words before 4 years old, 808 words before 5 years old, and 895 words before 6 years old. Yeh’s study (2009) has shown a mean of nouns and verbs: 166 words before 2 years, 402 words before 3years, and 478 words before 4years. Tsay and Cheng (2011) reported the vocabulary size of 8 children speaking Taiwanese Southern Min from approximately one and a half years old to 4 years old. The total vocabulary size of these children is smaller or just a little bit larger than 2000. The vocabulary growth rate of two children was also reported and was compared with Clark’s results. Clark (1993) discussed two young children's vocabulary growth. One child Keren (reported in Dromi 1987) produced up to 337 new words by 1;5 while the other child Damon produced up to 337 new words by 1;9. As for Tsay and Cheng’s study (2011), they found that a child reached 337 new words between 1;5 and 1;6, and another child reached 337 new words between 1;6 and 1;7.

2.2.2. Noun bias or Verb bias

A universal “Noun Bias” in young children’s vocabulary development has been debated heatedly for years. Gentner (1982) proposed Natural Partitions Hypothesis, stating that there is a preexisting perceptual-conceptual distinction between concrete concepts (nouns) and predicative concepts (verbs), and the distinction between nouns and predicate terms is based on this perceptual-conceptual distinction. The Natural Partitions hypothesis also holds that nouns belonging to concrete concepts are conceptually simpler or more basic than verbs and other predicates. Examining cross-linguistic data further evidenced that young children acquire nouns easier and earlier than verbs, and this “Noun Bias” is a universal phenomenon.

However, some studies of Mandarin Chinese and Korean vocabulary development argued against “Noun Bias”, suggesting that nouns are not always acquired first (Choi

& Gopnik, 1995; Tardif, 1996; Sheng, Deng, Zhang, Liang, & Lu, 2012). Based on the data collected in Beijing, Tardif (1996) argued against the universality of “Noun Bias”.

Ten Mandarin-speaking children participated in her longitudinal study, but only the second or third recording was analyzed when children’s mean age was 21 months. In order to know whether the different definition of nouns and verbs lead to various conclusions, she included several strict and broad definitions of nouns and verbs, object labels and action words, and nominals and predicates. The results of her sliced data suggested that Mandarin-speaking children produce more verbs than nouns in their early lexicon. She further suggested that linguistic and sociocultural input factors accounted for a “Verb Bias”. Mandarin verbs occur frequently in adult inputs and verbs are highlighted by occurring in salient positions. Furthermore, morphological simplicity has effects on the bias in children’s performance. The morphology of English nouns is

nouns. Unlike English, Mandarin is morphologically transparent. “Noun Bias” is not reinforced by Mandarin morphology., so nouns are acquired later. The input frequency account was further evidenced in Dhillon (2010) who examined the observational data of English-, Spanish-, and Mandarin-speaking children in the CHILDES database.

Children’s age ranged from 1 year and 7 months to 2 years and 11 months. The number of noun types, verb types and other types, the number of noun types versus the number of verb types, and the proportion of nouns divided by the sum of the proportion of nouns and verbs were computed. The results showed that Mandarin-speaking children exhibited a “Noun Bias” in the early stage (1;7 to 2;0) but no “Noun Bias” in the later ages. English- and Spanish-speaking children displayed a “Noun Bias” across all ages.

These results have supported Dhillon’s (2010) arguments that argument-dropping in Mandarin would make children receive more verbs from the adults’ inputs than children of other languages. These factors lead to the prediction that children will learn verbs more easily.

Nevertheless, Hsu (1996) has reported that Mandarin-speaking children produce more nouns (60%) than verbs (25%), supporting a “Noun Bias”. Yeh (2009) conducted a study about young children’s acquisition of nouns and verb in Mandarin Chinese in Taiwan. The results indicated that nouns are the major words which children acquire in their early ages, as well as that younger children use relatively more nouns whereas older children use relatively more verbs. It also suggests that children’s cognitive abilities and social interactions have influences on the early acquisition of nouns and verbs. In addition to observational studies, Haryu and colleagues (2005) had done an experiment to investigate whether there was a “Noun Bias” in Mandarin Chinese, Japanese, and English preschoolers of three- and five-year-old. They compared the ease of fast-mapping novel nouns to a novel object and novel verbs to a novel action. One of

the conditions was “bare verb condition” in which novel verbs were presented with no argument. There are morphological affixes in English and Japanese but not in Mandarin, thus making the condition a “bare word condition” in Mandarin. The results showed that both the 3- and the 5-year-olds in three languages could fast-map a novel noun to a novel object, but they could not fast-map a novel verb to its meaning properly until five years old. Moreover, the results of bare word condition in Mandarin showed that children tended to map a novel word to a novel object. These findings supported the universal “Noun Bias” view. They concluded that the difficulty of Mandarin verb learning was resulted from the lack of verb morphology and the argument-dropping property of Mandarin. When children encounter a novel word, they need clues to decide whether to map the novel word to an object or an action. However, argument-dropping is allowed in Mandarin, and Mandarin verbs are not morphologically inflected. The linguistic properties of Mandarin imply that the linguistic clues to decide to map a word to an object or an action are not always available. As a consequence, children rely on the universal “Noun Bias” to map a novel word to a novel object.

In a cross-linguistic corpus-based study, Liu and her collaborators (2008) examined the noun versus verb (N/V) ratio in types in English, Cantonese and Mandarin, found different results. They conducted two studies examining 13- to 60-months-old children’s data in the CHILDES database (MacWhinney, 2000). The first study selected a total of 72 files, and the second study used all data available from the corpora. Their results revealed that when averaged over all ages, English-speaking children showed a stronger

“Noun Bias”, but Cantonese-speaking children had a relatively weak “Noun Bias” and Mandarin-speaking children even had no “Noun Bias”. The same pattern was observed in adults’ inputs, and Mandarin-speaking adults even displayed a “Verb Bias”.

The authors believed that adults’ inputs might have influence on children’s vocabulary learning. As what Tardif (2006) and Tardif, Gelman, and Xu (1999) have stated, Mandarin-speaking parents focused on verbs when talking to their children and the verb use was highly specific in Mandarin. On the contrary, English-speaking parents focused on nouns and used more general purpose verbs when talking to their children.

The controversy of the “Noun Bias” in Mandarin remains after continuous efforts.

Previous studies investigating this issue in Mandarin by using the CHILDES database found the “Verb Bias”, challenging the universal “Noun Bias” view. To clarify the dialectal variation in the so-called “Noun Bias” in Mandarin, this study examines the use of nouns and verbs in young Mandarin-speaking children in Taiwan. Moreover, Liu and her colleagues (2008) and Tardif (1996) concluded that there is no “Noun Bias” for Mandarin children using data from the CHILDES database. It is possible to reach a different conclusion using a different corpus. Besides, although Tardif (1996), Dhillon (2010), and Haryu and colleagues (2005) illustrated the biases in terms of morphology and argument structure of Mandarin, their explanations were different. Therefore, if there is a “Noun Bias” observed in this study, it supports Haryu and colleagues' explanation (2005); on the contrary, if there is no “Noun Bias” observed, it supports the explanation of Tardif (1996) and Dhillon (2010).

2.2.3. Lexical complexity in acquisition

Word frequency provides general information about the collected language samples. Word type frequency represents the number of different words children know, and word token frequency represents the total number of items children produced.

However, producing many types is not equal to using all the types frequently;

meanwhile, producing many tokens does not mean producing many various word types.

Thus, vocabulary diversity needs to be examined, and measurements of vocabulary diversity are frequently used in language research. One of the measurements based on the ratio of the number of different words (Types) to the total number of words (Tokens) is known as the type-token ratio (TTR). A high ratio of TTR may represent a rich vocabulary diversity which means that children produce relatively more different types in their speech. On the contrary, a low ratio of TTR may represent a weak vocabulary diversity which means that children produce fewer different types or repeatedly produce the same types in their speech.

Many measures of vocabulary diversity have been based on the type-token ratio.

Unfortunately, Heaps’s law (Heaps, 1978) predicts that the more words (tokens) a sample has, the less possible it is that new words (types) will show up. That is to say, the first few tokens in a sample are likely to be new types, but later words are likely to be types that have been used before. Thus, measures based on TTR are likely to be affected by the sample size. The TTR values are lower in samples with more tokens and vice versa (Tweedie & Baayen, 1998).

In order to fix the sample size problem of TTR, another measure of lexical diversity was invented. A program called vocd was developed to calculate D (McKee, Malvern, & Richards, 2000). This method depends on the analysis of the probability of a new word appearing in longer and longer samples. The analysis leads to a mathematical model of how TTR interacts with token size. By comparing the mathematical model with empirical data in a transcript, it provides a measure of lexical diversity which is called D. The formula of the model is the following equation.

TTR = 1 + 2 − 1

produ

is calculated, and the average TTR for each sample size is calculated to represent that point of the curve of TTR of that transcript. Third, the software finds the best fit of this empirical curve based on the TTR and the token size, and obtains the value of D-measure. The average D value of subsamples of varying sample sizes is calculated, and the best-fit D values is obtained with the least square difference method. Finally, repeat step 1 to step 3 for twice. The average best-fit D value is the best D value of that transcript (see McKee et al., 2000, for a complete flow chart). A high value of the D-measure reflects high lexical diversity, and a low value means low lexical diversity.

The calculation of D-measure takes different sample sizes into consideration, so it is proven to be a more valid and reliable measure of lexical diversity (MacWhinney, 2000;

McCarthy & Jarvis, 2007; McKee et al., 2000).

Liu and her colleagues (2008) used D-measure to quantify vocabulary diversity of speech samples in their cross-linguistic study about early lexical development. They found both language and age had significant main effects on lexical diversity.

Mandarin-speaking children had their mean D value of 44.21. They also found that differences between any two age groups are significant, except for the difference between 25-36 months old and 37-48 months old. The results reveal that children’s speech becomes more diverse with increasing ages.

2.2.4. Performance in semantic categories

Peng and Chong (2010) have conducted a study of early acquisition of nouns from two Mandarin-speaking children age from 1 to 3 years old. They have categorized nouns into kinship terms, organs, clothing, device terms, vehicles, food, animals, natural objects, and colors. Their study has indicated that children’s early noun acquisition is

category of device nouns, such as 燈 (dēng, lamp), 筆 (bǐ, pen) and 球 (qiú, ball), more frequently than nouns of other categories. On the other hand, the production of nouns in categories like kinship terms, organs, natural objects and colors is relatively few. The high frequency of producing device nouns is due to the frequent inputs in children’s daily life. Besides, children’s cognitive ability also has effects on their language acquisition, so children acquire color terms later until three years old.

Sheng et al. (2012) examined Chinese young children’s Mandarin development in Nanjing, using a revised Chinese version of MCDI (Liang et al, 2001). The study was done with 326 toddlers of two age stages, 14-16 months and 24-26 months. They have found that toddlers at stage 1 could express 42.09 words (SD = 64.43), and they acquired more onomatopoeia words than words in other categories (i.e. nouns, verbs, adjectives/adverbs, conventional game words, numeral words, quantifiers, interrogatives, pronouns, directional words, and time words). Besides, they could express more verbs than nouns. The results of stage 1 further showed that the age of 15 months was found to be the critical period of producing nouns, verbs, adjectives/adverbs, conventional game words and onomatopoeia.

Likewise, Lee (2014) has found in her study that animals, tool, and food contributes the most proportion on nouns in early Hakka vocabulary acquisition. She also concluded that the process of acquiring nouns, such as personal pronouns, body parts, clothing, tools, food, and animals, coincided with cognitive development.

在文檔中測量華語兒童早期詞彙成長:以語料庫為本之研究 (頁 14-0)