Chapter One Introduction
This study investigated the acquisition and development of aspiration contrast in
Mandarin Chinese from one- to six- year-old children. The one-year-olds were
observed longitudinally for a five-month period while the two- to six-year-olds were
examined in a cross-sectional study. Children’s speech samples collected from the task
of picture naming were analyzed instrumentally and then compared with adults’ speech
production in order to sketch a general tendency to the development of aspiration. After
the developmental trend was drawn, it was related to similar research topic in other
languages for a cross-linguistic comparison which is assumed to present a common
progress with respect to the acquisition of aspiration.
As mentioned above, instrumental analysis was employed to explore the speech
data in the present experiment. In fact it was voice onset time (VOT) that was selected
among various acoustic components to measure the aspiration duration of stop
consonants. Though the present study of language acquisition obtained and explained
the results by means of instrumental practice, there has been a long history in inquiring
into child phonological progress by applying generative phonological rules to child
speech development. To explicate the motive of opting for experimental research
rather than phonological derivation in this study, we would like to make a remark about
the one-to-multiple correspondence relationship between phonological contrast and
phonetic values, and the inadequacy of applying adult-based phonological theory to
child language acquisition. Finally we will illuminate how voice onset time can be
utilized to provide a supplementary point of view to the traditional practice of the
inquiry into language acquisition.
1.1 The correlation between phonological contrast and phonetic values
The phonological feature of aspiration or voicing is primarily used to characterize
stop consonants in the languages of the world, but the phonetic realization of the feature
may be different from language to language (Allen 1985, Cho and Ladefoged 1999,
Goodluck 1991, Johnson 2003, Ladefoged and Maddieson 1996, Ladefoged 2001,
Menn 1983, Macken 1980, Öğüt et al. 2006, Smith and Kenney 1999, Snow 1997, Wei
1997). Thus it is expected to cause some confusion if cross-linguistic reference only
counts on the orthographical label. To solve the problem, more and more studies rely on
phonetic values furnished by instrumental analyses in their research. As the
relationships between phonological terms and their corresponding phonetic output are
usually variable among languages, instrument analyses could act as an impartial tool
regardless of conventional feature classifications in different languages.
A pair of stop consonants at the same place of articulation is traditionally
differentiated from each other by the feature of aspiration or voicing aside from other
acoustic parameters. At the phonological level, voiced unaspirated stops are
traditionally represented as /b/, /d/, and /g/ as in Thai and Turkish while their voiceless
counterparts as /p/, /t/, and /k/ as in English and French. Furthermore voiceless
aspirated stop phonemes may be labeled as /ph/, /th/, and /kh/ as in Cantonese and
Apache. However at the phonetic level, the same orthographical representation in
different languages is not always realized in the same way. For example, with regard to
voicing contrast, Öğüt et al. (2006) and Gandour et al. (1986) reported that /b/, /d/, and
/g/ in Turkish and Thai respectively are classified as voiced stop consonant according to
the manner of articulation. Both of the two studies revealed that these three stops are
pronounced as voiced by adult native speakers. On the contrary, the same three symbols
in English are produced usually as voiceless unaspirated as are Turkish and Thai
voiceless stops (Macken1980, Wei 1997). The cross-linguistic results thus indicate that
there may be some discrepancy among languages of the world in phonetically realizing
the same phonemic labels.
Voicing contrast seems not to have identical phonetic production across languages,
nor does aspiration contrast. Cho and Ladefoged (1999) found that both voiceless
aspirated and unaspirated sounds may be realized with extensive degrees of aspirated
duration among the eighteen languages they investigated. Take velar plosives /kh/ and
/k/ for example, the pair of aspirated and unaspirated stops in Apache has a mean VOT
of 80 and 31 ms. respectively; in Navajo is 154 and 45 ms; and in Tlingit 128 and 28 ms.
Both the realization of aspirated and unaspirated stop in the three languages is highly
different from one another. Such a great divergence renders the category of aspiration
perplexing, i.e. to what extent a stop phoneme can be identified as aspirated or
unaspirated. In light of these cross-linguistic phonetic variation, Cho and Ladefoged
further classified the category of aspiration into four such groups as unaspirated,
slightly aspirated, aspirated and highly aspirated stops on the basis of VOT values.
Even though four types of aspiration was proposed, a language which has aspiration
contrast normally characterizes the stop consonants as aspirated or unaspirated
irrespective of which type the sound belongs to. Therefore the phonological
terminology of aspiration for a stop phoneme could mean comparatively vague and
ambiguous when conducting a research across languages.
With the language-specific relationship between the designation at the
phonological level and the realization at the phonetic level, it is predictable to generate
some misunderstanding if stop consonants are described either in terms of the
phonemic labels such as /b/, and /t/ or contrastive categories such as voiced or aspirated.
Thus VOT is normally acknowledged as a language-independent means of
characterizing stops.
1.2 Child phonology
In the research of speech production development, a child’s speech data has been
considered as the output derived from phonological rules (Fromkin and Rodman 1997,
O’Grady et al. 1997, Smith 1973). The derivation process normally adopts broad
transcription to represent a child’s pronunciation. However broad transcription would
interpret the changes from a child’s underlying representation to their surface
production as an abrupt categorical change rather than progressive gradient change.
Such change from one phoneme category to another might somehow misrepresent the
phonetic facts of speech development (Hewlett and Waters 2004, Menn and
Stoel-Gammon 1995, Scobbie et al. 2000, Stewart and Vaillette 2001). Therefore some
literature on the progress of child speech production has been employing instrumental
analyses as an aid to investigate child’s phonological development.
In generative phonology every surface output of the production of sounds is
derived from an underlying form through some phonological process of substitution,
deletion or insertion of a segment and so on. It has been presumed that a child also
obeys the same process to produce their words from an underlying form which was
constructed as a result of their comprehension of adult speech even though their speech
is not entirely in an adult form. This notion indicates that children’s ability to perceive
phonological contrasts is better than the ability to produce them. O’Grady et al. (1997)
illustrated that a child can not pronounce cart and card or jug and duck distinctively but
can point them out correctly in a comprehension task. Under the derivational approach
based on adult system, if a child pronounces tent as [det], the process would be involved
with voicing and nasal deletion, yielding a categorical change at the surface level.
However the broad transcription used to represent the example above may overlook
some phonetic facts of the surface form. As children’s pronunciation tends to be more
variable, the chances are that they could produce two phonemes /t/ and /d/ similarly to a
listener’s ears but differently in instrumental analyses.
The subtle distinction between a child’s productions of two phonemes can be
viewed as covert contrast which might not be adequately accounted for by phonological
process. Cover contrasts cannot be explicitly described by phonemic transcription
because each phonemic label represents a discrete unit with no intervening area
between them. Nevertheless a child’s immature speech is often intermediate between
sounds as in Smith’s findings (1973) that his young subject produced some [ts] sound
when he seemed to initially develop [s] out of [t]. It is plausible that before an adult can
distinguish a child’s two target phonemes, the child has already produced a covert
contrast between them. In the beginning the phonetic values of the covert contrast are
normally too minute to be discerned within the range of human perception. Not until the
contrasts are large enough to affect categorical perception can an adult detect the
differences between sounds. So it is helpful to apply instrumental analysis to the study
of child language development as instruments can be tuned to measure fine components
of speech sounds.
1.3 Voice onset time (VOT)
Among stop system one of the most distinguished acoustic values is the burst
shown on the spectrogram. (Borden et al. 1994, Clumeck et al. 1981, Davis 1995, Eilers
et al. 1984, Gandour et al. 1986, Johnson 2003, Ladefoged 2001, Ladefoged and
Maddieson. 1996, Macken and Barton 1979, 1980). The temporal duration between the
release burst of the stop and the onset of the following voiced sound is called voice
onset time (VOT). VOT can also be influenced by place of articulation, i.e. the further
back the place of articulation is the longer VOT is. Lisker and Abramson first identified
three categories of VOT commonly observed in many human languages (cited from
Clumeck et al. 1981, Davis 1995, Macken et al. 1979, 1980). The first one is called
voicing lead in which voicing is detected before the release of the stop. Thai, for
example, has such voiced stop consonant as /d/ in /daaw/ “star”. The second category is
short lag VOT or voiceless unaspirated in which voicing starts usually within 30
milliseconds (ms.) after the release of the stop, such as the /t/ in Spanish. The last one is
long lag in which voicing starts about 30 ms. after the stop is released, as in English, e.g.
/p/ as in pen, /t/ as in tea, and /k/ as in cool. According to the above categorization of
VOT, the unaspirated stops in Mandarin such as /p/, /t/ and /k/ are assumed to
correspond to the type of short lag. Presumably their aspirated counterparts /ph/, /th/,
and /kh/ correspond to long lag.
The distinction of aspiration in Mandarin Chinese is used to discriminate
syllable-initial stop consonants (Cheng 1973, Li and Thompson 1981, Wei 1997, Xu
1990). The distinction between aspirated and unaspirated stops can be illustrated by the
following minimal pairs (Cheng 1973:35-36): ba /pa/ “father” vs. pa /pha/ “to fear”; da
/ta/ “big” vs. ta /tha/ “to step”; gu /ku/ “old” vs. ku /khu/ “bitter”.
As revealed by the above examples aspiration is one of the major distinctive
features which serve to contrast three pairs of cognate stops in three different places of
articulation. It distinguishes the two labial sounds /p/ from /ph/, the two dentals /t/ from
/th/, and two velars /k/ from /kh/. In some languages such as English aspiration is also
designated as voicing.
This study aims to investigate Mandarin-speaking children’s acquisition and
development of word-initial stops with the application of VOT. The objective of this
experimental study is to figure out the possible developmental stages in the acquisition
of the stops at different places of articulation and of different values of aspiration.
Furthermore the results of acquisition can be compared and contrasted with other
previous findings in other languages to draw a common developmental tendency. The
following chapter two reviews the literature whose research topic is on the acquisition
of stop consonants. Chapter three outlined the methodology of the longitudinal and
cross-sectional studies in this experiment and the administration of a pretest. Chapter
four and five presents and discuss the results respectively.