CHAPTER 2 LITERATURE REVIEW
2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
2.3 Mandarin phonological system
There are 12 combinations of Mandarin syllable structure, including V, CV, GV,
VG, VN, CVG, CVN, CGV, GVG, GVC, CGVG, and CGVN. In Mandarin, a syllable
is traditionally divided into three parts, including an optional initial, a final and a tone
(C. Cheng, 1973). The initial part can be a nasal or a consonant. The final part
contains an optional prenuclear glide, a vowel, and an optional postnuclear glide or a
nasal. However, during the past two decades, the status of the prenuclear glides in
Mandarin syllable has raised many debates (Bao, 2002; Yip, 2002; Duanmu, 2002;
Wan, 2002a). Under the study, since the status of the prenuclear glide is not the focus,
the prenuclear glide was not grouped with the onset or the rhyme and was replaced by
the hiccup noise alone just as the initial consonant and the vowel. Last but not least, in
order not to let the duration of the rime be much longer than that of the prenuclear
glide and that of the initial consonant, the rime was further divided into a vowel plus a
postnuclear glide, or a vowel plus a final nasal. Each part of the rime could be
replaced by the hiccup noise individually.
2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin
In Taiwan Mandarin, there are overall 21 onset consonants, namely, six oral stops
/p/, /pȹ/, /t/, /tȹ/, /k/, /kȹ/, two nasal stops, /m/, /n/, six fricatives /f/, / /, /x/, / /, /s/, / /,
six affricates /t /, /tȹ /, /t /, /tȹ /, /ts/, /tȹs/, and one liquid /l/. In the following sections,
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
the acoustic-phonetic characteristics of those onset consonants are introduced. These
characteristics serve as the criteria of segmentation in experiment 1 and 2.
2.4.1 The acoustic-phonetic cues of stops
There are three acoustic-phonetic cues for distinguishing stops. They are formant
transitions, burst amplitude, and duration.
First, formant transitions are crucial for detecting the place of articulation of
stops. The F2 and F3 transitions from the bilabial stops into the following vowels are
rising. The F2 and F3 transitions from the alveolar stops into the following vowels are
almost flat. The F2 and F3 transitions from velar stops into the following vowel come
together. Second, previous research (Repp, 1984) indicated that the burst amplitude of
labial stops is weaker than that of the alveolar and velar stops. Perceptual experiments
have shown that burst amplitude can influence the identification of labial and alveolar
stops. This effect can be better realized on voiceless stops than voiced stops. Third,
VOT is of paramount importance for the detection of voicing. Stops, which have
relatively long VOT, tend to be perceived as voiceless stops; in contrast, stops, which
have relatively short VOT, are prone to be recognized as voiced stops. In addition,
voiceless aspirated stops have the longest VOT compared with voiced stops, and
voiceless unaspirated stops. In Mandarin, the mean VOTs for /p/, /pȹ/, /t/, /tȹ/, /k/, and
/kȹ/ are 14 ms, 82 ms, 16 ms, 81 ms, 27 ms, and 92 ms, respectively (Chao et al.,
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
2006).
2.4.2 The acoustic-phonetic cues of nasals
According to Ladefoged (2000), there are four acoustic-phonetic cues for
recognizing nasals. First, there is a sharp change in the spectrogram at the time of the
formation of the articulatory closure. Second, the bands of the nasal are lighter than
those of the vowel, which indicates that the intensity of the nasal is weaker than that
of the vowel. Third, the F1 of the nasal is often very low, centered at around 250 Hz.
Fourth, there is a large space above the F1 with no energy. Based on these
acoustic-phonetic cues, nasals can be identified.
2.4.3 The acoustic-phonetic cues of fricatives
The most crucial acoustic-phonetic cue for separating voiceless fricatives from
voiced fricatives is by examining the extended period of noise (Borden et al., 1994).
The extended period of noise can be easily detected on the spectrogram. Voiceless
fricatives have longer duration and stronger intensity. To the contrary, voiced
fricatives (i.e., / / in Mandarin) are shorter in duration and weaker in intensity, but
their formant frequencies are clearer than those of voiceless fricatives.
Fricatives are known for their high-frequency noise in the spectrum, which is an
acoustic-phonetic cue for distinguishing the place of articulation of fricatives. Another
acoustic-phonetic cue for distinguishing the place of articulation of fricatives is the
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
intensity of frication. Sibilants (i.e., /s/, / /, / /, / /, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/in
Mandarin) are noted for relatively steep, high-frequency spectral peaks, whereas
nonsibilants (i.e., /f/ and /x/ in Mandarin) are famous for relatively flat and wider
band spectra. Moreover, alveolar sibilants (i.e., /s/ in Mandarin) can be distinguished
from palatal sibilants (i.e., / /...) by the location of the lowest spectral peak. The
lowest spectral peak of the alveolar sibilants is around 4000 Hz, while the lowest
spectral peak of the palatal sibilants is around 2500 Hz. Furthermore, the intensity
shown on the spectrogram can also differentiate the place of articulation of fricatives.
Stronger intensity is the feature of sibilants; weaker intensity, the feature of
nonsibilants. This is because the resonating cavity in front of the alveolar or the
palatal constrictions results in high intensity. However, there is no resonating cavity in
front of the labio-dental constriction, which brings about the relatively weak intensity.
The acoustic-phonetic characterization of fricatives is illustrated in Figure 3.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 3. Acoustic-phonetic characteristics of fricatives (Borden et al., 1994)
Figure 3 shows how listeners perceive fricatives. When the listener hears an
input, it enters the first filter and is judged by whether it has noisy sound with
relatively long duration. If the answer is yes, the input is regarded as a fricative and
sent to the next filter. In the second filter, the input is examined by whether its
intensity is relatively high. If the answer is yes, the input is considered a sibilant and
sent to the next filter. In the third filter, the input is investigated by its first spectral
peak. If the first spectral peak of the input is around 4kHz, it is viewed as /s/ or /z/ and
sent to the next filter. In the fourth filter, the input is judged by “phonation exists or
duration and intensity small enough?” If the answer is yes, the input is perceived as /z/;
if the answer is no, it is perceived as /s/. By those filters, the input is examined step by
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
2.4.4 The acoustic-phonetic cues of affricates
There are three pairs of affricates in Mandarin, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/.
According to Ladefoged (2000), an affricate is simply a sequence of a stop followed
by a homorganic fricative. Therefore, it can be inferred that affricates have the
acoustic-phonetic characteristics of both stops and fricatives.