The acoustic-phonetic cues of the consonants in Taiwan Mandarin

CHAPTER 2 LITERATURE REVIEW

2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.3 Mandarin phonological system

There are 12 combinations of Mandarin syllable structure, including V, CV, GV,

VG, VN, CVG, CVN, CGV, GVG, GVC, CGVG, and CGVN. In Mandarin, a syllable

is traditionally divided into three parts, including an optional initial, a final and a tone

(C. Cheng, 1973). The initial part can be a nasal or a consonant. The final part

contains an optional prenuclear glide, a vowel, and an optional postnuclear glide or a

nasal. However, during the past two decades, the status of the prenuclear glides in

Mandarin syllable has raised many debates (Bao, 2002; Yip, 2002; Duanmu, 2002;

Wan, 2002a). Under the study, since the status of the prenuclear glide is not the focus,

the prenuclear glide was not grouped with the onset or the rhyme and was replaced by

the hiccup noise alone just as the initial consonant and the vowel. Last but not least, in

order not to let the duration of the rime be much longer than that of the prenuclear

glide and that of the initial consonant, the rime was further divided into a vowel plus a

postnuclear glide, or a vowel plus a final nasal. Each part of the rime could be

replaced by the hiccup noise individually.

2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin

In Taiwan Mandarin, there are overall 21 onset consonants, namely, six oral stops

/p/, /pȹ/, /t/, /tȹ/, /k/, /kȹ/, two nasal stops, /m/, /n/, six fricatives /f/, / /, /x/, / /, /s/, / /,

six affricates /t /, /tȹ /, /t /, /tȹ /, /ts/, /tȹs/, and one liquid /l/. In the following sections,

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

the acoustic-phonetic characteristics of those onset consonants are introduced. These

characteristics serve as the criteria of segmentation in experiment 1 and 2.

2.4.1 The acoustic-phonetic cues of stops

There are three acoustic-phonetic cues for distinguishing stops. They are formant

transitions, burst amplitude, and duration.

First, formant transitions are crucial for detecting the place of articulation of

stops. The F2 and F3 transitions from the bilabial stops into the following vowels are

rising. The F2 and F3 transitions from the alveolar stops into the following vowels are

almost flat. The F2 and F3 transitions from velar stops into the following vowel come

together. Second, previous research (Repp, 1984) indicated that the burst amplitude of

labial stops is weaker than that of the alveolar and velar stops. Perceptual experiments

have shown that burst amplitude can influence the identification of labial and alveolar

stops. This effect can be better realized on voiceless stops than voiced stops. Third,

VOT is of paramount importance for the detection of voicing. Stops, which have

relatively long VOT, tend to be perceived as voiceless stops; in contrast, stops, which

have relatively short VOT, are prone to be recognized as voiced stops. In addition,

voiceless aspirated stops have the longest VOT compared with voiced stops, and

voiceless unaspirated stops. In Mandarin, the mean VOTs for /p/, /pȹ/, /t/, /tȹ/, /k/, and

/kȹ/ are 14 ms, 82 ms, 16 ms, 81 ms, 27 ms, and 92 ms, respectively (Chao et al.,

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2006).

2.4.2 The acoustic-phonetic cues of nasals

According to Ladefoged (2000), there are four acoustic-phonetic cues for

recognizing nasals. First, there is a sharp change in the spectrogram at the time of the

formation of the articulatory closure. Second, the bands of the nasal are lighter than

those of the vowel, which indicates that the intensity of the nasal is weaker than that

of the vowel. Third, the F1 of the nasal is often very low, centered at around 250 Hz.

Fourth, there is a large space above the F1 with no energy. Based on these

acoustic-phonetic cues, nasals can be identified.

2.4.3 The acoustic-phonetic cues of fricatives

The most crucial acoustic-phonetic cue for separating voiceless fricatives from

voiced fricatives is by examining the extended period of noise (Borden et al., 1994).

The extended period of noise can be easily detected on the spectrogram. Voiceless

fricatives have longer duration and stronger intensity. To the contrary, voiced

fricatives (i.e., / / in Mandarin) are shorter in duration and weaker in intensity, but

their formant frequencies are clearer than those of voiceless fricatives.

Fricatives are known for their high-frequency noise in the spectrum, which is an

acoustic-phonetic cue for distinguishing the place of articulation of fricatives. Another

acoustic-phonetic cue for distinguishing the place of articulation of fricatives is the

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

intensity of frication. Sibilants (i.e., /s/, / /, / /, / /, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/in

Mandarin) are noted for relatively steep, high-frequency spectral peaks, whereas

nonsibilants (i.e., /f/ and /x/ in Mandarin) are famous for relatively flat and wider

band spectra. Moreover, alveolar sibilants (i.e., /s/ in Mandarin) can be distinguished

from palatal sibilants (i.e., / /...) by the location of the lowest spectral peak. The

lowest spectral peak of the alveolar sibilants is around 4000 Hz, while the lowest

spectral peak of the palatal sibilants is around 2500 Hz. Furthermore, the intensity

shown on the spectrogram can also differentiate the place of articulation of fricatives.

Stronger intensity is the feature of sibilants; weaker intensity, the feature of

nonsibilants. This is because the resonating cavity in front of the alveolar or the

palatal constrictions results in high intensity. However, there is no resonating cavity in

front of the labio-dental constriction, which brings about the relatively weak intensity.

The acoustic-phonetic characterization of fricatives is illustrated in Figure 3.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Figure 3. Acoustic-phonetic characteristics of fricatives (Borden et al., 1994)

Figure 3 shows how listeners perceive fricatives. When the listener hears an

input, it enters the first filter and is judged by whether it has noisy sound with

relatively long duration. If the answer is yes, the input is regarded as a fricative and

sent to the next filter. In the second filter, the input is examined by whether its

intensity is relatively high. If the answer is yes, the input is considered a sibilant and

sent to the next filter. In the third filter, the input is investigated by its first spectral

peak. If the first spectral peak of the input is around 4kHz, it is viewed as /s/ or /z/ and

sent to the next filter. In the fourth filter, the input is judged by “phonation exists or

duration and intensity small enough?” If the answer is yes, the input is perceived as /z/;

if the answer is no, it is perceived as /s/. By those filters, the input is examined step by

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

2.4.4 The acoustic-phonetic cues of affricates

There are three pairs of affricates in Mandarin, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/.

According to Ladefoged (2000), an affricate is simply a sequence of a stop followed

by a homorganic fricative. Therefore, it can be inferred that affricates have the

acoustic-phonetic characteristics of both stops and fricatives.

在文檔中臺灣華語的口語詞彙辨識歷程: 從雙音節詞來看 - 政大學術集成 (頁 29-34)

The acoustic-phonetic cues of the consonants in Taiwan Mandarin

CHAPTER 2 LITERATURE REVIEW

2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學