• 沒有找到結果。

Lexical tone is a feature that each Mandarin syllable possesses. There are four

lexical tones4 in Mandarin. Chao (1948) described the 1st Tone (abbreviated as T1 hereafter) as a high-level tone, the 2nd Tone (T2) as a high-rising tone, the 3rd Tone (T3) as a low-dipping tone, and the 4th Tone (T4) as the high-falling tone. In order to make the illustrations of the four tones more concrete and accurate, this section will present how the four tones are realized with respect to fundamental frequency (Section 2.2.1), duration (Section 2.2.2), and amplitude (Section 2.2.3) in acoustic studies.

2.2.1 Fundamental Frequency

The four basic Mandarin tones have been observed to be different from each other in F0 in production. Moore & Jongman (1997) provided a plot of a female’s F0 when she pronounced the Mandarin syllable ma5 in each of the four tones as a demonstration of F0 changes. The track of F0 stayed high and flat around 245 Hz throughout the syllable for T1. When producing T2, the F0 started with a median height at 220 Hz, dropped a little to 210 Hz, and then went all the way up to a high point at 260 Hz. T3’s F0 began at a low point at 200 Hz and gradually descended to 180 Hz, which was followed by a rise back to 200 Hz. The starting point of T4 was

4 The discussion of the neutral tone is not included here, so there are four (but not five) tones in total.

The F0 pattern of the neutral tone varies according to the preceding tone (Jongman et al., 2006). Since the neutral tone does not have a constant F0 feature as the four tones do, it is excluded in the discussion here.

5 The Mandarin syllables are spelled in Pinyin in italic letters throughout this thesis. The tone of each

the highest point at 270 Hz and it dropped drastically to the lowest point at 180 Hz.

The F0 tracks of the tones are consistent with Chao’s (1948) descriptions, with the exception of T2, which in fact has a short period of minor down-going F0 before the major rising.

The track of F0 can influence the perception of Mandarin tones. Yang (2010) carried out an experiment on the perception of the resynthesized syllable tao with varying staring points and ending points of F0, followed by the syllable qian2 (錢)

“money” or shui4 (稅) “tax”. The syllable tao, according to Yang, can mean “pay” if pronounced in T1 (掏), “avoid” in T2 (逃), “ask for” in T3 (討), and “drag” in T4 (套).

It was found that a syllable having a starting F0 value close to the ending value would be perceived as T1. If a syllable had an obviously lower starting point than the ending, it would be perceived as T2 or T3. When the starting point was apparently higher than the ending, the syllable would be perceived as T4. This result did not specify the factor(s) that could differentiate T2 from T3. Shen and Lin (1991) investigated the differences between T2 and T3 in perception. They synthesized falling-rising tones that varied in the two perspectives: the magnitude of the falling and the timing of the turning point (i.e., the ending point of the falling part). The magnitude was defined by the F0 difference between the starting point and the turning point, and the timing of the turning point was defined by the percentage of the length of the falling part over

the length of the whole syllable; for example, a turning point at the 25 ms of a 250-ms-long syllable would be defined as “the 10% turning point.” It was found that when a tone had a 30 Hz falling magnitude and a turning point that occurred before the 40% timing, it would mostly be identified as a T2. The tones with a 30 Hz falling magnitude and a turning point later than the 40% timing would mostly be identified as a T3. In the tones with a 15 Hz falling magnitude, turning points prior to around the 60 – 70 % timing would lead to a T2 perception, while turning points after the 60 – 70

% timing would be perceived as T3. Different T2/T3 thresholds based on the timing of turning point were found for different falling magnitudes, which indicated that the turning point timing and the falling magnitude could help differentiate T2 from T3.

2.2.2 Duration

As demonstrated by Moore & Jongman’s (1997) measurements, the four tones differed in their duration. T2 and T3 were similar in length (almost 300 ms), and they were the longest among the four tones. T1 (about 250 ms) was slightly shorter than T2 and T3. T4 (about 175 ms) was the shortest. Moore & Jongman’s (1997)

measurements did not show an apparent difference between T2 and T3 in duration. In Shen (1990), it was found that T3’s duration is actually longer than T3.

Liu & Samuel (2004) examined the relation between the four tones’ duration and

identification. They used whispered Mandarin tones in their perception test in order to remove the cue from F0. It was found that the participants were able to identify T1, T3 and T4 based on the tones’ durations. According to Liu & Samuel, when

whispering, the effect of F0 was not realized; hence, exaggerated duration contrasts were utilized by the listeners to differentiate tones.

In conclusion, duration does differentiate the four tones in perception; however, the differentiation based on duration is possible when the durations are produced in exaggeration, in order to compensate for the lack of F0 contrasts.

2.2.3 Amplitude

Whalen & Xu (1992) provided their measurements of the amplitude contour of a male speaker producing the Mandarin syllable yi in the four tones. The contours all began with a rise and ended with a drop, with a period of fluctuation in between. T1 had the highest peak amplitude. The peak amplitude of T4 was the second highest.

T3’s peak amplitude was the lowest, and T2 was the second lowest.

In Whalen & Xu’s (1992) perception test, they used a metronome to pace a male speaker to produce the four tones at different speeds, which gave rise to the

production of the four tones in their typical and atypical durations. For each tone, the token with a duration typical of the tone was selected as the experiment material. In

addition, the tokens with durations atypical of the tone but typical of other tones were also selected. In other words, among the tokens selected, each tone was produced in four different durations, the durations typical of T1, T2, T3 and T4. The F0 cues of all the tokens were removed. Therefore, only the duration and amplitude cues were preserved. When examining the perception of each tone with its typical duration, the chance of correctly identifying T2, T3 and T4 were higher than 50%, with T1 lower than 50%. When examining the perception of each tone with all the four durations, the chances of “properly” identifying a tone based on its duration (e.g., identifying T2 as T3 when the T2 had a duration that was typical of T3) were all lower than 50%. If the listeners’ identification of tones was totally dominated by duration, they should perceive a T2 which had a duration typical of T3 as T3. However, the low percentage of this kind of identification showed that it was not very likely. Therefore, duration was not the dominant cue. In this experiment, the only two types of cues available were duration and amplitude. It appeared that, when the F0 cue was unavailable and the duration cue was misleading, the amplitude cue could override the duration cue and enable the listeners to identify tones correctly.

2.2.4 The Dominance of F0 in Mandarin Tones

The previous sections have shown that Mandarin tones differ from each other in

terms of F0, duration and amplitude in production. In addition, each of the three acoustic parameters contributes to the recognition of the tones. However, F0 is the dominant cue among the three for identifying Mandarin tones.

Looking at Mandarin tones from a descriptive approach, Chao (1948) illustrate the four tones as “high-level”, “high-rising”, “low-dipping” and “high-falling” based on audible pitch, which is the equivalent of F0 in acoustics. Also, as pointed out by Liu & Samuel (2004), F0 cues are used to teach children or adults when they are learning Mandarin. If F0 were not the important dominant cue for Mandarin tones, it would not be adopted to describe or teach Mandarin tones.

In studies investigating Mandarin tones from the acoustic perspectives, F0 has been considered the primary cue (Whalen & Xu, 1992; Liu & Samuel, 2004; Jongman et al., 2006). Though duration and intensity has been found to contribute to the

recognition of Mandarin tones, they only served as a secondary cue. For example, as mentioned earlier, Liu & Samuel (2004) found that when F0 cues were not available, duration could help distinguish T1, T3, and T4. Also, Whalen & Xu (1992) found that listeners were able to recognize T2, T3 and T4 based on amplitude contour when F0 cues were removed. In these two studies, duration and amplitude only worked as a

“backup parameter” when the parameter F0 was absent. Besides, they were only able to differentiate three but not four tones. The “backup” quality and the weakness in

differentiating four tones completely showed that duration and amplitude cannot be the effective dominant cue for differentiating Mandarin tones.

To sum up, Mandarin tones differ from each other in F0, duration and amplitude in production; however, when it comes to perception, F0 is the dominant cue.

相關文件