Chapter 1 Introduction
1.3 The framework of the thesis
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
effects of markedness in Taiwan Mandarin investigated the initial consonants regarding place of articulation (Wan, 2002) and nasal codas (Hsu, 2011). Wan (2002) found that consonants at different places of articulation tend to substitute for unmarked coronals, while Hsu (2011) found that in the coda position, it is the marked velar nasal that tends to be replaced.
The purpose of this thesis is to investigate the voice onset time of the stops and the affricates in Taiwan Mandarin. In addition, the present study aims to elicit speech errors and to figure out the substitution pattern of markedness effect regarding aspiration in Taiwan Mandarin. Therefore, two experiments would be conducted by using non-word materials as stimuli.
1.3 The framework of the thesis
In Chapter 2, the relevant studies will be reviewed; in section 2.1, the definition and studies related to voice onset time of stops and affricates will be discussed, and in section 2.2, the experimental studies of speech errors and the markedness effect will be addressed. Finally, in section 2.3, the research questions of the present study will be stated. In Chapter 3, section 3.1 will briefly introduce the background of the subjects for the present study, and in section 3.2, the equipment applied in the two experiments will be described. The materials of the experiments will be elaborated in
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
section 3.5, the data analysis regarding the measurement of the voice onset time and the transcription of speech errors will be discussed. The results and analysis will be in Chapter 4. Section 4.1 will display the values of the voice onset time for the stops and the affricates, and in section 4.2, the speech error analysis will be provided. The discussion and conclusion of this study will be presented in Chapter 5.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Chapter 2
Literature Review
The present study investigates the voice onset time (henceforth, VOT) and the markedness effect of Mandarin unaspirated and aspirated stops and affricates in terms of experimental elicited data. The VOT of either focusing on one language or cross-linguistically has been studied following the pioneer of Lisker and Abramson (1964), while the examination of speech errors via experimental induction has flourished since Baars, Motley and MacKay (1975) and Motley and Baars (1976) proposed the classic SLIP technique. Therefore, in this chapter, previous studies concerning the VOT, experimental studies of speech errors, and markedness effect on the consonants will be discussed. In section 2.1, the basic idea and definition regarding VOT and the research of VOT on stops and affricates will be reviewed. In section 2.2, previous studies on experimental induction of speech errors will be addressed; and section 2.3 will display the relations of markedness and previous studies related to markedness in Mandarin will be reviewed. The research questions for this thesis will be addressed in section 2.4.
2.1 Voice onset time
In this section, the issues of VOT on stops and affricates will be introduced. First,
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
stops cross-linguistically and of Mandarin will be in 2.1.2, while those on affricates will be illustrated in 2.1.3. The effects of other factors on VOT will be addressed in 2.1.4, and the summary of this section will be in 2.1.5.
2.1.1 Definition of VOT
Voice onset time (VOT), referred to as the time duration between the burst release of the initial stop and the voicing onset or vocal fold vibration of the following vowel (Ladefoged & Johnson, 2011; Lisker & Abramson, 1964), has been regarded as a convincing tool to measure and discuss the acoustic properties of consonants. The VOT values can be clearly measured through the waveform and the spectrogram (see Figure 2.1).
Figure 2.1 The spectrogram of [tha55]1
The spike on the spectrogram or the short vibration on the waveform indicates where the burst of the stop starts, and it is followed by the stop gap, the noise (aspiration), and then the formants of the vowel (Kent & Read, 2002). This figure shows the spectrogram of [tha55] in Mandarin. As it is clearly shown in the
1 The spectrogram is analyzed using KayPANTAX, provided by Phonetics and Psycholinguistics Lab at National Chengchi University.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
spectrogram, there is a spike indicating the release of the stop. The VOT is measured starting from this spike to the onset of the voicing, as represented in the red block of the figure.
Kent and Read (2002), in addition, defined the perceptual components of VOT as four acoustic events: transient, frication interval, aspiration, and onset of voicing. As an important phonetic cue, the values of VOT distinguish the voicing and the aspiration of syllable-initial consonants, and they may vary depending on various factors such as the place of articulation or speech rate.
2.1.2 Stops
In relation to the stop consonants, they have been described in different ways. On one hand, there are three acoustic or phonetic phases of syllable-initial stops: closure, release (which can be further divided into aspirated/unaspirated), and formant transition (Kent & Read, 2002). On the other hand, stops can be further investigated in terms of voicing and aspiration, and they contrast variously in different languages.
Lisker and Abramson (1964) did a classic cross-linguistic study and classified the eleven languages into three groups based on the number of the contrast of the stops;
they were “two-category languages”2 (e.g., German, contrast in voicing, having voiced and voiceless unaspirated stops), “three-category languages” (e.g., Korean),
2 Two-category languages refer to those languages that the stops contrast in either voicing or aspiration;
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
and “four-category languages” (e.g., Hindi). In their study, they introduced the notion of “voicing lead” and “voicing lag” of VOT. Voicing lead is represented by negative values, for the voicing starts before the stop release, while voicing lag is represented by positive values, for the voicing starts after the stop release. Voicing lag can be further classified as “short lag” and “long lag” depending on how late the voicing quality is; short lag ranges around 0-25 ms, and long lag ranges higher than 35 ms.
(Auzou et al., 2000; Keating, 1984; Kent & Read, 2002). As a result, VOT is able to differentiate unaspirated stops and aspirated stops.
In their study, Lisker and Abramson (1964) measured the VOT of the initial stops in 11 languages and pointed out that it possesses the function of distinguishing voicing, aspiration and the place of articulation effectively. Their result showed that the VOT ranges for the stops in two-category languages are -125~-75ms (voiced), 0-25ms (voiceless unaspirated), and 60-100ms (voiceless aspirated). Moreover, regarding aspiration, they found that the VOT values of aspirated stops are longer than unaspirated stops; when it comes to the place of articulation, the values of the velar stops are the longest among labials, apical (dental and alveolar), and velar. Their classic study inspired many researchers to study VOT in different languages. Cho and Ladefoged (1999) integrated previous research findings and proposed three major principles of VOT regarding place of articulation. First, the duration of VOT has a
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
further back relationship with the closure. Second, the duration of VOT changes with the degree of extension of the contact area. Third, the duration of VOT changes with the speed of the shift of the articulator. Therefore, the VOT values of the world languages are supposed to generally follow the principle that the values of the alveolar stops are shorter than those of the velar stops, but are higher than those of the bilabial stops (Cho & Ladefoged, 1999; Kent & Read, 2002). While the universal VOT values seem to follow the principle, in the study of Cho and Ladefoged (1999), the VOT of word-initial voiceless aspirated and voiceless unaspirated stops across 18 languages was measured. It was reported that the VOT values of velar stops are the longest among bilabial and coronal stops; however, in some languages, the difference of the values of bilabial stops and coronal stops do not reach significance, and VOT values are believed to be language-specific. Besides, their study provided the boundaries of the VOT values for unaspirated and aspirated stops and classified them into four categories as well: 30 milliseconds for unaspirated stops, 50 milliseconds for slightly aspirated stops, 90 milliseconds for aspirated stops, and over 90 milliseconds for highly aspirated stops.
Mandarin oral stops are voiceless, and they are two-way contrasted in aspiration:
voiceless aspirated stops and voiceless unaspirtaed stops. Table 2.1 displays the consonant inventory of Mandarin from Lin (2007).
‧
Table 2.1 The consonant chart of Mandarin Chinese (Lin, 2007) bilabial labio- /k/, /kh/, and they could be further classified as bilabial stops, dental stops, velar stops, based on their place of articulation. In addition, the oral stops can only occur syllable-initially.
The VOT of the stops has been investigated a lot, either in Beijing Mandarin or in Taiwan Mandarin. Z. Wu (1987) and H. Liu et al. (2007) discussed the VOT of the stops in Beijing Mandarin, and the VOT of the consonants of Taiwan Mandarin has been probed into since Chen, Tsay, and Hong’s (1998) comprehensive study. The VOT values of the stops in Taiwan Mandarin have then been investigated by several studies (H. L. Wu, 2009; H. M. Liu et al., 1999; Jeng, 2005; Lai, 2013; L. M. Chen et al., 2007; Peng, 2009); the figures of the studies are shown in Table 2.2.
Table 2.2 The mean VOT values (ms) of Taiwan Mandarin Oral Stops
[p] [ph] [t] [th] [k] [kh]
‧
Chen, Tsay, & Hong (1998)
13.9 74.3 14.7 81.2 24.2 88.5
H. M. Liu et al. (1999) 9 72 14 74 24 83
Jeng (2005) 11 80 19 68 23 87
Chen, Chao, & Peng (2007)
13.9 77.8 15.3 75.5 27.4 85.7 H. L. Wu (2009) 13.60 62.08 14.97 62.21 34.51 79.11 Peng (2009) 14.68 89.4 14.73 86.52 31.05 109.21 Lai (2013) 47.69 91.99 62.04 171.40 62.63 168.40
Z. Wu (1987) explored the issue of Beijing Mandarin’s prevocalic stops and affricates’ duration from two aspects; one was from biological point of view by measuring the pressure and airstream in the vocal tract, and the other was from acoustic point of view through the analysis of intensity and spectrograms. His results showed that the duration time of unaspirated stops are 8 times shorter than that of aspirated stops. However, his study only provided the mean duration of each stop and affricate under different vowel context instead of an averaged mean figure of each stop and affricate. H. Liu et al. (2007) analyzed and compared the VOT of the Beijing Mandarin stops by esophageal speakers and normal speakers in terms of place of articulation and aspiration. The figures of each individual consonant were not provided; however, it was reported in their study that regarding normal speakers, the VOT of the velar stops is significantly the longest among initial oral stops, and the values of the unaspirated stops is claimed to be shorter compared with that of the aspirated ones; the findings of which correspond to the general rules and principles.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Concerning the VOT of Taiwan Mandarin, K. Chen et al. (1998) did a comprehensive survey over the duration time on the initial consonants. The findings suggested that the duration varies due to some factors. Aspiration is one of the two main factors that the duration of the unaspirated sounds (e.g., [p]) is shorter than that of the aspirated counterparts (e.g., [ph]). Besides, the duration is influenced by the place of articulation as well. Take unaspirated stops for example; the duration of labial stop [p] is shorter than that of dental stop [t], which is shorter than that of velar stop [k]. The finding of which is in line with the principle proposed by Cho and Ladefoged (1999).
Other research studied the VOT in different aspects; some of them only explored the issue in Taiwan Mandarin (H. M. Liu et al., 1999), some of them compared the VOT of stops in Taiwan Mandarin vs. the VOT of English stops produced by Taiwanese native speakers (L. M. Chen et al., 2007), or VOT of Taiwan Mandarin vs.
Hakka (H. L. Wu, 2009; Peng, 2009; Peng et al., 2009), and the others of them compared the VOT of Taiwan Mandarin stops and affricates produced by native Taiwan Mandarin speakers and produced by international learners learning Taiwan Mandarin as their second language (Lai, 2013). The methods of these studies were alike that disyllabic word lists were created and the participants were asked to read them. As a result, most of the results of the Mandarin VOT in previous studies were similar, as in Table 2.2. From their results, the VOT can be further interpreted
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
regarding aspiration and place of articulation. With respect to aspiration, the VOT of aspirated stops are longer than those of unaspirated ones; the findings of which confirm the cross-linguistic studies found in the world languages (H. Liu, 2007; H. L.
Wu, 2009; H. M. Liu et al., 1999; Jeng, 2005, 2011; K. Chen et al., 1998; L. M. Chen et al., 2007; Peng, 2009; Z. Wu, 1987). All VOT of the aspirated stops from previous studies were higher than 35 milliseconds, proving that the aspirated stops in Taiwan Mandarin fall in the ‘long lag’ category. With respect to the place of articulation, the VOT values of previous studies are similar. As Table 2.2 shows, except that the figures provided by Lai (2013) are apparently higher, which might be due to the design of the test materials and the location of the target words, the VOT values of each consonants from the rest of the studies generally fall within a range: [p] between 9 and 14.68 milliseconds, [ph] between 62.08 and 89.4 milliseconds, [t] between 14 and 19 milliseconds, [th] from 62.21 to 86.52 milliseconds, [k] from 23 to 34.51 milliseconds, and [kh] from 79.11 to 109.21 milliseconds. The figures from Table 2.2 show that in previous studies, the VOT values of the velar stops are significantly the longest and those of the bilabial stops being the shortest, the findings of which conform to the principle. However, the findings of Jeng (2005) and L. M. Chen et al.
(2007) revealed that the VOT values of the voiceless aspirated stops do not follow the principle; instead, voiceless aspirated dental stops have shorter VOT values than
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
voiceless aspirated bilabial stops do.
2.1.3 Affricates
Affricates, such as [dʒ] or [tʃ] in English, are phonemes that consist two segments and are a composition of stop and homorganic fricative (Kent & Read, 2002;
Ladefoged & Johnson, 2011). Kent and Read (2002) stated that affricates have the acoustic properties of stops and fricatives—affricates are equivalent to stops that there is “a period of complete obstruction in the vocal tract,” and are equivalent to fricatives that there is “a period of frication” but the length is shorter than that of the fricatives.
Since affricates have the acoustic quality of stops, VOT is believed to serve as an efficient tool to do the measurement. Yiu (2008) measured the closure period, fricative period, and VOT of Cantonese stops ([t], [th]), fricatives ([s]), and affricates ([ts], [tsh]). Her results were in accordance with the general rule that the VOT values of the aspirated stops and affricates are longer than their counterpart: the VOT values of [t]
are 52.91 milliseconds, [th] being 69.11 milliseconds, [ts] being 79.80 milliseconds, and [tsh] being 106.01 milliseconds.
Mandarin, similar to Cantonese, has more affricates than English does. Unlike English affricates, which contrast in voicing, Chinese affricates contrast in aspiration.
The affricates in Chinese involve dental affricates, [ts], [tsh], post-alveolar affricates, [tʂ], [tʂh], and alveolo-palatal affricates, [tɕ], [tɕh], as shown in Table 2.1.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Contrary to the numerous amount of research in stops, affricates in Chinese have not been much discussed. Z. Wu (1987) pointed out that, in Beijing Mandarin, the duration of voiceless unaspirated affricates is three times shorter than that of voiceless aspirated affricates, inasmuch as the time for aspiration is condensed by the presence of the fricative part. Regarding Taiwan Mandarin, past research has probed the issue of affricates by measuring the noise duration. Tse (1988) measured the noise duration on Mandarin fricatives and affricates and some of the findings were as follows: (1) the unaspirated affricates are produced to have shorter mean duration than its counterpart, (2) voiceless unaspirated affricates and voiceless aspirated affricates have shorter mean duration than that of voiceless fricatives. Later studies measuring the noise duration demonstrated corresponding results (H. M. Liu et al., 1999; Jeng, 2005; K.
Chen et al., 1998). While past research studied the affricates via the measurement of the noise duration, since affricates own the acoustic property of stops, recent research has started to study the affricates by measuring the VOT (H. L. Wu, 2009; Lai, 2013).
An example of Mandarin word [tʂha55], with a word-initial affricate, is presented in Figure 2.2.
‧
Mandarin affricates show that an affricate is composed of a stop ([t]) together with a fricative ([s], [ʂ], [ɕ]). That is, there exists a spike representing the burst of stop, followed by the fricative noise and aspiration in its spectrogram, as shown in the block of Figure 2.3.
Presented in Table 2.3, the numbers show that by measuring the VOT the results are not much different from those done by measuring the noise duration.
Table 2.3 The mean noise duration (ms) & VOT (ms) of Mandarin Affricates [ts] [tsh] [tʂ] [tʂh] [tɕ] [tɕh]
The figures in Table 2.3 show that the variation of the affricates is large, for the range of the duration and of the VOT are not small. Except the apparently higher figures of Lai (2013), the duration of unaspirated dental affricate [ts] ranges from 58.9
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
milliseconds to 90 milliseconds, aspirated dental affricate [tsh] from 101.68 milliseconds to 166.3 milliseconds, unaspirated post-alveolar [tʂ] from 55.6 milliseconds to 87 milliseconds, aspirated post-alveolar [tʂh] from 114.6 milliseconds to 153 milliseconds, unaspirated alveolo-palatal [tɕ] from 72 milliseconds to 84 milliseconds, and aspirated alveolo-palatal [tɕh] from 123.5 milliseconds to 145.8 milliseconds.
The figures reveal that there exists a big variation across studies; however, there are two points in common: aspiration and place of articulation. Regarding aspiration, the results are consistent with the general rule that the aspirated affricates have longer noise duration/ VOT than unaspirated ones; regarding place of articulation, the results correspond to what Tse (1988) and Lai (2013) found –the variation between different places of articulation of voiceless unaspirated affricates do not reach the level of significance, so does it between those of voiceless aspirated affricates.
2.1.4 The effect of other factors on VOT
The results of previous studies indicated that some factors would influence the duration time/ VOT values of either stops or affricates, such as speech rate, the position of the target, or vowel context. It has been reported that when the stops or affricates are produced under the context of continuous speech, either in sentences or paragraphs, the VOT values would be shorter than those when they are in isolated
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
words or isolated syllables (Howell & Rosen, 1983; Lisker & Abramson, 1964).
Concerning the speaking rate, Jeng (2005, 2011) investigated the duration of Taiwan Mandarin consonants under different speech rate. Her results showed that for the affricates, the noise duration would be shorter if the speech rate turns faster; for the aspirated stops, the VOT would be shorter if the speech rate turns faster, and the circumstance of which, however, does not occur to unaspirated stops.
Apart from these two factors, vowel context affects the VOT values as well.
However, it has not been discussed much. K. Chen et al. (1998) proposed that the vowel context has influence on the duration of the consonants. Previous studies in Taiwan Mandarin focused on the influence on stops brought by single vowels [i], [u], and [a] (L. M. Chen et al., 2007; Peng, 2009). Findings from the two studies both suggested that the VOT values of stops are affected by the vowel height (i.e. highness or lowness); in other words, stops have shorter VOT when under the context of low vowel [a], and they have longer VOT when under the context of high vowels [i] and [u].
2.1.5 Summary
VOT is an efficient tool to measure the stops and affricates. Table 2.4 lists the previous studies of Taiwan Mandarin reviewed above displaying on stops or affricates or both their research focus are.
‧
Table 2.4 The research focus of VOT in Taiwan Mandarin unaspirated
stops
aspirated stops unaspirated affricates
From the data of previous studies on the stops of Taiwan Mandarin, regarding the unaspirated stops, the results of previous studies showed that the velar stops are apparently longer the dental ones, and that the dental ones are longer than the bilabial ones. Unlike unaspirated stops, the VOT values of aspirated stops conform to the principle that the VOT values of the velar ones are the longest, but there shows no tendency for that of dental ones to be significantly longer than that of bilabial ones.
Moreover, the aspirated stops [ph], [th], [kh], all longer than 35 milliseconds, are proved to fall at the long-lag category. With respect to Mandarin affricates, the data of
Moreover, the aspirated stops [ph], [th], [kh], all longer than 35 milliseconds, are proved to fall at the long-lag category. With respect to Mandarin affricates, the data of