國立臺灣大學文學院語言學研究所 碩士論文
Graduate Institute of Linguistics College of Liberal Arts National Taiwan University
Master Thesis
年齡與韻律對臺灣華語自然語料中鼻音韻尾合流之影響
The Effect of Age and Prosody on Syllable-final Nasal Mergers in Taiwan Mandarin Spontaneous Speech
葉宇喬 Yu-Chiao Yeh
指導教授:馮怡蓁博士 Advisor: Janice Fon, Ph.D.
中華民國 104 年 7 月
July, 2015
i
致謝
三年的碩士求學生涯,漫長得耐人尋味,但也無疑是我人生中最重要的探 索過程。期間經歷了太多事情,需要述說的感謝,太過深切。
首先感謝臺大語言所的師長們,尤其是指導教授馮怡蓁老師為我所做的一 切,無論是專業課程上的指導,抑或是文章寫作以及報告技巧方面的協助, 帶 給我豐富的知識,使我得以一窺在研究領域所需的種種用心。另外也非常感謝 兩位口試委員許慧如老師以及張詠翔老師。謝謝兩位老師能在百忙之中抽空參 與論文口試,並在各自的專業領域內提供非常寶貴的意見。特別感謝張老師針 對論文內文不足之處簡單明確地逐一指出,對於修改論文提供偌大的幫助。
除了論文之外,還有眾多需要感謝的師長。謝謝魏岫明老師帶領我進入語 言學此一領域,開啟我的語言學生涯。謝謝劉德馨老師在課堂上以生動活潑的 教學氣氛帶入各種議題的討論,讓我能一探聲韻學和社會語言學理論發展過程 中的論述與角力。謝謝宋麗梅老師帶來的南島語田野課程,揭開這些鄰近而遙 遠的語言之神秘面紗,也謝謝老師點出我在學術英文寫作上需要加強之處。謝 謝楊秀芳老師的漢語方言及臺灣語言課程,老師清晰而溫厚的課堂解說完美詮 釋了高雅的學者風範,也讓我對生活周遭的各種語言有了全新的認知觀點。另 外還要謝謝張君松老師以及張雯媛老師,二位老師無疑為我提供了研究之外最 為重要的精神糧食。 ขอบคุณเพือทุกสิงทีอาจารย์ทําให้ผม นันคือสิงทีผมจะไม่ลืมแลวก็ลืมไม่ได้ ในทั'งชีวิตของผม Muchas gracias a la profesora para invitarme a ser ayudante, que me ha traído mucho gusto y éxito. 最後還是要再度感謝馮怡蓁老師在語音學方面的指導,也謝謝老 師親身示範了母語保存的實踐過程,讓我重拾對閩語及客語的關注。
感謝實驗室這三年來的諸位學長姊,讓我在求學生涯中始終有人可以詢問。
謝謝 Shelly 學姊在我最青澀的碩一時期擔任助教,幫忙解決各種疑難雜症並提 供眾多學習上的鼓勵。謝謝 Sally 學姊的各種關懷與照顧,總是能在需要的時候 扮演最溫暖的後盾。謝謝 Sarah 學姊在論文準備及寫作期間提供大量的寶貴意 見,讓論文得以順利成形。謝謝聖富學長幫忙照料眾多瑣事,並陪伴我們走過 語料庫建構過程的艱辛。
另外還要感謝陪伴我一路成長的同學們。謝謝翔宇總是和我一同討論各種 語言相關議題,提供我能力所不能及的各種協助,承受我過度的壓力釋放,並 和我一起面對論文過程中遭遇的各種挑戰。謝謝 Ajax 在英文方面提供莫大的協 助,也對語言學習的熱情和方法提供最好的示範,伴我一同跨越許多關卡。謝 謝 Amber 總是那麼溫柔體貼、善解人意,也是和我同時面對論文進度壓力最佳 的精神支柱。謝謝 Angus 一同求學期間的知識交流,讓我認識到語言的世界中 還有無限廣大的空間可以讓我探索。謝謝 Crystal 在令人崩潰的時期帶來舒壓的 著色活動,提供了非常療癒的研究環境。謝謝 Emily 總是帶來歡樂的氣氛,並 無私分享各種實用的寫作小技巧及工具書,真的相當有幫助。謝謝 Jenny 的各 種溫馨小提醒,無意中冒出了關懷都令我深感溫馨。謝謝 Shannon 總是用輕鬆 幽默的觀點看待周遭的事物,聊天漫談的過程中都能幫助我釋放不少壓力。謝 謝 Taco 和 Yvonne 陪伴我度過無數的漫漫長夜,總是為我擔任閩語和華語的最 佳發音人,一同討論時事、討論過去、討論未來,就算在不同的地方工作,我
ii
也知道你們一直都在。謝謝馨妍時不時陪伴我進行各種淺嚐深探的漫談,給予 我放鬆充電的能量,也提供我隨意使用客語的空間,無疑是和我談話的最佳夥 伴。
最後我要感謝我的家人,對於我長達十九年的求學生涯全力支持,未曾有 過一絲怨言,讓我在求學疲累之餘,一回到家就得以安心休息,充滿能量繼續 奮鬥努力。
需要感謝的人事物還有太多太多,謝謝在我周遭出現、與我互動的每一個 人,你們為我帶來的各種幫助,衷心感謝。
iii
摘要
本研究旨在探討臺灣華語中鼻韻尾合流的表現。過去數十年來之研究已確 立鼻音韻尾合流存在地域差異,但關於這些合流現象的起源仍未有定論。因此,
本研究納入不同世代的發音人來更深入探討此一起源議題。另外,透過自然語 料,韻律因素的影響也得以受檢視。本研究共有三十二位男女發音人,年齡層 分佈在兩個不同的世代,皆出生於台北或高雄,每位發音人提供約 30 分鐘長的 談話。探討的變因有五項,其中包含三個社會變因──性別、年齡、地區──以 及兩個語言韻律變因──語句重音、韻律邊界。結果顯示,北部的年輕人最常使 用 /in/ → [iŋ],而南部人大量使用 /iŋ/ → [in],尤其是南部老年女性,兩條 /i/ 母 音後的鼻韻尾合流律之間具有相互競爭的傾向;所有不同社會條件的發音人皆 廣泛使用 /əŋ/ → [ən]。至於語言韻律變因,韻律邊界對於三條合流律皆有整體 強化效果,而語句重音的效果則較為複雜:它對所有社會條件的 /in/ → [iŋ] 都 有強化效果,對年輕人的 /iŋ/ → [in] 也有強化效果,對於年輕人的 /əŋ/ → [ən]
則有抑制的趨勢。本研究結果可進一步推論出這三條合流律處於不同的地位:
/in/ → [iŋ] 是出於臺北的創新,其競爭者 /iŋ/ → [in] 則是源於閩語的負遷移。
/əŋ/ → [ən] 的起源仍無法確定,但也許是因為它未受其他合流律競爭壓迫,故 而使用最為廣泛普遍。韻律邊界對所有合流律都有強化效果,語句重音的效應 則可能與合流律發展程度、合流律的社會意涵以及鼻音的有標程度相關。
關鍵字:鼻韻尾合流、語句重音、韻律邊界、自然語料、臺灣華語
iv
ABSTRACT
The present study investigated the performance of syllable-final nasal mergers in Taiwan Mandarin. After decades of research, past studies on this topic have
identified regional difference in the application of the mergers, but their origins are still under debate. In this study, generational difference was examined in order to explore such an issue into a deeper core. With the usage of spontaneous speech, the effect of prosodic factors was also under examination. Thirty-two speakers of both genders, of two generations, and from two regions, Taipei and Kaohsiung, were recruited and each speaker contributed around 30-minute-long speech data. Effects of five factors were observed, including three social factors—gender, age and region—
and two linguistic factors—prosodic promenince, and prosodic boundary. Results showed that the /in/ → [iŋ] merger was led by the young northerners, whereas the /iŋ/
→ [in] merger was dominated by the southerners, especially the old southern females.
The two post-/i/ nasal mergers were generally in competition with each other. On the other hand, the /əŋ/ → [ən] merger was widely applied by speakers of all different social factors. As for the effects of linguistic factors, prosodic boundary had an overall strengthening effect for all the three mergers, while prosodic promenince had a more complicated effect: an overall strengthening effect for the /in/ → [iŋ] merger of all social groups, and for the /iŋ/ → [in] merger of the young groups, and a restraining trend for the /əŋ/ → [ən] merger of the young groups. Our results further implicated the different status of the three mergers: the /in/ → [iŋ] merger was an innovation of Taipei origin, while its competitor, the /iŋ/ → [in] merger, was a negative Min transfer. The origin of the /əŋ/ → [ən] merger was still unclear, but perhaps its lack of competition made it become the most frequently used merger.
Although prosodic boundary presented an overall enhancement on all the mergers, the effect of prosodic promenince seemed to interact with rule progression, rule
connotation and markedness of nasals.
Keywords: syllable-final nasal mergers, prosodic promenince, prosodic boundary, spontaneous speech, Taiwan Mandarin
v
TABLE OF CONTENTS
致謝 i
摘要 iii
Abstract iv
1 Introduction 1
1.1 Linguistic background ... 1
1.2 Motivation ... 4
1.3 Aims of study ... 7
1.4 Significance... 9
1.5 Organization ... 9
2 Literature Review 11 2.1 Previous works regarding syllable-final nasal mergers ... 11
2.1.1 Debate on merger types ... 11
2.1.2 Evidence for regional difference ... 13
2.1.3 Possible generational difference ... 15
2.2 The effect of prosody on the application of phonological rules ... 16
2.2.1 Prosodic prominence ... 16
2.2.2 Prosodic boundary ... 18
3 Methods 21 3.1 Data collection of corpus ... 21
3.2 Background of speakers ... 21
3.3 Labeling of target words ... 22
3.3.1 Labeling of realization ... 22
3.3.2 Labeling of intention ... 23
3.4 Prosodic labeling ... 24
3.4.1 Stress labeling ... 25
3.4.2 Break labeling ... 26
4 Results 28 4.1 Overall distribution ... 28
4.1.1 Examples of discarded tokens ... 30
4.1.2 Data for analyses ... 34
4.1.3 Analyses on tokens of nasalized vowels ... 34
4.2 Effects of social factors on nasal mergers... 36
4.2.1 Individual difference ... 39
4.2.2 Nasal merging rates of the speakers who applied mergers ... 42
4.3 Effects of stress on nasal mergers ... 45
4.4 Effects of break on nasal mergers ... 51
5 Discussion 55 5.1 The realizations of syllable-final nasals in spontaneous speech ... 55
5.2 Effect of social factors on nasal mergers ... 56
5.2.1 The /in/ → [iŋ] merger ... 56
5.2.2 The /iŋ/ → [in] merger ... 57
vi
5.2.3 The /əŋ/ → [ən] merger ... 60
5.2.4 The status and possible origins of the mergers ... 61
5.3 Effect of linguistic factors on nasal mergers... 64
5.3.1 The role of stress on nasal mergers ... 64
5.3.2 The role of break on nasal mergers ... 66
6 Conclusion 68
References 69
Appendix I 73
Appendix II 75
vii
LIST OF FIGURES
Figure 1.1 The distribution of the self-identified ethnic groups of Taiwan residents surveyed in 2013. Data were from the 2013TSCS survey (Fu et al., 2014)
with 1952 valid samples ... 2
Figure 1.2 The most often used language at home. Data were from the 2013TSCS survey (Fu et al., 2014). There were 2016 valid samples in the year of 2003 and 1952 valid samples in the year of 2013. Mix usage of Chinese languages stands for bilingual usage of Min and Mandarin (19.5%), or Hakka and Mandarin (1.5%), or trilingual usage of Min, Hakka and Mandarin (0.8%) in the year of 2013 ... 3
Figure 3.1 An example of nasal realization (tier 2) and intention (tier 3) labeling ... 24
Figure 3.2 An example of stress labeling ... 26
Figure 3.3 An example of break indices labeling ... 27
Figure 4.1 An example of truncation ... 31
Figure 4.2 An example of hesitation ... 31
Figure 4.3 An example of syllabic nasals ... 32
Figure 4.4 An example of creaky voice... 32
Figure 4.5 An example of breathy voice ... 33
Figure 4.6 An example of nasal loss ... 33
Figure 4.7 Nasalization rates of targets ... 35
Figure 4.8 Application rates of nasal mergers divided by each speaker ... 39
Figure 4.9 Speech error rates of references divided by each speaker ... 40
Figure 4.10 Application rates of nasal mergers divided by different social factors .... 43
Figure 4.11 Application rates of nasal mergers divided by different social factors and stress levels ... 49
Figure 4.12 Application rates of nasal mergers divided by different social factors and break indices ... 52
viii
LIST OF TABLES
Table 2.1 Summary of debates on merging direction ... 13
Table 2.2 Levels of stress in Pan-Mandarin ToBI (Peng et al., 2005) ... 17
Table 2.3 Modified criteria for stress in Pan-Mandarin ToBI (Chuang, 2009) ... 18
Table 2.4 Levels of break indices in Pan-Mandarin ToBI (Peng et al., 2005) ... 19
Table 3.1 The number of subjects in different social conditions ... 22
Table 3.2 Modified criteria for levels of prosodic boundary ... 27
Table 4.1 Distribution of targets and references ... 29
Table 4.2 Distribution of tokens subdivided by different reasons of discarding ... 30
Table 4.3 Distribution of valid tokens... 34
Table 4.4 Statistics of step-wise binary logistic regression of nasalization rates .... 36
Table 4.5 Subgrouping of VN tokens ... 37
Table 4.6 Token distributions according to the categories in Table 4.5 ... 38
Table 4.7 The number of speakers that applied mergers ... 41
Table 4.8 Statistics of step-wise binary logistic regression (social factors) ... 45
Table 4.9 Crosstabs of numbers of targets divided by stress and break ... 46
Table 4.10 Statistics of step-wise binary logistic regression (stress) ... 50
Table 4.11 Statistics of step-wise binary logistic regression (break) ... 53
Table 5.1 General usages of the three mergers ... 61
1
CHAPTER 1 INTRODUCTION
1.1 Linguistic background
Taiwan is a society with multiple languages, mainly including Mandarin, Min1, Hakka, and more than ten different indigenous languages. These languages entered Taiwan during different time periods2. Unlike the aboriginal languages that have been residing in Taiwan for more than thousands of years, Min and Hakka arrived at Taiwan only hundreds of years ago. With the demographic advantage of the Min population, Min language has been the lingua franca in Taiwan for a long time before the government imposed a Mandarin-only policy since 1956 (Huang, 1993). In contrast, Mandarin is a relative newcomer to Taiwan. Despite its late arrival,
Mandarin has rapidly become the most commonly used language in formal situations due to the government’s mandatory promotion (Ang, 1997; Ang, 2002).
Nowadays, most people living in Taiwan are capable of speaking Mandarin, with different degrees of accents coming from their own substrate mother tongues if the languages are different from Mandarin. According to the “Taiwan Social Change Survey: Year 4 of Round 6 (2013TSCS)” (Fu et al., 2014) sponsored by the Ministry of Science and Technology, Taiwan, Republic of China, Min is the largest ethnic group (76.2%) in Taiwan based on residents’ self-identity, followed by Hakka (9.2%), Mainlanders (6.6%) and Austro-Polynesian aborigines (1.1%).3 Figure 1.1
demonstrates the ethnic composition of Taiwan residents surveyed in 2013.
1 Min languages consist of a large group of different dialects, such as Northern Min, Southern Min, etc.
Since Taiwan Southern Min is the most popular Min language in Taiwan, it is abbreviated as Min in the current study.
2 The ancestors of the indigenous languages were reported to immigrate into Taiwan around 15,000 years ago, before the end of the last ice age (M.-l. Lin, 2010).
3 The terminologies for ethnic identities in the 2013TSCS survey were ‘Taiwan Southern Min’,
‘Taiwan Hakka’, ‘Taiwan Aborigines’, ‘people from different provinces of Mainland (China)’,
‘Taiwan Mainlanders’, ‘people from Kinmen or Matsu Islands’, ‘people from Southeast Asia countries’,
2
Figure 1.1 The distribution of the self-identified ethnic groups of Taiwan residents surveyed in 2013. Data were from the 2013TSCS survey (Fu et al., 2014) with 1952 valid samples.
With the fact that the Min ethnic group dominates the population in Taiwan, it is not surprising that Min is used much more frequently than other languages. Based on the data of 2013TSCS (Fu et al., 2014) (see Figure 1.2), Min languages are still the most domestically used (44.2 %) even in the year of 2013, ten years after another similar survey conducted in 2003. As a result, Min possibly contributes the largest substrate influence to Mandarin spoken in Taiwan. In fact, it has long been reported that Min has a profound influence on the phonology of Taiwan Mandarin. Ing’s (1985) report, for example, described various aspects of the divergences of Taiwan Mandarin from Standard Mandarin taught in school due to Min and different dialects or accents of Mandarin, including onsets, rhymes, and tones.
However, the accents of Mandarin spoken in Taiwan are not highly homogeneous in the society all over the island owing to different levels of Min fluency. Phonological systems of Min and Mandarin are competing with each other
and others. Here we combined ‘people from different provinces of Mainland (China)’ and ‘Taiwan Mainlanders’ into a single group, which is ‘Mainlanders’.
Min Hakka Mainlander Abrigine Others
3
for bilingual speakers based on complex factors, such as frequency of language usage, language competence, age of onset in learning a language, etc. As we can see in Figure 1.2, Mandarin is the one with the largest growth in the domestic usage throughout the ten years (from 23.4% to 31.4%), implying a more common usage of Mandarin among the younger generation. The domestic usage population of Min seems to be relativley stable (from 45.3% to 44.2%), whereas that of Hakka and other languages has decreased throughout the time.
Figure 1.2 The most often used language at home. Data were from the 2013TSCS survey (Fu et al., 2014). There were 2016 valid samples in the year of 2003 and 1952 valid samples in the year of 2013. Mix usage of Chinese languages stands for bilingual usage of Min and Mandarin (19.5%), or Hakka and Mandarin (1.5%), or trilingual usage of Min, Hakka and Mandarin (0.8%) in the year of 2013.
Besides the aforementioned survey data, Ang (1992) has also pointed out the change of status of language usage in Taiwan throughout recent decades. Before the year of 1945, Mandarin was hardly heard in Taiwan. Min dominated the other languages in the society, while Hakka and the aboriginal languages were also frequently used in their own regions at that time. Min was still the most dominant language in most parts of Taiwan before 1960s, but Mandarin gradually suppressed
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2003 2013
Others
Mix usage of Chinese languages
Hakka Min Mandarin
4
Min and other languages after half a century of ruling by the Nationalist government.
Such a change of language usage in society due to governmental enforcement led to different Mandarin proficiency levels, and thus different accents, among different generations. The older generation speaks Mandarin with a stronger influence on pronunciation from the substrate languages. Such an influence among the younger generation is weaker since they do not use substrate languages as frequently as the older generation.
Apart from the aforementioned generation difference, the frequency of usage of languages also differs across regions. According to the “Census of population and housing 2010” (DGBAS, 2011), the usage of Mandarin decreases and that of Min increases as one goes further south in the western part of Taiwan. The average percentages of domestic usage of Mandarin for the northern, the central, and the southern regions are 91.8%, 78.5%, and 73.8%, respectively, while those of Min are 73.3%, 88.7%, and 91.9%, respectively. These findings all suggest that Taiwan is not a linguistically homogeneous society. Both age and region are possible independent variables that may shape one’s accent.
1.2 Motivation
The influence of Min on the phonology of Taiwan Mandarin is a topic with a long history. The first thing that comes into our mind may be deretroflexion
originated from obvious mismatch between sibilant sounds of the two languages.
However, other less obvious phenomona can also be found, such as syllable-final nasal mergers. Compared to the highly stigmatized deretroflexion in Taiwan
Mandarin, the nasal mergers seem to have obtained a relatively neutral judgment from native speakers. Although Ing (1985) has reported a strongly negative implication of
5
both deretroflexion and the nasal mergers, there is still other indirect evidence that helps distinguish the difference of the deretroflexion and the nasal mergers in terms of social connotations. According to the perception test in Fon et al. (2011), the
connotations to variant forms of syllable-final nasals were very diverse for different groups of people. Such an inconsistent judgment among people may imply that the variants of syllable-final nasals are less stigmatized than the variants of retroflex sibilants since some of the nasal variants have even obtained positive evaluation, but the deretroflexion has never been regarded as a positive sound change. Therefore, nasal mergers are more unconscious and opaque sound changes in progress, and worthy to be concerned with effort.
There are two syllable-final nasals in Mandarin, /n/ and /ŋ/, which are allowed to be preceded by the five vowels in the system (i.e., /i/, /ə/, /a/, /u/, and /y/).
Nonetheless, only the first three vowels result in monophthongal realizations
regardless of the following nasals (Duanmu, 2000). It has been reported that the two nasals tend to be neutralized under certain conditions, but disagreement upon the direction of sound changes remains. Most reseachers agree that /əŋ/ tends to be realized as [ən]4 (Chen, 1991; Hsu, 2006; Hsu & Tse, 2007; C.-w. Hung, 2006;
Kubler, 1985; C. C. Lin, 2002; Tse, 1992; J. H.-T. Yang, 2010; Yueh, 1992), but they disagree upon the direction of nasal merger after vowel /i/. Some argued that /iŋ/
tends to be realized as [in] (C.-w. Hung, 2006; Kubler, 1985; Tse, 1992; J. H.-T. Yang, 2010; Yueh, 1992), while others suggested that /in/ tends to be realized as [iŋ] (Chen, 1991; Hsu, 2006; Hsu & Tse, 2007; C. C. Lin, 2002). The ‘merging directions’ of the nasals after vowel /i/ reported in the previous studies were contradictory to each other
4 The vowel labialization of /əŋ/ (əŋ/ → [ɔŋ]) after labial onsets (/p/, /ph/, /m/, or /f/) in Taiwan Mandarin was also found in previous studies (e.g., Ing, 1985; Kubler, 1985). It has been reported to frequently block the nasal merger /əŋ/ → [ən] (e.g., J. H.-T. Yang, 2010). This phenomena was also found in the current study. However, with only limited amount of tokens initiated with labial onsets, such an issue was ignored here.
6
among the two groups of researchers. Such a contradiction implies that there are some factors that may lead to different kinds of nasal mergers. Apart from the majority of research, there is still one study that proposed that /n/ and /ŋ/ are highly fluid after vowels /i/, /ə/, and /a/ (Ing, 1985).
The divergent proposals of the merging directions mentioned above may be attributed to several possible reasons. First of all, previous studies differ in their research materials and experimental designs. Ing (1985) and Kubler (1985) primarily based the study on their own observations towards the language use of the society, while most of the other early studies utilized read speech of controlled tokens to discuss the nasal mergers (e.g., Fon et al., 2011; Hsu & Tse, 2007; C. C. Lin, 2002; J.
H.-T. Yang, 2010). In contrast, only Su (2012) attempted to use data of interviewed spontaneous speech, which provides a more natural and realistic speech flow. As for the target syllables under investigation, some researchers did not include the full set of all possible syllables with both kinds of nasal endings (e.g., C.-w. Hung, 2006; Su, 2012), and thus the problem is still not fully explored in their studies. Therefore, this study would like to combine the benefits suggested by the previous studies using spontaneous speech, to investigate all possible syllable combinations.
Besides, the linguistic backgrounds and the place of origins of the speakers were not well controlled or were only partially controlled in most of the earlier studies.
Since Fon et al. (2011) pointed out the existence of a regional spit in terms of the merging directions for /in/ and /iŋ/, there are difficulties in interpretation for many of the earlier studies with regard to regional differences. Only studies that recruited speakers from the same region (Hsu, 2006; Hsu & Tse, 2007; C.-w. Hung, 2006) can be used to support the regional spit without trouble. Studies without a careful control of speakers’ origins would be more difficult to interpret.
7
In addition, the factors being examined were also very diverse. Most research were sociolinguistic studies that focused on extra-linguistic variables such as gender, age, social class, etc. Few studies paid attention to linguistic factors that are carried in spontaneous speech. Studies have shown that prosodic factors such as prosodic prominence may have an influence on the realizations of phonemes (e.g., Chuang, 2009; Duanmu, 2000; J. Wang, 1996). The effect of such linguistic factors has yet to be considered with regard to the nasal mergers.
Given the aforementioned reasons, we would like to investigate the nasal mergers more thoroughly, by examining the full set of syllables with several linguistic and social factors.
1.3 Aims of study
In order to understand the possible mechanism and regular patterns hidden behind the nasal mergers, we refer to the inspiration provided by previous works and attempt to explore the topic from a different perspective. Precisely speaking, three research goals are to be achieved.
The first goal is to investigate the realization of syllable-final nasals in Mandarin more systematically, by using all possible tokens containing the three monophthongal vowels paired with the two nasal codas (i.e., /in/, /iŋ/, /ən/, /əŋ/, /an/, and /aŋ/) in a spontaneous speech corpus. Most of the past studies adopted read speech of well-designed tokens, which may lead to departure from the reality of language. With the application of spontaneous speech, the frequencies of content and function words are more like the original composition as in natural languages, since no artificial adjustment was made for token selection. At the same time, the extra attention towards intentionally elicited read speech can also be eliminated, making the
8
pronunciation more natural. Although Su (2012) also has used data from spontaneous speech to investigate the issue, only tokens of /iŋ/ and /əŋ/ were included in her study.
A more thorough investigation upon spontaneous speech with all possible vowel- nasal combination is needed.
The second research goal is to examine social factors more completely by using corpus data with a stricter control over speakers’ backgrounds. Although Su (2012), Hsu & Tse (2007), and Chen (1991) also had control over social factors, their factors were limited to either region (Su, 2012) or age (Hsu & Tse, 2007; Chen, 1991) and had not attempted to utilize both. This study thus included speakers from
distinctive regions and of different age groups to specify the role of Min on the nasal mergers, as both factors covary with Min and Mandarin proficiency. As mentioned earlier, Taipei residents and youngsters mainly use Mandarin more in their daily life compared to the southerners and the elder. Therefore, the influence of Min may be hidden behind these social factors. Also, gender is an important social factor that cannot be discarded since we are unable to guarantee the uniformity of rule application among different genders. With the combination of the three social variables, we are able to understand the nasal mergers to a greater extent.
The last goal of our study is to include linguistic factors which have not been examined in any of the previous works. Prosodic prominence has been incorporated in examining the realization of different phonemes in Mandarin (e.g., Chuang, 2009;
Duanmu, 2000; J. Wang, 1996) and in other languages (e.g., Beckman & Edwards, 1994; Cole et al., 2007). Prosodic boundaries have also been reported to have
strengthening effects, such as in durational cues (Horne et al., 1995; Krivokapi, 2007) or phrase initial strengthening (Cho, 2003; Kraehenmann & Lahiri, 2008). Previous studies regarding prosodic factors mainly focused on realizations of phonetic levels.
9
As a first attempt, this study would like to see how prosodic prominence and prosodic boundaries modify the usage of nasal variants, which are more like phonological rule applications.
1.4 Significance
Building on the foundation laid down by previous research on nasal mergers, this study endeavored to examine this topic one step further by looking into a full set of vowel-nasal combinations with the main vowel remaining in monophthongal vowel realization and their association with some social factors using spontaneous speech. It is hoped that a more general picture could be depicted via a more thorough
examination using more natural speech.
This study incorporated two different prosodic factors to investigate a sound change with relatively neutral connotations compared to deretroflexion. As speakers are less self-conscious about their own pronunciation, one can more easily study the effect of linguistic factors on the realization of nasal mergers. With the usage of both prosodic prominence and prosodic boundaries at the same time, the result of this study may help improve the existing prosodic theory in a way that we can further investigate how the two dimensions of linguistic factors function differently on the same phonological rule.
1.5 Organization
The following chapters are structured as follows. Literature regarding syllable- final nasal mergers in Taiwan Mandarin and other relevant prosodic issues is
discussed in Chapter 2. Details of research methods, from data collection to data processing, are introduced in Chapter 3. Results and statistical analyses are presented
10
in Chapter 4. Discussion and conclusion are provided in Chapter 5 and Chapter 6, respectively.
11
CHAPTER 2 LITERATURE REVIEW
This chapter provides the overview of relevant studies. Section 2.1 reviews studies on the topics pertaining to syllable-final nasal mergers in Taiwan Mandarin.
Discrepancies among researchers in terms of merging directions are discussed together with social factors. Section 2.2 reviews studies on prosodic factors. Both prosodic prominence and prosodic phrasing are the focus of the current study.
2.1 Previous works regarding syllable-final nasal mergers
As early as 1985, Ing observed the instability and interchangeability of the syllable-final nasals in Taiwan Mandarin (i.e., /n/ and /ŋ/ might be pronounced as both [n] and [ŋ] when preceded by vowel /i/, /ə/ or /a/), and attributed the
mispronunciation to the effect of Min and a number of Chinese dialects on Mandarin.
In the same year, Kubler reported the replacement of [iŋ] and [əŋ] by [in] and [ən], respectively, in Taiwan Mandarin due to the lack of [iŋ] and [əŋ] in Min5. In spite of the inconsistent observations towards merging performance of the syllable-final nasals, the two studies ascribed their observations primarily to Min influence and triggered the subsequent wave of research on the topic.
2.1.1 Debate on merger types
Studies after Ing (1985) and Kubler (1985) further investigated the nasal mergers with elicited experimental data, but they did not reach an agreement on the
5 In Kubler (1985), Min was reported to have only the following vowel finals ending in [ŋ]: [iəŋ], [ɑŋ], [iɑŋ], [ɔŋ] and [iɔŋ]. According to Yuan et al. (2001), there are also five rhymes ending in /ŋ/, namely, /iŋ/, /aŋ/, /iaŋ/, /ɔŋ/ and /iɔŋ/. Although there is a discrepancy between Kubler (1985) and Yuan et al.
(2001), the symbols in Yuan et al. (2001) are phoneme labeling, and records in fieldwork data (e.g., Li, 2009, p.112 & p.120; Yuan et al., 2001, p.243) have reported a transitional /ᵊ/ between the /i/ and /ŋ/ of the /iŋ/ rhyme regardless of Chôan, Chiang, or Ē-bn̂g dialects.
12
merger direction. After Ing (1985), other studies no longer observed the nasal mergers under the condition of vowel /a/. While Tse (1992) found the dominance of /ŋ/ to [n]
mergers over /n/ to [ŋ] mergers and agreed with Kubler’s (1985) observation, Chen (1991) and C.C. Lin (2002), by contrast, reported an opposite merging direction, /n/ to [ŋ], when the nasal followed the vowel /i/.
Later on, studies focusing on social factors also appeared. Yueh (1992) was the first study that focused on a number of social factors, such as location, gender, and age, but their results showed no significant effect of any social factor. C.-w. Hung’s (2006) study of Kaohsiung residents was another research focusing on social factors, including gender, age, social class, ethnicity, and context (level of formality). Results indicated that age, social class, and context did have a significant influence on the variation of /ŋ/. Compared to the younger people (16 to 30 years old) and the older people (above 51 years old), those who aged between 31 and 50 produced the most prestigious forms [ŋ]. Higher social class, and more formal styles and contexts are influential in inhibiting the mergers. Gender is not a determinant factor since male speakers, senior female subjects, and lower-middle class subjects all produced many [n]. J. H.-T. Yang (2010) compared the nasal mergers of Taiwan Mandarin and Mainland Mandarin and reported the difference between the two places. Speakers from Taiwan seemed to show a high level of homogeneity. No regional difference was found in his subjects from Taiwan. The /ŋ/ to [n] merger was found to lead in Taiwan, whereas the /n/ to [ŋ] merger was reported to lead in China.
Yueh (1992), C.-w. Hung (2006) and J. H.-T. Yang (2010) all found the nasal mergers after vowel /i/ and /ə/ to be /ŋ/ to [n] direction, which supported Kubler’s point of view again. However, Hsu (Hsu 2006, Hsu & Tse 2007) found /in/ to [iŋ] and /əŋ/ to [ən] mergers, which provided another counter example to argue against Kubler,
13
but none of the social factors (age, gender, and ethnicity) in their work were statistically significant. The debate on merging directions of syllable final nasals could be summarized as follows: most researchers agreed with the merging direction after the vowel /ə/, i.e., /əŋ/ to [ən], but disagreed upon the merging direction after the vowel /i/ (Table 2.1).
Table 2.1 Summary of debates on merging direction.
Merger types Studies /n/ → [ŋ]
/ŋ/ → [n]
(interchangeable)
Ing (1985)
/iŋ/ → [in]
/əŋ/ → [ən]
(same direction)
Kubler (1985) Tse (1992) Yueh (1992) C.-w Hung (2006) J. H.-T. Yang (2010) /in/ → [iŋ]
/əŋ/ → [ən]
(opposite direction)
Chen (1991) C. C. Lin (2002)
Hsu (2006) Hsu & Tse (2007) R. J.-m. Hung (2007)
2.1.2 Evidence for regional difference
According to the abovementioned works (see Table 2.1), except for Ing (1985), there seemed to be two camps of researchers, who agreed on the merging direction after /ə/, but disagreed on the merging direction after /i/. One possible explanation for such discrepancies lies in regional variations, as subjects from different populations were recruited for these studies. Yueh (1992) was the first study attempting to survey regional difference. J. H.-T. Yang (2010) also recruited speakers form different parts of Taiwan. Unfortunately, no regional variations were found in the two studies.
However, later studies such as R. J.-m. Hung (2007) and Fon et al. (2011) pointed out that the contradictory results in merging direction were very likely to be caused by
14
regional and methodological differences. Fon et al.’s (2011) experiment recruited subjects from Taipei and southwestern Taiwan and suggested the /ŋ/ to [n] mergers in both regions and an additional /in/ to [iŋ] merger in Taipei.
Coming back to view the discrepancies, studies that reported the /in/ to [iŋ]
merger were Chen (1991), C. C. Lin (2002) and Hsu (Hsu 2006, Hsu & Tse 2007), which all used speakers from Taipei as their subjects. Such a coincidence further confirmed Fon et al.’s observation, i.e., the /in/ to [iŋ] merger was only found in Taipei. All the other studies, except for Ing (1985), only reported /ŋ/ to [n] mergers (C.-w. Hung, 2006; Tse, 1992; J. H.-T. Yang, 2010; Yueh, 1992). Even if Tse (1992), J. H.-T. Yang (2010) and Yueh (1992) enlisted Taipei residents as part of their
subjects, the result did not contradict with Fon et al. (2011), either. The regional difference between the north and the south was almost established.
Su (2012) further examined the social and contextual factors influencing the variation of /ŋ/, using data from 35 sociolinguistic interviews among college students in Taipei and Tainan. As a first trial using spontaneous speech, Su reproduced the regional difference in a different respect. Although the realization of /n/ was not discussed, regional split and gender difference could be seen from the merging rate of /iŋ/. Southerners and male speakers utilized the variant form [in] more frequently than their counterparts. In order to complement Su’s deficiency and to reduplicate Fon et al.’s finding, this study used a full set of vowel-nasal combinations with the main
vowel remaining in monophthongal vowel realization, so that the performance of nasal mergers in the opposite direction could also be covered. Although the effect of region is not the main focus of the current study, such a factor is still included since regional differences have been well established by recent studies (Fon et al., 2011; Su, 2012).
15
2.1.3 Possible generational difference
Previous studies gave different explanations for the existence of nasal mergers.
Some argued that they were due to negative transfer from Min (Ing, 1985; Kubler, 1985), others claimed that the mergers were neutral innovations by young speakers due to assimilation (C. C. Lin, 2002; Tse, 1992; Yueh, 1992), still others pointed out the possibility of a natural sound change inspired by historical rhyme books or dictionaries (Chen, 1991; Hsu & Tse, 2007). Ing (1985) and Kubler (1985) attributed the cause of the nasal mergers to the influence of Min, since it is the most influential substrate language in Taiwan. Tse (1992) and Yueh (1992) claimed that the /ŋ/ to [n]
mergers were the innovation favored by the younger generation because most of the younger generation acquired Mandarin as their first language, and both Mandarin monolingual and Mandarin-Min bilingual speakers showed a similar trend, suggesting no obvious effect from Min transfer. The rules could thus be best described as
frontness assimilation, as both /i/ and /ə/, as well as /n/, are produced in the front half of the vocal tract. C. C. Lin (2002), on the other hand, also regarded the mergers as an innovation, but his explanation was somewhat different due to the opposite direction of the /in/ → [iŋ] merger. Since both [ŋ] and [i] can be regarded as [+high], and both [n] and [ə] as [–high], he characterized the mergers as an [αhigh] assimilation.
These studies also reported different degrees of progression for different merging rules regardless of merging directions, implying a possible existence of generational difference. The processes of the /ŋ/ to [n] mergers were found to vary from those at the burgeoning stage (17-28% in Tse, 1992), to mergers-in-progress (32-43% in C.-w. Hung, 2006), to changes almost complete (95-97% in J. H.-T. Yang, 2010). As for the /in/ → [iŋ] and /əŋ/ → [ən] mergers, Chen (1991) claimed the
merging process to be more advanced for /in/ → [iŋ] than for /əŋ/ → [ən] among three
16
different age groups in his study, while Hsu (Hsu, 2006; Hsu & Tse, 2007) reported in the opposite way. Their older speakers showed lower merging rates for /in/ → [iŋ]
than for /əŋ/ → [ən], while no such difference was found for her younger speakers regardless of gender and language background. As a result, Hsu argued that /əŋ/ to [ən]
was the leading merger instead. Owing to the divergent findings in previous reports, this study aimed to investigate potential generational difference in more detail.
2.2 The effect of prosody on the application of phonological rules
Prosody, including stress, rhythm, and intonation, has been found to have crucial functions in language use. For example, different stress patterns could be used to convey different meanings or to emphasize different things; rhythm could be used to do chunking or segmentation of sentences or words for newborn babies (e.g., Christophe et al., 2014; Wellmann et al., 2012); intonation of sentences could also be used to express emotions or tell the difference between statements and questions (e.g., Eady & Cooper, 1986; Ma et al., 2006, van Heuven & Haan, 2000). Regardless of the debate on the linguistic or paralinguistic nature of prosody, researchers (e.g., Ladd, 2008) have dedicated to the construction of a reasonable and systematic way of analysis on prosody. The following sections review the functions of prosodic prominence and boundary, and the accompanying influence on phonetic realization.
2.2.1 Prosodic prominence
The function of prominence has been widely explored. Take English for example, stress, as the realization of prominence, can be used to make lexical contrast (e.g., WHITE house vs. white HOUSE), or to do narrow focus (e.g., TOM loves Mary vs. Tom loves MARY), etc. Besides, stress has also been reported to influence phonetic
17
realization of words, such as vowel space or consonant gesture. Vowels are pronounced with greater gestural efforts in stressed positions (e.g., Beckman &
Edwards, 1994), and consonants remain greater distinction in some dimensions with more stress (e.g., Cole et al., 2007). In contrast, unstressed vowels and consonants are more centralized and lose their original distinctive phonetic features (e.g., van
Bergem, 1993; van Son & Pols, 1999).
As for stress in Mandarin, Chao (1968) suggested a three-level distinction, i.e., contrastive stress, normal stress, and weak stress. Contrastive stress refers to words with a contrastive context and is realized with a wider pitch range and longer duration.
Weak stress is associated with neutral tone syllables, most of which are grammatical function words, such as –de (possessive marker) or –zi (noun suffix). The name
“neutral” is given because the original tonal range is “flattened to practically zero”.
Compared to contrastive stress, neutral tone syllables are flatter in pitch range and shorter in duration. At last, any syllable that bears neither contrastive stress nor weak stress belongs to normal stress.
As proposed by Chao (1968), the most relevant phonetic correlates of stress in Mandarin are pitch range and duration, with loudness being less related. Such an idea was further proven by Jin’s (1996) acoustic study, reporting pitch to be even more related to stress than duration. Following the idea of Chao, Pan-Mandarin ToBI developed by Peng et al. (2005) provided a systematic tool for stress labeling. There are four levels of stress, S0 to S3, in total. Their definitions are shown in Table 2.2.
Table 2.2 Levels of stress in Pan-Mandarin ToBI (Peng et al., 2005).
Stress Description
S3 Syllables with a fully realized lexical tone.
S2 Syllables with substantial tone reduction.
S1 Syllables that have lost their lexical tonal specification.
S0 Syllables with a lexically-defined neutral tone.
18
Peng et al. divided the weak stress of Chao into two levels (S0 and S1), depending on whether a syllable is lexically specified. Notice that only tonal realizations were taken as the sole criterion for labeling stress levels. However,
Shen’s (1993) perceptual experiment found that listeners were able to recognize stress locations even without F0 information, suggesting none of the cues (pitch, duration and loudness) were indispensable. Taking a step further, Chuang (2009) improved the criteria for stress labeling in her study with cues from multiple dimensions (see Table 2.3).
Table 2.3 Modified criteria for stress in Pan-Mandarin ToBI (Chuang, 2009).
Stress Tone Amp. Duration Segmental information
S3 tone expanded/raised loud lengthened target accurately reached
S2 default default default default
S1 loss of original tonal shape soft shortened target neutralized S0 lexically neutral tone soft shortened target neutralized
2.2.2 Prosodic boundary
Prosody also functions as an aid for chunking, breaking a stream of speech into smaller prosodic phrases. Due to some similarities, such phrases are often confused with syntactic phrases. There were studies reporting that prosodic phrases and boundaries should be defined by syntactic projection and juncture (Hayes, 1989;
Selkirk, 1986), however, Ladd (2008) regarded their proposal only as a hypothesis rather than an approved definition. Regardless of the high correlation between prosodic and syntactic phrasing, prosodic trees differ from syntactic ones in that they should be non-recursive (Halliday 1966; Huddleston, 1965; Matthews 1966; Nespor
& Vogel, 1986; Selkirk, 1984), entailing the special fixed depth property of prosodic trees (Ladd, 2008).
19
Although the theory of prosody is still under construction, it has been proven that prosodic boundary has something to do with changes of acoustic signals, such as articulatory strengthening (Fougeron & Keating, 1997) and declination reset (Ladd, 1988). In order to quantify prosodic boundaries for further exploration of their essence, studies have proposed an impressionistic way to label different strengths of word and phrase boundaries and named it as “break indices”. Such method became part of the major prosodic labeling system known as ToBI (Beckman et al., 2005;
Brugos et al., 2006; Pitrelli et al., 1994; Silverman et al., 1992).
In Pan-Mandarin ToBI (Peng et al., 2005), there are six levels of break indices, B0 to B5. Their definition is shown in Table 2.4. Notice that Peng et al. also
mentioned the difficulty of distinguish B4/B5 and B2/B3 due to their subtle difference in definition. Even trained labelers showed a low rate of consistency for these two pairs of tagging. Since both B4 and B5 refer to a breath group boundary, and both B3 and B2 refer to a phrase boundary, it is not surprising to see the trouble of distinction, which mainly depends on length or existence of pauses.
Table 2.4 Levels of break indices in Pan-Mandarin ToBI (Peng et al., 2005).
BI Definition
B5 Prosodic group boundary: a breath group boundary accompanied by a prolonged pause.
B4 Breath group boundary: reset of pitch between sentences or phrases.
B3 Major phrase boundary.
B2 Minor phrase boundary: must be followed by at least S2.
B1 Regular syllable boundary: the ‘default’ case within a polysyllabic word.
B0 Reduced syllable boundary, i.e., contraction: require S0 or S1 on left or right.
Most studies focused on the effect of prosodic boundaries on durational cues (e.g., Horne et al., 1995; Krivokapi, 2007; S.-F. Wang, 2013), and fewer investigated their influence on phonetic realizations of vowels or consonants. There are some
20
studies showing a strengthening effect of prosodic boundaries, especially phrase initial strengthening (e.g., Cho, 2003; Kraehenmann & Lahiri, 2008). Although their findings are mainly about subtle phonetic differences in VOT rather than
phonological changes of phonemes, such findings still shed light on the possible influence of prosodic boundaries on segments near boundaries. Therefore, this study would like to look into the effect of prosodic boundary on syllable-final nasal mergers since syllable-final nasals may possibly be influenced by prosodic boundaries due to their close relationship regarding position. It would be thus interesting to extend the finding from a phonetic level to a phonological level.
21
CHAPTER 3 METHODS
This chapter describes the methods of conducting the present study. Section 3.1 provides the process of data collection. Section 3.2 introduces social factors that speakers stand for. Section 3.3 describes basic labeling, including segmentation of syllables and classification of target syllables. Section 3.4 presents the criteria and examples of prosodic labeling.
3.1 Data collection of corpus
This study utilized part of the speech data from the Mandarin-Min bilingual corpus constructed by Fon (2004). The social variables in the corpus, including age at the time of recording, and the place one grew up, were all strictly controlled,
providing us with a good source for comparison between different regions and age groups. Besides, the ethnicity of speakers’ parents was confined to Min only, so that the influence from other languages could be minimalized. Only the Mandarin part of the corpus was utilized in this study.
3.2 Background of speakers
The data of 32 speakers in the corpus were included for analyses. Each of the speakers contributed around 30 minutes of interviewed spontaneous speech. The speakers were equally divided into 8 groups according to their gender, age, and place of upbringing. All of the three social factors were 2-leveled, resulting in a 2×2×2 combination, in which each of the 8 cells contained 4 participants. Table 3.1 shows the distribution of the subjects. The number of the speakers was equally distributed in gender. Half of the speakers were aged between 20 and 35 years old, while the other
22
half were between 50 and 65 years old at the time of recording. Half of them lived solely in Taipei City/County at least between ages of 3 and 18, whereas the other half lived in Kaohsiung City/County.6
Table 3.1 The number of subjects in different social conditions.
Place of origin Taipei Kaohsiung
Age Young Old Young Old
Gender male 4 4 4 4
female 4 4 4 4
3.3 Labeling of target words
The recordings were first transcribed in Chinese characters, later romanized in Hanyu Pinyin, and then labeled using Praat (Boersma & Weenink, 2008). Each syllable that contained /in/, /iŋ/, /ən/, /əŋ/, /an/, or /aŋ/ was first identified, and its phonetic realization (described in the next section), stress level, and prosodic break index were labeled by two native speakers of Taiwan Mandarin, one of whom was the author of the current study. Each of the two labelers was responsible for half of the data. In order to increase inter-labeler reliability, the standard for labeling was
frequently discussed at initial stages. Ambiguous cases were selected and checked by both labelers in order to improve the consistency of the labeling standard, and the standard was applied throughout the whole process of labeling.
3.3.1 Labeling of realization
Based on the auditory perception of the labelers together with the auxiliary of acoustic signals, all target syllables were assigned into one of the following categories:
6 Taipei County has become New Taipei City and Kaohsiung County has merged with Kaohsiung City since the 25th of December, 2010.
23
[Vm], [Vɱ], [Vn], [Vɲ], [Vŋ], [Ṽ], [N̩], [V], and [V̥]. Despite the fact that there are only two types of syllable-final nasals in Mandarin, which are /Vn/ and /Vŋ/, we did observe five different realizations of nasals in terms of places of articulation due to the inevitable assimilation or coarticulation caused by following syllables. All tokens were assigned one of the two nasals of default places, [Vn] and [Vŋ], the best we could. They were assigned other labels only when none of the two nasals suited them any longer. Besides, we also found several variants due to fast paces of speech flow.
Sometimes the nasal was merged into the vowel, resulting in a nasalized vowel (assigned into [Ṽ]); sometimes the vowel was deleted, causing a syllabic nasal (assigned into [N̩]); sometimes the nasal was completely deleted, remaining a pure oral vowel (assigned into [V]); sometimes the vowel quality changed, making the vowel be realized as breathy or creaky (assigned into [V̥]). Apart from the
aforementioned reasons, there were still other reasons accounting for all the variant forms, such as speech errors, careless speech styles, or any unpredictable causes due to random sampling from spontaneous speech.
3.3.2 Labeling of intention
The categorizing principles mentioned in Section 3.3.1 focused basically on signals of the nasal part; as a result, the realization of a great number of tokens was inevitably affected by coarticulation and lost its representativeness and explanatory ability. In order to rescue some of the tokens back into use, we took signals of the vowel part into account to judge speakers’ intended nasal target if possible. This kind of rescue functioned the best for /an/ and /aŋ/ pair because the phonetic realizations of the vowel /a/ before different nasal codas had greater distinction (i.e., [an] vs. [ɑŋ]) compared to /i/ and /ə/.
24
Figure 3.1 shows an example of nasal realization and intention labeling.
Syllables were transcribed in the first tier, acoustic/perceptional realizations of nasals were labeled in the second tier, while the intentions of speakers towards nasals were tagged in the third tier if they were judged to be different from those in the second tier.
The realization of nan2 was [am]. Since the [m] final of nan2 was caused by assimilation or coarticulation of the following syllable, and there seemed to be no intention of pronouncing [ɑŋ], we assigned [an] on the intention tier to rescue it back into valid analysis. The intention of speakers was mainly judged according to the acoustic/perceptional realizations of vowels since vowel differences resulting form different nasal targets could also be accounted for their intention of pronouncing a certain nasal.
Figure 3.1 An example of nasal realization (tier 2) and intention (tier 3) labeling.
hen3-duo1 nan2-bu4 de0 xiao1-xi2
“lots of news from the south”
3.4 Prosodic labeling
There are two prosodic factors being examined in our study, i.e., prosodic prominence and prosodic phrasing. Different levels of prominence and phrasing were
25
labeled utilizing stress and break indices of a modified Pan-Mandarin ToBI system (Peng et al., 2005).
3.4.1 Stress labeling
Peng et al. (2005) have designed the Pan-Mandarin ToBI system for labeling different Mandarin varieties. Here we adopted the modified version from Chuang (2009) to label the prominence levels for each syllable. In the original version, the criteria that differentiate distinctive levels of prominence are generally based on tonal realizations or pitch contours. The modified version, however, was improved with more relevant variables, such as duration, amplitude, and segmental information, which made it easier to adapt to the various situations in spontaneous speech. There are four levels of prominence in total, from S0 to S3. S3 represents the prominent condition, S0 and S1 are the reduced ones, and S2 stands for the default condition.
The modified set of criteria is provided in Table 2.3.
Examples of stress labeling are shown in Figure 3.2. she4 and gao1 were labeled with S3 since they accorded with multiple criteria provided in Table 2.3, with louder amplitude, longer duration, and targets accurately reached. Although the gao1 had relatively low pitch, we still labeled it with S3 because of the long and clear closure of its initial plosive, making it sound like a kind of emphasis. bi3 was
relatively weaker than other syllables, and thus received S1 since it was not a syllable with lexically neutral tone. de0 automatically received S0 due to the requirement of neutral tone.
26
Figure 3.2 An example of stress labeling.
she4-jing1 di4-wei4 bi3-jiao4 gao1 de0
“those that were higher in social class”
3.4.2 Break labeling
A modified Pan-Mandarin ToBI (Peng et al., 2005) was also used to label prosodic break indices. In the original version, there are six ordinary levels (see Table 2.4), but there are no levels designed for disfluency, which occurred very often in spontaneous speech. In order to improve its feasibility, we referred to the P diacritic of English ToBI (Beckman et al., 2005) and combined some of the ordinary levels in the original version. The modified version of break indices is shown in Table 3.2.
There are four ordinary levels, from B0 to B3, and two additional levels with
diacritics for compromising disfluency of spontaneous speech. B3 and B2 indicate the existence of prosodic boundaries, B1 is the default syllable boundary, and B0 stands for the reduce one.
27
Table 3.2 Modified criteria for levels of prosodic boundary.
Break Definition
B3 intonation phrase boundary B2 phonological phrase boundary B1 regular syllable boundary B0 reduced syllable boundary B2P hesitation
B1P truncated
Examples of break labeling are shown in Figure 3.3. The boundary between qu1 and yu4 was erased and was thus labeled B0. There seemed to be a small break
after zhong4 and it was thus tagged with B2. heng2 was at the end of a sentence and was perceived with a feeling of ending, so it received B3. All the other syllables were relatively normal and received B1.
Figure 3.3 An example of break indices labeling.
zhu4-zhong4 qu1-yu4 de0 ping2-heng2
“pay attention to the areal balance”
28
CHAPTER 4 RESULTS
Data with statistical analyses are presented in this chapter. Overall distribution of relevant syllables is provided in Section 4.1. The examination of social factors is summarized in Section 4.2. The effects and statistic results of two linguistic factors, stress and break, are shown in Sections 4.3 and 4.4, respectively.
4.1 Overall distribution
There are in total 46,324 tokens containing /in/, /iŋ/, /ən/, /əŋ/, /an/, and /aŋ/
vowel-nasal sequence from the recordings of 32 speakers. The summary of recording content is provided in APPENDEX I. Detailed distribution of the underlying vowel- nasal sequence is shown in APPENDIX II, subdivided by age (Young/Old), gender (Male/Female), region (Taipei/Kaohsiung), stress level (S3/S2/S1/S0), and break index (B3/B2/B1/B0/B2P/B1P).
According to previous research, nasal mergers were mainly found in /in/, /iŋ/, and /əŋ/ syllable types; these syllables were thus chosen as the target of analyses (termed as “target” in the rest of the study). On the other hand, /ən/, /an/, and /aŋ/
syllable types were also included and used as references (termed as “reference” in the remaining part). A previous study has shown a lower degree of nasal mergers in spontaneous speech (Su, 2012). It would thus be easier to judge and compare the proportion of nasal mergers with some reference points being settled. Overall token numbers and relative percentages of target syllables and references are shown in Table 4.1. The target syllables composed 22.14% of tokens, whereas the other 77.86%
of tokens consisted of the references.
29
Table 4.1 Distribution of targets and references.
Syllable type Token number Percentage Target
/in/ 3,080 6.68%
/iŋ/ 3,987 8.65%
/əŋ/ 3,143 6.82%
Reference
/ən/ 9,502 20.62%
/an/ 16,828 36.52%
/aŋ/ 9,546 20.71%
Total 46,086 100.00%
Some of the data were discarded before analyses due to various reasons, such as bad voice quality or disfluency. The details of data distribution divided by different reasons of exclusion are shown in Table 4.2. The upper part of the table gives a general distribution of the targets, while the lower part provides that of the references.
The valid data were those containing both vowels and nasals in some forms in their signals. Only such tokens underwent further investigation since our research goals were based on the realization of nasals. Notice that we also labeled the intention of some tokens as mentioned in Section 3.3.2. As a result, some of the tokens that should have been discarded could be taken back into analyses. However, most of the rescued tokens were /an/ and /aŋ/. Among all valid data, only 6 tokens of /in/, 33 tokens of /iŋ/, and 7 tokens of /əŋ/ were rescued from invalid tokens with the help of intention
labeling.
Except for /ən/ syllables (only 88%), valid tokens accounted for approximately 95% of the data, which revealed a highly accordant distribution among syllable types.
The exception of /ən/ came from a great amount of men0, the plural suffix for personal pronouns, which was often realized as a syllabic nasal since it is a function word with high frequency. However, the low level of prosodic prominence (S0) is
30
also a reasonable explanation of getting lots of syllabic nasals in men0. With only a small number of invalid tokens, we discarded them without further statistical analyses.
Table 4.2 Distribution of tokens subdivided by different reasons of discarding.
Target
/in/ /iŋ/ /əŋ/
Token
number Percentage Token
number Percentage Token
number Percentage
valid 3,000 97.40% 3,835 96.19% 3,031 96.44%
truncation 11 0.36% 15 0.38% 12 0.38%
hesitation 16 0.52% 47 1.18% 51 1.62%
syllabic N 26 0.84% 48 1.20% 25 0.80%
creaky V 4 0.13% 14 0.35% 12 0.38%
breathy V 4 0.13% 21 0.53% 1 0.03%
loss of N 19 0.62% 7 0.17% 11 0.35%
Total 3,080 100.00% 3,987 100.00% 3,143 100.00%
Reference
/ən/ /an/ /aŋ/
Token
number Percentage Token
number Percentage Token
number Percentage
valid 8,366 88.05% 16,037 95.30% 9,046 94.76%
truncation 92 0.97% 77 0.46% 31 0.32%
hesitation 235 2.47% 228 1.36% 164 1.72%
syllabic N 619 6.52% 9 0.05% 8 0.08%
creaky V 103 1.08% 321 1.91% 209 2.19%
breathy V 24 0.25% 31 0.18% 33 0.35%
loss of N 63 0.66% 125 0.74% 55 0.58%
Total 9,502 100.00% 16,828 100.00% 9,546 100.00%
4.1.1 Examples of discarded tokens
As shown in Table 4.2, several types of tokens were discarded due to different reasons. The following figures are a couple of illustrations for each kind of discarded tokens. Figure 4.1 shows an example of truncated tokens. This kind of tokens was abandoned due to sudden interruption and unclearness at syllable final positions, which made it hard or impossible to identify the realization of nasals.
31
Figure 4.1 An example of truncation.
jiu4-shi4 hen3
“it’s just very”
Figure 4.2 is an example of hesitation. Tokens with hesitation may show different patterns of nasal mergers from those of fluent speech. It might thus be inadequate to group these tokens together with the others before we could make sure there was no difference in their performance. With a limited and insufficient number of tokens, we had no choice but to remove them from further analyses.
Figure 4.2 An example of hesitation.
li4-shi3 xi4 gen1
“Department of History and”
Figure 4.3 displays an instance of syllabic nasals. Without vowels in the realization, they no longer showed a vowel-nasal sequence. They could be influenced
32
by surrounding syllables of both edges and were thus excluded from the population of investigation.
Figure 4.3 An example of syllabic nasals.
ying1-gai1 jiu4
“it should just”
Figure 4.4 and Figure 4.5 present cases of creaky voice and breathy voice, respectively. Owing to their bad voice quality, signals of nasals became harder to recognize. Such tokens were also discarded as a consequence.
Figure 4.4 An example of creaky voice.
na4-yang4
“that kind”
33
Figure 4.5 An example of breathy voice.
xian4-zai4 jiu4 qu4
“now go to”
Figure 4.6 is a demonstration of tokens without any nasal articulation. These tokens were also discarded since there were no nasals in the signal any longer. Only the oral vowel remained.
Figure 4.6 An example of nasal loss.
yi1-ding4 yao4
“must”
Taking out the aforementioned variants, one obtained a cohort of 43,315 tokens as our database for further examination, in which 9,866 of them were targets and 33,449 of them were references.
34
4.1.2 Data for analyses
In the rest of the tokens, there were mainly two types of realizations: one with a relatively complete or obvious sequence of vowel and nasal in the signal, and the other with merely a nasalized vowel with little or no trace of formant transitions caused by nasals. Their relative proportions are shown in Table 4.3. The nasalization rates of the tokens varied from 9.67% to 26.23% for different vowel-nasal sequences.
Table 4.3 Distribution of valid tokens.
Target
/in/ /iŋ/ /əŋ/
Token
number Percentage Token
number Percentage Token
number Percentage
VN 2,213 73.77% 3,464 90.33% 2,615 86.28%
Ṽ 787 26.23% 371 9.67% 416 13.72%
Total 3,000 100.00% 3,835 100.00% 3,031 100.00%
Reference
/ən/ /an/ /aŋ/
Token
number Percentage Token
number Percentage Token
number Percentage
VN 7,264 86.83% 13,027 81.23% 7,959 87.98%
Ṽ 1,102 13.17% 3,010 18.77% 1,087 12.02%
Total 8,366 100.00% 16,037 100.00% 9,046 100.00%
4.1.3 Analyses on tokens of nasalized vowels
Nasal weakening or deletion was not a surprising result in spontaneous speech.
However, its emergence competed with our research interests. Although it might be a by-product of spontaneous speech, its unequal distribution among different vowel- nasal combinations seemed to contain some information. Before taking them out, it is also worth knowing their distribution regarding social factors. Since all of the social factors are categorical, step-wise binary logistic regressions were carried out for the three types of target syllables. The dependent variable was whether the realizations of the syllables became nasalized vowels without nasal murmurs or not. All independent
35
variables were categorical and were set to be simple contrasts, including Age (Young
= 0.5, Old = −0.5), Gender (Male = 0.5, Female = −0.5), and Region (Taipei = 0.5, Kaohsiung = −0.5), and all possible combinations of these variables, resulting in a total of seven predictors.
Figure 4.7 Nasalization rates of targets.
The statistic results are presented in Table 4.4. Results showed that the overall model fit test was significant for /in/ → [ĩ] [χ2(4) = 213.562, p < .001], for /iŋ/ → [ĩ]
[χ2(4) = 80.024, p < .001], and for /əŋ/ → [ə̃] [χ2(4) = 139.709, p < .001].
For all targets of /in/, /iŋ/, and /əŋ/, the main effect of Age was the first predictor that entered each of the three models; switching from Old to Young group would increase the odds of applying nasalization by a factor of 3.595, 2.684, or 3.938, respectively. There are still other successful predictors for each type of targets as