• 沒有找到結果。

臺灣華語的口語詞彙辨識歷程: 從雙音節詞來看 - 政大學術集成

N/A
N/A
Protected

Academic year: 2021

Share "臺灣華語的口語詞彙辨識歷程: 從雙音節詞來看 - 政大學術集成"

Copied!
91
0
0

加載中.... (立即查看全文)

全文

(1)國立政治大學語言學研究所碩士論文 National Chengchi University Graduate Institute of Linguistics Master Thesis 指導教授:萬依萍 博士 Advisor: Dr. I-Ping Wan. 立. 政 治 大. ‧ 國. 學. ‧. Spoken Word Recognition in Taiwan Mandarin: Evidence from Isolated Disyllabic Words 臺灣華語的口語詞彙辨識歷程:從雙音節詞來看. n. er. io. sit. y. Nat. al. Ch. engchi. 研究生:錢昱夫. i n U. 撰. Student: Yu-Fu Chien 中華民國一百年七月 July, 2011. v.

(2) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(3) 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v.

(4) Chinese Abstract 國立政治大學研究所碩士論文提要 研究所別:語言學研究所 論文名稱:臺灣華語的口語詞彙辨識歷程:從雙音節詞來看 指導教授:萬依萍 博士 研究生:錢昱夫 論文提要內容:(共一冊,17770 字,分六章) 本研究用雙音節詞來探討不同音段和聲調在臺灣華語的口語詞彙辨識歷程. 政 治 大. 中的重要性。Cohort 模型(1978)非常強調詞首訊息的重要性,然而 Merge 模型 (2000)認為訊息輸入和音韻表徵的整體吻合才是最重要的。因此,本研究企圖 探索不同音段和詞首詞尾在臺灣華語的口語詞彙辨識歷程中的重要性。然而,聲. 立. ‧. ‧ 國. 學. 調的問題並無在先前的模型裡被討論。因此,聲調在臺灣華語的口語詞彙辨識歷 程中所扮演的角色也會在本研究中被討論。另外,詞頻效應也會在本研究中被探 索。本研究的三個實驗均由同樣的十五名受試者參加。實驗一是測試不同音段在 臺灣華語的口語詞彙辨識歷程中的重要性。實驗一操弄十二個雙音節高頻詞和十. n. al. er. io. sit. y. Nat. 二個雙音節低頻詞,每一個雙音節詞的每一個音段都分別被噪音擋住。實驗二是 在探索詞首和詞尾在臺灣華語的口語詞彙辨識歷程中的重要性。實驗二操弄十二 個雙音節高頻詞和十二個雙音節低頻詞。這些雙音節詞的詞首 CV 或詞尾 VG/N 都分別被雜音擋住。實驗三操弄二十四個雙音節高頻詞和二十四個雙音節低頻詞。 這些雙音節詞的聲調都被拉平到 100 赫茲。在這三個實驗中,受試者必須聽這些 被操弄過的雙音節詞,並且辨認它們。受試者的反應時間和辨詞的準確率都用. Ch. engchi. i n U. v. E-Prime 來記錄。實驗結果顯示,傳統的 Cohort 模型不能被完全支持,因為詞首 訊息被噪音擋住的詞仍能被受試者成功的辨識出來。強調聲音訊息和音韻表徵的 整體吻合度的 Merge 模型,比較能解釋實驗的結果。然而,Merge 模型必須要加 入韻律節點才能處理臺灣華語的聲調辨識的問題。本研究也顯示,雙音節詞的第 一個音節的母音在口語詞彙辨識歷程中是最重要的,而雙音節詞的第二個音節的 母音是第二重要的。這是因為母音帶了最多訊息,包括聲調。另外,雙音節詞的 詞首和詞尾在臺灣華語的口語詞彙辨識歷程中是扮演差不多重要的角色。母音對 於聲調的感知是最重要的。詞頻效應也完全表現在臺灣華語的口語詞彙辨識歷程 中。 關鍵詞:口語詞彙辨識歷程、臺灣華語、華語聲調、音段、Cohort 模型、Merge 模型.

(5) Abstract The present study investigated the importance of different segments and the importance of tone in spoken word recognition in Taiwan Mandarin by using isolated disyllabic words. Cohort model (1978) emphasized the absolute importance of the initial information. On the contrary, Merge (2000) proposed that the overall match between the input and the phonological representation is the most crucial. Therefore, this study tried to investigate the importance of different segments and the importance of onsets and offsets in the processing of Mandarin spoken words. However, the issues of tone were not included in the previous models. Thus, the importance of tone was also investigated in this study. The issues about frequency effect were also explored here. Three experiments were designed in this study. Fifteen subjects were invited to participate in all three experiments. Experiment 1 was designed to investigate the importance of different segments in Taiwan Mandarin. In experiment 1, 12 high-frequency disyllabic words and 12 low-frequency disyllabic words were. 政 治 大 selected. Each segment of each disyllabic word was replaced by the hiccup noise. 立 Experiment 2 was designed to investigate the importance of onsets and offsets. In. ‧ 國. 學. experiment 2, 12 high-frequency disyllabic words and 12 low-frequency disyllabic. ‧. words were chosen. The CV of the first syllable and the VG/N of the second syllable were replaced by the hiccup noise. Experiment 3 was designed to investigate the importance of Mandarin tones. In experiment 3, 24 high-frequency disyllabic words and 24 low-frequency disyllabic words were selected. The tones of the disyllabic. y. Nat. sit. n. al. er. io. words were leveled to 100 Hz. In the three experiments, subjects listened to the stimuli and recognized them. The reaction time and accuracy were measured by E-Prime. The results indicated that traditional Cohort model cannot be fully supported because words can still be correctly recognized when word initial information is. Ch. engchi. i n U. v. disruptive. Merge model, which proposed that the overall match between the input and the lexical representation is the most important, was more compatible with the results here. However, Merge model needs to include the prosody nodes, so that it can account for the processing of tones in Taiwan Mandarin. In addition, the current study also showed that the first vowel of the disyllabic word is the most crucial and the second vowel of the disyllabic word is the second influential since the vowel carries the most important information, including tones. The results of experiment 2 demonstrated that the onsets and offsets are almost the same important in Mandarin. Furthermore, vowel is the most influential segment for the perception of Mandarin tones. Finally, frequency effect appeared in the processing of Mandarin words. Keywords: spoken word recognition, Taiwan Mandarin, Mandarin tones, segments, Cohort, Merge.

(6) TABLE OF CONTENTS CHAPTER 1 INTRODUCTION……………………………………………………...1 1.1 The background of spoken word recognition………………………………...1 1.2 Motivation and research questions…………………………………………...4 1.3 Organization………………………………………………………………….9 CHAPTER 2 LITERATURE REVIEW……………………………………………...10 2.1 Models of spoken word recognition………………………………………...10 2.1.1 Cohort model (1980)…………………………………………….......10 2.1.2 Merge (2000)………………………………………………………...13 2.2 The role of acoustic onsets and offsets……………………………………...15 2.3 Mandarin phonological system……………………………………………..20 2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin………..20 2.4.1 The acoustic-phonetic cues of stops…………………………………21. 政 治 大 2.4.2 The acoustic-phonetic cues of nasals………………………………..22 立 2.4.3 The acoustic-phonetic cues of fricatives…………………………….22. ‧ 國. 學. 2.4.4 The acoustic-phonetic cues of affricates…………………………….25. ‧. 2.5 The acoustic-phonetic cues of the vowels in Taiwan Mandarin…………….25 2.6 Mandarin tone………………………………………………………………25 2.6.1The perception of Mandarin Chinese tones…………………………..25 2.6.2 The processing of Mandarin tone……………………………………29. y. Nat. sit. n. al. er. io. 2.7 Summary……………………………………………………………………31 CHAPTER 3 METHODS……………………………………………………………33 3.1 Subjects……………………………………………………………………..33 3.2 Equipment…………………………………………………………………..33. Ch. engchi. i n U. v. 3.3 Stimuli………………………………………………………………………33 3.3.1 Word frequencies…………………………………………………….34 3.3.2 Segmentation………………………………………………………...34 3.3.2.1 Segmentation of the initial consonant………………………..35 3.3.2.2 Segmentation of prenuclear glides…………………………...36 3.3.2.3 Segmentation of vowels……………………………………...39 3.3.2.4 Segmentation of postnuclear glides, and final nasals………...41 3.3.3 The leveling of tones………………………………………………...42 3.4 Design……………………………………………………………………….43 3.5 Procedures…………………………………………………………………..46 CHAPTER 4 RESULTS……………………………………………………………...48 4.1 Experiment 1: one-segment disruption...........................................................48 4.2 Experiment 2: two-segment disruption...…………………………….……..53 i.

(7) 4.3 Experiment 3: tone leveling...........................................................................60 CHAPTER 5 DISCUSSION…………………………………………………………64 5.1 The results and the two models (Cohort and Merge).....................................64 5.2 Cohort and Merge models in Taiwan Mandarin…………………………….66 5.3 Merge model: Spoken word recognition in Taiwan Mandarin.......................70 CHAPTER 6 CONCLUSION………………………………………………………..74 REFERENCES……………………………………………………………………….77 APPENDIX 1………………………………………………………………………...79 APPENDIX 2………………………………………………………………………...81. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. ii. i n U. v.

(8) LIST OF FIGURES Figure 1 Schema of the cohort model..........................................................................12 Figure 2 The basic architecture of Merge (Norris, McQueen, & Cutler, 2000)……...14 Figure 3 Acoustic-phonetic characteristics of fricatives (Borden et al., 1994)............24 Figure 4 The marked part designates the initial consonant /t/ in /ta51 Figure 5 The marked part designates the prenuclear glide / / in /ta51. 35/...........36 35/……...38. Figure 6 The marked part designates the vowel / / in /t ŋ21 tȹwan35/……………...40 Figure 7 The marked part designates the final nasal /n/ in /xwan35 t in51/…………41 Figure 8 The test item 中心(/t oŋ55 in55/, a center) whose tones are leveled at around 100Hz……………………………………………………………...42 Figure 9 The test item /min35 t oŋ51/ ‘the common people’ (民眾) whose second. 政 治 大. rime is replaced by the hiccup noise………………………………………57 Figure 10 The test item /nej51 oŋ35/ ‘contents’ (內容) whose initial CV is replaced. 立. by the hiccup noise………………………………………………………...59. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. iii. i n U. v.

(9) LIST OF TABLES Table1 High frequency words: one segment disruption……………………………...48 Table2 Low frequency words: one segment disruption………………………………50 Table 3 Incorrect responses of tones (high-frequency words): one-segment disruption…………………………………………………………………52 Table 4 Incorrect responses of tones (low-frequency words): one-segment disruption…………………………………………………………………53 Table 5 Two segment-disruption.….…………………………………………………53 Table 6 Incorrect responses of tones: two-segment disruption………………………56 Table 7 Results of tone-leveled stimuli.......................................................................60. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. iv. i n U. v.

(10) CHAPTER 1. Introduction. 1.1 The background of spoken word recognition Speech perception has been a popular issue for several decades. Researchers in different fields, such as physics, engineering, linguistics, and psychology, keep concerning the issues of speech perception regarding how humans perceive speech sounds effectively and efficiently. In the field of linguistics, the long-term issues that. 政 治 大. have always raised many scholars’ interests regarding speech perception include how. 立. phonemes, and how phonemes are combined to form words.. 學. ‧ 國. acoustic signals map to the phonetic segments, how phonetic segments map to. ‧. Concerning the issue of speech perception, the most fundamental problem is how. Nat. io. sit. y. acoustic properties map to phonetic segments. According to many previous studies,. er. acoustic signals vary among individuals, between genders, and even within a. al. n. v i n C hacoustic propertiesUof a specific segment also alter particular person. In addition, the engchi from context to context. Therefore, how human perceptual systems decode these various acoustic signals, and how human perceptual systems pick out any invariance among a plethora of variance in speech signals for speech perception, have been the issues tackled by many researchers for many years. Another unresolved issue as to speech perception is what the elementary unit of perception is. Traditionally, it is assumed that the phonetic segment is the elementary 1.

(11) unit of speech perception because it can differentiate one word from another and it is the minimal speech sound unit (Cutler, Norris, & Williams, 1987). However, some proposed that it is the phonetic feature that is the true basic unit of speech perception for the reason that it cannot be broken down further into other smaller linguistic unit (Jakobson, Fant, & Halle, 1952). Different from the above mentioned suggestions, some studies indicated that the syllable is the elementary unit of speech perception. 政 治 大. since it is impossible to draw a clear-cut boundary between segments in a syllable. 立. (Savin and Bever, 1970). The acoustic properties of one segment overlap with those. ‧ 國. 學. of the preceding or the following segments. Therefore, what the elementary unit of. ‧. speech perception is still a controversial issue for researchers to explore.. Nat. io. sit. y. As already noted, in the early field of speech perception, researchers mainly. er. focused on the phonetic segments, including how phonetic segments are discriminated. al. n. v i n C h segments are U from one another, and how the phonetic e n g c h i categorized. In the 1970s, a new issue concerning the processes and representations for perception of spoken words attracted many researchers’ attention. This new issue came from the concerns that a comprehensive theory of speech perception cannot only focus on the consonants, vowels, and syllables. How the hearers perceive and understand the fluent speech is the most crucial issue. Hence, spoken words became the focus of the research. Prior to the studies on the perception of spoken words, the research regarding the 2.

(12) perception of printed words had already been investigated by many scholars. However, the theories about the recognition of visual words could not be applied to the recognition of spoken words, since the theories concerning the recognition of visual words could not explain how the acoustic signals are perceived by listeners and mapped to the hundreds of thousands of representations in the human brain. One of the most significant models of the recognition of spoken words was the Cohort. theory,. which. was. 立. by Marslen-Wilson 政 治 大. proposed. (1978,. 1980).. Marslen-Wilson’s Cohort theory turned over a new leaf in the history of speech. ‧ 國. 學. perception and set up the field of spoken word recognition. According to the Cohort. ‧. model (Marslen-Wilson & Welsh, 1987), word initial information is very crucial for. Nat. io. sit. y. activating an initial set of hypotheses about the acoustic input. However,. er. over-emphasizing the word initial information results in incorrect prediction since. al. n. v i n C h a word even U humans can still correctly recognize e n g c h i if the word initial phoneme is disrupted. Thus, some models were invented to modify the defects of the cohort. theory. In contrast with the cohort model, disruptions of word-initial phonemes in the models such as Race (Cutler & Norris, 1979), TRACE (McClelland & Elman, 1986), Shortlist (Norris, 1994b), and Merge (Norris, McQueen, and Cutler, 2000) are not disastrous because the acoustic information of the other phonemes still contributes to the activation of a lexical entry. Although there had already been some studies 3.

(13) pointing out the shortcomings of the Cohort theory, it could not deny that Marslen-Wilson’s Cohort theory aroused many concerns for the following decades. Actually, many issues that are still active in the field of spoken word recognition now are either closely related to the Cohort theory or trying to modify the defects of the original model. 1.2 Motivation and research questions. 政 治 大. Four groups of questions will be discussed in this study. Questions to be asked. 立. involve the following.. ‧ 國. 學. Is the word-initial information such important as Cohort theory predicts? If. ‧. this is the case, then any stimuli in the experiment of this study whose initial. Nat. io. sit. y. segment is replaced by the hiccup noise cannot be perceived correctly. If the word-initial information is not as crucial as what the Cohort theory predicts,. er. (i). al. n. v i n C h word-initial segments then those stimuli whose e n g c h i U are disrupted can still be perceived correctly by the listeners. However, if the results show what Cohort theory predicts is wrong, then it raises the question about the status of the acoustic onset and offset in spoken word recognition. That is, is it the onset or the offset that is the most crucial for spoken word recognition in Mandarin? Among the models of spoken word recognition, there are two different 4.

(14) arguments concerning the importance of the initial segment. Cohort theory (Marslen-Wilson & Welsh, 1978) proposed that a set of representations in memory are activated by the acoustic input, known as the word-initial cohort. All of the words which have the same initial acoustic information as the input signals are activated in the listener’s mind. Therefore, the early Cohort theory put much emphasis on the importance of the word-initial. 政 治 大. input, suggesting that spoken word recognition would break down if the. 立. initial input is seriously disturbed. However, other models of spoken word. ‧ 國. 學. recognition did not emphasize the importance of the word-initial input to. ‧. such an extent. The Merge model (Norris, McQueen, & Cutler, 2000). Nat. io. sit. y. focused on the overall similarity between the acoustic inputs and the words. er. being activated. It suggests that word-initial segment is not of critical. al. n. v i n C hthe word-initial information importance. Even though is severely damaged, engchi U the particular word can still be activated and recognized depending on the rest of the acoustic information. Although a great number of studies have been conducted to investigate the importance of the word-initial information in the recognition of spoken words, few studies focused on the role of the final segments in spoken word recognition (Wingfield, Goodglass, & Lindfield, 1997). In addition, most of the studies concerning this issue 5.

(15) focused on English or some western languages such as Dutch; very few focused on the role of the initial and final segments in spoken word recognition in Mandarin. Hence, these questions based on the gaps mentioned above will be tackled in this study. (ii) What is the status of different segments in spoken word recognition? If the initial consonant is the most important segment in spoken word recognition,. 政 治 大. then the result of the experiment will predict the longest reaction time and. 立. lowest accuracy when the initial consonant is replaced by the hiccup noise.. ‧ 國. 學. However, if it is the prenuclear glide, vowel, postnuclear glide, or the final. ‧. nasal that occupies the prestigious status in spoken word recognition, then. Nat. io. sit. y. the result of the experiment will presage the longest reaction time and. er. lowest accuracy when the prenuclear glide, vowel, postnuclear glide, or. n. al. i n C final nasal, is replaced byhthe hiccup noise. U engchi. v. One of the active questions in the field of spoken word recognition is about the nature of lexical and sublexical representations. Research on the lexical competition mainly put emphasis on the competition between the representations of words. Nevertheless, another crucial issue regarding the spoken word recognition is the nature and existence of sublexical representations. Marslen-Wilson and Warren (1994) argued against the 6.

(16) existence of sublexical representations. They suggested that phonetic features maps to words directly, without any intermediate sublexical representations. Other researchers, in contrast to Cohort theory, argued for the nature and the existence of sublexical representations though different models proposed different viewpoints about the interactions between segmental and lexical representations. At present, much evidence is in favor. 政 治 大. of the existence of sublexical representations, in contrast with Cohort theory.. 立. However, there is still a gap in that few studies focused on the role of. ‧ 國. 學. different segments in a word. Thus, the questions concerning this gap will. ‧. be dealt with in this study.. Nat. io. sit. y. (iii) Can the spoken words be recognized successfully if the tones of the words. er. are leveled? What is the interaction between Mandarin tones and segments. al. n. v i n in the recognition ofCspoken Which segment, namely, the initial h e nwords? gchi U. consonant, prenuclear glide, vowel, postnuclear glide, or final nasal, is the most influential segment for the perception of Mandarin tones? If the segment which is replaced by hiccup noise results in the wrong perception of the particular tone, it can be inferred that the segment carries the most important acoustic information of that tone. If the segment which is replaced by hiccup noise is perceived correctly concerning its tone, it 7.

(17) suggests that the acoustic information in that segment is not enough to cause the incorrect perception of Mandarin tone. In the history of speech perception, the issues regarding how the acoustic signals map to phonetic features, how phonetic features map to phonemes, how phonemes map to syllables, and how syllables map to words, have already been tackled by a number of researchers. These issues. 政 治 大. are segmental. Other issues concerning suprasegmental have also been. 立. explored. For example, studies about segmentation of words in fluent. ‧ 國. 學. speech suggested the prosodic solution to the segmentation problem, which. ‧. stated that listeners parse the speech stream by exploiting rhythmic. Nat. io. sit. y. characteristics of their language (Cutler, 1996; Cutler & Norris, 1988).. er. Although a great number of studies have been conducted to investigate both. al. n. v i n C h issues aboutUspeech perception, few concern segmental and suprasegmental engchi. the status of Mandarin tones in speech perception. Therefore, the status of Mandarin tones will be investigated in this study. (iv) Does frequency effect affect Mandarin spoken word recognition? If frequency effect really exists in Mandarin, then the reaction time of the high frequency words will be shorter than that of the low frequency words. To the contrary, if the frequency effect plays no role in Mandarin spoken word 8.

(18) recognition, the reaction time of the high frequency words will not be longer than that of the low frequency words. A number of previous studies have already proved that low frequency words are more difficult to be picked up by high frequency words in spoken word recognition (Monaco, 2007; Savin, 1963; Broadbent, 1967; Elliott, 1987). Nevertheless, few studies examined this phenomenon in Taiwan. 政 治 大. Mandarin. As a result, the study will examine this effect in Taiwan. 立. Mandarin.. ‧ 國. 學. Given the gaps mentioned above, in this study, we intend to investigate several. ‧. issues concerning Mandarin word recognition more thoroughly and completely.. io. sit. y. Nat. 1.3 Organization. er. The organization of the following chapters is as follows. Reviews of literature on. al. n. v i n C h together withUa number of issues concerning the models of spoken word recognition, engchi. spoken word recognition, are discussed in chapter 2. Chapter 3 focuses on the methods for conducting this study, including the details of the subjects, segmentation criteria of the initial consonant, prenuclear glide, vowel, postnuclear glide, and final nasal, the equipments used in this study, along with the procedures of the experiment. Chapter 4 introduces the statistical analyses and a brief result. The discussion and theoretical explanations relevant to the study are shown in Chapter 5. 9.

(19) CHAPTER 2. LITERATURE REVIEW. In this chapter, previous studies on word recognition from acoustic signals, and the acoustic-phonetic cues concerning consonants, vowels and tones in Mandarin will be discussed. Section 2.1 introduces the cohort model (Marslen-Wilson & Welsh, 1978), which is a parallel lexical access model, and the Merge model (Norris, McQueen, and Cutler, 2000), which is one kind of connectionist model. Section 2.2. 政 治 大. reviews studies concerning auditory word recognition, focusing on the acoustic onsets. 立. the. syllable. structure. of. Mandarin.. 學. including. ‧ 國. and acoustic offsets. Section 2.3 puts emphasis on the Mandarin phonological system, Section. 2.4. displays. the. ‧. acoustic-phonetic cues of Mandarin consonants. Section 2.5 briefly shows the. Nat. io. sit. y. acoustic-phonetic cues of Mandarin vowels. Section 2.6 reviews the perception of. n. al. er. Mandarin tones, putting emphasis on the acoustic-phonetic cues and the processing of Mandarin tones.. Ch. engchi. i n U. v. 2.1 Models of spoken word recognition In this section, we briefly introduce two crucial models of spoken word recognition in recent years, including Cohort (1978), and Merge (2000). 2.1.1 Cohort model (1978) The Cohort model, although share some basic assumptions with regard to lexical access with the logogen model, was designed to explain the process of auditory word 10.

(20) recognition. Marslen-Wilson et al. (Marslen-Wilson & Zwitserlood, 1989; Tyler, 1984; Marslen-Wilson & Tyler, 1980, 1981) proposed that when we hear a word, all of the words which bear the phonological resemblance with the heard word are activated. For example, if we hear the sentence “Paul wants to be a ca-…,” cap, capital, Capricorn, capture, captain, captive, and many others, would be activated, which means that all of these activated words can be the candidate for selection. This set of. 政 治 大. words is called the “word initial cohort”. Hence, as the assumption of logogen model,. 立. possible candidates would be activated until the final candidate is identified. As the. ‧ 國. 學. other parallel access models, activation of a word in cohort model is based on direct. ‧. mapping between the speech input and the lexicon.. Nat. io. sit. y. In Cohort theory, all possible candidates for lexical access would be activated by. er. the auditory input and then eliminated gradually by the following two ways-either the. al. n. v i n Ccohort context narrows the word initial possible h e norgthec h i U candidates are kicked out as. more and more phonological information is perceived. In the latter case, as more of the spoken word is identified, the cohort narrows the window. For instance, if the phoneme /p/ follows the sequence ca-, captain, captive, and all the other words that have the same initial letters are the potential candidates from the initial cohort. The pool of candidates continues to narrow as more acoustic signal is received. Only when one single candidate left can the particular word be recognized. The schema of the 11.

(21) Cohort model and how cohort model operates are shown in Figure 1. Time. -------------------------------------------------------------------------. Input:. Recognized. [t]. [tr]. [trɛspəs]. Phoneme. (trespass). Current. [tri:],. [tri:],. [trɛspəs],. [trɛspəs],. Cohort. [taɪm],. [trɛspəs],. [trɛɪn],. [trɛs], .... [trɛspəs],. [trɛɪn],. [trɛnd],. [trɛɪn],. [trɛnd],. [trɛs], .... [trɛnd],. [trɛs], .... [trɛ]. [trɛs]. [trɛsp]. Recognized word:. [trɛspəs]. [trɛspəs]. [trɛs], ... Figure 1. Schema of the cohort model.. 政 治 大 of words which begin with立 [t]. This set of words is called “word-initial cohort.” As This figure displays that when the phoneme [t] is recognized, it activates a series. ‧ 國. 學. more and more phonemes are recognized, the activated words become less and less.. ‧. Finally, when the phoneme [p] is recognized, the word [trɛspəs] is also recognized. sit. n. al. Originally, the Cohort. er. io. candidates.. y. Nat. because the phoneme sequence [trɛsp] can only activate [trɛspəs] but no other. v i n C h puts heavy emphasis theory on engchi U. the absolute match. between the perceived auditory signal and the phonological representation in the mental lexicon, which means that the word initial stimulus is of paramount importance and cannot be mispronounced or blocked by a cough or the surrounding noise. If the initial stimulus is disturbed, the word initial cohort cannot be activated. However, subsequent experiments indicate that a word can still be recognized even if the initial input of the word is obstructed (Marslen-Wilson, 1987). Therefore, the. 12.

(22) Cohort theory was revised by Marslen-Wilson (1987) so that the system chooses the best match to fit an incoming word. Under this revised Cohort theory, word recognition depend less on the initial auditory input. A word can be recognized as long as the phonological representation of that word shares enough features with the incoming stimulus. Nevertheless, Marlen-Wilson (1989) reemphasized the importance of the word-initial information because lexical activation would be obstructed even if. 政 治 大. all the other information except word-initial information is consistent with the target. 立. words.. ‧ 國. 學. 2.1.2 Merge (2000). ‧. The Merge model (2000), which is an autonomous model, was proposed by and. Cutler.. The. network. of. Merge. is. a. simple. io. sit. McQueen,. y. Nat. Norris,. er. competition-activation network which is the same as the basic dynamics as Shortlist. al. n. v i n C hare three types ofUnodes, including input nodes, (Norris, 1994b). In Merge, there engchi. lexical nodes and phoneme decision nodes. As in Figure 2, the input nodes are associated by facilitatory links to the appropriate lexical nodes and the phoneme decision nodes. The lexical nodes are also connected by facilitatory links to the suitable phoneme decision nodes. But, different from the TRACE model (McClelland & Elman, 1986), an interative model, there is no feedback from the lexical nodes to the prelexical phoneme nodes. Inhibitory activation happens between lexical nodes as 13.

(23) well as phoneme decision nodes, but not between input nodes.. 立. 政 治 大. Figure 2. The basic architecture of Merge. The facilitatory connections, which are unidirectional, are displayed by bold lines with arrows; the inhibitory connections, which are bidirectional, are illustrated by fine lines with circles (Norris, McQueen, & Cutler, 2000). ‧ 國. 學. Figure 2 displays the simulation of the subcategorical mismatch in the. ‧. architecture of Merge. The network was designed with merely 14 nodes, including 6. Nat. io. sit. y. input nodes (/dʒ/, / /, /g/, /b/, /v/, and /z/), 4 phoneme decision nodes, and 4 possible. n. al. er. word nodes, job, jog, jov, joz. The latter two word nodes stand for only the possible. Ch. engchi. combinations of words, rather than the real words.. i n U. v. The Merge model, which is faithful to the basic principles of autonomy, was designed to explain the data which were not compatible with TRACE (McClelland & Elman, 1986). Merge, with the phoneme decision nodes that combine the lexical and phonemic information flows, provides a simple and appropriate account for the data proposed by Marslen-Wilson & Warren (1994), McQueen et al. (1999a), Connine et al. (1997), and Frauenfelder et al. (1990), which cannot be explained either by TRACE 14.

(24) appropriately. Therefore, Merge can give a full explanation of the known empirical findings in phonemic decision making. 2.2 The role of acoustic onsets and acoustic offsets A basic feature of speech signal is its intrinsic directionality in time. When utterances proceed, speech signals are moving along the time line from the beginning to the end of the utterances. This fundamental property of speech signal strongly. 政 治 大. implies that the initial acoustic signal is of paramount importance, which is in. 立. accordance with the claims of Cohort theory (Marslen-Wilson, 1984).. ‧ 國. 學. Auditory word recognition is a very complicated language processing issue. ‧. because of many linguistic and non-linguistic factors that may disrupt the acoustic. Nat. io. sit. y. cues of speech signal. These disruptive factors include speech errors, acoustic. er. phonetic variability under different phonological conditions, and the auditory. al. n. v i n C h noise. These possible obstructions due to the surrounding e n g c h i U acoustic disruptions can. happen at any moment of auditory word recognition. However, human brains can still recognize words with little difficulty most of the time. Moreover, the speech input is a stream of acoustic signal. Hearers do not exactly know whether the particular input is in the initial, medial, or final position of a word. From the above mentioned difficulties of speech processing, it is clear that the Cohort model, first proposed by Marslen-Wilson & Welsh (1978), putting great 15.

(25) emphasis on the importance of initial acoustic cues, cannot account for the fact that speech signal is more or less varied or disrupted under different circumstances. Therefore, the Cohort model was revised by Marslen-Wilson (1987), which rejected the total dependence on the word-initial cues for auditory word recognition. Unlike the old Cohort model (1978), the relatively new Cohort theory claims that the disruptions of the word-initial signal are not the end of the world because the. 政 治 大. non-word-initial information can still bring about the activation of candidates.. 立. Therefore, in the latter Cohort theory, 100 percent match between the word-initial. ‧ 國. 學. acoustic signal and the phonological representation of a given word is not as crucial as. ‧. what the original Cohort theory claims given the acoustic information in spoken. Nat. io. sit. y. words. In addition, there are other experiments indicating that auditory word. er. recognition is not blocked even though the word-initial signal is distorted. One such. al. n. v i n Cphoneme experiment is that an ambiguous /d/ and /t/ is presented before the h e n gbetween chi U. sequence /ajp/. The subjects, after listening the stimulus, have to decide what phoneme they have heard (Connine & Clifton, 1987). The result shows that subjects tend to label the ambiguous phoneme as /t/ when followed the sequence /ajp/ because /tȹajp/ ‘type’ is a word. This indicates that the word-initial signal is not extremely crucial in auditory word recognition; otherwise the word ‘type’ cannot be recognized due to the ambiguous word-initial acoustic cues. 16.

(26) Marslen-Wilson and Zwitserlood (1989) conducted experiments to investigate whether a nonword can activate a real word if the nonword is different from the real word only by the initial phoneme. The results of their study indicated that the nonword different from the real word by merely the initial phoneme cannot activate the real word generally. According to the results, Marslen-Wilson and Zwitserlood reemphasized the importance of the word-initial information. They claimed that. 政 治 大. lexical activation would be barred even if all the other information except the initial. 立. phoneme is consistent with the hypothesized words. Therefore, mispronunciation of. ‧ 國. 學. the initial phoneme of a word cannot facilitate the base but preclude the activation of. ‧. it. From the result that a nonword derived from a real word by merely changing its. Nat. io. sit. y. initial phoneme cannot facilitate the real word, it is clear that word-initial information. er. is very important in auditory word recognition. In addition to the studies regarding the. al. n. v i n Ch initial segment of the input, Nooteboom U Vlugt (1988) compared the e n gandc hvani der. importance of word onsets and offsets. The results indicated that words can be recognized equally well no matter the inputs are heard from the beginnings or from the endings as long as the hearers knows which part of the words they have heard. However, they still claimed that word-beginning priority exists due to the fact that word initial information is more easily to be associated correctly to the lexical representation than the word final information. 17.

(27) Another relevant study concerning the role of the initial segment of a nonword was done by Connie, Blasko, and Titone (1993). The purpose of their research was to demonstrate whether phonetically similar initial phonemes in a derived nonword would be sufficient to produce activation of a base word. They designed the nonwords which was only one or two phonetic features different from the base words. The altered segments of those nonwords were either in the initial position or medial. 政 治 大. position. The results of the study indicated that a base word can still be activated by a. 立. nonword with a similar initial phoneme. The results also showed that the altered. ‧ 國. 學. position of a nonword is not the factor that influences the priming effect. Connie,. ‧. Blasko, and Titone (1993) concluded that relative similarity of elements in the input to. Nat. io. sit. y. a lexical representation is critical for auditory word recognition. Furthermore, it is not. er. the exact positional acoustic information of a particular lexical item that is important. al. n. v i n in spoken word recognition, butCthe similarity between the h eoverall n g acoustic-phonetic chi U input and lexical representation that is influential. Therefore, the findings of their study contradict the cohort theory, which claims that the initial segment serves to determine the activated word candidates. Wingfield et al. (1997) used gating technique to investigate the interaction. among the acoustic onsets and offsets, the cohort size, and syllabic stress in English. Their analysis on the cohort sizes from both forward and backward gating showed 18.

(28) that the cohort size is significantly larger at the recognition point from forward gating than from backward gating for two and three syllable words and for all stress patterns. This finding depicted a great advantage of forward gating over backward gating for two and three syllable words and for all stress patterns, indicating that acoustic onset information is much more important than acoustic offset information for all stress patterns though words can be identified from both beginning and ending directions.. 政 治 大. However, Wingfield et al. (1997) degraded the absolute word-onset priority principle. 立. when taking the stress patterns into consideration. They assumed that stress patterns. ‧ 國. 學. can restrict the cohort size. They showed that the cohort sizes at recognition point. ‧. were not only significantly reduced, but the cohort sizes at recognition point were also. Nat. io. sit. y. equal in both forward and backward gating directions. This analysis supported the. n. al. regardless of direction of. er. claim that cohort reduction is a very crucial mechanism in auditory word recognition,. v i n C h which supported gating, e n g c h i U the overall. goodness-of-fit. hypothesis, rather than the absolute word-onset priority principle. Nevertheless, Wingfield et al. did not deny the fact that more acoustic information is needed for word recognition if a word is gated from its ending. That is possibly due to the fact that, for any given cohort size, a longer gate duration is needed in the backward-gating condition than in forward-gating in English.. 19.

(29) 2.3 Mandarin phonological system There are 12 combinations of Mandarin syllable structure, including V, CV, GV, VG, VN, CVG, CVN, CGV, GVG, GVC, CGVG, and CGVN. In Mandarin, a syllable is traditionally divided into three parts, including an optional initial, a final and a tone (C. Cheng, 1973). The initial part can be a nasal or a consonant. The final part contains an optional prenuclear glide, a vowel, and an optional postnuclear glide or a. 政 治 大. nasal. However, during the past two decades, the status of the prenuclear glides in. 立. Mandarin syllable has raised many debates (Bao, 2002; Yip, 2002; Duanmu, 2002;. ‧ 國. 學. Wan, 2002a). Under the study, since the status of the prenuclear glide is not the focus,. ‧. the prenuclear glide was not grouped with the onset or the rhyme and was replaced by. Nat. io. sit. y. the hiccup noise alone just as the initial consonant and the vowel. Last but not least, in. er. order not to let the duration of the rime be much longer than that of the prenuclear. al. n. v i n C h the rime was further glide and that of the initial consonant, e n g c h i U divided into a vowel plus a postnuclear glide, or a vowel plus a final nasal. Each part of the rime could be replaced by the hiccup noise individually. 2.4 The acoustic-phonetic cues of the consonants in Taiwan Mandarin In Taiwan Mandarin, there are overall 21 onset consonants, namely, six oral stops /p/, /pȹ/, /t/, /tȹ/, /k/, /kȹ/, two nasal stops, /m/, /n/, six fricatives /f/, / /, /x/, / /, /s/, / /, six affricates /t /, /tȹ /, /t /, /tȹ /, /ts/, /tȹs/, and one liquid /l/. In the following sections, 20.

(30) the acoustic-phonetic characteristics of those onset consonants are introduced. These characteristics serve as the criteria of segmentation in experiment 1 and 2. 2.4.1 The acoustic-phonetic cues of stops There are three acoustic-phonetic cues for distinguishing stops. They are formant transitions, burst amplitude, and duration. First, formant transitions are crucial for detecting the place of articulation of. 政 治 大. stops. The F2 and F3 transitions from the bilabial stops into the following vowels are. 立. rising. The F2 and F3 transitions from the alveolar stops into the following vowels are. ‧ 國. 學. almost flat. The F2 and F3 transitions from velar stops into the following vowel come. ‧. together. Second, previous research (Repp, 1984) indicated that the burst amplitude of. Nat. io. sit. y. labial stops is weaker than that of the alveolar and velar stops. Perceptual experiments. er. have shown that burst amplitude can influence the identification of labial and alveolar. al. n. v i n stops. This effect can be betterC realized stops than voiced stops. Third, h e nongvoiceless chi U. VOT is of paramount importance for the detection of voicing. Stops, which have relatively long VOT, tend to be perceived as voiceless stops; in contrast, stops, which have relatively short VOT, are prone to be recognized as voiced stops. In addition, voiceless aspirated stops have the longest VOT compared with voiced stops, and voiceless unaspirated stops. In Mandarin, the mean VOTs for /p/, /pȹ/, /t/, /tȹ/, /k/, and /kȹ/ are 14 ms, 82 ms, 16 ms, 81 ms, 27 ms, and 92 ms, respectively (Chao et al., 21.

(31) 2006). 2.4.2 The acoustic-phonetic cues of nasals According to Ladefoged (2000), there are four acoustic-phonetic cues for recognizing nasals. First, there is a sharp change in the spectrogram at the time of the formation of the articulatory closure. Second, the bands of the nasal are lighter than those of the vowel, which indicates that the intensity of the nasal is weaker than that. 政 治 大. of the vowel. Third, the F1 of the nasal is often very low, centered at around 250 Hz.. 立. acoustic-phonetic cues, nasals can be identified.. ‧. 2.4.3 The acoustic-phonetic cues of fricatives. 學. ‧ 國. Fourth, there is a large space above the F1 with no energy. Based on these. Nat. io. sit. y. The most crucial acoustic-phonetic cue for separating voiceless fricatives from. er. voiced fricatives is by examining the extended period of noise (Borden et al., 1994).. al. n. v i n The extended period of noise C canhbe easily detectedUon the spectrogram. Voiceless engchi. fricatives have longer duration and stronger intensity. To the contrary, voiced fricatives (i.e., / / in Mandarin) are shorter in duration and weaker in intensity, but their formant frequencies are clearer than those of voiceless fricatives. Fricatives are known for their high-frequency noise in the spectrum, which is an acoustic-phonetic cue for distinguishing the place of articulation of fricatives. Another acoustic-phonetic cue for distinguishing the place of articulation of fricatives is the 22.

(32) intensity of frication. Sibilants (i.e., /s/, / /, / /, / /, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/in Mandarin) are noted for relatively steep, high-frequency spectral peaks, whereas nonsibilants (i.e., /f/ and /x/ in Mandarin) are famous for relatively flat and wider band spectra. Moreover, alveolar sibilants (i.e., /s/ in Mandarin) can be distinguished from palatal sibilants (i.e., / /...) by the location of the lowest spectral peak. The lowest spectral peak of the alveolar sibilants is around 4000 Hz, while the lowest. 政 治 大. spectral peak of the palatal sibilants is around 2500 Hz. Furthermore, the intensity. 立. shown on the spectrogram can also differentiate the place of articulation of fricatives.. ‧ 國. 學. Stronger intensity is the feature of sibilants; weaker intensity, the feature of. ‧. nonsibilants. This is because the resonating cavity in front of the alveolar or the. Nat. io. sit. y. palatal constrictions results in high intensity. However, there is no resonating cavity in. er. front of the labio-dental constriction, which brings about the relatively weak intensity.. al. n. v i n C h of fricatives isU The acoustic-phonetic characterization e n g c h i illustrated in Figure 3.. 23.

(33) 立. 政 治 大. ‧ 國. 學. Figure 3. Acoustic-phonetic characteristics of fricatives (Borden et al., 1994). Figure 3 shows how listeners perceive fricatives. When the listener hears an. ‧. input, it enters the first filter and is judged by whether it has noisy sound with. sit. y. Nat. io. n. al. er. relatively long duration. If the answer is yes, the input is regarded as a fricative and. i n U. v. sent to the next filter. In the second filter, the input is examined by whether its. Ch. engchi. intensity is relatively high. If the answer is yes, the input is considered a sibilant and sent to the next filter. In the third filter, the input is investigated by its first spectral peak. If the first spectral peak of the input is around 4kHz, it is viewed as /s/ or /z/ and sent to the next filter. In the fourth filter, the input is judged by “phonation exists or duration and intensity small enough?” If the answer is yes, the input is perceived as /z/; if the answer is no, it is perceived as /s/. By those filters, the input is examined step by step and finally recognized by the listener. 24.

(34) 2.4.4 The acoustic-phonetic cues of affricates There are three pairs of affricates in Mandarin, /t /, /tȹ /, /t /, /tȹ /, /ts/, and /tȹs/. According to Ladefoged (2000), an affricate is simply a sequence of a stop followed by a homorganic fricative. Therefore, it can be inferred that affricates have the acoustic-phonetic characteristics of both stops and fricatives. 2.5 The acoustic-phonetic cues of the vowels in Taiwan Mandarin. 政 治 大. Phonetically speaking, there are overall 12 vowels in Taiwan Mandarin,. 立. including 4 high vowels ([i], [u], [y], and [ ]), 2 low vowels ([a] and [ ]), as well as 6. ‧ 國. 學. mid vowels ([e], [ ], [ə], [ ], [o], and [ ]). Vowels have very different phonetic cues. ‧. from consonants. First of all, vowels have much longer duration than consonants.. Nat. io. sit. y. Second, the formants of vowels are much clearer than those of consonants. Third, the. er. energy of vowels is stronger than that of consonants, causing darker spectrogram.. al. n. v i n C hthe tones in Mandarin. Fourth, the F0 in vowels displays e n g c h i U From the acoustic-phonetic cues, vowels can be distinguished from consonants. 2.6 Mandarin tone 2.6.1The perception of Mandarin Chinese tones Lexical tones are pitch patterns that can distinguish lexical meanings in a given language. In Mandarin Chinese, tones, like the aspirated and unaspirated stops, are phonemic features that can differentiate word meanings. Mandarin Chinese 25.

(35) phonemically distinguishes four tones, which are Tone 1, with high-level pitch, Tone 2, with high-rising pitch, Tone 3, with low falling-rising pitch, and Tone 4, with high-falling pitch (Chao, 1948). The same syllable structure can have different meanings if it carries different tones. For instance, ma with Tone 1 has the meaning of ‘mother’; ma with Tone 2 has the meaning of ‘numbness’; ma with Tone 3 has the meaning of ‘horse’; ma with Tone 4 has the meaning of ‘scold’.. 政 治 大. There are several factors that can affect the perception of Mandarin Chinese. 立. tones. First, fundamental frequency plays a role in the Mandarin Chinese tone. ‧ 國. 學. perception. Previous acoustic studies have found that the F0 height and F0 contour are. ‧. the acoustic cues for Mandarin Chinese tone perception. Howie (1976) performed the. Nat. io. sit. y. tone perception experiments to test whether the participants could identify the correct. er. tones of the stimuli. Howie designed three contrasted conditions, which were. al. n. v i n C hpatterns, syntheticUspeech with the monotonic F0 synthetic speech with natural F0 engchi contour, and synthetic speech sounding like a whisper. The results showed that subjects easily recognized the synthetic speech which F0 patterns were maintained. Gandour (1984) and Tseng & Cohen (1985) indicated that both F0 height and F0 contour are very crucial acoustic cues for Mandarin tone perception. Neither one can be missed. Moore and Jongman (1997) differentiated Tone 2 from Tone 3 in terms of two characteristics. One is turning point, which is the point in time at which the tone 26.

(36) changes from falling to rising, and the other is ∆F0, which is the F0 change from the onset to the turning point. Moore and Jongman found that the turning point of Tone 2 is earlier than that of Tone 3, and the ∆F0 of Tone 2 is smaller than that of Tone 3. Comparing the acoustic cues of Tone 3 and Tone 4, Garding et al. (1986) found that the stimuli which have the early peak of pitch and fall dramatically after the turning point tend to be perceived as Tone 4. The stimuli which stay at low F0 range and have. 政 治 大. long duration tend to be recognized as Tone 3. This study demonstrates that F0. 立. contour is of paramount importance for Mandarin tone perception.. ‧ 國. 學. The second factor that can influence the perception of Mandarin Chinese tones is. ‧. the temporal properties of tones. According to the production data, Nordenhake and. Nat. io. sit. y. Svantesson (1983) found that the duration of Tone 3 is the longest, which is only. n. al. er. slightly longer than that of Tone 2, while the duration of Tone 4 is the shortest. Given. Ch. engchi. that the F0 contours are similar between Tone. iv n 2U and Tone. 3, Nordenhake and. Svantesson (1983) further indicated that Tone 2 could be perceived as Tone 3 if it is lengthened. In addition to F0 and duration, amplitude can also affect the perception of Mandarin Chinese tones though only to a small extent. Whalen and Xu (1992) designed stimuli whose formant structures and F0 contours were removed, but the amplitude cues of the stimuli were reserved, and then they asked the participants to 27.

(37) identify the stimuli. The results demonstrated that participants could successfully identify Tone 2, Tone 3, as well as Tone 4, but fail to recognize Tone 1. From the above mentioned studies on acoustic phonetic characteristics of Mandarin Chinese tones, it is clear that fundamental frequency, turning point, ∆F0, duration, and amplitude are the acoustic cues which play a critical role in the perception of Mandarin Chinese tones. Nevertheless, the acoustic quality of tones can. 政 治 大. be influenced by the surrounding context, which may also affect the perception of. 立. tones.. ‧ 國. 學. Shen (1990) studied the tonal coarticulation of Mandarin Chinese and found that. ‧. tonal coarticulation not only affects the F0 height of the onset or offset, but it affects. Nat. io. sit. y. the F0 height of the entire word. The tones that are most easily to be affected are those. er. which follow Tone 1 and Tone 2. Both Tone 1 and Tone 2 have high offset F0 value,. al. n. v i n C hof the following tones. which can raise the entire F0 value e n g c h i U In addition, the high onset F0 value of Tone 4 also has the effect of raising the whole F0 value of the preceding tones. Unlike Tone 1, the offset of Tone 2, and onset of Tone 4, the onset of Tone 2 as well as Tone 3, whose onset F0 value sits on the middle of the frequency range, do not have the ability of raising the entire F0 value of the preceding tones. In addition to all these findings above, shen also found that the tonal contour does not change even if the tone’s entire F0 value has risen. Shen finally pointed out that tonal coarticulation 28.

(38) cannot extend beyond one syllable. 2.6.2 The processing of Mandarin tone Lexical tone is of paramount importance in processing spoken words in tone languages. Fox and Unkefer (1985) asked subjects to identify the tone of each stimulus in a continuum. The results displayed that the responses were much easier to make an ambiguous word a real word rather than a nonword. By the results, Fox and. 政 治 大. Unkefer indicated that lexical tone is an integral part of lexical representation in. 立. Mandarin. Cutler and Chen (1997) asked the subjects to judge whether the. ‧ 國. 學. monosyllabic words and nonwords in pairs in Cantonese, differing only by onset. ‧. consonant, vowel, or tone, were the same or different. The results showed that. Nat. io. sit. y. responses were slower and more inaccurate when the words and nonwords differed by. er. tone than by onset consonant or vowel. Cutler and Chen proposed that tonal. al. n. v i n information arrives later than C segmental which results in the slower h e n ginformation, chi U process of tone. Moreover, Ye and Connie (1999) performed a tone monitoring task. In the experiment, the penultimate syllable of the idiom (t in55 y51 lj ŋ35 j n35) was changed to the close tone (ie. Tone 3 and tone 2 are acoustically close) and far tone (ie. Tone4 and tone 2 are acoustically far) to the final tone, respectively. That is, lj ŋ35 was changed to lj ŋ21 and lj ŋ51. The results demonstrated that responses to far tones were significantly slower than to close tones. The results indicated that tonal 29.

(39) information maps to lexical representation in a graded style. Lee (2000) investigated the processing of lexical tone and segment in Mandarin by using the direct form priming task. In the experiment, eighty monosyllabic Mandarin words were elected as targets. For each target, there were four primes having different kinds of relationship to it, including ID prime, Same Seg prime, Same Tone prime, and Unrelated prime. ID prime means that the prime and the target share not only the. 政 治 大. same segments but also the same tone, such as the prime pȹ w21 (“to run” in English). 立. and the target pȹ w21. Same Seg prime refers to the prime and target sharing only the. ‧ 國. 學. segments, but not tone, such as the prime pȹ w55 (“to fling” in English) and the target. ‧. pȹ w21. Same Tone prime concerns the prime and target sharing merely the tone, but. Nat. io. sit. y. not segments, such as the prime fej21 (“a bandit” in English) and the target pȹ w21.. er. Unrelated prime means that the prime and target are not similar at all, such as the prime. al. n. v i n C hand the target pȹUw21. During English) engchi. t yn51 (“handsome” in. the experiment,. subjects heard the monosyllabic prime first and then judged whether the second monosyllabic word, the target, is a real Mandarin word or not. By this experiment, Lee (2000) found that significant facilitatory priming effect appeared for targets following ID primes. Non-significant facilitatory priming effect occurred for targets following Same Seg primes, while non-significant inhibitory priming effect happened for targets following Same Tone primes. From the results, Lee (2000) indicated that 30.

(40) Same Seg primes cannot fully activate the targets although their segments overlap totally. Therefore, it can be inferred that lexical tone in Mandarin is used on-line to resolve lexical identity. Lee also demonstrated that the decrease of activation level from ID prime, Same Seg prime, to Same Tone prime may be due to the degree of phonological match between the input and lexical representation. That is, Segmental information has higher degree of phonological match to lexical representation than. 政 治 大. tonal information, so that Same Seg prime has stronger power of activation than Same. 立. an independent tier in the processing of Mandarin words.. ‧. 2.7 Summary. 學. ‧ 國. Tone prime. It can be further implied that tone is more like a phonetic segment than. Nat. io. sit. y. In this chapter, two models regarding spoken word recognition and the past. er. studies concerning the acoustic-phonetic cues for the word recognition in Mandarin. al. n. v i n Ccan were discussed. In general, a gap that is, previous studies concerning h ebenobserved; gchi U spoken word recognition mainly focused on western languages, whereas only a few studies focused on Mandarin. It is one of our main goals to fill this gap by investigating the status of different segments and tones in spoken word recognition in Taiwan Mandarin. Two spoken word recognition models, Cohort model and Merge model, were presented as well. As a result, in this study, we are going to examine the two models 31.

(41) to see which model can best explain the spoken word recognition in Taiwan Mandarin.. 立. 政 治 大. ‧. ‧ 國. 學. n. er. io. sit. y. Nat. al. Ch. engchi. 32. i n U. v.

(42) CHAPTER 3. METHODS. This chapter shows the research methods used in this study. Section 1 introduces the subjects’ backgrounds. Section 2 describes the recording and broadcasting equipments. Section 3 introduces the details of the stimuli. Section 4 illustrates the design and procedures of the study. 3.1 Subjects. 政 治 大. Thirty subjects were recruited in this study. They all lived in Taipei City or Taipei. 立. County. The 15 subjects, 7 males and 8 females, were all native Mandarin speakers. ‧ 國. 學. and were not good at Taiwan Southern Min. They were at the age of 22 to 30 at the. ‧. time of participating in the experiment.. io. sit. y. Nat. 3.2 Equipments. er. During the experiment, participants listened to the stimuli played by ACER. al. n. v i n C h was used to record Aspire One Series computer. E-Prime e n g c h i U the participants’ responses. The reaction time of subjects’ responses was also measured by E-Prime. 3.3 Stimuli. All of the 48 stimuli in the experiment were disyllabic words embedded in the carrier sentence “t. 51-k ts 51. 51 ____”, “This word is ____”. The stimuli were. all recorded by Praat with mono channel and 11kHz sampling rate. The 48 disyllabic words were selected from Academia Sinica Balanced Corpus of Modern Chinese. The 33.

(43) details of the stimuli are listed in appendix 1, and 2. 3.3.1 Word frequencies There were totally 48 disyllabic words, which included 24 high frequency words and 24 low frequency words. The 48 disyllabic words were chosen from Word List with Accumulated Word Frequency in Sinica Corpus 3.0. The average frequency of the 24 high frequency words was 2503 occurrences out of 5 million tokens. The. 政 治 大. average frequency of the 24 low frequency words was 16 occurrences out of 5 million. 立. tokens.. ‧ 國. 學. 3.3.2 Segmentation. ‧. There are overall 12 combinations of Mandarin syllable structure, namely V, CV,. Nat. io. sit. y. GV, VG, VN, CVG, CVN, CGV, GVG, GVN, CGVG, and CGVN. In this paper, the. er. 48 disyllabic words contain all of the possible syllable structures in Mandarin, except. al. n. v i n C hwhy the syllable structure for one structure, “V”. The reason “V” is excluded in this engchi U study is that if the single vowel word is replaced by the hiccup noise, there is nothing left to be heard by the subjects. Therefore, those words having only one vowel and nothing else are excluded in this study. Since speech is continuous, it is really challenging to make a clear-cut distinction between segments. Nonetheless, for the purpose of finding out which part of the syllable to be of paramount importance for auditory word recognition in Mandarin, 34.

(44) segmentation needs to be conducted. The following acoustical principles were the criteria for segmentation in this study. 3.3.2.1 Segmentation of the initial consonant The first way to segment the initial consonant from the following prenuclear glide or vowel was to see the waveform. The initial consonant was measured from the starting point of the vibration on the waveform to the beginning of the intense. 政 治 大. vibration on the waveform. In order to further eliminate the quality of the initial. 立. consonant, the segmentation boundary between the initial consonant and the. ‧ 國. 學. prenuclear glide or vowel moved forward about 13 milliseconds, as in Figure 4.. ‧. Another way to distinguish the first cut-off part from the following prenuclear. Nat. io. sit. y. glide or vowel was to examine the spectrogram. The boundary between the initial. er. consonant and the following prenuclear glide or vowel lied on the first relatively dark. al. n. v i n C h of further excluding purpose e n g c h i U the. vertical striation. For the. quality of the initial. consonant, the cut-off part moved backward about 13 milliseconds. After the cut-off part had been decided, we eliminated that part and pasted the hiccup noise, whose duration was the same as the cut-off part, to the position that was originally occupied by the cut-off part, as exemplified in Figure 4.. 35.

(45) 立. 政 治 大. ‧ 國. 學. Figure 4 marked the initial consonant /t/ in /ta51. ‧. Figure 4. The marked part designates the initial consonant /t/ in /ta51 35/. In order to further eliminate the quality of /t/, the hiccup noise replaces the part starting from the first dotted line to 13 milliseconds after the second dotted line.. 35/ ‘university’ (大學). The. sit. y. Nat. io. n. al. er. first red-dotted line, which was situated at the start of the vibration on the waveform,. i n U. v. designated the start of the initial consonant /t/. The second red dotted line, which was. Ch. engchi. located at the first relatively dark vertical striation on the spectrogram, marked the end of the initial consonant /t/. The space between the two red dotted lines was the initial consonant /t/. 3.3.2.2 Segmentation of prenuclear glides There were two points in segmentation of prenuclear glide. One was where the boundary between the initial consonant and the prenuclear glide is; another was where the boundary between the prenuclear glide and the vowel is. The boundary between 36.

(46) the initial consonant and the prenuclear glide could be generally defined by the waveform and the spectrogram. Nevertheless, in order to fully eliminate the quality of prenulear glide, the starting point of the cut-off part was situated slightly before the beginning of the intense vibration on the waveform or the first relatively dark vertical striation on the spectrogram. In terms of the boundary between the prenuclear glide and the vowel, it is relatively difficult to define. Chang (2009) investigated the vowels. 政 治 大. in Taiwan Mandarin acoustically. The principle for him to analyze the vowel quality. 立. of a diphthong was to examine the energy and the comparatively steady formants. In. ‧ 國. 學. the spectrogram, the darker area represents the stronger energy; the lighter, the weaker.. ‧. Vowels usually have stronger energy than prenuclear glides. In addition, the formants. Nat. io. sit. y. of the vowel are steady compared with those of the prenuclear glide. Therefore,. er. Chang (2009) proposed that the main vowel of a diphthong should be the section. al. n. v i n having both the strongest energyCand steady formants. In this study, h ethencomparatively gchi U following Chang’s (2009) methods, the boundary between the prenuclear glide and the vowel could be decided, as illustrated in Figure 5.. 37.

(47) Figure 5. The marked part designates the prenuclear glide / / in /ta51 two dotted lines is then replaced by the hiccup noise.. 學. ‧ 國. 立. 政 治 大 35/. The part between the. ‧. Figure 5 marked the prenuclear glide / / in /ta51. 35/(大學, university in. y. Nat. al. er. io. sit. English). The first red dotted line, which was situated at the first relatively dark. n. vertical striation on the spectrogram, designated the start of the prenuclear glide / /.. Ch. engchi. i n U. v. The second red dotted line, which was located at the start of the comparatively steady formants on the spectrogram, marked the end of the prenuclear glide / /. The space between the two red dotted lines was the prenuclear glide / /. After determining the starting point and the ending point of the prenuclear glide, the prenuclear glide was replaced by the hiccup noise, which had the same duration as the prenuclear glide.. 38.

(48) 3.3.2.3 Segmentation of vowels The vowel in Mandarin is either preceded by the initial consonant or by the prenuclear glide. The ways to distinguish the initial consonant and the prenucleaer glide from the vowel have already been discussed above. What has yet to be discussed is about distinguishing the vowel from the postnuclear glide or final nasal. The ways to distinguish the vowel from the postnuclear glide were similar to the ways to. 政 治 大. distinguish the vowel from the prenuclear glide. The relatively dark area represents. 立. the strong energy section, which designates the position of the main vowel. The. ‧ 國. 學. comparatively light area represents the weak energy section, which shows the position. ‧. of the postnuclear glide. Furthermore, the comparatively steady formants can also. Nat. io. sit. y. designate the position of the main vowel.. er. The vowel in Mandarin can also be followed by the final nasal. The ways to. al. n. v i n C hnasal was based onUthe acoustic cues proposed by distinguish the vowel from the final engchi Ladefoged (2006). There are four acoustic cues used in this study to differentiate the vowel from the final nasal. First, a clear mark of a nasal consonant is a sharp change in the spectrogram at the time of the formation of the articulatory closure. Second, the bands of the nasal are fainter than those of the vowel. Third, the first formant of the nasal consonant is usually very low, which is centered at about 250 Hz. Fourth, there is a large region above the first formant with no energy. According to the acoustic 39.

(49) cues mentioned above, the vowel and the final nasal can be distinguished. After determining the starting point and the ending point of the vowel, the vowel was replaced by the hiccup noise, as illustrated in Figure 6.. 立. 政 治 大. ‧. ‧ 國. 學 / / in /t ŋ21 tȹwan35/. The part between the two. n. Ch. er. io. al. sit. y. Nat. Figure 6. The marked part designates the vowel dotted lines is then replaced by the hiccup noise.. i n U. v. Figure 6 marked the vowel /a/ in /taŋ21 tȹwan35/ ‘political party’ (黨團). The. engchi. first red dotted line, which was situated at the first relatively dark vertical striation on the spectrogram, designated the start of the vowel /a/. The second red dotted line, which was located at the sharp change on the spectrogram, marked the start of the final nasal /ŋ/. The space between the two red dotted lines was the vowel / /.. 40.

(50) 3.3.2.4 Segmentation of postnuclear glides, and final nasals The ways of designate the boundary between the postnuclear glide and the preceding vowel, as well as the final nasal and the previous vowel have already been addressed above. The ways of designating the ending points of the postnuclear glide and the final nasal are the same, namely, the point where the waveform shows no vibration and the spectrogram displays no energy, as illustrated in Figure 7. After. 政 治 大. determining the starting point and the ending point of the prenuclear glide or the final. 立. nasal, the section occupied by the postnuclear glide or the final nasal was cut off and. ‧ 國. 學. was replaced by the hiccup noise with the same duration.. ‧. n. er. io. sit. y. Nat. al. Ch. engchi. i n U. v. Figure 7. The marked part designates the final nasal /n/ in /xwan35 t in51/. The first dotted line displays the beginning of /n/; the second dotted line shows the end of /n/. This marked part is then replaced by the hiccup noise.. Figure 7 marked the final nasal /n/ in /xwan35 t in51/ ‘the environment’ (環境). 41.

(51) The first red dotted line, which was situated at the sharp change on the spectrogram, designated the start of the final nasal /n/. The second red dotted line, which was located at the place showed no vibration and energy, marked the end of the final nasal /n/. The space between the two red dotted lines was the final nasal /n/. 3.3.3 Leveling of tones The tones of the 48 stimuli, including both high and low frequency words, were. 政 治 大. leveled, which means that the pitch contours of the disyllabic words disappear,. 立. resulting in the robot-like sounds. The F0 of the stimuli centers around 100Hz, as. ‧ 國. 學. illustrated in figure 8.. ‧. n. er. io. sit. y. Nat. al. Figure 8. The test item 中心(/t oŋ55. Ch. engchi. i n U. v. in55/, a center) whose tones are leveled at around 100Hz.. In this figure, the original tones of the test item 中心(/t oŋ55. in55/, a center). were designated by the two gray dotted lines. The green straight line marked the manipulated tones, centered at 100Hz.. 42.

數據

Figure 1 Schema of the cohort model..........................................................................12  Figure 2 The basic architecture of Merge (Norris, McQueen, & Cutler, 2000)……...14  Figure 3 Acoustic-phonetic characteristics of fricatives
Figure 1. Schema of the cohort model.
Figure 2. The basic architecture of Merge. The facilitatory connections, which are unidirectional, are  displayed by bold lines with arrows; the inhibitory connections, which are bidirectional, are illustrated  by fine lines with circles (Norris, McQueen,
Figure 3. Acoustic-phonetic characteristics of fricatives (Borden et al., 1994)
+7

參考文獻

相關文件

了⼀一個方案,用以尋找滿足 Calabi 方程的空 間,這些空間現在通稱為 Calabi-Yau 空間。.

After the Opium War, Britain occupied Hong Kong and began its colonial administration. Hong Kong has also developed into an important commercial and trading port. In a society

• ‘ content teachers need to support support the learning of those parts of language knowledge that students are missing and that may be preventing them mastering the

• helps teachers collect learning evidence to provide timely feedback & refine teaching strategies.. AaL • engages students in reflecting on & monitoring their progress

Robinson Crusoe is an Englishman from the 1) t_______ of York in the seventeenth century, the youngest son of a merchant of German origin. This trip is financially successful,

fostering independent application of reading strategies Strategy 7: Provide opportunities for students to track, reflect on, and share their learning progress (destination). •

Strategy 3: Offer descriptive feedback during the learning process (enabling strategy). Where the

How does drama help to develop English language skills.. In Forms 2-6, students develop their self-expression by participating in a wide range of activities