CHAPTER II. LITERATURE REVIEW
2.6 Summary
國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
2.6 Summary
In previous studies, the acoustic approach had been widely applied to the investigation of speech sounds of languages in the world. Some studies have been dedicated to the acoustic properties of speech sounds of Hakka dialects, but most of the former acoustic research in studying speech sounds focus on the phonological system of Si-xian dialect.
Since there are scarcely any studies focusing on the acoustic studies of Hai-lu Hakka segments, the aim of this present study is to investigate the acoustic nature of Hai-lu Hakka vowels, and hope to provide reference materials for further research on speech cognition, production, perception, pathological diagnosis and correction of speech disorders, and Hakka language teaching.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
CHAPTER III
METHODOLOGY
3.1 Subjects
A total of six subjects were invited to participate in the study. The subjects were aged from fifty-three to seventy-eight years old. Three of them were male and the other three subjects were female. All of them were Hakkaese and were brought up in Hakkaese families, and they speak Hai-lu Hakka as their mother tongue. Their birthplaces and living places are within Xinpu town, Hsinchu County. As for the language background of the subjects, all of the subjects are fluent in using Hai-lu Hakka, and one female subject is fluent in using Si-xian Hakka, too. All of the subjects can speak Taiwan Mandarin fluently. Most of them can understand Taiwanese Southern Min, but they cannot speak Taiwanese Southern Min fluently.
Four of the subjects have learned some English in school but most are considered not fluent by self evaluation. For the detailed data and language background of the subjects, please see Appendix I.
3.2 Materials
This section presents the materials in the present study, including design of testing words, and equipments.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.2.1 Design of testing words
There were 32 test items in total in this study. The 32 test items include five types of syllable structure: CV, CVV, CVVV, CVC, and CVVC. During the experiment, the testing words were produced in two forms: the citation form and the sentence form. In the citation form, the test words were produced in individual syllables with several seconds of interval between syllables. As for the sentence form, the test words were produced in the short sentence [tshiaŋ24 ŋiam33 tshut5 ___ lia24 kai11 sɨ33 loi55] (Please say ____ this word), and there are intervals between these words.
Almost all of the testing words have the tone value [55], except for the words:
[nie31] (meaning, ‘蟻’ ants). Tone [55] is chosen for consistency purpose, and the other reason is that it is the most common tone value which has the most possible syllables in the lexicon. There are more possible syllables that include a greater variety of combinations of segments in the syllable final and could also be written down in Chinese characters. The rationale behind the selection of the same tone value for test items is because studies such as Howie (1976), Tsao & Yang (1984), and Hoole & Hu (2004) found that tones do have influences on the vowel quality. For example, Howie (1976) found syllables with high tones have higher vowel quality.
For the detailed test words involved in this study, please see Appendix II.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.2.2 Equipments
This section introduces the equipments utilized in the present study. The sub-sections include the stimuli, the recording apparatus, acoustic analysis apparatus, and statistic software and vowel space plotting software.
3.2.2.1 Stimuli
Testing word cards were used as the stimuli for this study. The testing word cards were made of papers in 6cm × 12cm size. Characters of the testing words and carrier sentences that contain the testing words sized 72pts were printed in black New Ximing Font in the center of word cards. Since there were three practice cards for the citation section and three for the sentence section and 192 tokens for each subject, a total of 205 word cards were employed in this study.
3.2.2.2 Recording apparatus
The recording apparatus used in the present study was KAY Electronics’ CSL
4100 speech analysis package, which is provided by the phonetic and psycholinguistic
laboratory of the Graduate Institute of Linguistics, National Chengchi University.
One of the advantages of using KAY Electronics’ CSL 4100 speech analysis
package to record the speech sound is that it could convert the analog signals into digital signals with little distortion. Thus, it provides more convenience for the computer to edit, store, and analyze the speech sounds. The microphone was placed
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
near the subjects’ mouth in a distance during the recording, so that sounds could be
recorded clearly, and still subjects felt comfortable. The whole recording section was stored as PCM, 11.025 kHz, 16 bits, monaural WAV sound files.
3.2.2.3 Acoustic analysis apparatus
The apparatus utilized in this study to analyze the record files is also KAY Electronics’ CSL 4100 speech analysis package. KAY has been used widely in studies
of acoustic analysis in the fields of acoustics, audiology, speech pathology, and acoustic phonetics because of its convenience in edit, store, and analyze.
3.2.2.4 Statistic software and vowel space plotting software
The statistic data will be processed with the SPSS 11.0 (Statistics Package for Social Science, version 11.0) released by SPSS Inc. SPSS is one of the most widely used programs for statistical analysis in social sciences. The analysis method used in this study was descriptive statistics. Most of the vowel spaces or formant plots were plotted with the software Origin 6.0.
3.3 Procedures
This chapter introduces the procedures in this study. Section 3.3.1 describes the recording procedure, and the section presents the acoustic measurement of the vowels after collecting the data.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.1 Recording
The subject was seated in a quiet room, and the recording apparatus was placed before the subjects on a table. There were sections of personal data filling and instruction before the recording begins. During the instruction section, the subjects will be informed of the requirements and things to be noticed in the recording. For the detailed personal data filling form for the subjects, please see Appendix III.
Each of the 32 words was printed on a card. A microphone was placed on a table in front of the subject’s mouth. The distance between the subject’s mouth and the
microphone was approximately 15 cm. The recording system consisted of an IBM ThinkPad X200s laptop computer and one general extra microphone. The subjects were asked to read the words on the card aloud at a normal speaking rate. If any word was unfamiliar, the experimenter would explain the word or ask the subject to try to read it with the assistance of its phonetic transcription. No modelling of the sound production was provided by the experimenter.
After the instruction section, the recoding will further be divided into two sections: a citation form section and a sentence form section. Before each section, three practice word cards in the form of that section will be given to simulate the process in the recording. During each section, three piles of word cards will be given to the subjects since three tokens for each testing word were required, and
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
consequently three rounds of the reading of word cards were needed. To ensure that the reading speed was generally controlled at similar pace, the word cards were presented to the subjects one by one by the experimenter rather than handled by the subjects themselves.
The subjects were asked to read the testing words in the speed of their ordinary conversation. The order of word cards was randomized in each round in order to prevent any possible patterns created by the order of the testing words. There will be short breaks between sections for the subjects to take a rest and for the experimenter to set up the recording instrument.
3.3.2 Acoustic measurements
In this section, we present the introduction to the acoustic measurement of six types of syllable structure: CV, CVV, CVC, CVVC, and CVVV. All the acoustic measurements were done by the help of Kay CSL 4100.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.2.1 Acoustic measurement of CV
Figure 2.1 The spectrogram of [tʰi55] (meaning, ‘提’ ‘lift’)
Table 2.1 Acoustic measurement of [tʰi55] (meaning, ‘提’ ‘lift’)
[tʰ] [i]
Start time 0.1567 ms 0.29887 ms
End time 0.28953 ms 0.58017ms
F1 245Hz
F2 2820Hz
In CV syllable structure, the word [tʰi55] (meaning, ‘提’ ‘lift’) is chosen as an example for acoustic analysis here. The spectrogram of [tʰi55] (meaning, ‘提’ ‘lift’) is
shown in Figure 2.1. [tʰi55] consists of the aspirated voiceless alveolar stop [tʰ] and the high front vowel [i]. We can identify the stop [tʰ] on the spectrogram by the burst between 0.1567 ms and 0.28953 ms. The stop [tʰ] is tagged as in the rectangle on the
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
left of Figure 3.1.
Between 0.28953 ms 0.29887 ms, the sound is the combination of the stop [tʰ]
and the high front vowel [i], thus this part is excluded from our analysis of high front vowel [i]. The high front vowel [i] starts at about 0.29887 ms, and it ends at 0.58017ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The arrows show where the first formant and the second formant of high vowel [i] are. The F1 of high vowel [i] is about 245Hz, and the F2 value is about 2820Hz.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.2.2 Acoustic Measurement of CVV
Figure 2.2 The spectrogram of [kʰia55] (meaning, ‘佢’ ‘his’)
Table 2.2 Acoustic measurement of [kʰia55] (meaning, ‘佢’ ‘his’)
[kʰ] [i] [a]
Start time 0.45760 ms 0.51471 ms 0.65157 ms End time 0.49855 ms 0.64295 ms 0.73885 ms
F1 560.81Hz 917.69Hz
F2 2421.68Hz 1962.83Hz
In CVV syllable structure, the word [kʰia55] (meaning, ‘佢’ ‘him’) is chosen as an example for acoustic analysis here. The spectrogram of [kʰia55] (meaning, ‘佢’
‘him’) is shown in Figure 2.2. [kʰia55] consists of the aspirated voiced velar stop [kʰ], the high front vowel [i], and the low central vowel [a]. We can identify the stop [kʰ]
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
on the spectrogram by the burst between 0.45760 ms and 0.49855 ms. The stop [kʰ] is
tagged as in the rectangle on the left of Figure 3.2.
Between 0.49855 ms 0.51471 ms, the sound is the combination of the stop [kʰ]
and the high front vowel [i], thus this part is excluded from our analysis of high front vowel [i]. The high front vowel [i] starts at about 0.51471 ms, and it ends at 0.64295 ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The two black arrows show where the first formant and the second formant of high vowel [i] are. The F1 of high vowel [i] is about 560.81Hz, and the F2 value is about 2421.68Hz.
Between 0.64295 ms 0.65157 ms, there is the transition of the high front vowel [i]
and the low central vowel [a], thus this part is excluded from our analysis of the two vowels. The low central vowel [a] starts at about 0.65157 ms, and it ends at 0.73885 ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The two white arrows show where the first formant and the second formant of low central vowel [a] are. The F1 of high vowel [i] is about 917.69Hz, and the F2 value is about 1962.83Hz.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.2.3 Acoustic measurement of CVVV
Figure 2.3 the spectrogram of [tʰiau55] (meaning, ‘條’ ‘a long narrow strip’)
Table 2.3 Acoustic measurement of [tʰiau55] (meaning, ‘條’ ‘a long narrow strip’)
[tʰ] [i] [a] [u]
Start time 0.41672 ms 0.58200 ms 0.65913 ms 0.72524 ms
End time 0.56397 ms 0.63408 ms 0.71622 ms 0.91155 ms
F1 676.67Hz 863.33Hz 700.00Hz
F2 2100.00Hz 1656.00Hz 933.33Hz
In CVVV syllable structure, the word [tʰiau55] (meaning, ‘條’ ‘a long narrow strip’) is chosen as an example for acoustic analysis here. The spectrogram of [tʰiau55]
(meaning, ‘條’ ‘a long narrow strip’) is shown in Figure 3.3. The syllable [tʰiau55]
consists of the aspirated voiceless alveolar stop [tʰ], the high front vowel [i], the low
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
central vowel [a], and the high back rounded vowel [u]. We can identify the stop [tʰ]
on the spectrogram by the burst between 0.41672 ms and 0.56397 ms. The stop [tʰ] is tagged as in the rectangle on the left of Figure 3.3.
Between 0.56397 ms and 0.58200 ms, the sound is the combination of the stop [tʰ] and the high front vowel [i], so this part is excluded from our analysis of the mid
back rounded [i]. The high front vowel [i] starts at about 0.58200 ms, and it ends at 0.63408 ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The two black arrows show where the first formant and the second formant of the high front vowel [i] are. The F1 of the high front vowel [i] is about 676.67 Hz, and the F2 value is about 2100.00 Hz.
Between 0.63408 ms. and 0.65913 ms, there is the transition of the high front vowel [i] and the low central vowel [a], and thus this part is excluded from our analysis of the two vowels. The low central vowel [a] starts at about 0.65913 ms, and it ends at 0.71622 ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The two white arrows show where the first formant and the second formant of low central vowel [a] are. The F1 of high vowel [i] is about 863.33 Hz, and the F2 value is about 1656.00 Hz.
Between 0.71622 ms and 0.72524 ms, there is the transition of the low central vowel [a] and the high back rounded vowel [u], and thus this part is excluded from
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
our analysis of the two vowels. The high back rounded vowel [u] starts at about 0.72524 ms, and it ends at 0.91155 ms. As the vowel is identified on the spectrogram, we then move the cursor on the left to the steady state of the vowel. The two arrows with dashed line show where the first formant and the second formant of high back rounded vowel [u] are. The F1 of high back rounded vowel [u] is about 700.00Hz, and the F2 value is about 933.33Hz.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.2.4 Acoustic measurement of CVC
Figure 2.4 The spectrogram of [tʰot55] (meaning, ‘脫’ ‘get off ’)
Table 2.4 Acoustic measurement of [tʰot55] (meaning, ‘脫’ ‘get off ’)
[tʰ] [o] [t]
Start time 0.34820 ms 0.46966 ms 0.56728 ms End time 0.46058 ms 0.55706 ms 0.63652 ms
F1 256.67Hz
F2 700Hz
In CVC syllable structure, the word [tʰot55] (meaning, ‘脫’ ‘get off’) is chosen as
an example for acoustic analysis here. The spectrogram of [tʰot55] (meaning, ‘脫’ ‘get off’) is shown in Figure 2.4. The syllable [tʰot55] consists of the aspirated voiceless alveolar stop [tʰ], the mid back rounded [o], and the unaspirated voiceless alveolar
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
stop [t]. We can identify the stop [tʰ] on the spectrogram by the burst between 0.34820 ms and 0.46058 ms. The stop [tʰ] is tagged as in the rectangle on the left of Figure 2.4.
Between 0.46058 ms 0.46966 ms, the sound is the combination of the stop [tʰ]
and the mid back rounded [o], so this part is excluded from our analysis of the mid back rounded [o]. The mid back rounded [o] starts at about 0.46966 ms, and it ends at 0.55706 ms. As the vowel is identified in the spectrogram, we then move the red cursor on the left to the steady state of the vowel. The two black arrows show where the first formant and the second formant of the mid back rounded [o] are. The F1 of the mid back rounded [o] is about 256.67Hz, and the F2 value is about 700Hz.
Between 0.55706 ms and 0.56728 ms, there is the transition of the mid back rounded [o] and the unaspirated voiceless alveolar stop [t], and thus this part is excluded from our analysis of the vowel. The unaspirated voiceless alveolar stop [t]
starts at about 0.56728 ms, and it ends at 0.63652 ms.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
3.3.2.5 Acoustic measurement of CVVC
Figure 2.5 The spectrogram of [tʰiap55] (meaning, ‘帖’ ‘a handwritten copy’)
Table 2.5 Acoustic measurement of [tʰiap55] (meaning, ‘帖’ ‘a handwritten copy’)
[tʰ] [i] [a] [p]
Start time 0.44591 ms 0.53339 ms 0.60555 ms 0.67707 ms
End time 0.48986 ms 0.59278ms 0.66302 ms 0.71411 ms
F1 560.00Hz 956.67Hz
F2 2006.67Hz 1656.00Hz
In CVVC syllable structure, the word [tʰiap55] (meaning, ‘帖’ ‘a handwritten copy’) is chosen as an example for acoustic analysis here. The spectrogram of [tʰiap55]
(meaning, ‘帖’ ‘a handwritten copy’) is shown in Figure 2.5. The syllable [tʰiap55]
consists of the aspirated voiceless alveolar stop [tʰ], the high front vowel [i], the low
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
central vowel [a], and the unaspirated voiceless bilabial stop [p]. We can identify the stop [tʰ] on the spectrogram by the burst between 0.44591 ms and 0.48986 ms. The stop [tʰ] is tagged as in the rectangle on the left of Figure 2.5.
Between 0.48986 ms and 0.53339 ms, the sound is the combination of the stop [tʰ] and the high front vowel [i], and thus this part is excluded from our analysis of the
high front vowel [i]. The high front vowel [i] starts at about 0.53339 ms, and it ends at 0.55706 ms. As the vowel is identified on the spectrogram, we then move the red cursor on the left to the steady state of the vowel. The two black arrows show where the first formant and the second formant of the high front vowel [i] are. The F1 of the high front vowel [i] is about 560.00Hz, and the F2 value is about 2006.67Hz.
Between 0.55706 ms and 0.60555 ms, there is the transition of the high front vowel [i] and the low central vowel [a], and thus this part is excluded from our analysis of the two vowels. The low central vowel [a] starts at about 0.60555 ms, and it ends at 0.66302 ms. As the vowel is identified in the spectrogram, we then move the red cursor on the left to the steady state of the vowel. The two white arrows show where the first formant and the second formant of low central vowel [a] are. The F1 of high vowel [i] is about 956.67Hz, and the F2 value is about 1656.00Hz.
Between 0.66302 ms and 0.67707 ms, there is the transition of the mid back rounded [o] and the unaspirated voiceless bilabial stop [p], and thus this part is
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
excluded from our analysis of the vowel. The unaspirated voiceless bilabial stop [p]
starts at about 0.67707 ms, and it ends at 0.71411 ms.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
CHAPTER IV
FINDINGS AND DISCUSSIONS
4.1 Overview of the data
In this chapter, we present the results of the study. Among the 32 test words, one test word [tsʰioi55] is considered to be no longer spoken as the phonetic transcription in Hakka dictionary, and the six subjects pronounce it as [tsʰoi55] instead, and thus this test word is excluded from present study. Besides [tsʰoi55], another test word [iai55] can only be spoken correctly by two of the six subjects.
Excluding the test word [tsʰoi55], there are 31 test words, and therefore 186 tokens (31x3x2) are spoken by six subjects, and we expect 1116 (31 test words x3 repetitions x 2 conditions x 6 speakers) acoustic data. However, the test word [iai55]
can only be pronounced correctly by two of the six subjects, and the rest four subjects pronounce [ʒai55] instead. We have to minus 24 (1 test word x 3 repetitions x 2 conditions x 4 speakers) acoustic data. Finally, we get 1092 reliable recording data under study.
After the recording was done, all the recording data were analyzed by using KAY CSL 4100. Among the 1092 recording data, there were 6 blurred data that are unreadable to the software, thus they are also excluded from the present analysis, and therefore there are totally 1086 reliable acoustic data under study.
‧ 國
立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
After 1086 acoustic data are analyzed by using KAY CSL 4100 and processed by SPSS 11.0, the mean level of frequency of F1 and F2 value of the six monophthongs,
eleven diphthongs and three triphthongs are presented in the following sub-sections.
Section 4.2 presents the acoustic analysis and the spectrogram of six monophthnogs [i], [e], [ɨ], [a], [o], and [u]. Section 4.3 presents the acoustic analysis and the spectrogram of eleven diphthnogs [ie], [ia], [io], [iu], [eu], [ai], [au], [oi], [ui], [ue], and [ua]. Section 4.4 presents the acoustic analysis and the spectrogram of three triphthnogs [iai], [iau], and [uai]. The summary of the findings is provided in section 4.5.
4.2 Acoustic analysis of monophthongs
This section presents the acoustic analysis of the six monophthongs [i], [e], [ɨ], [a], [o], and [u]. To describe the location of six monophthongs precisely in a phonetic vowel space, we design twelve test words that place six monophthongs [i], [e], [ɨ], [a],
[o], and [u] in both CV and CVC structure. The twelve test words are [tʰi55], [kʰe55], [tsʰɨ55], [kʰa55], [tʰo55], [tʰu55], [tit55], [tet55], [ʒɨt55], [kat55], [tʰot55], and [kut].
The mean F1 and F2 values (Hz) of 6 monophthongs of Hai-lu Hakka in all subjects are presented in section 4.2.1, and the spectrogram analysis of six monophthongs is provided in section 4.2.2.
‧
4.2.1 Formant frequencies of monophthongs
This section gives the acoustic data analysis of the six monophthongs. After acoustic data are analyzed by using KAY CSL 4100, the mean level of frequency of F1 and F2 value of the six monophthongs from six subjects is provided in Table 3.1 Table 3.1 Mean F1 and F2 values (Hz) of 6 monophthongs of Hai-lu Hakka in all subject 5, and subject 6) are similar to those for the female speakers (subject 1, subject 2, and subject 3), respectively. The overall formant frequency values, particularly the F1 and F2 values, of the vowels for the male speakers are lower than those for the female speakers as expected. It is due to the differences in