• 沒有找到結果。

Chapter 3 Experiments

3.3 Discussion

It is shown by these experiments that our proposed system and the applied techniques performed better than those by others in the literature.

In the first set of experiments, we showed the poorness of the feasibility in the perception model by Hillenbrand. In the second set of experiments, we just tried to evaluate the classification ability of the neural fuzzy classifier SONFIN if we compare it to the system proposed by Zahorian (1999) [13] which also used the DCSCs as the feature set but partitioned neural networks as the classifier. The experiment showed the accuracy rate by SONFIN is 72.72% higher than the accuracy rate of 71.50 % from the system proposed by Zahorain. Almost 1.2% improved by SONFIN.

The experiment results showed the powerful classification ability of SONFIN. It used a simpler structure and easier to be trained than the PNN, but performed better. The reasons may be the partition ability in the input/output space and the effective inference of rules. Each rule constructed in SONFIN may classify the output space as many parts as the output class, and multiple rules are integrated via fuzzy inference which partitioned the output space carefully and precisely.

The modified feature set is proved more representative in experiment set III, the higher recognition rate in experiment set II is 74.41% which is 1.69% higher than that in experiment set II. the experiment result showed the idea of the enhancement of the acoustic characteristics in the spectrum domain is feasible. The procedure of acoustic-enhancement emphasized the spectral harmonics and balanced them which suggested in the acoustic-phonetic researches [5]. The acoustic-checking procedure is

applied in experiment IV. The experiment result showed the recognition rate is raised to 74.75%. Thus, the potential and effectiveness of the proposed system was verified.

Chapter 4 Conclusion

In this thesis, we attempt to develop a more robust speaker independent automatic recognizer for English vowels. We integrate the spectral shape based features and acoustic characteristics in our system.

Moreover several techniques are applied in this work.

Fist of all, we modify the gross shape of spectrum, which will be encoded to the feature set of the token. In this phase, we try to enhance the spectral peaks by eliminating the variation between harmonics and balance the amplitude difference by a spectrum-level-normalization process. The DCTC is adopted to encode the spectrum with a nonlinearity frequency warping according to the characteristic of human perception. In order to represent the temporal cues, we use the DCSC to encode the trajectory of spectrum. The suitable time warping can be adjusted to preserve the information better in a finite feature dimension.

A neural fuzzy inference classifier called as SONFIN is adopted in the proposed system as the main recognizer. The SONFIN has the ability to construct its optimal structure by itself and can self-adjust its parameters such as membership function and the consequent parameters.

Experiments showed the SONFIN has a simpler structure and better performance.

Finally a formant checking procedure is done in the system. The procedure is used to distinguish the ambiguous cases according to their

acoustic evidence. If the confidence factor form SONFIN classification is not high enough, the recognition-result is taken as ambiguous and their acoustic cues such as fundament frequency and formant trajectory will be evaluated and checked with the model of vowels. This procedure provides another view to look at the token and provide a more accuracy recognition result. Many experiments based on the popular acoustic-phonetic database are done and the results showed that our proposed system performed much better.

Bibliography

[1] L. Rabiner and B. H. Juang, “Fundamentals of Speech Recogntion”, Prentice-Hall International, NJ, 1993

[2] A. M. A. Ali, J. van der Spiegel, and P. Mueller, “Acoustic-Phonetic Fetaures for the Automatic Classification of Stop-Consonants”, IEEE Trans. Speech and Audio Processing, vol. 9, pp. 833-841, 2001.

[3] G. Peterson and H. L. Barney, ‘‘Control methods used in a study of the vowels,’’ J. Acoust. Soc. Am. 24, 175–184, 1952.

[4] J. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler, ‘‘Acoustic characteristics of American English vowels,’’ J. Acoust. Soc. Am. 97, 3099–3111, 1995.

[5] J. Hillenbrand, and A. H. Robert, “A narrow band pattern-matching model of vowel perception”, J. Acoust. Soc. Am 113(2), 1044-1055, 2003.

[6] L. C. W. Pols, L. J. van der Kamp, and, R. Plomp, ‘‘Perceptual and physical space of vowel sounds,’’ J. Acoust. Soc. Am. 46, 458–467, 1969.

[7] Z. B. Nossair and S. A. Zahorian, “Dynamic spectral shape features as acoustic correlates for initial stop consonants,” J. Acoust. Soc.

Amer., vol. 89, pp. 2978–2991, 1991.

[8] Z. B. Nossair, P. L. Silsbee, and S. A. Zahorian, “Signal modeling enhancements for automatic speech recognition,” in Proc.

ICASSP’95, pp. 824–827.

[9] L. Rudasi and S. A. Zahorian, “Text-independent talker identification with neural networks,” in Proc. ICASSP’91, pp. 389–392.

[10] L. Rudasi and S. A. Zahorian, “Text-independent speaker identification using binary-pair partitioned neural networks,” in Proc.

IJCNN’92, pp. IV: 679–684.

[11] S. A. Zahorian and A. J. Jagharghi, “Spectral-shape features versus formants as acoustic correlates for vowels,” J. Acoust. Soc. Amer., vol. 94, pp. 1966–1982, 1993.

[12] S. A. Zahorian, D. Qian, and A. J. Jagharghi, “Acoustic-phonetic transformations for improved speaker-independent isolated word recognition,” in Proc. ICASSP’91, pp. 561–564.

[13] S. A. Zahorian and Z. B. Nossair, “A partitioned neural network approach for vowel classification using smoothed time/frequency features”, IEEE Trans. Speech and Audio Processing, vol. 7, pp.

414-425, 1999.

[14] C. F. Juang and C. T. Lin, “An on-line self-constructing neural fuzzy inference network and its application,” IEEE Trans. Fuzzy Syst., vol.

6, pp. 12–32, 1998.

[15] C. T. Lin and C. S. G. Lee, Neural Fuzzy Systems: A Neural-Fuzzy Synergism to Intelligent Systems. Englewood Cliffs, NJ:

Prentice-Hall, 1996.

[16] C. T. Lin, Neural Fuzzy Control Systems with Structure and Parameter Learning. Singapore: World Scientific, 1994.

[17] W. D. Goldenthal, “Statistical trajectory models for phonetic recognition,” Ph.D. dissertation, Mass. Inst. Technol., Cambridge,

[18] W. D. Goldenthal and J. R. Glass, “Modeling spectral dynamics for vowel classification,” in Proc. EUROSPEECH’93, pp. 289–292.

[19] H. Leung and V. Zue, “Some phonetic recognition experiments sing artificial neural nets,” in Proc. ICASSP’88, pp. I: 422–425.

[20] H. Leung and V. Zue, “Phonetic classification using multi-layer perceptrons,” in Proc. ICASSP’90, pp. I: 525–528.

[21] H. M. Meng and V. Zue, “Signal representations for phonetic classifi-cation,” in Proc. ICASSP’91, pp. 285–288.

[22] H. Gish and K. Ng, “A segmental speech model with applications to word spotting,” in Proc. ICASSP’93, pp. II-447–II-450.

[23] M. S. Phillips, “Speaker independent classification of vowels and diphthongs in continuous speech,” in Proc. 11th Int. Cong. Phonetic Sciences, 1987, vol. 5, pp. 240–243.

[24] R. A. Cole and Y. K. Muthusamy, “Perceptual studies on vowels excised from continuous speech,” in Proc. ICSLP’92, pp. 1091–1094.

[25] B. S. Rosner and J. B. Pickering, “Vowel Perception and Production”, Oxford U.P., Oxford, 1994

[26] D. H. Klatt, ‘‘Prediction of perceived phonetic distance from critical-band spectra: A first step,’’ IEEE ICASSP, 1278–1281, 1982.

[27]S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, and classifi-cation,” IEEE Trans. Neural Networks, vol. 3, pp. 683–697, 1992.

[28] S. Mitra and S. K. Pal, “Fuzzy multilayer perceptron, inferencing and rule generation,” IEEE Trans. Neural Networks, vol. 6, pp. 51–63, 1995.

Classification and Reule Generation," IEEE Trans. Neural Networks, vol. 8, pp. 1338–1350, 1997.

[30] X. Sun, "Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio," IEEE ICASSP2002, , 2002, Vol 1, pp 333-336.

相關文件