本論文以經驗模態分解法(EMD)並結合粒子群演算法(PSO)訓練本質模態函 數(IMF)之最佳權重值強化情緒語音訊號,接著使用梅爾倒頻譜參數(MFCC)方法 計算語音特徵值,最後由 HMM 辨識之,期望能改善情緒語音之辨識率。以 EMD 所分解出的 IMF 含有的情緒成份比例不一,由實驗結果可看出,不同的語料其 包含的情緒含量不同,使用 EMD+MFCC 無法有效將情緒分離出來,導致辨識率 比 MFCC 方法差。因此,我們將結合 EMD 與 PSO,利用 PSO 找出不同語料之
IMF 權重值組合,由實驗結果可知,採用此法比起 MFCC 方法可以改善平均辨 識率約 9%~10%。另外,由實驗得知,採用 EMDRE 和 ERBAF 方法擷取特徵值,
其辨識率遠低於 PEM 之特徵值計算方法,PEM 方法比 EMDRE 和 ERBAF 方法 之辨識率高出 10%~21%。因此,本論文所提出利用 EMD 方法強化情緒語音訊 號,接著由 PSO 找出語料最佳 IMF 之權重值組合合成新的情緒語音訊號,再由 MFCC 計算出的特徵值,對於幫助提升情緒語音之辨識率是可行的方法。
在未來我們實驗室希望能夠將本論文所提出之方法應用到不同專業領域上面或 者改善此方法更進一步提升情緒語音辨識率。因此,在未來的研究方向如下:
1. 由於 EMD 計算所耗費的時間太多,不能實用於系統資源不高的智慧型系統 上做即時應用,期待以後有方法能夠加速 EMD 的執行速度。
2. 在本論文 EMD 實驗中,所有情緒語料訓練以及辨識皆採用單一的 IMF 權重
值進行,因此未來可能針對不同情緒之語料分別採用不同 IMF 權重值得到 更好的辨識率。
3. HTK 的環境尚不熟悉,無法細調 HTK 之參數,未來希望能夠進一步的研究 HTK 軟體,熟悉參數設定方法,完善 PEM 方法應用在 HTK 的程式架構,
提高 HTK 的之辨識率。
4. 近年來,於生醫領域中已經有利用生醫訊號辨識情緒狀態,但是研究採用
EMD 方法,因此可以結合 EMD 與生醫訊號,提高辨識情緒之辨識率。期 待未來會出現這方面之研究。
參考文獻
[1] B. Schuller, G. Rigoll, and M. Lang, “Hidden Markov Model-Based Speech Emotion Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2003, Vol.2, pp. II - 1-4, 2003.
[2] N. Fragopanagos and J. G. Taylor, “Emotion Recognition in Human-Computer Interaction”, Neural Networks, Vol.18, Iss. 4, pp. 389-405, May 2005.
[3] L. Cen, W. Ser, Z. L. Yu , and W. Cen, “Automatic Recognition of Emotional States From Human Speeches”, Pattern recognition Recent Advances, Adam Herout Edition, Cha. 22, 2010.
[4] S. Wu, T. H. Falk, and W. Y. Chan, “Automatic Speech Emotion Recognition Using Modulation Spectral Features”, Speech communication, Vol. 53, Iss. 5, pp.
768-785, May-Jun. 2011.
[5] V. A. Petrusihin, “Emotion Recognition in Speech Signal: Experimental Study, Development, and Application”, International Conference on Spoken Language Processing ICSLP 2000, 2000.
[6] R. Tato, R. Santos, R. Kompe, and J. M. pardo, “Emotion Space Improves Emotion Recognition”, International Conference on Spoken Language Processing ICSLP 2002, 2002.
[7] S. Yacoub, S. Simske, X. Lin, and J. Burns, “Recognition of Emotion in
Interactive Voice Response Systems”, 8th European Conference on Speech Communication and Technology, 1-4, Geneva, Switzerland, Sep. 2003.
[8] B. Schuller, G. Rigoll, and M. Lang, “Speech Emotion Recognition Combining Acoustic Feature and Linguistic Information in a Hybrid Support Vector Machine-Belief Network Architecture”, IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 2004, Vol. 1, pp. I - 577-580,
2004.
[9] T. L. New, S. W. Foo, and L. C. De Silva, “Speech Emotion Recognition Using Hidden Markov Models”, Speech Communication, Vol. 41, Iss. 4, pp. 603-623, Nov. 2003.
[10] N. E. Huang, “The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis”, Proc. R. Soc. London, pp.
903-995, 1996.
[11] H. Huang and J. Pan, “Speech Pitch Determination Based on Hilbert-Huang Transform”, Signal Processing, Vol. 86, Iss. 4, pp. 792-803, Apr. 2006.
[12] K. khaldi, A. O. boudraa, A. Bouchikhi, M. Turki-Hadj Alouane, and E. H. S.
Diop, “Speech Signal Noise Reduction by EMD”, IEEE International Symposium on Communications, Control and Signal Processing ISCCSP 2008, St. Julians,
Malta, 12-14 Mar.2008.
[13] R. R. Zhang, S. Ma, and S. Hartzell, “Signatures of the Seismic Source in EMD-Based Characterization of the 1994 Northridge, California, Earthquake Recordings”, Bulletin of the Seismological Society of America, Vol. 93, No. 1, pp. 501-518, Feb. 2003.
[14] B. Weng, M. Blanco-Velasco, and K. E. Barner, “ECG Denoising Based on the Empirical Mode Decomposition”, IEEE International Conference on Engineering in Medicine and Biology Society EMBS 2006, NY, USA, 30 Aug. - 3 Sep. 2006.
[15] L. He, M. Lech , N. C. Maddage, and N. B. Allen, “Study of Empirical Mode Decomposition and Spectral Analysis for Stress and Emotion Classification in Natural Speech”, Biomedical Signal Processing and Control, Vol. 6, Iss. 2, pp.
139-146, Apr. 2011.
[16] E. K. Lenzi, R. S. Mendes, L. R. Da Silva, “Statistical Mechanics Based on Renyi Entropy”, Physica A: Statistical Mechanics and its Applications, Vol. 280, Iss.
3-4, pp. 337-345, 2000.
[17] D. M. Deaven and K. M. Ho, “Molecular Geometry Optimization with a Genetic Algorithm”, Physical Review Letters, Vol. 75, Iss. 2, pp. 288-291, Jul. 1995.
[18] U. Maulik, and S. Bandyopadhyay, “Genetic Algorithm-Based Clustering Technique”, Pattern Recognition, Vol. 33, Iss. 9, pp. 1455-1465, Sep. 2000.
[19] R. C. Eberhart and Y. Shi, “Particle Swarm Optimization: Developments,
Applications and Resources”, IEEE International Conference on Evolutionary Computation, Vol. 1, pp. 81-86, 2001.
[20] P. Bajpai and S. N. Singh, “Fuzzy Adaptive Particle Swarm Optimization for Bidding Strategy in Uniform price Spot Market”, IEEE Transactions on Power Systems, Vol. 22, No. 4, 2007.
[21] J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting Control parameters in Differential Evolution: A Comparative Study on Numerical Benchmark problems”, IEEE Transactions on Evolutionary Computation, Vol.
10, Iss. 6, pp. 646-657, Dec. 2006.
[22] R. Storn, “Differential Evolution Design of an IIR-Filter”, IEEE International Conference on Evolutionary Computation, pp. 268-273, May. 1996.
[23] M. E. Ayadi, M. S. Kamel, and F. Karray, “Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases”, Patten Recognition, Vol. 44, Iss. 3, pp. 572-587, Mar. 2011.
[24] pascal.kgw.tu-berlin.de/emodb/index-1024.html
[25] www.enterface.net/enterface05/main.php?frame=emotion
[26] L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing, Pearson, 2010.
[27] 王小川,語音訊號處理,全華科技出版社,2004.
[28] V. Oppenheim, R. W. Schafer, and J. R. Buck, DISCRETE-TIME SIGNAL PROCESSING 2nd Edition, Pearson, 1999.
[29] S. Oraintara, Y. J. Chen, and T. Q. Nguyen, “Integer Fast Fourier Transform”, IEEE Transactions on Signal Processing, Vol. 50, No. 3, pp. 607-618, Mar. 2002.
[30] M. J. Carey, E. S. Parris, and H. Lloyd-Thomas, “A Comparison of Features for Speech, Music Discrimination”, IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP 1999, Vol. 1, pp. 149-152, 15-19 Mar.
1999.
[31] J. H. Mathews and K. D. Fink, Numerical Methods Using MATLAB, 4th Edition, Prentice-Hall, 2004.
[32] P. Blunsom, “Hidden Markov Models”, ww2.cs.mu.oz.au/460/2004/material.pdf, 19 Aug. 2004.
[33] X. Huang, A. Acero, and H. Wuenon, Spoken Language Processing A Guide to Theory, Algorithm and System Development, Pearson, 2005.
[34] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J.
Odell, D. Ollason, D. Povey, V. Valtchev, and P. woodland, The HTK Book, In entropic Cambridge Research Lab, 1995.
[35] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms, 2nd Edition, Wiley, 2004.
[36] J. Kennedy and R. Eberhart, “Particle Swarm Optimization”, IEEE International Conference on Neural Networks, Vol. 4, pp. 1942-1948, 1995.
[37] R. Stron and K.Price, “Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces”, Journal of Global Optimization, Vol. 11, No. 4, pp. 341-359, 1997.
[38] www.imtoo.com/3gp-video-converter.html