7.1 結論
本論文以結合經驗模態分解法與傳統隱藏式馬可夫模型語音辨識技術,以期 能實作出抗雜訊之語音辨識系統,並將其實作於以 FPGA 為核心的嵌入式系統 上。從實驗上的結果來看,單獨 IMF 所訓練出來的語音模型其辨識率反而不如 原始訊號所訓練出來的語音模型,這證明了不同的 IMF 中,多多少少有包含不 同程度的語音資料,其中個別 IMF 內語音資料和雜訊的比例視背景雜訊的的大 小而定,若背景雜訊越強,則語音資料會往較慢分解出來的 IMF 移動,而較早 分解出來的 IMF 含噪音的比例會提高。換句話說,不同的背景噪音強度,其 IMF 最佳的組合權重會有所改變。
在嵌入式系統實作方面,整數 FFT 大幅提升了傳統 FFT 的運算速度,推測 除了整數運算的速度提升外,其運算中 cos 和 sin 函式値的整數化建表也是加速 的主因之一,不過在經驗模態分解部分較不理想,其運算時間佔了整個流程時間 的 72%,是個必須重視的問題。
在基因演算法方面,我們分別對不同背景噪音強度找出最佳的 IMF 組合權 重,此方法提供了另一種語音強化的選擇,由實驗可知,針對不同背景噪音強度 所分解訓練出來的語音模型,在同背景噪音下其辨識率和傳統沒做分解處理的語 音模型是大幅提高的,甚至在乾淨語音下,也可利用此過程再將辨識率往上提 升,是提升辨識率的可行方法。
7.2 未來與展望
到目前為止將經驗模態分解應用於語音辨識上的研究較少,沒有較充裕的資 料可供參考,在做研究的過程中除了產生不少困境外,也有了一些新的想法,可
供有興趣之讀者做深入研究,以下提出幾個觀點:
1. 在經驗模態分解方面,其運算的複雜度相當高,對於實作於嵌入式系統上是 個重大的挑戰,目前加速方面並無相關研究,期待有朝一日能有所突破。
2. 經驗模態分解結合希爾伯特轉換所得的時頻圖,為一個時間、頻率、能量三 個維度的資料,是否可取代利用音框和梅爾頻率所計算出來的時頻資料達到 更佳的辨識率,有待實驗得知。
3. 語音辨識的端點偵測方面,由經驗模態分解後的波形圖來看,似乎 IMF2 和 IMF3 的靜音部分和語音部分差異較明顯,更有利於做端點偵測,不過若先 做經驗模態分解再作端點偵測,無異是更大幅增加了經驗模態分解的時間,
需端看硬體運算速度來採用。
4. 針對不同背景噪音強度所找出的組合權重,依據權重可訓練出不同背景噪音 強度下的最佳語音模型,可藉著由靜音段計算出噪音強度後動態選擇適合的 語音模型做辨識,對於語音辨識的抗噪音效果可大幅提升,不過所儲存的語 音模型量會增加。
5. 由於 EMD 屬於語音增強的方法,可搭配強健式特徵值、聲學模型調適來進 ㄧ步提升抗雜訊干擾的效果。
參考文獻
[1] 王小川,語音信號處理,全華科技出版社,2004。
[2] 藍敏倫,基因演算法應用於以類神經網路為基礎的語音辨識效果之改善,
樹德科技大學資訊工程學系碩士論文,2006。
[3] 唐華南,以隱藏式馬可夫模型、向量量化與語言文法為基礎的中文語音辨 識系統,雲林科技大學電子與資訊工程學系碩士論文,2000。
[4] 陳厚君,經驗模態分解法之語音辨識,中央大學電機工程研究所碩士論 文,2005。
[5] 吳光杰,加成性雜訊環境下倒頻譜統計正規化法於強健性語音辨識之研 究,暨南大學電機工程研究所碩士論文,2008。
[6] C. Neves, A. Veiga, L. Sa, and F. Perdigao, “Efficient Noise-robust Speech Recognition Front-end Based on The ETSI Standard,” International Conference on Signal Processing, pp. 609-612, 26-29 Oct. 2008.
[7] J. McAuley, J. Ming, D. Stewart, and P. Hanna, “Subband Correlation and Robust Speech Recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 13, Iss. 5, pp. 956-964, Sept. 2005.
[8] B.A. Sonkamble and D.D. Doye, “An Overview of Speech Recognition System Based on The Support Vector Machines,” International Conference on Computer and Communication Engineering, pp. 768-771, 13-15 May 2008.
[9] C. Wan and L. Liu, “Research and Improvement on Embedded System Application of DTW-based Speech Recognition,” International Conference on Anti-counterfeiting, Security and Identification, pp. 401-404, 20-23 Aug.
2008.
[10] T. Kinjo and K. Funaki, “On HMM Speech Recognition Based on Complex Speech Analysis,” IEEE Industrial Electronics, IECON, pp. 3477-3480, 6-10 Nov.2006.
[11] T. Jitsuhiro, T. Toriyama, and K. Kogure, “Robust Speech Recognition Using Noise Suppression Based on Multiple Composite Models and Multi-pass Search,” IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 53-58, 9-13 Dec. 2007.
[12] H. Sheikhzadeh and L. Deng, “Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization,” IEEE Transations on Speech and Audio Processing, pp. 80-89, Jan. 1994.
[13] E. Erzin, “Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings,” IEEE Transations on Audio, Speech, and Language Processing, Vol. 17, Iss. 3, pp. 1316-1324, Sept. 2009.
[14] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd Edition, McGraw-Hill, 2002.
[15] S. H. Chen and Y. R. Wang, “Tone recognition of continuous Mandarin speech based on neural networks,” IEEE Transactions on Speech and Audio Processing, Vol. 3, Iss. 2, pp. 146-150, March 1995.
[16] M. C. Mozer, “Neural-network speech processing for toys and consumer electronics,” IEEE Expert, Vol. 11, Iss. 4, pp. 4-5, Aug.1996.
[17] X. Huang, A. Acero, and H. Wuenon, Spoken Language Processing A Guide to Theory, Algorithm and System Development, Pearson, 2005.
[18] J. H. L. Hansen, and M. A. Clements, “Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress,” IEEE Transactions on Speech and Audio Processing, Vol. 3, Iss. 5, pp.
407-415, Sept. 1995.
[19] Q. Li, J. Zheng, A. Tsai, and Q. Zhou, “Robust endpoint detection and energy normalization for real-time speech and speaker recognition,” IEEE Transactions on Speech and Audio Processing, Vol. 10, Iss. 3, pp. 146-157, March 2002.
[20] N. E. Huang, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Proc. R. Soc. London, pp. 903-995, 1998.
[21] S. Haykin and B. V. Veen, Signals and Systems 2nd Edition, Wiley, 2003.
[22] A. V. Oppenheim, R. W. Schafer, and J. R Buck, Discrete-Time Signal Processing 2nd Edition, Pearson, 2005.
[23] S. Oraintara, Y. J. Chen, and T. Q. Nguyen, “Integer Fast Fourier Transform,”
IEEE Transations on Signal Processing, Vol. 50, Iss. 3, pp. 607-618, March 2002.
[24] X. Li, X. Zou, R. Zhang, and G. Liu, “Method of speech enhancement based on Hilbert-Huang transform,” 7th World Congress on Intelligent Control and Automation, pp. 8419 - 8424, 25-27 June 2008.
[25] W. Wang, X. Li, and R. Zhang, “Speech Detection Based on Hilbert-Huang Transform,” First International Multi-Symposiums on Computer and Computational Sciences, Vol. 1, pp. 290-293, 20-24 June 2006.
[26] X. Zou, X. Li, and R. Zhang, “Speech Enhancement Based on Hilbert-Huang Transform Theory,” International Multi-Symposiums on Computer and Computational Sciences, Vol. 1, pp. 208-213, 20-24 June 2006.
[27] Z. F. Liu, Z. P. Liao, and E. F. Sang, “Speech enhancement based on Hilbert-Huang transform,” International Conference on Machine Learning and Cybernetics, Vol. 8, pp. 4908-4912, 20-24 June 2005.
[28] J. H. Mathews and K. D. Fink, Numerical Methods Using MATLAB, 4th Edition, Prentice-Hall, 2004.
[29] R. L. Haupt and S. E. Haupt, Practical Genetic Algorithms, 2nd Edition, Wiley, 2004.