• 沒有找到結果。

使用 MFCC 與改良 MFCC 特徵之比較

在文檔中 圖目錄 (頁 46-53)

雖然本實驗使用了新的端點偵測法以及 DBNN 分類器,但是如此的辨識效果仍然 不是很高,原因是鳥類的鳴叫聲音通常包含了數種音節,僅以一個音節做訓練及辨識是 很困難的。由於鳥類的鳴叫聲通常包含了數種音節,所以在取特徵的時候若能以一段完 整的叫聲為單位取特徵,取出的特徵將會更具代表性,因此,本論文針對較能代表鳥聲 特性的特徵 MFCC 做進一步的改良,詳細的改良方法已於之前的章節說明,MFCC 在 此不同於之前按音節求平均,而是求取整個段落的平均。表 4.6 為使用 MFCC 為特徵與

使用改良後的 MFCC 為特徵分別搭配 BP 與 DBNN 分類器的系統辨識正確率,表 4.7 為 使用新的端點偵測法並使用 MFCC 為特徵與使用改良後的 MFCC 為特徵搭配 BP 與 DBNN 分類器的系統辨識正確率。由這些表的結果與之前的結果比較,可知本實驗所使 用的特徵更能代表鳥類叫聲的特性。

表 4.6 使用 MFCC 為特徵與使用改良後的 MFCC 為特徵分別搭配 BP 與 DBNN 分類器 的比較

MFCC + BP 改良的 MFCC

+ BP MFCC + DBNN 改良的 MFCC + DBNN 未將警戒音分

離的鳥聲 54.69% 73.41% 65.37% 77.89%

將警戒音分離

後的鳥聲 59.16% 76.97% 69.24% 83.11%

表 4.7 使用新的端點偵測法並使用 MFCC 為特徵與使用改良後的 MFCC 為特徵分別搭 配 BP 與 DBNN 分類器的比較

NED+MFCC +BP

NED+改良的 MFCC +BP

NED+MFCC+

DBNN

NED+改良的 MFCC+

DBNN 未將警戒音分

離的鳥聲

58.33% 75.71% 65.68% 80.43%

將警戒音分離 後的鳥聲

64.42% 80.93% 73.19% 86.59%

第伍章 結論與展望

我們之所以會製作鳥聲辨識系統,是因為,鳥類,在我們的生活中,是很容易見到 的,但是由於鳥類移動快速,要好好的觀察,甚至是拍張照片,都不是一件容易的事,

因此,從鳥類的鳴叫聲來做識別,便成了另一個可行的方法。但是鳥聲辨識系統的完成 有其困難度,其困難點一是系統的抗雜訊能力要有相當的程度,另一個困難點則是取出 的聲音特徵要真正具有代表性,而我們參考前人對於語音信號的研究,從各種的端點偵 測法中,決定了結合 R-S 端點偵測法及頻譜的連續性來做端點偵測,而從結果來看,也 證明了這種方法,可以提升鳥聲辨識系統的正確率。另一方面,我們對於特徵值的改進,

則是使用了小波轉換,來描述 MFCC 曲線之特性,並以此做為特徵值,來做鳥聲的辨識,

事實也證明了,以這種特徵來辨識,其辨識結果之正確率,是較優於原本之 MFCC。另 一方面,我們在辨識器的選擇上,使用了決策類神經網路來替代傳統的倒傳遞網路,也 發現了其在鳥聲辨識的應用上,確實是優於倒傳遞網路。

雖然我們使用了上面的方法,也得到了可以接受的辨識率,然而畢竟還是有許多可 以提升辯識率之方向,如辨識器的討論,端點偵測的標準值之訂定也應該可以使用人工 智慧的方法來決定,而特徵值更是可以不要以 MFCC 為基礎而研究出新的特徵值。再 者,我們所使用的資料庫,雖然是足夠,但是還是有許多鳥種未列於其中,本身鳥種的

資料數量也未盡完善。鳥類的警戒音部份,目前更是只能視為獨立的部份來做,這些部 份,皆可以做為將來研究之方向。

參考資料

[1] Alex L. Mcllraith and Card C. Howard, “Birdsong Recognition with DSP and Neural Networks,” IEEE WESCANEX 95. Communications, Power, and Computing.

Conference Proceedings, Vol. 2, pp. 409-414, May 1995.

[2] Alex L. Mcllraith and Card C. Howard, “Birdsong identification using artificial neural network and statistical analysis,”IEEE Electrical and Computer Engineering Conference Proceeding, Vol. 1, pp. 63-66, May 1997.

[3] P. Somervuo and A. Harma , “Bird song recognition based on syllable pair histograms,”

IEEE International Conference, Acoustics, Speech, and Signal Processing, Vol. 5, pp.

17-21 , May 2004.

[4] A. Harma, “Automatic identification of bird species based on sinusoidal modeling of syllables,” IEEE International Conference, Acoustics, Speech, and Signal Processing, Vol. 5, pp. 545-548, 2003.

[5] D. O’Shaughnessy, ”Speech Communication Human and Machine,” Addison-Wesley.

Pub. Co., 1987.

[6] 陽鎮光, ”Visual Basic 與語音辯識 讓電腦聽話,” 松崗出版, 2002.

[7] T.F. Quatieri, ”Discrete-Time Speech Signal Processing,” Prentice Hall, 2002.

[8] X.D. Huang, A. Acero, and H.W. Hon, ”Spoken Language Processing: A Guide to Theory, Algorithm and System Development,” Prentice Hall Ptr, 2001

[9] 劉振源, ”類神經網路模型與語音識別,” 全華出版, 1995

[10] 陳松琳, ”以類神經網路為架構之語音辨識系統,” 國立中山大學電機工程學系碩士 論文, 2001.

[11] J.G. Wilpon, L.R. Rabiner, and T. Martin, “An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints,” AT&T Bell labs. Tech. J., Vol, 63, pp. 479-498, Mar. 1984.

[12] L.R. Rabiner and M.R. Sambur, “An algorithm for determining the endpoints of isolated utterances”, Bell Syst. Tech. J., Vol.54, pp. 297-315, Feb. 1975.

[13] J.C. Junqua, B. Reaves, and B. Mark, “A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize,” in Proc. Eurospeech, pp. 1371-1374, 1991.

[14] S.N He, and J.B Yu, “A novel Chinese continuous speech endpoint detection method based on time domain features of the word structure,” IEEE International Conference, Communications, Circuits and Systems and West Sino Expositions, Vol. 2, pp. 992-996,

July 2002.

[15] A. Hussain, S.A. Samad, and L.B. Fah, “Endpoint detection of speech signal using neural network,” TENCON 2000. Vol. 1, pp. 271-274 Sept. 2000.

[16] Q. Li, J.S. Zheng; A. Tsai, and Q. Zhou, “Robust endpoint detection and energy normalization for real-time speech and speaker recognition,” IEEE Transactions, Speech and Audio Processing, pp.146-157, March 2002.

[17] J.A. Haigh, and J.S. Mason , “Robust voice activity detection using cepstral features,”

in Proc. IEEE TENCON , pp. 321-324, 1993.

[18] B.F. Wu, and K.C. Wang, “Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments,” IEEE Transactions, Speech and Audio Processing, pp. 762-775, Sept. 2005.

[19] W.J. Zhang, and J.Y. Xie, “Endpoint detection based on MDL using subband speech

satisfied auditory model,” IEEE International Conference, Neural Networks and Signal Processing, Vol. 2, pp. 892-895, Dec. 2003.

[20] H.G. Xu, H.S. Li, J. Liu, and R.S. Liu, “Endpoint detection algorithm for Mandarin digit recognition using DSP,” Signal Processing, 2002 6th International Conference, Vol.

1, pp. 548-551, Aug. 2002.

[21] S.N. He, and J.B. Yu, “The performance analysis of Chinese speech endpoint detection based on continuous multi sub-band spectral features,” IEEE International Conference, Communications, Circuits and Systems and West Sino Expositions, Vol. 2, pp. 997-1002,

July 2002.

[22] S.E. Bou-Ghazale, and K. Assaleh, “A robust endpoint detection of speech for noisy environments with application to automatic speech recognition,” IEEE International Conference, Acoustics, Speech, and Signal Processing, Proceedings, Vol. 4, pp.

3808-3811, May 2002.

[23] S. Davis, and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions, Acoustics, Speech, and Signal Processing, Vol. 28, pp. 357-366, Aug 1980.

[24] C.H. Lee, D.H. Hyun, E.S. Choi, J.W. Go, and C.Y. Lee, “Optimizing feature extraction for speech recognition,” IEEE Transactions, Speech and Audio Processing, Vol. 11, pp.

80-87, Jan. 2003.

[25] S.M Lee, S.H. Fang, J.W. Hung, and L.S. Lee, ” Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition,” IEEE Workshop, Automatic Speech Recognition and Understanding, pp. 49-52 Dec. 2001.

[26] M.D. Skowronski, and J.G. Harris, “Increased MFCC filter bandwidth for noise-robust

在文檔中 圖目錄 (頁 46-53)

相關文件