5.1 結論
本論文結合Delay and Sum Beamformer 空間濾波器與高斯混合模型偵 測異音的出現,並測試在不同訊噪比和不同方位的異音所偵測的結果。在 一般異音監控方面,使用語音活動的方式判斷異音的機制是對音框判斷能 量和頻率的變化,所以在低訊噪比的情況下變化不大,其判斷結果並不理 想。所以結合空間濾波器的好處在於,空間濾波是針對空間資訊對聲源作 純化。但該方向仍包含雜訊,所以利用高斯混合模型判斷是否符合聲場資 訊的統計資料。之後利用高斯混合模型的判斷機制,還可以根據機率值判 斷異音的方位。
所以利用麥克風陣列除了可以比一單麥克風辨識出更吵雜的環境中 仍然有異音的出現之後,還可以知道異音出現的方位。這也是單顆麥克風 無法達成的。這也是聲音監控為什麼要結合空間濾波的原因和優勢。
5.2 未來展望
使用高斯混合模型對於異音判斷能夠有好的結果,但是高斯混合模型 需要大量的運算,所以需花相當的時間才可建立完成,為了有效減少運算 量,我們將可探討如何自動選取GMM 個數、訓練取樣音框數…等。
目前異音監控一開始所監控的角度也是隨機設定,未來可以先設定好 異音較可能出現的角度監控。可避免異音一開始出現在沒有監控的角度 上。
Reference:
[1] Pradeep K. Atrey, Namunu C. Maddage and Mohan S. Kankanhalli, ” Audio Based Event Detection For Multimedia Surveillance”, IEEE Digital Object Identifier, vol. 5, May 2006.
[2] Aki Harma, Martin F. McKinney, and Janto Skowronek, “Automatic surveillance of the acoustic activity in our living environment,” in IEEE International Conference on Multimedia and Expo, Amsterdam, July 2005.
[3] Zoltowski, M., ”High resolution sensor array signal processing in the beamspace domain: novel techniques based on the poor resolution of Fourier beamforming,” Spectrum Estimation and Modeling, 1988., Fourth Annual ASSP Workshop on , pp. 350 –355, 1988
[4] European Digital Cellular Telecommuni- cations System; Half rate speech; Voice Activity Detection (VAD), ETSI GSM 06.42 (ETS 300-581-6), 1995.
[5] European Digital Cellular Telecommuni- cations System; Half rate speech; Half rate speech transcoding, ETSI GSM 06.20 (ETS 300-581-2), 1995.
[6] ITU-T G.729, Coding of Speech at 8kbit/s Using CS-ACELP, March, 1996.
[7] A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P.
Petit, “ITU recommendation G.729 annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,” IEEE Commun. Mag., vol. 35, pp. 64–73, Sept.
1997.
[8] C. Clavel, T. Ehrette, and G. Richard, “Event detection for an
audio-based surveillance system”, in IEEE International Conference on Multimedia and Expo, Amsterdam, July 2005.
[9] Radhakrishnan, R.; Divakaran, A., “Systematic Acquisition of Audio Classes for Elevator Surveillance”, SPIE Image and Video
Communications and Processing, Vol. 5685, pp. 64-71, March 2005.
[10] Härma, M. F. McKinney, J. Skowronek: “Automatic Surveillance of the Acoustic Activity in our Living Environment”, Proc. IEEE International Conference on Multimedia and Expo., Amsterdam, July 2005.
[11] P. K. Atrey et al.: “Audio Based Event Detection for Multimedia
Surveillance”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse (France), May 2006.
[12] C. Clavel et al.: “Events detection for an audio-based surveillance system”, Proc.IEEE International Conference on Multimedia and Expo., Amsterdam, July 2005.
[13] G. Valenzise et al.: “Scream and Gunshot Detection and Localization for Audio-Surveillance Systems”, Proc. IEEE International Conference on Advanced Video and Signal based Surveillanace, London, September 2007.
[14] W. Zajdel et al.: “CASSANDRA: Audio-video Sensor Fusion for
Aggression Detection”, Proc. IEEE International Conference on Advanced Video and Signal based Surveillanace, London, September 2007.
[15] Simon Moncrieff, Svetha Venkatesh, Geoff West, “Unifying Background Models over Complex Audio using Entropy”, IEEE Digital Object
Identifier, vol. 5,2006.
[16] C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In Computer Vision and Pattern Recognition, 1999.
IEEE Computer Society Conference on., volume 2, pages 246–252, Fort Collins, CO USA, 1999.
[17] Marco Cristani, Manuele Bicego, and Vittorio Murino, “On-line adaptive background modelling for audio surveillance”, IEEE Digital Object Identifier, vol. 2,2004.
[18] M. F. McKinney and J. Breebaart, “Features for audio and music classification,” in Proc. Int. Symp. Music Inf. Retrieval (ISMIR’2003), (Baltimore, USA), October 2003.
[19] T. K. Moon, ”The Expectation-Maximization algorithm, ”IEEE Signal Processing Magazine, 1996.
[20] C. H. Knapp, and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoustic, Speech, Signal
Processing, vol. 24, pp. 320-327, Aug 1976.
[21] 鄭士奇,“以高斯混合模型為基礎並使用陰影濾除之動態背景影像模型 建立”,國立交通大學,碩士論文,民國 94 年。
[22] 張永融,“利用聲場特徵及光流影像定位之全方向運動平台”,國立交 通大學,碩士論文,民國95 年。
[23] 楊佳興,“使用麥克風陣列實現即時語音純化與真人語音活動偵測”,
國立交通大學,碩士論文,民國94 年。
[24] 黃楷祥,“利用聲場特徵與 SVM 實現可結合輪式機器人之避障與導航”,
國立交通大學,碩士論文,民國96 年。
[25] 王小川 編著,語音訊號處理,全華科技,2002
[26] 丁家群,“語音辨識與 Visual Basic”,義守大學,碩士論文,民國 92 年。
[27] 賴建瑞,簡仁宗,“結合麥克風陣列及模型調整技術之遠距離語音辨識 系統”,國立成功大學,民國 89 年。