結論與未來展望 - 利用訊號特徵及麥克風陣列

5.1 結論

本論文結合Delay and Sum Beamformer 空間濾波器與高斯混合模型偵測異音的出現，並測試在不同訊噪比和不同方位的異音所偵測的結果。在一般異音監控方面，使用語音活動的方式判斷異音的機制是對音框判斷能量和頻率的變化，所以在低訊噪比的情況下變化不大，其判斷結果並不理想。所以結合空間濾波器的好處在於，空間濾波是針對空間資訊對聲源作純化。但該方向仍包含雜訊，所以利用高斯混合模型判斷是否符合聲場資訊的統計資料。之後利用高斯混合模型的判斷機制，還可以根據機率值判斷異音的方位。

所以利用麥克風陣列除了可以比一單麥克風辨識出更吵雜的環境中仍然有異音的出現之後，還可以知道異音出現的方位。這也是單顆麥克風無法達成的。這也是聲音監控為什麼要結合空間濾波的原因和優勢。

5.2 未來展望

使用高斯混合模型對於異音判斷能夠有好的結果，但是高斯混合模型需要大量的運算，所以需花相當的時間才可建立完成，為了有效減少運算量，我們將可探討如何自動選取GMM 個數、訓練取樣音框數…等。

目前異音監控一開始所監控的角度也是隨機設定，未來可以先設定好異音較可能出現的角度監控。可避免異音一開始出現在沒有監控的角度上。

Reference:

[1] Pradeep K. Atrey, Namunu C. Maddage and Mohan S. Kankanhalli, ” Audio Based Event Detection For Multimedia Surveillance”, IEEE Digital Object Identifier, vol. 5, May 2006.

[2] Aki Harma, Martin F. McKinney, and Janto Skowronek, “Automatic surveillance of the acoustic activity in our living environment,” in IEEE International Conference on Multimedia and Expo, Amsterdam, July 2005.

[3] Zoltowski, M., ”High resolution sensor array signal processing in the beamspace domain: novel techniques based on the poor resolution of Fourier beamforming,” Spectrum Estimation and Modeling, 1988., Fourth Annual ASSP Workshop on , pp. 350 –355, 1988

[4] European Digital Cellular Telecommuni- cations System; Half rate speech; Voice Activity Detection (VAD), ETSI GSM 06.42 (ETS 300-581-6), 1995.

[5] European Digital Cellular Telecommuni- cations System; Half rate speech; Half rate speech transcoding, ETSI GSM 06.20 (ETS 300-581-2), 1995.

[6] ITU-T G.729, Coding of Speech at 8kbit/s Using CS-ACELP, March, 1996.

[7] A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P.

Petit, “ITU recommendation G.729 annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,” IEEE Commun. Mag., vol. 35, pp. 64–73, Sept.

1997.

[8] C. Clavel, T. Ehrette, and G. Richard, “Event detection for an

audio-based surveillance system”, in IEEE International Conference on Multimedia and Expo, Amsterdam, July 2005.

[9] Radhakrishnan, R.; Divakaran, A., “Systematic Acquisition of Audio Classes for Elevator Surveillance”, SPIE Image and Video

Communications and Processing, Vol. 5685, pp. 64-71, March 2005.

[10] Härma, M. F. McKinney, J. Skowronek: “Automatic Surveillance of the Acoustic Activity in our Living Environment”, Proc. IEEE International Conference on Multimedia and Expo., Amsterdam, July 2005.

[11] P. K. Atrey et al.: “Audio Based Event Detection for Multimedia

Surveillance”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse (France), May 2006.

[12] C. Clavel et al.: “Events detection for an audio-based surveillance system”, Proc.IEEE International Conference on Multimedia and Expo., Amsterdam, July 2005.

[13] G. Valenzise et al.: “Scream and Gunshot Detection and Localization for Audio-Surveillance Systems”, Proc. IEEE International Conference on Advanced Video and Signal based Surveillanace, London, September 2007.

[14] W. Zajdel et al.: “CASSANDRA: Audio-video Sensor Fusion for

Aggression Detection”, Proc. IEEE International Conference on Advanced Video and Signal based Surveillanace, London, September 2007.

[15] Simon Moncrieff, Svetha Venkatesh, Geoff West, “Unifying Background Models over Complex Audio using Entropy”, IEEE Digital Object

Identifier, vol. 5,2006.

[16] C. Stauffer and W. Grimson. Adaptive background mixture models for real-time tracking. In Computer Vision and Pattern Recognition, 1999.

IEEE Computer Society Conference on., volume 2, pages 246–252, Fort Collins, CO USA, 1999.

[17] Marco Cristani, Manuele Bicego, and Vittorio Murino, “On-line adaptive background modelling for audio surveillance”, IEEE Digital Object Identifier, vol. 2,2004.

[18] M. F. McKinney and J. Breebaart, “Features for audio and music classification,” in Proc. Int. Symp. Music Inf. Retrieval (ISMIR’2003), (Baltimore, USA), October 2003.

[19] T. K. Moon, ”The Expectation-Maximization algorithm, ”IEEE Signal Processing Magazine, 1996.

[20] C. H. Knapp, and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoustic, Speech, Signal

Processing, vol. 24, pp. 320-327, Aug 1976.

[21] 鄭士奇，“以高斯混合模型為基礎並使用陰影濾除之動態背景影像模型建立”，國立交通大學，碩士論文，民國 94 年。

[22] 張永融，“利用聲場特徵及光流影像定位之全方向運動平台”，國立交通大學，碩士論文，民國95 年。

[23] 楊佳興，“使用麥克風陣列實現即時語音純化與真人語音活動偵測”，

國立交通大學，碩士論文，民國94 年。

[24] 黃楷祥，“利用聲場特徵與 SVM 實現可結合輪式機器人之避障與導航”，

國立交通大學，碩士論文，民國96 年。

[25] 王小川編著,語音訊號處理,全華科技,2002

[26] 丁家群，“語音辨識與 Visual Basic”，義守大學，碩士論文，民國 92 年。

[27] 賴建瑞，簡仁宗，“結合麥克風陣列及模型調整技術之遠距離語音辨識系統”，國立成功大學，民國 89 年。

在文檔中利用訊號特徵及麥克風陣列 (頁 72-76)