即時性驗證

第四章實作與實驗結果

4.3 即時性驗證

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

表 4-13. 即時性驗證之實驗結果 (汽車警報聲)

音訊事件事件偵測數量反應時間雙向濾波器耗時

0 719 ms 672 ms

1 842 ms 656 ms

2 922 ms 656 ms

3 1031 ms 656 ms 4 1140 ms 656 ms 5 1250 ms 656 ms

於實驗結果可以發現，雙向濾波器為反應時間之瓶頸，如果為了時間考量是否可以考慮放棄雙向濾波器，針對這個問題設計以下實驗，實驗結果如下表所示，是否使用雙向濾波器對於起始點偵測所造成之結果比較。

表 4-14. 即時性驗證

實驗樣本反應時間雙向濾波器耗時

638 1176 ms 656 ms

表 4-15. 有無雙向濾波器之起始點偵測結果比較

音訊事件起始點偵測雙向濾波器起始點偵測

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

當測試環境中僅有音訊事件時，系統不需經過雙向濾波器亦能有效的進行起始點偵測，而當測試環境中有環境聲時，如上表所示，對音訊訊框直接進行起始點偵測將會造成過度切割(over segmentation)，尤其於電視情境聲中新聞播報與談話性節目特別明顯，

對測試結果造成影響，由此可見雙向濾波器仍有其必要性。

‧

將 intra-class 視為不同 class 的分類正確率可達 80%，而將 intra-class 視為相同 class 的分類正確率更可提高至 90%以上。設計環境聲與多個音訊事件同時發生之實驗，以模擬真實家庭環境可能發生之情境，系統仍能相當程度的辨識出各種聲音並正確地加以分類，

在不同情境環境聲下，辨識正確率仍能有 85%的表現。當兩個音訊事件同時發生時，對個別事件之辨識正確率亦可有 70%以上之辨識正確率。應用於擁有較高容忍程度之家庭

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

情境中，系統有效的對各種音訊事件分類。針對即時性的驗證，系統反應時間亦有相當不錯的表現，音訊訊框之處理時間約為 1 秒鐘，因最差情況僅發生於系統初始之時，故系統對於音訊事件發生之平均反應時間約為 1 秒至 5.5 秒之間，應用於智慧家庭的各種情境中，均能有效於合理之時間限制內有所回應。

然而為因應測試環境的變化，更為有效的音源區分是未來最為重要的努力目標之一，

透過多個指向性麥克風或麥克風陣列，企圖將多個同時發生之音源加以定位，可有效的以空間位置為音源區分之依據，將發生於相同時間甚至發生於相同頻帶上之音訊事件加以區分。另一方面，透過收集更多的音訊訓練資料建立更為完整之資料庫，或以背景建模之方式針對環境聲加以排除，均可將本研究之系統推展至更大的測試環境，以實現更為完善之電腦聽覺技術。

‧

[1] A. S. Bregman. ―Auditory Scene Analysis‖. The Perceptual Organization of Sound.

Cambridge, MA: MIT Press, 1990.

[2] D. Rosenthal and H. Okuno, Eds.. ―Computational Auditory Scene Analysis‖.

Lawrence Erlbaum Associates, 1998.

[3] D. Ellis. ―Prediction-Driven Computational Auditory Scene Analysis‖. Ph.D. thesis,

MIT, 1996.

[4] 王小川，「語音訊號處理」，全華股份有限公司，2007年4月。

[5] 張智星，「音訊處理與辨識」，

http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/ [retrieved July 2009]

[6] Wen-Hung Liao and Yi-Syuan Su. ―Analysis and classification of human sounds‖.

Master’s thesis, Department of Computer Science National Chengchi University, July

2006.

[7] Yan Ke, Derek Hoiem and Rahul Sukthankar. ―Computer Vision For Music

Identification‖. IEEE Conference on Computer Vision and Pattern Recognition, 2005.

[8] J. Haitsma and T. Kalker. ―A Highly Robust Audio Fingerprinting System‖. in

Proceedings of International Conference on Music Information Retrieval, 2002.

[9] G. Hu and D.L. Wang. ―Auditory Segmentation Based on Event Detection‖. In ISCA

Tutorial and Research Workshop on Stat. and Percept. Audio Process., 2004.

[10] S.H. Srinivasan. ―Auditory blobs‖. in IEEE ICASSP '04, vol. 4, pp. iv–313 – iv–316, 2004.

[11] Valerie Pierson and Nadine Martin. ―Comparison of Shape Descriptors For Feature Extraction of A Time- Frequency Image‖. CEPHAG-ENSJEG - BP 46 - 38402

ST-MARTIN-D’HERES C&Ex FRANCE.

[12] Ruohua Zhou, Marco Mattavelli, and Giorgio Zoia. ―Music Onset Detection Based On Resonator Time Frequency Image‖. IEEE Transactions On Audio, Speech, And

Language Processing, Vol. 16, No. 8, 2008.

[13] 王駿發，｢多媒體影音檢索系統｣，

http://web1.nsc.gov.tw/ct.aspx?xItem=8460&ctNode=40&mp=1[retrieved July 2009]

‧

[14] D. Li, I. Sethi, N. Dimitrova, and T. McGee. ―Classification Of General Audio Data For Content-Based Retrieval‖. Pattern Recognition Letters, vol. 22(5), pp. 533–544, 2001.

[15] Zhu Liu, Yao Wang and Tsuhan Chen. ―Audio Feature Extraction And Analysis For Scene Segmentation And Classification‖. Polytechnic University, Brooklyn, NY 11201, Carnegie Mellon University, Pittsburgh, PA 15213.

[16] Silvia Allegro, Michael Büchler and Stefan Launer. ―Automatic Sound Classification Inspired By Auditory Scene Analysis‖. Signal Processing Department, Phonak AG, Switzerland Department of Otorhinolaryngology, University Hospital Zurich, Switzerland.

[17] T. Ojala, M. Pietikainen, and T. Maenpaa, ―Multiresolution Gray-Scale And Rotation Invariant Texture Classification With Local Binary Patterns‖. IEEE Trans. On Pattern

Analysis and Machine Intelligence, vol. 24, pp. 971-987, 2002.

[18] L. Cohen. ―Time-Frequency Analysis‖. Prentice Hall PTR, Englewood Cliffs 1995.

[19]

J. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies and M. Sandler. ―A Tutorial On Onset Detection In Music Signals‖. IEEE Transactions on Speech and Audio Processing, 2005.

[20] S. Paris. ―A Gentle Introduction To Bilateral Filtering And Its Applications‖. In ACM

SIGGRAPH 2007 courses, Course 13.

[21] V. Aurich and J.Weule. ―Non-Linear Gaussian Filters Performing Edge Preserving Diffusion‖. in Proceedings of the DAGM Symposium, pp. 538–545, 1995.

[22] C. Tomasi and R. Manduchi. ―Bilateral Filtering For Gray And Color Images‖. in

Proceedings of the IEEE International Conference on Computer Vision, pp. 839–846,

1998.

[23] F. Durand and J. Dorsey. ―Fast Bilateral Filtering For The Display Of

Highdynamic-Range Images‖. in Proceedings of the ACM SIGGRAPH conference, 2002.

[24] Paul Masri and Andrew Bateman. ―Improved Modeling Of Attack Transients In Music Analysis-Resynthesis‖. in Proceeding of International Computer Music Conference, 1996.

[25] M. Goto and Y. Muraoka. ―Beat Tracking Based On Multiple-Agent Architecture — A Real-Time Beat Tracking System For Audio Signals —‖ in ICMAS-96, pp. 103–110, 1996.

[26] H. Freeman, ―Techniques For The Digital Computer Analysis Of Chain-Encoded Arbitrary Plane Curves‖. in: Proc. Nat. Electronics Conf., 1961, pp. 421-432.

[27] E. Bruce Goldstein. Sensation and Perception. Wadsworth Publishing Co., Belmont, California, 1980.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[28] Y. He and A. Kundu. ―2-D Shape Classification Using Hidden Markov Model‖. IEEE

Trans. Pat-tern Analysis and Machine Intelligence, 13(1991) 1172-1184.

[29] Xu Qing, Yang Jie and Ding Siyi. ―Texture Segmentation Using LBP Embedded Region Competition‖. Inst. of Image Processing & Pattern Recognition.

‧ 國

立政治大學

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

B. 基本全域閾值設定之實驗結果

Class1 門鈴聲-1 之基本全域閾值設定實驗結果

音訊事件影像音訊事件影像強度分布直方圖

ΔT = 50 ΔT = 30

ΔT = 20 ΔT = 10

ΔT = 5 ΔT = 1

Class2 門鈴聲-2 之基本全域閾值設定實驗結果

音訊事件影像音訊事件影像強度分布直方圖

ΔT = 50 ΔT = 30

ΔT = 20 ΔT = 10

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

ΔT = 5 ΔT = 1

Class3 電話鈴聲-1 之基本全域閾值設定實驗結果

音訊事件影像音訊事件影像強度分布直方圖

ΔT = 50 ΔT = 30

ΔT = 20 ΔT = 10

ΔT = 5 ΔT = 1

Class4 電話鈴聲-2 之基本全域閾值設定實驗結果

音訊事件影像音訊事件影像強度分布直方圖

‧ 國

ΔT = 5 ΔT = 1

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

C. 雙閾值設定之實驗結果

₂為全圖平均

Class4 電話鈴聲-2 之雙閾值設定實驗結果

‧ 國

立政治大學

‧

N a tio na

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Class1. 門鈴聲-1 音訊事件與音訊區塊樣本

Class2. 門鈴聲-2 音訊事件與音訊區塊樣本

Class3. 電話鈴聲-1 音訊事件與音訊區塊樣本

‧ 國

Class8. 火災警報聲音訊事件與音訊區塊樣本

在文檔中串流式音訊分類於智慧家庭之應用 - 政大學術集成 (頁 74-0)

第四章 實作與實驗結果

4.3 即時性驗證

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

Lawrence Erlbaum Associates, 1998.

MIT, 1996.

http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/ [retrieved July 2009]

Master’s thesis, Department of Computer Science National Chengchi University, July

Proceedings of International Conference on Music Information Retrieval, 2002.

Tutorial and Research Workshop on Stat. and Percept. Audio Process., 2004.

ST-MARTIN-D’HERES C&Ex FRANCE.

Language Processing, Vol. 16, No. 8, 2008.

http://web1.nsc.gov.tw/ct.aspx?xItem=8460&ctNode=40&mp=1[retrieved July 2009]

‧

Analysis and Machine Intelligence, vol. 24, pp. 971-987, 2002.

J. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies and M. Sandler. ―A Tutorial On Onset Detection In Music Signals‖. IEEE Transactions on Speech and Audio Processing, 2005.

SIGGRAPH 2007 courses, Course 13.

Proceedings of the IEEE International Conference on Computer Vision, pp. 839–846,

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Trans. Pat-tern Analysis and Machine Intelligence, 13(1991) 1172-1184.

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

附錄

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

第四章實作與實驗結果

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學