6.1 趨勢估計結果
本論文中提出以 GMM 轉換方程式的方法來去除背景伴奏,並且使用 PFCC 來當 作特徵參數,最後再提出使用二階段趨勢估計的方法。在表五-3 中可以看到每個項目的 正確答案涵蓋率都在 90%以上,這個數值代表著最後計算正確率時的極限,因為只有在 趨勢估計中被選到的範圍會進入下一步驟的音高追蹤。在這個部分 90%算是一個令人滿 意的結果,表示大部分的正確答案都被包含在其中。要隱憂的則是在表五-3 中可以看到 正確答案覆蓋率較低的音檔數量幾乎沒有減少太多,可見這些特例是目前方法不能處理 的,而在趨勢估計優化的步驟中(如第三章中所提到的 M 和 N),各項系統參數是選擇能 夠使整體最佳效果的解,因此難以對特例做出微調,然而流行音樂百百種,特例必然存 在,這是往後該面對的問題。
6.2 正確率結果
本論文中最好的效能是表五-3 最後一欄“New_2stage_noDP_H&N”,是使用二階段 趨勢估計,並且將原始頻譜做 HPSS 和 NSHS 處理,最後不使用 DP,直接在趨勢範圍 中找每個 frame 的最大值當作預測答案,正確率是 79.6%,但仍然略輸 Hsu’s method 的 80.12%,這兩種方法不同的是本論文提出的方法是以參數訓練為基礎進行的,而 Hsu’s method 完全是以訊號處理的角度來建構系統。如表六-1 顯示訓練資料量對系統正向效 益的影響,而本論文中使用的訓練資料量總長度只有 40 分鐘左右(表中 C 欄位),以現 實流行音樂來說大約只有 10 首歌的長度,實則非常的少,在我們增加訓練資料至 60 分 鐘後,正確率提升到 80.68% (表中 D 欄位),由此可預期在未來擁有更多的音樂資料時,
本論文的方法能夠有更好的效能。
35
表六-1:訓練參數量與正確率和涵蓋率的影響
A B
C
DRaw pitch accuracy 66.97% 73.59%
79.6%
80.68%Number of raw pitch accuracy < 50% 199 124
37
33 Number of raw pitch accuracy < 25% 97 4610
4 Number of raw pitch accuracy < 10% 43 135
1正確答案覆蓋率 79.63% 84.57%
90.75%
91.16%覆蓋率低於 50%的音檔數 146 79
19
19覆蓋率低於 25%的音檔數 64 29
10
4覆蓋率低於 10%的音檔數 30 11
5
1*
A 共使用 64727 特徵向量,總長約 10 分鐘 B 共使用 126430 特徵向量, 總長約 20 分鐘 C 共使用 253455 特徵向量, 總長約 40 分鐘 (本論文最終使用的資料量)
D 共使用 365780 特徵向量,總長約 60 分鐘
36
參考資料
[1] Wei-Ho Tsai and Hao-Ping Lin “Background Music Removal Based on Cepstrum Transformation for Popular Singer Identification”, IEEE rans. on Audio, Speech, and Language Processing, vol. 19, no. 5, July 2011.
[2] 宋柏毅 “以韻律模型為基礎之中文韻律轉換研究”, 交通大學碩士論文, 2009.
[3] Sebastian Ewert and Meinard Muller “Chroma oolbo : Matlab Implementations for Extracting Variants of Chroma-Based Audio Features”, I MIR 2 11.
[4] Sebastian Ewert, Meinard Muller and Michael Clausen “ owards imbre-Invariant Audio Features for Harmony-Based Music”, IEEE Trans. on Audio, Speech and Language Process., Vol. 18, No. 3, pp.649 -662, 2010.
[5] Chao-Ling Hsu and Jyh-Shing Roger Jang,
“https://sites.google.com/site/unvoicedsoundseparation/mir-1k” , (n.d.).
[6] Mike Brookes, “http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html”,(n.d.).
[7] 白宗儒 “一個是用於複音音樂之音高追蹤的混乘法” , 清華大學碩士論文, 2011.
[8] N. Ono, K. Miyamoto, J. Le Rou , H. Kameoka, and . agayama, “ eparation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram”, Proc. of EUSIPCO, 2008.
[9] Hermes, D. J. (1988). "Measurement of pitch by subharmonic summation," J. Acoust.
Soc. Am. 83, 257-264.
[10] M. Goto, “A Real-Time Music Scene Description System: Predominant-F0 Estimation Detecting Melody and Bass Lines in Real-World Audio ignals”, peech Communication, vol. 43, no. 4, pp.311–329, 2004.
[11] Chao-Ling Hsu and Roger Jang, “Singing Pitch Extraction at Mirex 2010”, he Music Information Retrieval Evaluation Exchange (MIREX), 2010.
[12] Justin salamon and Emilia Gomez, “Melody Extraction from Polyphonic Music Signals using Pitch Contour Charateristics”, IEEE Trans. on Audio, Speech, and Language Processing, 2012
[13] A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. 7th Int. Conf. on Music Inform. Retrieval,Victoria, Canada, Oct.
2006
[14] G. Poliner and D. Ellis, “A classification approach to melody transcrip-tion,” in Proc. 6th Int. Conf. on Music Inform. Retrieval, London, Sep.2005
[15] M. Lagrange, L. G. Martins, J. Murdoch, and G. zanetakis, “Normalized cuts for predominant melodic source separation processing,” IEEE Trans. on Audio, Speech and Language Process., vol. 16, no. 2, pp.278–290, Feb. 2008
[16] Yannis Panagankis ,Constantine Kotropoulos and Gonzalo R. Arce, “ 1-Graph Based Music Structure Analysis”, 12th International Society for Music Information Retrieval
37
Conference (ISMIR), 2011.
[17] Joan Serr`a, Emilia G´omez, Perfecto Herrera and Xavier Serra, “Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification”, IEEE Trans. on Audio, Speech, and Language Processing, 2008.
[18] Cynthia C.S. Liem and Alan Hanjalic, “Cover Song Retrieval: a Comparative Study of ystem Component Choices”, 10th International Society for Music Information Retrieval Conference(ISMIR), 2009.
[19] D. Ellis and G. Poliner “Identifying Cover Songs with Chroma Features and Dynamic Programming Beat Tracking”, IEEE Trans. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 2007.
[20] Müller and Meinard, “Information Retrieval for Music and Motion ( Chapter 3 )”, 2007.
[21] Frank Kurth and Meinard Müller, “Efficient Index-Based Audio Matching” IEEE Trans.
on Audio, Speech, and Language Processing, Vol. 16, No. 2, 2008.
[22] Riccardo Miotto and Nicola Orio, “A Music Identification System Based on Chroma Indexing and tatistical Modeling”, ISMIR Content-Based Retrieval, Categorization and Similarity, 2008.
[23] D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” International Society for Music Information Retrieval Conference(ISMIR), 2007.
[24] George Tzanetakis and Perry Cook, “Musical Genre Classification of Audio ignals,”
IEEE Trans. on Audio, Speech, and Language Processing, Vol. 10, No. 5, 2002.
[25] Yuxiang Liu, Qiaoliang Xiang, Ye Wang and Lianhong Cai, “Cultural Style Based Music Classification of Audio Signals” IEEE Trans. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 2009.
[26] D. Ellis, “Beat Tracking by Dynamic Programming”, J. New Music Research, Special Issue on Beat and Tempo Extraction, Vol. 36, No. 1, pp. 51-60, 2007.