結果分析與未來展望

6.1 趨勢估計結果

本論文中提出以 GMM 轉換方程式的方法來去除背景伴奏，並且使用 PFCC 來當作特徵參數，最後再提出使用二階段趨勢估計的方法。在表五-3 中可以看到每個項目的正確答案涵蓋率都在 90%以上，這個數值代表著最後計算正確率時的極限，因為只有在趨勢估計中被選到的範圍會進入下一步驟的音高追蹤。在這個部分 90%算是一個令人滿意的結果，表示大部分的正確答案都被包含在其中。要隱憂的則是在表五-3 中可以看到正確答案覆蓋率較低的音檔數量幾乎沒有減少太多，可見這些特例是目前方法不能處理 的，而在趨勢估計優化的步驟中(如第三章中所提到的 M 和 N)，各項系統參數是選擇能 夠使整體最佳效果的解，因此難以對特例做出微調，然而流行音樂百百種，特例必然存在，這是往後該面對的問題。

6.2 正確率結果

本論文中最好的效能是表五-3 最後一欄“New_2stage_noDP_H&N”，是使用二階段趨勢估計，並且將原始頻譜做 HPSS 和 NSHS 處理，最後不使用 DP，直接在趨勢範圍中找每個 frame 的最大值當作預測答案，正確率是 79.6%，但仍然略輸 Hsu’s method 的 80.12%，這兩種方法不同的是本論文提出的方法是以參數訓練為基礎進行的，而 Hsu’s method 完全是以訊號處理的角度來建構系統。如表六-1 顯示訓練資料量對系統正向效益的影響，而本論文中使用的訓練資料量總長度只有 40 分鐘左右(表中 C 欄位)，以現實流行音樂來說大約只有 10 首歌的長度，實則非常的少，在我們增加訓練資料至 60 分鐘後，正確率提升到 80.68% (表中 D 欄位)，由此可預期在未來擁有更多的音樂資料時，

本論文的方法能夠有更好的效能。

35

表六-1：訓練參數量與正確率和涵蓋率的影響

A B

C

Raw pitch accuracy 66.97% 73.59%

79.6%

80.68%

Number of raw pitch accuracy < 50% 199 124

37

33 Number of raw pitch accuracy < 25% 97 46

10

4 Number of raw pitch accuracy < 10% 43 13

5

正確答案覆蓋率 79.63% 84.57%

90.75%

91.16%

覆蓋率低於 50%的音檔數 146 79

19

覆蓋率低於 25%的音檔數 64 29

10

覆蓋率低於 10%的音檔數 30 11

5 A 共使用 64727 特徵向量，總長約 10 分鐘 B 共使用 126430 特徵向量，總長約 20 分鐘 C 共使用 253455 特徵向量，總長約 40 分鐘 (本論文最終使用的資料量)

D 共使用 365780 特徵向量，總長約 60 分鐘

36 參考資料

[1] Wei-Ho Tsai and Hao-Ping Lin “Background Music Removal Based on Cepstrum Transformation for Popular Singer Identiﬁcation”, IEEE rans. on Audio, Speech, and Language Processing, vol. 19, no. 5, July 2011.

[2] 宋柏毅 “以韻律模型為基礎之中文韻律轉換研究”, 交通大學碩士論文, 2009.

[3] Sebastian Ewert and Meinard Muller “Chroma oolbo : Matlab Implementations for Extracting Variants of Chroma-Based Audio Features”, I MIR 2 11.

[4] Sebastian Ewert, Meinard Muller and Michael Clausen “ owards imbre-Invariant Audio Features for Harmony-Based Music”, IEEE Trans. on Audio, Speech and Language Process., Vol. 18, No. 3, pp.649 -662, 2010.

[5] Chao-Ling Hsu and Jyh-Shing Roger Jang,

“https://sites.google.com/site/unvoicedsoundseparation/mir-1k” , (n.d.).

[6] Mike Brookes, “http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html”,(n.d.).

[7] 白宗儒 “一個是用於複音音樂之音高追蹤的混乘法” , 清華大學碩士論文, 2011.

[8] N. Ono, K. Miyamoto, J. Le Rou , H. Kameoka, and . agayama, “ eparation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram”, Proc. of EUSIPCO, 2008.

[9] Hermes, D. J. (1988). "Measurement of pitch by subharmonic summation," J. Acoust.

Soc. Am. 83, 257-264.

[10] M. Goto, “A Real-Time Music Scene Description System: Predominant-F0 Estimation Detecting Melody and Bass Lines in Real-World Audio ignals”, peech Communication, vol. 43, no. 4, pp.311–329, 2004.

[11] Chao-Ling Hsu and Roger Jang, “Singing Pitch Extraction at Mirex 2010”, he Music Information Retrieval Evaluation Exchange (MIREX), 2010.

[12] Justin salamon and Emilia Gomez, “Melody Extraction from Polyphonic Music Signals using Pitch Contour Charateristics”, IEEE Trans. on Audio, Speech, and Language Processing, 2012

[13] A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. 7th Int. Conf. on Music Inform. Retrieval,Victoria, Canada, Oct.

2006

[14] G. Poliner and D. Ellis, “A classiﬁcation approach to melody transcrip-tion,” in Proc. 6th Int. Conf. on Music Inform. Retrieval, London, Sep.2005

[15] M. Lagrange, L. G. Martins, J. Murdoch, and G. zanetakis, “Normalized cuts for predominant melodic source separation processing,” IEEE Trans. on Audio, Speech and Language Process., vol. 16, no. 2, pp.278–290, Feb. 2008

[16] Yannis Panagankis ,Constantine Kotropoulos and Gonzalo R. Arce, “ 1-Graph Based Music Structure Analysis”, 12th International Society for Music Information Retrieval

37

Conference (ISMIR), 2011.

[17] Joan Serr`a, Emilia G´omez, Perfecto Herrera and Xavier Serra, “Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification”, IEEE Trans. on Audio, Speech, and Language Processing, 2008.

[18] Cynthia C.S. Liem and Alan Hanjalic, “Cover Song Retrieval: a Comparative Study of ystem Component Choices”, 10th International Society for Music Information Retrieval Conference(ISMIR), 2009.

[19] D. Ellis and G. Poliner “Identifying Cover Songs with Chroma Features and Dynamic Programming Beat Tracking”, IEEE Trans. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 2007.

[20] Müller and Meinard, “Information Retrieval for Music and Motion ( Chapter 3 )”, 2007.

[21] Frank Kurth and Meinard Müller, “Efficient Index-Based Audio Matching” IEEE Trans.

on Audio, Speech, and Language Processing, Vol. 16, No. 2, 2008.

[22] Riccardo Miotto and Nicola Orio, “A Music Identification System Based on Chroma Indexing and tatistical Modeling”, ISMIR Content-Based Retrieval, Categorization and Similarity, 2008.

[23] D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” International Society for Music Information Retrieval Conference(ISMIR), 2007.

[24] George Tzanetakis and Perry Cook, “Musical Genre Classification of Audio ignals,”

IEEE Trans. on Audio, Speech, and Language Processing, Vol. 10, No. 5, 2002.

[25] Yuxiang Liu, Qiaoliang Xiang, Ye Wang and Lianhong Cai, “Cultural Style Based Music Classification of Audio Signals” IEEE Trans. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 2009.

[26] D. Ellis, “Beat Tracking by Dynamic Programming”, J. New Music Research, Special Issue on Beat and Tempo Extraction, Vol. 36, No. 1, pp. 51-60, 2007.

在文檔中使用GMM轉換之背景伴奏消除及趨勢估計之歌曲音高軌跡追蹤 (頁 44-47)

6.1 趨勢估計結果

6.2 正確率結果

35

C

79.6%

37

10

5

90.75%

19

10

5

A 共使用 64727 特徵向量，總長約 10 分鐘 B 共使用 126430 特徵向量， 總長約 20 分鐘 C 共使用 253455 特徵向量， 總長約 40 分鐘 (本論文最終使用的資料量)

D 共使用 365780 特徵向量，總長約 60 分鐘

36

參考資料

37

A 共使用 64727 特徵向量，總長約 10 分鐘 B 共使用 126430 特徵向量，總長約 20 分鐘 C 共使用 253455 特徵向量，總長約 40 分鐘 (本論文最終使用的資料量)