• 沒有找到結果。

Comparison of multiple representation of each feature set…

在文檔中 中 華 大 學 (頁 49-60)

In this experiment, each feature set is represented by multiple prototypes generated by using the k-means clustering algorithm. Tables 3.4 ~ 3.7 show the classification results of different number of prototypes (clusters). It can be seen that the best classification accuracy is 89.99% when α=0.99, with 9 prototypes for each music genre class, and LDA is used for dimension reduction of the feature set SMMFCC+SMOSC+

SMNASE+SMCOMB.

Table 3.1 Classification accuracy (%) of different modulation spectral features using LDA or NDA as the classifier

α Feature Set LDA NDA

0.98

SMMFCC 82.30 81.62

DMMFCC 72.43 72.15

SMOSC 80.52 80.38

DMOSC 77.09 76.54

SMNASE 80.38 80.52

DMNASE 76.41 75.72

SMMFCC⊕SMOSC 86.28 86.28

DMMFCC⊕DMOSC 81.21 81.21

SMMFCC⊕SMNASE 85.32 86.01

DMMFCC⊕DMNASE 80.52 79.97

SMOSC⊕SMNASE 84.22 84.64

DMOSC⊕DMNASE 78.60 78.88

SMMFCC⊕SMOSC⊕SMNASE 86.83 87.65 DMMFCC⊕DMOSC⊕DMNASE 80.93 81.76 SMMFCC+ SMOSC+ SMNASE 83.95 84.77 DMMFCC+ DMOSC+ DMNASE 81.89 81.89 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 83.54 84.50 SMMFCC+ SMOSC+ SMNASE+SMCOMB 85.19 85.73 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.30 82.99 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 84.50 85.73

0.99

SMMFCC 81.48 83.26

DMMFCC 75.17 73.53

SMOSC 82.17 81.62

DMOSC 77.37 76.82

SMNASE 80.38 81.34

DMNASE 77.09 77.09

SMMFCC⊕SMOSC 86.01 85.73

DMMFCC⊕DMOSC 81.62 81.07

SMMFCC⊕SMNASE 86.42 86.42

DMMFCC⊕DMNASE 80.66 80.66

SMOSC⊕SMNASE 86.56 86.56

DMOSC⊕DMNASE 79.70 79.84

SMMFCC⊕SMOSC⊕SMNASE 88.07 88.07 DMMFCC⊕DMOSC⊕DMNASE 82.72 82.30 SMMFCC+ SMOSC+ SMNASE 85.19 86.15 DMMFCC+ DMOSC+ DMNASE 80.93 80.93 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 83.95 85.32 SMMFCC+ SMOSC+ SMNASE+SMCOMB 86.97 87.93 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.58 82.72 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 86.15 86.83

Table 3.2 Classification accuracy (%) using kNN classifier with LDA

α Feature Set kNN=5 kNN=10 kNN=15 kNN=20

0.98

SMMFCC 80.38 80.93 82.30 81.21 DMMFCC 76.68 76.95 76.82 76.41 SMOSC 81.62 81.76 82.30 81.76 DMOSC 76.95 78.74 77.91 77.78 SMNASE 80.25 81.07 81.34 80.52 DMNASE 75.03 75.58 76.82 77.09 SMMFCC⊕SMOSC 86.69 86.01 86.56 87.11 DMMFCC⊕DMOSC 79.97 79.70 80.66 80.52 SMMFCC⊕SMNASE 85.19 85.19 85.46 85.87 DMMFCC⊕DMNASE 78.60 79.42 80.80 80.80 SMOSC⊕SMNASE 84.22 83.68 83.40 83.54 DMOSC⊕DMNASE 80.38 79.15 80.11 80.11 SMMFCC⊕SMOSC⊕SMNASE 87.65 87.11 87.79 87.65 DMMFCC⊕DMOSC⊕DMNASE 80.93 81.76 81.62 81.21 SMMFCC+ SMOSC+ SMNASE 84.50 84.50 84.77 84.36 DMMFCC+ DMOSC+ DMNASE 80.25 79.70 80.11 79.42 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 84.09 83.68 83.13 82.58 SMMFCC+ SMOSC+ SMNASE+SMCOMB 85.87 86.01 85.87 85.05 DMMFCC+ DMOSC+ DMNASE+DMCOMB 81.76 81.48 81.76 81.62 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 85.19 84.64 84.77 84.22

0.99

SMMFCC 81.48 82.85 82.58 81.76 DMMFCC 75.72 76.95 76.27 75.31 SMOSC 82.58 83.26 82.85 82.58 DMOSC 74.90 77.23 77.09 77.37 SMNASE 80.80 81.34 80.80 80.66 DMNASE 75.99 76.82 76.41 76.95 SMMFCC⊕SMOSC 86.01 87.11 86.83 86.69 DMMFCC⊕DMOSC 80.93 81.07 81.62 81.07 SMMFCC⊕SMNASE 85.73 86.28 86.01 85.60 DMMFCC⊕DMNASE 81.34 81.48 80.80 81.21 SMOSC⊕SMNASE 84.64 85.19 84.91 85.05 DMOSC⊕DMNASE 79.29 79.01 79.42 79.56 SMMFCC⊕SMOSC⊕SMNASE 87.24 87.93 87.38 86.97 DMMFCC⊕DMOSC⊕DMNASE 81.76 82.58 83.26 83.26 SMMFCC+ SMOSC+ SMNASE 85.05 85.19 85.46 84.50 DMMFCC+ DMOSC+ DMNASE 80.52 80.93 79.97 80.11 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 84.22 84.77 84.22 83.68 SMMFCC+ SMOSC+ SMNASE+SMCOMB 87.11 86.83 86.69 86.56 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.03 83.13 82.30 82.30 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 86.01 86.56 86.42 86.01

Table 3.3 Classification accuracy (%) using kNN classifier with NDA

α Feature Set kNN=5 kNN=10 kNN=15 kNN=20

0.98

SMMFCC 81.21 82.17 82.03 82.99 DMMFCC 74.90 75.58 75.58 76.13 SMOSC 80.66 80.93 82.44 80.93 DMOSC 76.82 76.95 77.23 77.78 SMNASE 80.66 81.48 81.07 81.48 DMNASE 77.50 78.88 78.74 78.33 SMMFCC⊕SMOSC 85.05 85.19 85.73 86.69 DMMFCC⊕DMOSC 81.07 79.84 80.25 80.38 SMMFCC⊕SMNASE 85.60 86.28 85.73 85.32 DMMFCC⊕DMNASE 79.01 79.56 80.38 79.56 SMOSC⊕SMNASE 83.95 85.19 84.91 84.64 DMOSC⊕DMNASE 78.60 80.52 80.93 80.11 SMMFCC⊕SMOSC⊕SMNASE 87.24 88.34 88.61 87.93 DMMFCC⊕DMOSC⊕DMNASE 81.62 82.44 82.17 82.17 SMMFCC+ SMOSC+ SMNASE 85.46 84.64 85.05 84.77 DMMFCC+ DMOSC+ DMNASE 81.34 80.52 81.34 80.93 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 83.26 83.54 83.26 82.85 SMMFCC+ SMOSC+ SMNASE+SMCOMB 87.24 86.69 87.52 87.52 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.30 82.03 81.76 81.62 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 85.60 85.73 84.91 84.64

0.99

SMMFCC 82.30 83.13 82.30 82.58 DMMFCC 74.62 75.72 76.54 76.13 SMOSC 82.44 82.44 83.40 83.40 DMOSC 75.17 75.17 77.64 78.19 SMNASE 81.62 82.03 81.07 81.76 DMNASE 77.64 79.01 79.15 78.33 SMMFCC⊕SMOSC 87.65 88.07 87.65 87.38 DMMFCC⊕DMOSC 81.76 80.80 80.93 82.17 SMMFCC⊕SMNASE 86.01 86.42 87.11 87.11 DMMFCC⊕DMNASE 79.29 80.38 81.62 80.80 SMOSC⊕SMNASE 85.60 86.15 86.15 85.46 DMOSC⊕DMNASE 80.11 80.80 81.48 81.07 SMMFCC⊕SMOSC⊕SMNASE 87.24 87.52 88.07 87.93 DMMFCC⊕DMOSC⊕DMNASE 81.89 82.72 83.40 82.85 SMMFCC+ SMOSC+ SMNASE 85.87 86.15 86.42 85.87 DMMFCC+ DMOSC+ DMNASE 80.93 80.80 80.93 80.66 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 84.77 85.46 85.05 84.36 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.89 88.75 88.61 88.20 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.99 82.58 81.89 81.62 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 86.28 86.69 87.52 86.28

Table 3.4 Classification accuracy (%) using LDA with multiple prototypes for each music class and α=0.98

α Feature Set k=2 k=3 k=4 k=5

0.98

SMMFCC 82.44 83.95 82.58 84.77 DMMFCC 73.66 75.45 77.37 76.68 SMOSC 80.93 82.99 81.07 82.99 DMOSC 75.45 77.37 77.09 77.37 SMNASE 81.34 82.30 82.72 83.40 DMNASE 74.90 76.68 76.68 75.99 SMMFCC⊕SMOSC 87.52 86.56 88.48 87.65 DMMFCC⊕DMOSC 81.34 79.84 80.11 80.66 SMMFCC⊕SMNASE 86.01 86.56 86.69 87.79 DMMFCC⊕DMNASE 80.93 80.52 81.07 81.07 SMOSC⊕SMNASE 86.01 84.77 85.19 85.73 DMOSC⊕DMNASE 79.15 80.93 79.70 79.01 SMMFCC⊕SMOSC⊕SMNASE 89.03 89.30 88.48 87.65 DMMFCC⊕DMOSC⊕DMNASE 81.89 81.48 81.21 81.34 SMMFCC+ SMOSC+ SMNASE 87.93 87.79 88.61 88.20 DMMFCC+ DMOSC+ DMNASE 80.66 82.03 82.30 82.17 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 86.83 86.15 87.11 87.38 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.89 88.89 89.30 89.03 DMMFCC+ DMOSC+ DMNASE+DMCOMB 83.95 82.58 82.99 82.44 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 87.24 87.65 88.75 88.48

α Feature Set

k=6 k=7 k=8 k=9

0.98

SMMFCC 83.68 84.36 84.50 84.09 DMMFCC 77.50 76.54 75.99 76.95 SMOSC 82.99 82.58 83.13 83.13 DMOSC 75.58 76.13 75.58 75.99 SMNASE 82.85 83.68 83.54 82.72 DMNASE 76.68 77.50 78.60 78.60 SMMFCC⊕SMOSC 88.07 88.89 87.79 88.34 DMMFCC⊕DMOSC 80.38 81.21 81.76 80.66 SMMFCC⊕SMNASE 87.52 87.11 88.07 87.52 DMMFCC⊕DMNASE 81.48 82.03 81.62 80.38 SMOSC⊕SMNASE 85.46 85.60 86.28 85.46 DMOSC⊕DMNASE 80.38 79.84 80.80 81.34 SMMFCC⊕SMOSC⊕SMNASE 87.65 86.97 87.65 87.79 DMMFCC⊕DMOSC⊕DMNASE 81.62 81.48 81.89 82.03 SMMFCC+ SMOSC+ SMNASE 87.93 88.61 88.34 87.79 DMMFCC+ DMOSC+ DMNASE 82.44 82.58 82.17 82.58 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 86.56 87.38 87.38 87.93 SMMFCC+ SMOSC+ SMNASE+SMCOMB 89.30 89.71 89.16 89.44 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.99 83.13 82.17 82.58 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 88.75 87.79 88.20 88.48

Table 3.5 Classification accuracy (%) using LDA with multiple prototypes for each music class and α=0.99.

α Feature Set k=2 k=3 k=4 k=5

0.99

SMMFCC 83.68 83.54 83.68 85.19 DMMFCC 73.66 75.31 73.39 75.17 SMOSC 83.26 83.68 83.54 82.72 DMOSC 76.13 77.23 77.91 78.05 SMNASE 81.89 82.72 83.40 84.22 DMNASE 74.76 76.41 77.64 77.64 SMMFCC⊕SMOSC 88.20 87.11 87.93 87.52 DMMFCC⊕DMOSC 80.25 80.93 80.38 81.34 SMMFCC⊕SMNASE 86.28 86.69 86.42 87.52 DMMFCC⊕DMNASE 81.48 81.62 81.62 81.07 SMOSC⊕SMNASE 87.52 86.56 86.42 86.56 DMOSC⊕DMNASE 80.66 81.34 81.62 80.80 SMMFCC⊕SMOSC⊕SMNASE 88.07 88.07 88.89 88.48 DMMFCC⊕DMOSC⊕DMNASE 82.85 83.54 82.44 82.99 SMMFCC+ SMOSC+ SMNASE 88.20 87.24 88.48 88.48 DMMFCC+ DMOSC+ DMNASE 80.66 82.99 82.72 82.30 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 86.83 87.38 87.65 88.48 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.34 88.61 89.03 89.16 DMMFCC+ DMOSC+ DMNASE+DMCOMB 82.99 83.68 84.22 83.54 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 87.65 88.61 88.07 88.34

α Feature Set k=6 k=7 k=8 k=9

0.99

SMMFCC 84.22 85.60 85.46 84.09 DMMFCC 75.58 74.35 74.90 75.99 SMOSC 83.13 84.36 84.50 85.46 DMOSC 77.50 77.64 74.90 75.45 SMNASE 84.09 83.95 84.50 83.81 DMNASE 77.64 78.88 79.70 79.97 SMMFCC⊕SMOSC 88.34 88.61 87.93 87.24 DMMFCC⊕DMOSC 81.34 80.66 81.76 81.89 SMMFCC⊕SMNASE 86.01 86.15 86.42 85.87 DMMFCC⊕DMNASE 83.68 81.62 82.58 81.76 SMOSC⊕SMNASE 86.83 86.28 87.24 87.11 DMOSC⊕DMNASE 79.70 80.25 82.30 81.89 SMMFCC⊕SMOSC⊕SMNASE 88.61 88.34 87.79 88.20 DMMFCC⊕DMOSC⊕DMNASE 82.99 82.44 82.85 83.26 SMMFCC+ SMOSC+ SMNASE 87.79 88.20 88.75 87.79 DMMFCC+ DMOSC+ DMNASE 83.40 82.30 81.48 82.58 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 88.07 87.52 88.07 88.20 SMMFCC+ SMOSC+ SMNASE+SMCOMB 89.03 89.03 89.85 89.99 DMMFCC+ DMOSC+ DMNASE+DMCOMB 83.68 83.40 82.17 83.13 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 89.16 88.75 88.61 88.20

Table 3.6 Classification accuracy (%) using NDA with multiple prototypes for each music class and α=0.98.

α Feature Set k=2 k=3 k=4 k=5

0.98

SMMFCC 82.17 82.58 82.17 84.09 DMMFCC 73.39 75.31 75.99 74.49 SMOSC 82.03 83.40 82.30 82.44 DMOSC 75.72 76.82 76.54 78.46 SMNASE 80.80 82.17 82.44 82.99 DMNASE 74.62 75.86 76.54 75.31 SMMFCC⊕SMOSC 87.24 87.52 88.20 87.93 DMMFCC⊕DMOSC 81.62 80.66 81.76 81.48 SMMFCC⊕SMNASE 85.73 87.52 87.38 87.38 DMMFCC⊕DMNASE 80.93 80.80 80.38 81.48 SMOSC⊕SMNASE 85.73 85.32 84.77 85.60 DMOSC⊕DMNASE 78.60 80.66 80.11 79.70 SMMFCC⊕SMOSC⊕SMNASE 89.30 88.20 89.44 87.93 DMMFCC⊕DMOSC⊕DMNASE 82.44 81.34 81.89 82.58 SMMFCC+ SMOSC+ SMNASE 86.56 87.65 88.34 87.79 DMMFCC+ DMOSC+ DMNASE 81.34 82.03 83.13 82.44 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 86.42 86.56 86.83 87.38 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.34 88.75 88.89 89.30 DMMFCC+ DMOSC+ DMNASE+DMCOMB 83.40 83.26 83.13 83.54 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 87.38 88.07 88.07 88.48

α Feature Set k=6 k=7 k=8 k=9

0.98

SMMFCC 84.09 84.64 84.77 83.95 DMMFCC 76.27 77.50 76.27 77.64 SMOSC 82.44 82.03 83.68 83.26 DMOSC 77.23 76.82 75.99 75.17 SMNASE 82.03 83.54 83.68 82.85 DMNASE 76.41 77.50 79.01 78.88 SMMFCC⊕SMOSC 88.89 88.34 87.93 88.07 DMMFCC⊕DMOSC 80.93 80.80 81.48 81.48 SMMFCC⊕SMNASE 87.52 87.38 88.07 87.52 DMMFCC⊕DMNASE 80.52 80.66 81.48 80.25 SMOSC⊕SMNASE 85.87 85.87 85.73 85.87 DMOSC⊕DMNASE 79.97 79.70 81.34 81.62 SMMFCC⊕SMOSC⊕SMNASE 87.79 86.42 88.20 87.65 DMMFCC⊕DMOSC⊕DMNASE 82.03 82.30 81.34 81.76 SMMFCC+ SMOSC+ SMNASE 86.97 88.20 88.75 88.07 DMMFCC+ DMOSC+ DMNASE 82.17 82.30 82.44 82.99 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 87.38 87.65 87.93 88.34 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.61 88.75 89.16 89.44 DMMFCC+ DMOSC+ DMNASE+DMCOMB 83.81 83.81 83.13 82.72 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 88.07 87.38 88.34 88.34

Table 3.7 Classification accuracy (%) using NDA with multiple prototypes for each music class and α=0.99

α Feature Set k=2 k=3 k=4 k=5

0.99

SMMFCC 83.81 83.13 82.72 84.77 DMMFCC 73.80 75.72 73.53 75.03 SMOSC 83.68 84.09 82.44 82.85 DMOSC 75.99 77.91 77.91 77.78 SMNASE 82.03 84.09 83.26 84.77 DMNASE 75.86 78.05 77.23 77.23 SMMFCC⊕SMOSC 87.24 87.93 88.48 87.38 DMMFCC⊕DMOSC 81.34 81.07 80.38 81.07 SMMFCC⊕SMNASE 86.28 87.65 87.24 87.24 DMMFCC⊕DMNASE 81.89 82.30 81.89 81.89 SMOSC⊕SMNASE 86.83 86.42 86.01 86.69 DMOSC⊕DMNASE 80.93 80.93 81.76 79.97 SMMFCC⊕SMOSC⊕SMNASE 88.34 88.34 89.30 88.07 DMMFCC⊕DMOSC⊕DMNASE 82.99 83.68 82.99 83.40 SMMFCC+ SMOSC+ SMNASE 88.34 87.38 88.20 88.48 DMMFCC+ DMOSC+ DMNASE 81.89 82.58 81.76 82.58 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 87.65 87.11 86.97 87.93 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.89 88.20 88.61 88.75 DMMFCC+ DMOSC+ DMNASE+DMCOMB 84.22 84.36 84.50 83.54 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 88.48 88.61 88.61 87.93

α Feature Set k=6 k=7 k=8 k=9

0.99

SMMFCC 83.68 85.05 84.50 84.77 DMMFCC 75.31 74.21 74.49 76.95 SMOSC 83.40 83.68 84.22 85.60 DMOSC 77.64 76.95 75.72 74.62 SMNASE 83.95 83.81 83.81 83.13 DMNASE 77.37 78.74 79.56 80.80 SMMFCC⊕SMOSC 87.65 87.65 86.83 87.79 DMMFCC⊕DMOSC 80.80 79.84 81.48 81.07 SMMFCC⊕SMNASE 86.28 86.69 86.15 85.87 DMMFCC⊕DMNASE 82.17 81.48 81.76 81.21 SMOSC⊕SMNASE 86.28 85.73 86.97 86.83 DMOSC⊕DMNASE 80.52 80.11 81.34 81.48 SMMFCC⊕SMOSC⊕SMNASE 87.52 86.69 88.07 88.61 DMMFCC⊕DMOSC⊕DMNASE 83.26 82.58 82.30 82.85 SMMFCC+ SMOSC+ SMNASE 87.65 88.07 88.61 88.34 DMMFCC+ DMOSC+ DMNASE 82.99 82.85 82.03 82.99 SMMFCC+ SMOSC+ SMNASE+

DMMFCC+ DMOSC+ DMNASE 87.79 87.11 88.34 87.79 SMMFCC+ SMOSC+ SMNASE+SMCOMB 88.34 88.75 89.30 89.71 DMMFCC+ DMOSC+ DMNASE+DMCOMB 84.36 83.95 82.72 83.40 SMMFCC+ SMOSC+ SMNASE+SMCOMB+

DMMFCC+ DMOSC+ DMNASE+DMCOMB 89.16 89.16 89.16 89.16

Chapter 4 Conclusion

A novel feature set, derived from modulation spectral analysis of spectral (OSC and NASE) and cepstral (MFCC) features, is proposed for music genre classification. The long-term modulation spectrum analysis is employed to capture the time-varying behavior of each feature value. For each spectral/cepstral feature set, a modulation spectrogram will be generated by collecting the modulation spectrum of all corresponding feature values.

Statistical aggregations of all MSCs, MSVs, MSEs, MSCen, and MSFs, are computed to generate effective and compact discriminating features. The music database employed in the ISMIR2004 Audio Description Contest, where all music tracks are classified into six classes, was used for performance comparison. Form the experimental results, we can see that NDA and LDA both achieve good classification accuracy (about 88%). Furthermore, if multiple prototypes are used to represent each music class, the best classification accuracy achieved is 89.99%.

References

[1] G. Tzanetakis, P. Cook, “Musical genre classification of audio signals”, IEEE Trans.

on Speech and Audio Processing, vol. 10, no. 3, pp. 293-302, 2002.

[2] T Li, M. Ogihara, Q. Li, “A Comparative study on content-based music genre classification”, Proceedings of ACM Conf. on Research and Development in Information Retrieval, pp. 282-289, 2003.

[3] D. N. Jiang, L. Lu, H. J. Zhang, J. H. Tao, L. H. Cai, ”Music type classification by spectral contrast feature”, Proceedings of the IEEE International Conference on Multimedia & Expo, vol.1, pp. 113-116, 2002.

[4] K. West and S. Cox, “Features and classifiers for the automatic classification of musical audio signals”, Proceedings of International Conference on Music Information Retrieval, 2004.

[5] K. Umapathy, S. Krishnan, S. Jimaa, “Multigroup classification of audio signals using time-frequency parameters”, IEEE Trans. on Multimedia, vol. 7, no. 2, pp.

308-315, 2005.

[6] M. F. McKinney, J. Breebaart, “Features for audio and music classification”, Proceedings of the 4th International Conference on Music Information Retrieval, pp.

151-158, 2003.

[7] J. J. Aucouturier, F. Pachet, Representing music genres: a state of the art, Journal of New Music Research, vol. 32, no. 1,pp. 83-93, 2003.

[8] U. Bağci, E. Erzin, “Automatic classification of musical genres using inter-genre similarity”, IEEE Signal Processing Letters, vol. 14, no. 8, pp. 512-524, 2007.

[9] A. Meng, P. Ahrendt, J. Larsen, L. K. Hansen, “Temporal feature integration for music genre classification”, IEEE Trans. on Audio, Speech, and Language Processing, vol. 15, no. 5, pp. 1654-1664, 2007.

[10] T. Lidy, A. Rauber, “Evaluation of feature extractors and psycho-acoustic transformations for music genre classification”, Proceedings of the 6th International Conference on Music Information Retrieval, 2005, pp. 34-41.

[11] M. Grimaldi, P. Cunningham, A. Kokaram, “A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques”, Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp.102-108, 2003.

[12] J. J. Aucouturier, F. Pachet, and M. Sandler, “The way it sounds: timbre models for

analysis and retrieval of music signals”, IEEE Transactions on Multimedia, Vol. 7,no.

6, pp. 1028-1035, 2005

[13] J. Jose Burred and A. Lerch, “A hierarchical approach to automatic musical genre classification”, in Proc. of the 6th

Int. Conf. on Digital Audio Effects, pp. 8-11, 2003.

[14] J. G. A. Barbedo and A. Lopes, "Research article: automatic genre classification of musical signals", EURASIP Journal on Advances in Signal Processing, vol. 2007, pp.

1-12, 2006.

[15] T. Li and M. Ogihara, “Music genre classification with taxonomy”, in Proc. of IEEE

Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. 5, pp. 197-200, 2005.

[16] J. J. Aucouturier and F. Pachet, “Representing musical genre: a state of the art”,

Journal of new musical research, vol. 32, no.1, pp. 83-93, 2003.

[17] H. G. Kim, N. Moreau, T. Sikora, “Audio classification based on MPEG-7 spectral basis representation”, IEEE Trans. On Circuits and Systems for Video Technology, vol.

14, no. 5, pp. 716-725, 2004.

[18] M. E. P. Davies and M. D. Plumbley, “Beat tracking with a two state model”, in Proc.

Int. Conf. on Acoustic, Speech, and Signal Processing (ICASSP), 2005.

[19] W. A. Sethares, R. D. Robin, J. C. Sethares, “Beat tracking of musical performance using low-level audio feature”, IEEE Trans. on Speech and Audio Processing, vol. 13, no. 12, pp. 275-285, 2005.

[20] G. Tzanetakis, A. Ermolinskyi, and P. Cook, “Pitch Histogram in Audio and Symbolic Music Information Retrieval”, in Proc. IRCAM, 2002.

[21] T. Tolonen and M. Karjalainen, “A computationally efficient multipitch analysis model”, IEEE Transactions on Speech and Audio Processing, vol. 8, no. 6, pp.

708-716, 2000.

[22] R. Meddis and L. O’Mard, “A unitary model of pitch perception”, Acoustical Society

of America, vol. 102, no. 3, pp. 1811-1820, 1997.

[23] N. Scaringella, G. Zoia and D. Mlynek, "Automatic genre classification of music content: a survey", IEEE Signal Processing Magazine, vol. 23, no. 2, pp.133 - 141, 2006

[24] B. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram“, Speech Commun., vol. 25, no. 1, pp.117-132, 1998.

[25] S. Sukittanon, L. E. Atlas, and J. W. Pitton, “Modulation-scale analysis for content identification”, IEEE Transactions on signal processing, vol. 52, no. 10,

在文檔中 中 華 大 學 (頁 49-60)

相關文件