• 沒有找到結果。

Homogeneous Segmentation and Homogeneous Segmentation and Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag

N/A
N/A
Protected

Academic year: 2022

Share "Homogeneous Segmentation and Homogeneous Segmentation and Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag "

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

H S t ti d

H S t ti d

Homogeneous Segmentation and Homogeneous Segmentation and Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag Classifier Ensemble for Audio Tag

Annotation and Retrieval Annotation and Retrieval

Hung-Yi Lo, Ju-Chiang Wang, and Hsin-Min Wang g , g g, g

July 20, 2010

Spoken Language Processing Group Spoken Language Processing Group

Natural Language and Knowledge Processing Lab.

Natural Language and Knowledge Processing Lab.

Institute of Information Science Institute of Information Science Institute of Information Science Institute of Information Science Academia Sinica, Taiwan

Academia Sinica, Taiwan

http://sovideo.iis.sinica.edu.tw/SLG http://sovideo.iis.sinica.edu.tw/SLG

(2)

Social Tagging to Music

Social Tagging to Music

(3)

Audio Tag Annotation and Retrieval Audio Tag Annotation and Retrieval

Annotating audio clips with tags

Scores of Tag Predictors Annotate Audio

Using One

Scores of Tag Predictors

An Audio Clip

Using One Predictor for

Each Tag Female R&B Guitar Metal Bass

Retrieving audio clips using a tag query

A Query: Rock

Rank Audio Clips Based on the

Ranking List for the Query

A Query: Rock

Scores of the Rock Predictor

High Relevance Low Relevance 2010/07/20

(4)

Our Contributions Our Contributions

1. Dividing the audio signal into homogeneous segments using an audio novelty curve

2. Each tag predictor is an ensemble classifier combining two classifiers: SVM and AdaBoost

¾ Ranking Ensemble for audio tag retrieval

¾ Probability Ensemble for audio tag annotation

„ Our ranking ensemble won the Audio Tagging Competition in 2009 Music Information Retrieval Evaluation eXchange in 2009 Music Information Retrieval Evaluation eXchange (MIREX)

¾ In terms of tag F-measureg and the area under the ROC curve given a g tag (for audio retrieval)

(5)

Audio Segmentation Audio Segmentation

• Feature of the Matrix:

13 Di MFCC 13 Dim MFCC

• Kernel Type:

Gaussian

• Kernel Size:e e S e 128 frames

•The prediction score on the whole clip is the on the whole clip is the average of scores on each segment.

2010/07/20

(6)

Audio Segmentation Audio Segmentation

• Feature of the Matrix:

13 Di MFCC 13 Dim MFCC

• Kernel Type:

Gaussian

• Kernel Size:e e S e 128 frames

•The prediction score on the whole clip is the on the whole clip is the average of scores on each segment.

(7)

Audio Feature Extraction Using

Audio Feature Extraction Using MIRToolbox MIRToolbox

Classes Features

Dynamics

Dynamics ‚ Rms

‚ Peak and centroids of the fluctuation summary

Rhythm Rhythm

Peak and centroids of the fluctuation summary

‚ Tempo

‚ Attack slop and attack time of the onset

‚ Zero-crossing rate

‚ Spectral centroid, spread, skewness and kurtosis

‚ Brightness

‚ Rolloff with 95% threshold R ll ff ith 85% th h ld

Timbre Timbre

‚ Rolloff with 85% threshold

‚ Spectral entropy and flatness

‚ Roughness

‚ IrregularityIrregularity

‚ Inharmonicity

‚ MFCCs, delta-MFCCs, and delta-delta-MFCCs

‚ Low energy rategy

‚ Spectral flux

Pitch

Pitch ‚ Pitch

‚ Chromagram and its centroids and highest peak

2010/07/20

Tonality Tonality

‚ Key clarity

‚ Key mode

‚ Harmonic change

(8)

Classification Methods and The Difficulties Classification Methods and The Difficulties

„ The tag predictor is an ensemble that combines the outputs of two classifiers

¾ SVM: Linear SVM implemented by the LIBLINEAR package

¾ AdaBoost: decision stump as the base learner

Two methods to merge the two prediction scores 1 Ranking Ensemble for the retrieval task

1. Ranking Ensemble for the retrieval task

¾ The scales of the two classifiers’ prediction scores are rather different

2. Probability Ensemble for the annotation task 2. Probability Ensemble for the annotation task

¾ The scores of different tag predictors are not comparable

Female R&B Guitar Metal Bass

(9)

Ranking Ensemble Ranking Ensemble

AdaBoost SVM AdaBoost SVM Merged

Prediction

1.9 7.1 1 2 1.5

-0.5

1 1

6.5

3 9

4

2

3

4

3.5

3 1.1

-2 3

3.9

-0 3

2

5

4

5

3

5 2.3

0.2

0.3

12

5

3

5

1

5

2

Prediction

S Respective

R ki

Average R ki

2010/07/20

Scores Rankings Ranking

(10)

Probability Ensemble Probability Ensemble

„ In the audio annotation task, we need to compare the scores of all tag predictors

¾ The raw scores of different tag classifiers are not comparable

„ W t f th t t f SVM d Ad B t i t

„ We transform the output scores of SVM and AdaBoost into probability scores with a sigmoid function:

) exp(

1 ) 1

| 1

Pr( y Af B

+

≈ +

= x

¾ f : the output score of a classifier

¾ A, B:, can be learned by solving a regularized maximum likelihood y g g problem

(11)

Model Selection Model Selection

„ MIREX evaluates submitted algorithms by 3-fold cross- validation

„ Inner cross-validation on the training set to determine the classifier parameters

¾ The cost parameter C in the linear SVM

¾ The number of base learners in AdaBoost

„ Re train the classifiers with the complete training set and the

„ Re-train the classifiers with the complete training set and the selected parameters

„ Model selection criterion: AUC-ROC

¾ Since the class distributions for

Inner Cross- Validation

¾ Since the class distributions for

some tags are imbalanced Outer Cross-

Validation

2010/07/20

(12)

MIREX 2009 Results on The

MIREX 2009 Results on The MajorMiner MajorMiner Dataset Dataset

Tag F-measure

Tag Accuracy

Tag AUC-ROC

Clip AUC-ROC No Seg 0 289 0 900 0 782 0 751 No Seg 0.289 0.900 0.782 0.751 Seg 0.311 0.903 0.807 0.774 BP1 0 277

Audio Retrieval:

0 868

Audio Annotation:

0 742 0 871

Better Than BP1 0.277 0.868 0.742 0.871

BP2 0.290 0.859 0.761 0.861

CC1 0 209 0 912 0 762 0 882

Audio Retrieval:

Given a tag query, correct audio clips should be ranked higher

Audio Annotation:

Given a clip, correct tags should have higher scores

CC1 0.209 0.912 0.762 0.882

CC2 0.241 0.905 0.791 0.882

CC3 0 170 0 913 0 721 0 854

CC3 0.170 0.913 0.721 0.854

CC4 0.263 0.890 0.749 0.854

GP 0 012 0 891

GP 0.012 0.891

GT1 0.290 0.850 0.784 0.872

(13)

MIREX 2009 Results on The

MIREX 2009 Results on The Mood Mood Dataset Dataset

Tag F-measure

Tag Accuracy

Tag AUC-ROC

Clip AUC-ROC No Seg 0 204 0 882 0 667 0 678 No Seg 0.204 0.882 0.667 0.678 Seg 0.219 0.887 0.701 0.704

BP1 0 195 0 837 0 648 0 854

BP1 0.195 0.837 0.648 0.854

BP2 0.193 0.829 0.632 0.859

CC1 0 172 0 878 0 652 0 849

CC1 0.172 0.878 0.652 0.849

CC2 0.180 0.882 0.681 0.848

CC3 0 147 0 882 0 629 0 812

CC3 0.147 0.882 0.629 0.812

CC4 0.183 0.862 0.646 0.812

GP 0 084 0 863

GP 0.084 0.863

GT1 0.211 0.823 0.649 0.860

GT2 0 209 0 824 0 655 0 861

2010/07/20

GT2 0.209 0.824 0.655 0.861

HBC 0.063 0.909 0.664 0.861

(14)

Extended Experiments Extended Experiments

„ We extensively evaluate the classifiers and the ensemble methods on the downloaded MajorMiner dataset

¾ MajorMiner is a web-based music labeling game: http://majorminer.org/

„ Our extended experiments basically follow the MIREX 2009 t

setup

¾ Use the same 45 tags and download all the audio clips that are associated with these tags

associated with these tags

¾ The dataset might be slightly different from that used in MIREX 2009

¾ The resulting audio database

metal instrumental horns piano guitar

contains 2,472 clips

„ Repeat cross-validation

metal instrumental horns piano guitar

ambient saxophone house loud bass

fast keyboard vocal noise british

solo electronica beat 80s dance

twenty times to reduce variance

solo electronica beat 80s dance

jazz drum machine strings pop r&b

female distortion voice rap male

(15)

Results of The Audio Retrieval Task Results of The Audio Retrieval Task

Mean±

Tag AUC-ROC Tag F-measure

Standard

Deviation Without Seg.

With Seg.

Without Seg.

With 4.23% Seg.

g g g g

AdaBoost 0.7520

±0.0026 0.7943

±0.0024 0.2856

±0.0036 0.3034

±0.0051 1.42%

Linear SVM 0.7848

±0.0029 0.7990

±0.0030 0.3092

±0.0028 0.3169

±0.0038 1.42%

2.14%

Better Than

Probability

Ensemble 0.7894

±0.0030 0.8108

±0.0020 0.3163

±0.0037 0.3296

±0.0039 2.14%

1.92%

Ranking

Ensemble 0.7997

±0.0022 0.8189

±0.0017 0.3211

±0.0032 0.3332

±0.0038 1.92%

6.69%

2010/07/20

(16)

Results of The Audio Annotation Task Results of The Audio Annotation Task

Mean±

Clip AUC-ROC Tag Accuracy

Standard

Deviation Without Seg.

With Seg.

Without Seg.

With Seg.

g g g g

AdaBoost 0.8627

±0.0009 0.8774

±0.0009 0.9162

±0.0004 0.9184

±0.0004

Linear SVM 0.8788

±0.0009 0.8828

±0.0012 0.9191

±0.0004 0.9200

±0.0003 Probability

Ensemble 0.8788

±0.0007 0.8848

±0.0007 0.9191

±0.0002 0.9201

±0.0003 Ranking

Ensemble 0.7626

±0.0012 0.7814

±0.0010 0.9016

±0.0004 0.9057

±0.0003 10.34%

(17)

Conclusion Conclusion

„ This paper has presented our methods for audio tag annotation and retrieval

„ Major contributions:

¾ Use a novelty curve to divide audio clips into homogeneous segments

¾ Exploit two classifier ensembles: ranking ensemble and probability ensemble

„ The ranking ensemble performs very well in the MIREX 2009 audio tag classification task in terms of audio retrieval

metrics

¾ But not very good in terms of audio annotation metrics

„ The probability ensemble method performs very well in terms of audio annotation metrics

2010/07/20

(18)

Thank You

Thank You

Thank You

Thank You

參考文獻

相關文件

– stump kernel: succeeded in specific applications infinite ensemble learning could be better – existing AdaBoost-Stump applications may switch. not the

our reduction to boosting approaches results in significantly better ensemble ranking

Initial Approaches and Some Settings Sparse Features and Linear Classification Condensed Features and Random Forest Ensemble and Final Results.. Discussion

◆ Understand the time evolutions of the matrix model to reveal the time evolution of string/gravity. ◆ Study the GGE and consider the application to string and

application in audio-visual media of

The MTMH problem is divided into three subproblems which are separately solved in the following three stages: (1) find a minimum set of tag SNPs based on pairwise perfect LD

per-user incomplete discrete ratings predicted continuous ratings as individual: RMSE 24 :7433, worse than MF

[18] Jiho Ryu, Hojin Lee, Yongho Seok, Taekyoung Kwon and Yanghee Choi, “A Hybrid Query Tree Protocol for Tag Collision Arbitration in RFID systems,”,