• 沒有找到結果。

Using Tag Count Information

N/A
N/A
Protected

Academic year: 2022

Share "Using Tag Count Information"

Copied!
23
0
0

加載中.... (立即查看全文)

全文

(1)

Audio Tag Annotation and Retrieval by Audio Tag Annotation and Retrieval by 

Using Tag Count Information

Hung Yi Lo

1 2

Shou De Lin

2

and Hsin Min Wang

1

Hung‐Yi Lo

1,2

, Shou‐De Lin

2

, and Hsin‐Min Wang

1

1

Academia Sinica, 

2

National Taiwan University

MMM 2011

MMM 2011

Jan. 06, 2011

(2)

Social Tagging to Music

Tag count : the number of different users who have

t t d th t ( l f t i di t hi h t t)

Tag count : the number of different users who have

t t d th t ( l f t i di t hi h t t)

annotated the tag (a larger font indicates a higher tag count) annotated the tag (a larger font indicates a higher tag count)

(3)

Audio Tag Annotation and Retrieval

Annotating audio clips with tags

Scores of Tag Predictors Annotate Audio

Using One

Scores of Tag Predictors

An Audio Clip

Using One Predictor for

Each Tag

Female R&B Guitar Metal Bass

Retrieving audio clips using a tag query

A Query: Rock

Rank Audio Clips Based on the

Ranking List for the Query

A Query: Rock

Scores of the Rock Predictor

High Relevance Low Relevance

(4)

Motivation (1/2)

Noisy Social Tags:

Social tags are assigned by people with different levels of musical Social tags are assigned by people with different levels of musical  knowledge, they Inevitably contain noisy information

Social tags are unstructured, free‐form text that may contain  misspellings

Tags may even assigned by malicious users

Tag count information should be considered in automatic  tagging because the count reflects the confidence degree of tagging because the count reflects the confidence degree of  the tag

Tags with high counts are more reliable and credible

(5)

Motivation (2/2)

In previous works, the tag count is transformed into 1 (with a  tag) or 0 (without a tag) by using a threshold

tag) or 0 (without a tag), by using a threshold

Then train a binary classifier for each tag and make predictions about  untagged audio clips

The tag count information are lost

a tag assigned twice is treated in the same way as a tag assigned  hundreds of times

Question: how to use the tag count information for audio tag  annotation and retrieval?

annotation and retrieval?

Answer: Cost sensitive Learning with the Tag Counts as Costs Answer: Cost‐sensitive Learning with the Tag Counts as Costs

(6)

Factors Affect the Tag Counts

1. Consistent Agreement

When a large number of users consider that an audio clip should be When a large number of users consider that an audio clip should be  associated with a particular tag, the label information is more reliable Only a small portion of an audio clip is related to a certain tag then the  tag count will be small

Tags with higher counts are major property of the audio

2 Tag Bias

2. Tag Bias

Some types are more often used. (such as “rock”)

Some others are unpopular or hard to recognize. (“Baroque” is less Some others are unpopular or hard to recognize. ( Baroque  is less  popular than “classic”, “drum machine” might easily be recognized as 

“drum”)

S /Alb /A i P l i

3. Song/Album/Artist Popularity:

Popular songs, albums, and artists usually receive more tags, since  people tend to tag music that they like or they are familiar with people tend to tag music that they like or they are familiar with

(7)

An Example: “Let it Be” and its Tags

(8)

An Example: “Let it Be” and its Tags with Higher Counts

More Reliable, Major Properties, and Important Tags

More Reliable, Major Properties, and Important Tags

More Reliable, Major Properties, and Important Tags

More Reliable, Major Properties, and Important Tags

(9)

Cost‐sensitive Learning

Given some training example (x, y, c), where c is the  misclassification cost of this example

misclassification cost of this example

Learn a classifier h which minimizes the expected cost on Learn a classifier h which minimizes the expected cost on  unseen instances:

)]

) ( ( [ I h E

where I() is an indicator function that yields 1 if its argument )]

) ( (

[cI h y

E x

where I( ) is an indicator function that yields 1 if its argument  is true

A more general setup of traditional classification problem

(10)

Cost‐sensitive Learning Applications

Business Marketing

Given some features of potential customers each customers has a Given some features of potential customers, each customers has a  purchasing amount as misclassification cost

Decide which customers to mail a new catalog

Cost‐sensitive learning with purchasing amount as cost

Goal: total profit obtained from some unseen testing customers

Audio Tag Annotation with Tag Counts as Costs

Given some acoustic features of an audio with its tags and tag countsg g Goal: minimize misclassified tag counts

If 100 users annotate a audio with “rock”, but the classifier causes a 

f l h h d l

false negative, then it has to paid a cost equal to 100

Paid more attentions on the reliable, major, and Important tags 

(11)

Cost‐insensitive Support Vector Machine

Optimization problem:

min 1 +

i

i

T

w C

w ξ

2 1

ξ

, , b w

s. t.

ξ

, , b w

i i

T

i w x b

y ( + ) ≥ 1 − ξ

≥ 0 ξ i

Train a binary classifier for each tag and gather all instances  annotated with this tag as positive example

annotated with this tag as positive example

(12)

Cost‐sensitive Support Vector Machine

Optimization problem:

min 1 +

i

i i

T

w C c

w ξ

2 1

ξ

, , b w

c

i

s. t.

ξ

, , b w

i i

T

i w x b

y ( + ) ≥ 1 − ξ

≥ 0 ξ i

The cost ci is assigned as the tag count for positive examples

h f l f l

The cost ci is assigned uniformly for negative examples

Since people do not use negative tags like “non‐rock” and “no drum”

(13)

Cost‐sensitive AdaBoost

h ( )

W i ht d S l Mi i i W i ht d E

hT(x)

Weighted Sample Minimize Weighted Error

D ( i ) exp( y h ( x ))

……..

i

i t

i

h x

y i

D ( ) exp( ( ))

Weighted Sample h3(x)

Cost sensitive Weight Update Rule:

h2(x) Weighted Sample

Cost-sensitive Weight Update Rule:

i t i i t

t i c y h x

i D

D ( )exp( ( ))

)

( −α

= ci

Training Sample h1(x)

t

t i Z

D +1( ) =

(14)

Cost‐sensitive AdaBoost

h ( ) W i ht d S l

hT(x) Weighted Sample

……..

Final Classifier:

T Weighted Sample h3(x)

) ( )

(

1

x h x

H

t

T

t

t

=

= α

h2(x) Weighted Sample

Training Sample h1(x)

(15)

Cost‐sensitive Evaluation Metrics: Which One is Better?

(16)

Cost‐sensitive Evaluation Metrics: Which One is Better?

(17)

New Evaluation Metrics: Cost‐sensitive Area Under ROC

ROC Curve: is a graphical plot of true positives rate vs. false  positives rate as the discrimination threshold is varied

positives rate as the discrimination threshold is varied

Cost sensitive Area Under ROC Curve (AUC): replace the true Cost‐sensitive Area Under ROC Curve (AUC): replace the true  positive rate by cost‐weighted true positive rate

Clip AUC: given a audio clip give correct tag higher scores Clip AUC: given a audio clip, give correct tag higher scores

for audio annotation

Tag AUC: given a tag, give correct tag higher scores Tag AUC: given a tag, give correct tag higher scores

(18)

New Evaluation Metrics: Cost‐sensitive F‐measure

Cost‐sensitive precision (CSP):

TP W i ht d

Cost sensiti e recall (CSR)

FP Weighted TP

Weighted

TP Weighted

+

Cost‐sensitive recall (CSR):

FN Weighted TP

Weighted

TP Weighted

+

Cost‐sensitive F‐measure

FN Weighted TP

Weighted +

l b h d

CSR CSP

CSR 2 CSP

+

× ×

We evaluate on both cost‐sensitive metrics and cost‐insensitive  metrics

(19)

Experimental Setup

Compare to our winning method (cost‐insensitive) in MIREX 2009  audio tagging competition

MIREX refers to Music Information Retrieval Evaluation eXchange

Experiments basically follow the MIREX 2009 setup

Download audio data from MajorMiner, a web‐based music labeling game: 

http://majorminer.org/

Use the same 45 tags and download all the audio clips that are associated g p with these tags

The resulting audio database  i 2 472 li

metal instrumental horns piano guitar

ambient saxophone house loud bass

contains 2,472 clips

Select parameters using inner  cross‐validation on training data

fast keyboard vocal noise british

solo electronica beat 80s dance

jazz drum machine strings pop r&b

cross validation on training data

Repeat cross‐validation twenty times to reduce variance

female distortion voice rap male

slow electronic quiet techno drum

funk acoustic rock organ soft

times to reduce variance

(20)

Results of Cost‐sensitive Metrics

Mean±St. D. Cost‐sensitive  Tag AUC

Cost‐sensitive  Clip AUC

Cost‐sensitive  F‐measure AdaBoost 0.8055±0.0027 0.8892±0.0011 0.4099±0.0052 CS AdaBoost 0.8169±0.0023 0.8967±0.0005 0.4469±0.0081

Better Than

SVM 0.8112±0.0022 0.8957±0.0007 0.4354±0.0077

± ± ±

CS SVM 0.8215±0.0023 0.9005±0.0004 0.4593±0.0056 Ensemble 0 8334±0 0019 0 8979±0 0007 0 4606±0 0067 Ensemble 0.8334±0.0019 0.8979±0.0007 0.4606±0.0067 CS Ensemble 0 8356±0 0018 0 9032±0 0006 0 4808±0 0072 CS Ensemble 0.8356±0.0018 0.9032±0.0006 0.4808±0.0072

(21)

Results of Cost‐insensitive (Regular) Metrics

Mean±St. D. Tag AUC Clip AUC F‐measure AdaBoost 0.7941±0.0027 0.8773±0.0011 0.3018±0.0035 CS AdaBoost 0.8050±0.0023 0.8854±0.0005 0.3216±0.0049

Better Than

SVM 0.7992±0.0021 0.8837±0.0007 0.3226±0.0053

± ± ±

Tags with smaller counts may contain noisy labeling information Tags with smaller counts may contain noisy labeling information

CS SVM 0.8106±0.0021 0.8894±0.0004 0.3299±0.0037 Ensemble 0 8221±0 0018 0 8859±0 0007 0 3386±0 0046

Interesting! Since they are trained by cost-sensitive Interesting! Since they are trained by cost-sensitive

Tags with smaller counts may contain noisy labeling information CSL method can ignore the noisy information by giving a smaller

Tags with smaller counts may contain noisy labeling information CSL method can ignore the noisy information by giving a smaller

Ensemble 0.8221±0.0018 0.8859±0.0007 0.3386±0.0046 CS Ensemble 0.8247±0.0017 0.8921±0.0005 0.3442±0.0046

g y y

learning but evaluated with regular metrics

g y y

learning but evaluated with regular metrics

penalty (cost), and thereby train a more accurate classifier penalty (cost), and thereby train a more accurate classifier CS Ensemble 0.8247±0.0017 0.8921±0.0005 0.3442±0.0046

(22)

Conclusion

This paper has presented our novel idea for exploiting tag  count information in audio tagging tasks

count information in audio tagging tasks

discussed several factors that affect the tag counts consistent agreement is the most important issue

Formulate the audio tag prediction task as a cost‐sensitive 

classification problem to minimize the misclassified tag counts Present cost‐sensitive versions of several regular evaluation  metrics

The proposed cost‐sensitive methods outperform their cost‐

insensitive counterparts in terms of not only the cost‐sensitive  evaluation metrics but also the regular evaluation metrics

evaluation metrics but also the regular evaluation metrics

Since the tag count tell us whether we should trust this tag or not

(23)

Thank You

Thank You

參考文獻

相關文件

Recommendation 14: Subject to the availability of resources and the proposed parameters, we recommend that the Government should consider extending the Financial Assistance

The long-term solution may be to have adequate training for local teachers, however, before an adequate number of local teachers are trained it is expedient to recruit large numbers

In view of the large quantity of information that can be obtained on the Internet and from the social media, while teachers need to develop skills in selecting suitable

In an Ising spin glass with a large number of spins the number of lowest-energy configurations (ground states) grows exponentially with increasing number of spins.. It is in

• When a number can not be represented exactly with the fixed finite number of digits in a computer, a near-by floating-point number is chosen for approximate

The point should then be made that such a survey is inadequate to make general statements about the school (or even young people in Hong Kong) as the sample is not large enough

For the data sets used in this thesis we find that F-score performs well when the number of features is large, and for small data the two methods using the gradient of the

y A stochastic process is a collection of "similar" random variables ordered over time.. variables ordered