Using Tag Count Information

(1)

Audio Tag Annotation and Retrieval by Audio Tag Annotation and Retrieval by

Using Tag Count Information

Hung Yi Lo

^{1 2}

Shou De Lin

²

and Hsin Min Wang

¹

Hung‐Yi Lo

^1,2

, Shou‐De Lin

²

, and Hsin‐Min Wang

¹

1

Academia Sinica,

²

National Taiwan University

MMM 2011

Jan. 06, 2011

(2)

Social Tagging to Music

Tag count : the number of different users who have

t t d th t ( l f t i di t hi h t t)

Tag count : the number of different users who have

t t d th t ( l f t i di t hi h t t)

annotated the tag (a larger font indicates a higher tag count) annotated the tag (a larger font indicates a higher tag count)

(3)

Audio Tag Annotation and Retrieval

Annotating audio clips with tags

Scores of Tag Predictors Annotate Audio

Using One

Scores of Tag Predictors

An Audio Clip

Using One Predictor for

Each Tag

Female R&B Guitar Metal Bass

Retrieving audio clips using a tag query

A Query: Rock

Rank Audio Clips Based on the

Ranking List for the Query

A Query: Rock

Scores of the Rock Predictor

High Relevance Low Relevance

(4)

Motivation (1/2)

Noisy Social Tags:

Social tags are assigned by people with different levels of musical Social tags are assigned by people with different levels of musical knowledge, they Inevitably contain noisy information

Social tags are unstructured, free‐form text that may contain misspellings

Tags may even assigned by malicious users

Tag count information should be considered in automatic tagging because the count reflects the confidence degree of tagging because the count reflects the confidence degree of the tag

Tags with high counts are more reliable and credible

(5)

Motivation (2/2)

In previous works, the tag count is transformed into 1 (with a tag) or 0 (without a tag) by using a threshold

tag) or 0 (without a tag), by using a threshold

Then train a binary classifier for each tag and make predictions about untagged audio clips

The tag count information are lost

a tag assigned twice is treated in the same way as a tag assigned hundreds of times

Question: how to use the tag count information for audio tag annotation and retrieval?

annotation and retrieval?

Answer: Cost sensitive Learning with the Tag Counts as Costs Answer: Cost‐sensitive Learning with the Tag Counts as Costs

(6)

Factors Affect the Tag Counts

1. Consistent Agreement

When a large number of users consider that an audio clip should be When a large number of users consider that an audio clip should be associated with a particular tag, the label information is more reliable Only a small portion of an audio clip is related to a certain tag then the tag count will be small

Tags with higher counts are major property of the audio

2 Tag Bias

2. Tag Bias

Some types are more often used. (such as “rock”)

Some others are unpopular or hard to recognize. (“Baroque” is less Some others are unpopular or hard to recognize. ( Baroque is less popular than “classic”, “drum machine” might easily be recognized as

“drum”)

S /Alb /A i P l i

3. Song/Album/Artist Popularity:

Popular songs, albums, and artists usually receive more tags, since people tend to tag music that they like or they are familiar with people tend to tag music that they like or they are familiar with

(7)

An Example: “Let it Be” and its Tags

(8)

An Example: “Let it Be” and its Tags with Higher Counts

More Reliable, Major Properties, and Important Tags

(9)

Cost‐sensitive Learning

Given some training example (x, y, c), where c is the misclassification cost of this example

misclassification cost of this example

Learn a classifier h which minimizes the expected cost on Learn a classifier h which minimizes the expected cost on unseen instances:

)]

) ( ( [ I h E

where I(‧) is an indicator function that yields 1 if its argument )]

) ( (

[cI h y

E x ≠

where I( ) is an indicator function that yields 1 if its argument is true

A more general setup of traditional classification problem

(10)

Cost‐sensitive Learning Applications

Business Marketing

Given some features of potential customers each customers has a Given some features of potential customers, each customers has a purchasing amount as misclassification cost

Decide which customers to mail a new catalog

• Cost‐sensitive learning with purchasing amount as cost

Goal: total profit obtained from some unseen testing customers

Audio Tag Annotation with Tag Counts as Costs

Given some acoustic features of an audio with its tags and tag countsg g Goal: minimize misclassified tag counts

If 100 users annotate a audio with “rock”, but the classifier causes a

f l h h d l

false negative, then it has to paid a cost equal to 100

Paid more attentions on the reliable, major, and Important tags

(11)

Cost‐insensitive Support Vector Machine

Optimization problem:

min 1 ⁺ ∑

i

T

w C

w ξ

2 1

ξ

, , b w

s. t.

，

ξ

, , b w

i i

T

i w x b

y ( + ) ≥ 1 − ξ

≥ 0 ξ i

Train a binary classifier for each tag and gather all instances annotated with this tag as positive example

annotated with this tag as positive example

(12)

Cost‐sensitive Support Vector Machine

Optimization problem:

min 1 ⁺ ∑

i

i i

T

w C c

w ξ

2 1

ξ

, , b w

c

i

s. t.

，

ξ

, , b w

i i

T

i w x b

y ( + ) ≥ 1 − ξ

≥ 0 ξ i

The cost c_i is assigned as the tag count for positive examples

h f l f l

The cost c_i is assigned uniformly for negative examples

Since people do not use negative tags like “non‐rock” and “no drum”

(13)

Cost‐sensitive AdaBoost

h ( )

W i ht d S l Mi i i W i ht d E

……

h_T(x)

Weighted Sample Minimize Weighted Error

∑ ^D ⁽ ⁱ ⁾ ^exp( ^y ^h ⁽ ^x ⁾⁾

………..

∑ ⁻

i

i t

i

h x

y i

D ( ) exp( ( ))

Weighted Sample h₃(x)

Cost sensitive Weight Update Rule:

h₂(x) Weighted Sample

Cost-sensitive Weight Update Rule:

i t i i t

t i c y h x

i D

D ( )exp( ( ))

)

( −α

= ^cⁱ

Training Sample h₁(x)

t

t i Z

D ₊₁( ) =

(14)

Cost‐sensitive AdaBoost

h ( ) W i ht d S l

……

h_T(x) Weighted Sample

………..

Final Classifier:

T Weighted Sample h₃(x)

) ( )

(

1

x h x

H

_t

T

t

∑

t

=

= α

h₂(x) Weighted Sample

Training Sample h₁(x)

(15)

Cost‐sensitive Evaluation Metrics: Which One is Better?

(16)

Cost‐sensitive Evaluation Metrics: Which One is Better?

(17)

New Evaluation Metrics: Cost‐sensitive Area Under ROC

ROC Curve: is a graphical plot of true positives rate vs. false positives rate as the discrimination threshold is varied

positives rate as the discrimination threshold is varied

Cost sensitive Area Under ROC Curve (AUC): replace the true Cost‐sensitive Area Under ROC Curve (AUC): replace the true positive rate by cost‐weighted true positive rate

Clip AUC: given a audio clip give correct tag higher scores Clip AUC: given a audio clip, give correct tag higher scores

for audio annotation

Tag AUC: given a tag, give correct tag higher scores Tag AUC: given a tag, give correct tag higher scores

(18)

New Evaluation Metrics: Cost‐sensitive F‐measure

Cost‐sensitive precision (CSP):

TP W i ht d

Cost sensiti e recall (CSR)

FP Weighted TP

Weighted

TP Weighted

+

Cost‐sensitive recall (CSR):

FN Weighted TP

Weighted

TP Weighted

+

Cost‐sensitive F‐measure

FN Weighted TP

Weighted +

l b h d

CSR CSP

CSR 2 CSP

+

× ×

We evaluate on both cost‐sensitive metrics and cost‐insensitive metrics

(19)

Experimental Setup

Compare to our winning method (cost‐insensitive) in MIREX 2009 audio tagging competition

MIREX refers to Music Information Retrieval Evaluation eXchange

Experiments basically follow the MIREX 2009 setup

Download audio data from MajorMiner, a web‐based music labeling game:

http://majorminer.org/

Use the same 45 tags and download all the audio clips that are associated g p with these tags

The resulting audio database i 2 472 li

metal instrumental horns piano guitar

ambient saxophone house loud bass

contains 2,472 clips

Select parameters using inner cross‐validation on training data

fast keyboard vocal noise british

solo electronica beat 80s dance

jazz drum machine strings pop r&b

cross validation on training data

Repeat cross‐validation twenty times to reduce variance

female distortion voice rap male

slow electronic quiet techno drum

funk acoustic rock organ soft

times to reduce variance

(20)

Results of Cost‐sensitive Metrics

Mean±St. D. Cost‐sensitive Tag AUC

Cost‐sensitive Clip AUC

Cost‐sensitive F‐measure AdaBoost 0.8055±0.0027 0.8892±0.0011 0.4099±0.0052 CS AdaBoost 0.8169±0.0023 0.8967±0.0005 0.4469±0.0081

Better Than

SVM 0.8112±0.0022 0.8957±0.0007 0.4354±0.0077

± ± ±

CS SVM 0.8215±0.0023 0.9005±0.0004 0.4593±0.0056 Ensemble 0 8334±0 0019 0 8979±0 0007 0 4606±0 0067 Ensemble 0.8334±0.0019 0.8979±0.0007 0.4606±0.0067 CS Ensemble 0 8356±0 0018 0 9032±0 0006 0 4808±0 0072 CS Ensemble 0.8356±0.0018 0.9032±0.0006 0.4808±0.0072

(21)

Results of Cost‐insensitive (Regular) Metrics

Mean±St. D. Tag AUC Clip AUC F‐measure AdaBoost 0.7941±0.0027 0.8773±0.0011 0.3018±0.0035 CS AdaBoost 0.8050±0.0023 0.8854±0.0005 0.3216±0.0049

Better Than

SVM 0.7992±0.0021 0.8837±0.0007 0.3226±0.0053

± ± ±

Tags with smaller counts may contain noisy labeling information Tags with smaller counts may contain noisy labeling information

CS SVM 0.8106±0.0021 0.8894±0.0004 0.3299±0.0037 Ensemble 0 8221±0 0018 0 8859±0 0007 0 3386±0 0046

Interesting! Since they are trained by cost-sensitive Interesting! Since they are trained by cost-sensitive

Tags with smaller counts may contain noisy labeling information CSL method can ignore the noisy information by giving a smaller

Ensemble 0.8221±0.0018 0.8859±0.0007 0.3386±0.0046 CS Ensemble 0.8247±0.0017 0.8921±0.0005 0.3442±0.0046

g y y

learning but evaluated with regular metrics

g y y

learning but evaluated with regular metrics

penalty (cost), and thereby train a more accurate classifier penalty (cost), and thereby train a more accurate classifier CS Ensemble 0.8247±0.0017 0.8921±0.0005 0.3442±0.0046

(22)

Conclusion

This paper has presented our novel idea for exploiting tag count information in audio tagging tasks

count information in audio tagging tasks

discussed several factors that affect the tag counts consistent agreement is the most important issue

Formulate the audio tag prediction task as a cost‐sensitive

classification problem to minimize the misclassified tag counts Present cost‐sensitive versions of several regular evaluation metrics

The proposed cost‐sensitive methods outperform their cost‐

insensitive counterparts in terms of not only the cost‐sensitive evaluation metrics but also the regular evaluation metrics

evaluation metrics but also the regular evaluation metrics

Since the tag count tell us whether we should trust this tag or not

(23)

Using Tag Count Information

Audio Tag Annotation and Retrieval by Audio Tag Annotation and Retrieval by