Automatic Genre Classification f M i C t t

(1)

Automatic Genre Classification f M i C t t

of Music Content

[A survey]

Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE

IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006

By Yi-Tang Wang

(2)

Outline Outline

• Introduction

F t t ti t h i

• Feature extraction techniques

• Genre classification paradigms p g

• Classification results

• Future directions & Conclusion

(3)

I d i

Introduction

• EMD (electronic music distribution)

R t ti f l hi

– Restoration of analog archives – New content

– music catalogues become huge

• What do you want to listen ?

– 1 million tracks online

– Efficient ways to browse & organize

(4)

I d i ( ) Introduction (cont.)

• Music Genres

C t i t h t i i il iti

– Categories to characterize similarities – Boundaries are fuzzy

• Automatic Classification

Finding a taxonomy – Finding a taxonomy

– Hierarchical set of categories – Nontrivial task

(5)

C i i l i

Critical issues

• Artists, Albums, or Titles

O t (?)

– One song to one genre(?)

– Albums - heterogeneous material – Artists - several albums

– Same Titles?Same Titles?

• Nonagreement on Taxonomies

– Allmusic, Amazon, Mp3

[2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content- Based Multimedia Information Access (RIAO) Paris France 2000

Based Multimedia Information Access (RIAO), Paris, France, 2000

(6)

C i i l i ( ) Critical issues (cont.)

• ILL-Defined Genre Labels

V i d it i ( hi ll ti l t ) – Varied criteria (geographically, timely, etc) – Dependant on cultural

• Scalability of genre taxonomies

New genres appear frequently – New genres appear frequently – Merging or splitting

– Automatic system

(7)

Feature extraction techniques q

• High-level model g e e ode

– Event-like format (MIDI)

Symbolic format (MusicXML) – Symbolic format (MusicXML) – Rarely availiable

• Low-level

– Audio samplesAudio samples

– Low level and low density of info

• Do feature extraction

– Timbre, Melody, Harmony, Rhythm, y, y, y

(8)

Ti b

Timbre

• Same pitch and loudness but sound different

different

• Features to characterize timbre

– Temporal features Energy features – Energy features

– Spectral shape features – Perceptual features

– Some have been normalized in MPEG-7

(9)

Ti b ( )

Timbre (cont.)

(10)

Ti b ( ) Timbre (cont.)

• Transformations

– new feature or increase dimensionalitynew feature or increase dimensionality – Suggested transforming into logarithmic

decibel scale

• Texture window

– Larger window

– Reduce computation

– Increase classification accuracy – 1s

– Variant size and positions

(11)

Ti b ( ) Timbre (cont.)

• Texture model

model of features over texture window:

– model of features over texture window:

• 1) simple modeling with low-order statistics

• 2) modeling with autoregressive model

• 3) modeling with distribution estimation

algorithms (for example EM estimation of a algorithms (for example, EM estimation of a GMM of frames)

(12)

M l d & H

Melody & Harmony

• Melody

i f it h d t

– succession of pitched events – Horizontal element

• Harmony

pitch simultaneity chords – pitch simultaneity, chords – Vertical element

(13)

M l d & H ( ) Melody & Harmony (cont.)

• Pitch function

Ch t i i it h di t ib ti – Characterizing pitch distribution

– Amplitude, position of main peak, … – Unfolded

• Contains pitch content and info of its rangeCo a s p c co e a d o o s a ge

– Folded

• Mapped to a single octave

• Harmonic content

(14)

Rh h

Rhythm

• No precise definition

Generically all of the temporal aspects

• Generically, all of the temporal aspects

• Periodicity function

– Low level approach as pitch function

• 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm)

• 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more 6 s (corresponding to the length of one or more measure bar)

– Gouyon et al get MFCCs-like descriptorsGouyon et al. get MFCCs like descriptors

(15)

E i f

Extracting from segments

• Small segment may contain sufficient information

information

• Reduced required computation q p

• Typically 30s segment

and 30s after beginning – and 30s after beginning

• Artist classification

– Voice is easier to identify than music only

(16)

L l l i

Local conclusion

• High level descriptors from

polyphonic audio signal is not yet polyphonic audio signal is not yet state of the art

• Focus on timbre modeling

• Timbre may contain sufficient info

– 250ms : 53% , 3s : 72%

– Among 10 genres

(17)

L l l i ( ) Local conclusion (cont.)

• Another point of view (pessimistic)

Ti b i il it & 20 000 – Timbre similarity measure & 20,000

titles distributed over 18 genres – Little correlation

– May not scalabley

– Take cultrual features into account

(18)

G l ifi i

Genre classification

• Expert systems

• Unsupervised approach

– clusteringg

• Supervised approach

M hi l i l ith

– Machine learning algorithms

(19)

E

Expert systems

• A knowledge based system made up of a set of rules

of a set of rules

• No model based on it so far

• Expensive to implement and maintain

M i ld t d i t ti

• May yield unexpected interactions

(20)

E ( ) Expert systems (cont.)

P h t d C l ’ k

• Pachet and Cazaly’s work

– State differences with language based, e.g.

i t t ti instrumentation

(21)

U i d h Unsupervised approach

• Clustering with similarity measures

• Similarity measures y

– If time invariant

• Euclidean distance or cosine distanceEuclidean distance or cosine distance

– Otherwise

• Build statistical model (Gaussian or GMMs)Build statistical model (Gaussian or GMMs)

– Kullback-Leibler divergence, relative entropy – Sampling, Earth’s mover distance,

asymptotic likelihood approximation

• Shao et al. use HMMs

(22)

U i d h Unsupervised approach

• Clustering algorithms

K means – K-means

– Shao et al.’s work

l ti hi hi l l t i

• agglomerative hierarchical clustering

– SOM (self-organizing map)

A tifi i l l t k

• Artificial neural network

• High dim onto lower dim

• GHSOM (growing hierarchical SOM)

– Rauber et al.

(23)

S i d h Supervised approach

• A taxonomy of genres is given

• VS. Expert SystemVS. Expert System

– No rules (or description to genre)

• Supervised machine learning algop g g

– KNN (K-Nearest Neighbor)

– GMMs (Gaussian Mixture Models) HMM (Hidd M k M d l )

– HMM (Hidden Markov Models)

– LDA (Linear Discriminant Analysis) – SVMs (Support Vector Machines)SVMs (Support Vector Machines) – ANNs (Artificial Neural Networks)

(24)

Cl ifi i l

Classification results

• MIREX genre classification contest

1 005 / 510 songs over ten genres – 1,005 / 510 songs over ten genres – 940 / 447 songs over six genres

(25)

Classification results

(26)

F di i

Future directions

• Classification into perceptual categories

– Moods emotions – Moods, emotions

• Novelty Detection

Ne or nkno n data (not belong to an – New or unknown data (not belong to any

class)

• Classification with multiple labels

– Probably closer to human experience

F t i t f lk i

• From taxonomies to folksonomies

– Does the taxonomy fit to users

(27)

C l i

Conclusion

• Definitions of music genres are convoluted

convoluted

• Features → classification → result f

• Research is evolving from purely objective machine calculations to

h i

techniques

• Machine learning plays a fundamental role in classification domains

(28)