Automatic Genre Classification f M i C t t
of Music Content
[A survey]
Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE
IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006
By Yi-Tang Wang
Outline Outline
• Introduction
F t t ti t h i
• Feature extraction techniques
• Genre classification paradigms p g
• Classification results
• Future directions & Conclusion
• Future directions & Conclusion
I d i
Introduction
• EMD (electronic music distribution)
R t ti f l hi
– Restoration of analog archives – New content
– music catalogues become huge
• What do you want to listen ?
• What do you want to listen ?
– 1 million tracks online
– Efficient ways to browse & organize
I d i ( ) Introduction (cont.)
• Music Genres
C t i t h t i i il iti
– Categories to characterize similarities – Boundaries are fuzzy
• Automatic Classification
Finding a taxonomy – Finding a taxonomy
– Hierarchical set of categories – Nontrivial task
C i i l i
Critical issues
• Artists, Albums, or Titles
O t (?)
– One song to one genre(?)
– Albums - heterogeneous material – Artists - several albums
– Same Titles?Same Titles?
• Nonagreement on Taxonomies
– Allmusic, Amazon, Mp3
[2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content- Based Multimedia Information Access (RIAO) Paris France 2000
Based Multimedia Information Access (RIAO), Paris, France, 2000
C i i l i ( ) Critical issues (cont.)
• ILL-Defined Genre Labels
V i d it i ( hi ll ti l t ) – Varied criteria (geographically, timely, etc) – Dependant on cultural
• Scalability of genre taxonomies
New genres appear frequently – New genres appear frequently – Merging or splitting
– Automatic system
Feature extraction techniques q
• High-level model g e e ode
– Event-like format (MIDI)
Symbolic format (MusicXML) – Symbolic format (MusicXML) – Rarely availiable
• Low-level
– Audio samplesAudio samples
– Low level and low density of info
• Do feature extraction
– Timbre, Melody, Harmony, Rhythm, y, y, y
Ti b
Timbre
• Same pitch and loudness but sound different
different
• Features to characterize timbre
– Temporal features Energy features – Energy features
– Spectral shape features – Perceptual features
– Some have been normalized in MPEG-7
Ti b ( )
Timbre (cont.)
Ti b ( ) Timbre (cont.)
• Transformations
– new feature or increase dimensionalitynew feature or increase dimensionality – Suggested transforming into logarithmic
decibel scale
• Texture window
– Larger window
– Reduce computation
– Increase classification accuracy – 1s
– Variant size and positions
Ti b ( ) Timbre (cont.)
• Texture model
model of features over texture window:
– model of features over texture window:
• 1) simple modeling with low-order statistics
• 2) modeling with autoregressive model
• 2) modeling with autoregressive model
• 3) modeling with distribution estimation
algorithms (for example EM estimation of a algorithms (for example, EM estimation of a GMM of frames)
M l d & H
Melody & Harmony
• Melody
i f it h d t
– succession of pitched events – Horizontal element
• Harmony
pitch simultaneity chords – pitch simultaneity, chords – Vertical element
M l d & H ( ) Melody & Harmony (cont.)
• Pitch function
Ch t i i it h di t ib ti – Characterizing pitch distribution
– Amplitude, position of main peak, … – Unfolded
• Contains pitch content and info of its rangeCo a s p c co e a d o o s a ge
– Folded
• Mapped to a single octave
• Mapped to a single octave
• Harmonic content
Rh h
Rhythm
• No precise definition
Generically all of the temporal aspects
• Generically, all of the temporal aspects
• Periodicity function
– Low level approach as pitch function
• 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm)
• 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more 6 s (corresponding to the length of one or more measure bar)
– Gouyon et al get MFCCs-like descriptorsGouyon et al. get MFCCs like descriptors
E i f
Extracting from segments
• Small segment may contain sufficient information
information
• Reduced required computation q p
• Typically 30s segment
and 30s after beginning – and 30s after beginning
• Artist classification
– Voice is easier to identify than music only
L l l i
Local conclusion
• High level descriptors from
polyphonic audio signal is not yet polyphonic audio signal is not yet state of the art
• Focus on timbre modeling
• Timbre may contain sufficient info
• Timbre may contain sufficient info
– 250ms : 53% , 3s : 72%
– Among 10 genres
L l l i ( ) Local conclusion (cont.)
• Another point of view (pessimistic)
Ti b i il it & 20 000 – Timbre similarity measure & 20,000
titles distributed over 18 genres – Little correlation
– May not scalabley
– Take cultrual features into account
G l ifi i
Genre classification
• Expert systems
• Unsupervised approach
– clusteringg
• Supervised approach
M hi l i l ith
– Machine learning algorithms
E
Expert systems
• A knowledge based system made up of a set of rules
of a set of rules
• No model based on it so far
• Expensive to implement and maintain
M i ld t d i t ti
• May yield unexpected interactions
E ( ) Expert systems (cont.)
P h t d C l ’ k
• Pachet and Cazaly’s work
– State differences with language based, e.g.
i t t ti instrumentation
U i d h Unsupervised approach
• Clustering with similarity measures
• Similarity measures y
– If time invariant
• Euclidean distance or cosine distanceEuclidean distance or cosine distance
– Otherwise
• Build statistical model (Gaussian or GMMs)Build statistical model (Gaussian or GMMs)
– Kullback-Leibler divergence, relative entropy – Sampling, Earth’s mover distance,
asymptotic likelihood approximation
• Shao et al. use HMMs
U i d h Unsupervised approach
• Clustering algorithms
K means – K-means
– Shao et al.’s work
l ti hi hi l l t i
• agglomerative hierarchical clustering
– SOM (self-organizing map)
A tifi i l l t k
• Artificial neural network
• High dim onto lower dim
• GHSOM (growing hierarchical SOM)
• GHSOM (growing hierarchical SOM)
– Rauber et al.
S i d h Supervised approach
• A taxonomy of genres is given
• VS. Expert SystemVS. Expert System
– No rules (or description to genre)
• Supervised machine learning algop g g
– KNN (K-Nearest Neighbor)
– GMMs (Gaussian Mixture Models) HMM (Hidd M k M d l )
– HMM (Hidden Markov Models)
– LDA (Linear Discriminant Analysis) – SVMs (Support Vector Machines)SVMs (Support Vector Machines) – ANNs (Artificial Neural Networks)
Cl ifi i l
Classification results
• MIREX genre classification contest
1 005 / 510 songs over ten genres – 1,005 / 510 songs over ten genres – 940 / 447 songs over six genres
Classification results
Classification results
F di i
Future directions
• Classification into perceptual categories
– Moods emotions – Moods, emotions
• Novelty Detection
Ne or nkno n data (not belong to an – New or unknown data (not belong to any
class)
• Classification with multiple labels
• Classification with multiple labels
– Probably closer to human experience
F t i t f lk i
• From taxonomies to folksonomies
– Does the taxonomy fit to users
C l i
Conclusion
• Definitions of music genres are convoluted
convoluted
• Features → classification → result f
• Research is evolving from purely objective machine calculations to
h i
techniques
• Machine learning plays a fundamental role in classification domains