音樂曲風的探勘技術與應用

(1)

行政院國家科學委員會專題研究計畫成果報告

音樂曲風的探勘技術與應用

計畫類別：個別型計畫

計畫編號： NSC92-2213-E-004-006-

執行期間： 92 年 08 月 01 日至 93 年 07 月 31 日

執行單位：國立政治大學資訊科學系

計畫主持人：沈錳坤

報告類型：精簡報告

報告附件：出席國際會議研究心得報告及發表論文

處理方式：本計畫涉及專利或其他智慧財產權，2 年後可公開查詢

中華民國 93 年 12 月 10 日

(2)

行政院國家科學委員會補助專題研究計畫成果報告

音樂曲風的探勘技術與應用

計畫類別：⌧ 個別型計畫 □ 整合型計畫

計畫編號：

NSC92-2213-E-004-006

執行期間：2003 年 8 月 1 日至 2004 年 7 月 31 日

計畫主持人：沈錳坤

計畫參與人員：何旻璟、周大鈞、廖忠訓、郭芳菲

成果報告類型(依經費核定清單規定繳交)：⌧精簡報告 □完整報告

本成果報告包括以下應繳交之附件：

□赴國外出差或研習心得報告一份

□赴大陸地區出差或研習心得報告一份

⌧出席國際學術會議心得報告及發表之論文各一份

□國際合作研究計畫國外研究報告書一份

處理方式：除產學合作研究計畫、提升產業技術及人才培育研究計畫、列管

計畫及下列情形者外，得立即公開查詢

□涉及專利或其他智慧財產權，□一年⌧二年後可公開查詢

執行單位：政治大學資訊科學系

中華民國

93 年 10 月 31 日

(3)

行政院國家科學委員會專題研究計畫成果報告

音樂曲風的探勘技術與應用

計畫編號：NSC92-2213-E-004-006

執行期限：92 年 8 月 1 日至 93 年 7 月 31 日

主持人：沈錳坤政治大學資訊科學系

ABSTRACT

With the growth of digital music, content-based music retrieval (CBMR) has attracted increasingly attention. For most CBMR systems, the task is to return music objects similar to query in syntactic properties such as pitch and interval contour sequence. These approaches provide users the capability to look for music that has been heard. However, sometimes, listeners are looking, not for music they have been known, but for music that is new to them. Moreover, people sometimes want to retrieve music that “feels like” another music object or a music style. To the best of our knowledge, no published work investigates the content-based music style retrieval. This project investigates an approach for CBMR by melody style. We proposed four types of query specification for melody style query. The output of the melody style query is a music list ranked by the degree of relevance, in terms of music style, to the query. We developed the melody style mining algorithm to obtain the melody style classification rules. The style ranking is determined by the style classification rules. The experiment showed the proposed approach provides a satisfactory way for query by melody style.

Keywords

Content-Based Music Retrieval, Music Style Mining, Query by Melody Style.

ABSTRACT in Chinese

隨著數位音樂的發展，音樂內容擷取技術的相關研究逐漸成熟。現有大部分的音樂內容擷取系統提供根據音樂的音調及音長的查詢來擷取音樂。這些系統主要提供使用者查詢曾經聽過的音樂的功能。但是，有時候使用者想擷取的並非已經聽過的音樂，而是新的音樂。此外，使用者有時候想查詢感覺類似或樂風相似的音樂。就我們所知，目前尚未有曲風擷取的相關研究。本計畫主要在研究以曲風擷取音樂的相關技術。我們提出了四種曲風查詢的表示法。曲風查詢技術會根據與查詢在風格上的相似程度回傳給使用者。我們發展了音樂風格探勘的演算法以產生曲風分類的規則。而風格的排名就根據此分類規則來決定。實驗結果縣市我們所發展的技術提供使用者滿意的曲風查詢。

關鍵詞

音樂內容擷取、音樂曲風探勘、音樂曲風查詢.

1. INTRODUCTION

Music information retrieval (MIR) has become an increasingly important field of research in recent years. In traditional MIR systems, the query is based on text-based metadata. The content-based music retrieval (CBMR) allows user to query by music content instead of metadata.

Much work has been done on the development of CBMR. Query by humming or singing[7][8][11][13][14][18] are common approaches for retrieval from acoustic input. The queries were melodies hummed or sung by the user, and were transcribed into symbolic MIDI format. Query by tapping is another query method that takes the beat information for retrieval[10]. Recently, several researchers have explored polyphonic content-based music retrieval[6][15][16]. The polyphonic music retrieval techniques are more suitable than monophonic music retrieval for retrieving performance data and query by polyphonic input.

Main goals of the previous CBMR researches are to return the music objects that are similar to the query in pitch, interval contour or rhythm. Moreover, these CBMR approaches provide users the capability to look for music that they have been heard. However, sometimes, listeners are looking, not for music they have been known, but for music that is new to them. Moreover, people sometimes want to retrieve music that “feels like” another music object or a style.

To look for new music that we haven’t listened, the approaches of query by humming, singing, or tapping is helpless. It is necessary to develop the technique for query music by melody style. Music style implies the human perception of music, which is the feature that people often utilize to classify music. Though text-based metadata, which records the text description of music style,

(4)

can be utilized for melody style query. The text-based metadata should be annotated manually. Furthermore, sometimes user may wish to query mixed style. For example, the users may want to retrieve music mainly sounds like Chopin and a little Bach. The returned music objects should be more similar to Chopin style but also have a little feeling of Bach style.

The purpose of our research was to investigate the technique for content-based music retrieval by melody style. There are several issues about our work:

(1) To develop the methods for the specification of query style. (2) To determine the appropriate feature for music style and its

representation.

(3) To discover the description of melody style.

(4) To measure the degree of relevance between the music object and the query style.

For the first issue, we present four types of query specification for query style. For the second issue, the basic elements of music consist of melody, harmony, rhythm, and so on. Above all, melody is the most memorable aspect of music. Accordingly, we concentrated on the melody style and utilized chord as the melody feature for retrieval by music style. For the third issue, we develop an algorithm to discover the common characteristics from the music of the same style and find the discriminating patterns between the music of various styles. The melody styles are described by the discovered set of style rules. For the last issue, the discovered set of style rules is used to rank the music objects. Our work is useful in many aspects of applications. For example, to help physiotherapist for seeking music that will motivate a patient, to help film director for seeking music conveying a certain mood[9], to help restaurateur for seeking music that targets a certain clientele. Query by melody style provides users the capability to find music with style similar to what users like. This report is organized as follows. Section 2 give a brief review of previous work related to content-based music retrieval and music style discovery. In section 3, we present the music style retrieval model. Section 4 describes our proposed methodology. The experiment and result of performance analysis is described in Section 5. Section 6 concludes the report.

2. RELATED WORK

Much research has been done on the development of the content-based music retrieval technology. Query by humming or singing is a common approach for query by acoustic input[7][8][11][13][14][18]. Ghias et al.[7] introduced a query by humming system. The query input was converted into a melodic contour and the contour was matched against the music in the library by approximate string matching. McNab et al.[14] presented a CBMR system that accepted singing or humming queries. They investigated people’s singing accuracy and suggested that the music transcription should adapt user’s tuning. In Tseng’s research[18], key melody extraction is used for query suggestion and effective retrieval, where the key melodies are representative fragments of music. To allow queries in any key levels and match approximately, the pitch profile encoding and n-note indexing techniques were used respectively. Kline et al.[11] developed approximate matching algorithms make better use of

both pitch and duration information, which improved results when the users have relatively little music experience or ability. Lu et al. [13] proposed a new melody representation and hierarchical matching method for query by humming system. The melody representation is a combination of pitch contour, pitch interval and the duration. Jang et al.[10] presented a new query paradigm, which allows user query by tapping. Melodies are transformed into the time vectors that contain the beat information. Hu et al. [8] compared the performance of several retrieval algorithms. The types of query include humming, singing and whistling. In [4], Chen et al. investigated the music content representation and retrieval techniques. They proposed music segment as a music content representation, which consists of both melody and rhythm information.

Several researchers have explored polyphonic content-based music retrieval[6][15][16]. Doraisamy et al. [6] proposed the polyphonic music indexing using pitch and rhythm information. In [16], a probabilistic model is proposed for retrieving performances that include large number of variations in performing a melody and accompaniment. Pickens et al. [15] proposed harmonic description which contains the information from all chords, and combined with Markov method to model music document and query.

Though the aim of this work is melody style retrieval rather than melody style classification, several works on music genre classification that are related to our work are described as follows. The work developed in MIT Media Lab.[1] employed hidden Markov model to model and classify the melodies, which were represented as a sequence of absolute pitches, absolute pitches with duration, intervals and contours. Another research in CMU used the naïve classifier, linear and neural network respectively to recognize music style for interactive performance systems [5]. Thirteen statistical features derived from MIDI are identified for learning of music style. In [19], the music genre classification algorithms aimed at audio signals were explored. They proposed features for representing the musical surface and rhythmic structure and classified by statistical pattern recognition classifier.

3. MUSIC STYLE RETRIEVAL MODEL

Before the description of the proposed approaches of music style retrieval, we first formalize how the music style is modeled.

Definition 1 A music object O is represented as O = O(M, F, R)

where

M is the raw music data, for example, an MIDI file. F = {fi} is a set of low level music features associated with

the music object.

R = {rij} is a set of representations for a given feature fi.

Style usually refers to collections of data. Style is a concept description that generates descriptions for characterization and discrimination. Characterization refers to the common patterns of a given collection while discrimination denotes the comparison among collections. Therefore, the music style involves both the characterization of music patterns for each collection of music object and the discrimination of music features among collections of music objects.

(5)

Definition 2 The music style T is modeled as T = D(C(G(O)))

where G is the taxonomy of the music objects, C is the characterization function, D is the discrimination function. For example, the taxonomy of music objects may be classified according to the composer. For the folk song, the taxonomy may be classified according to the peoples. For the Western music, the taxonomy of music objects may be classified according to the eras of history of Western music, namely, the Baroque, the Classical, the Romantic and the Modern era.

For the taxonomy of Western music, the music shares aspects of style with other pieces written at roughly the same time. In the Baroque era, melodies are ornate and often make use of dramatic leaps. Repetition and simple binary and ternary forms provide the basis for musical structure. Rhythms are often derived from dance rhythms. Harmony is based on major/minor tonality, and dissonances become more common. The music style of Classical era is reflected in simple texture (homophonic textures became the standard while contrapuntal texture was used sparingly), simple melodies (melodies usually fall into even phrases, and often were organized into symmetrical "question and answer" structures) and simple, rational forms (simple two- and three-part forms became the essential building blocks of all Classical forms, especially the Sonata Allegro form). In the Romantic era, the melodies are longer, more dramatic and emotional. Moreover, Tempos are more extreme. Harmonies are fuller, more dissonant. In the Modern era, melodies can be long and abstract or reduced to small gestures. Form can be controlled to an almost infinite degree, or it may be the result of improvisation and chance.

Definition 3 The music style retrieval is modeled as S = S(T, O)

where S is the ranking function which measures the similarity between a given music object O and a specific music style T.

4. METHODOLOGY

4.1 Query Specification

The style query can be described in many ways. In our work, we proposed four types of query specification for music style query as follows.

(1) Query-by-music-group (QBMG): The user specifies the query style by selecting a group of music from the example music. The set of example music are randomly generated by the system. Therefore, the common style of the selected music group is what the user wish to retrieve. The constitution of these query examples can be regarded as a new, user-defined music style.

(2) Query-by-music-example (QBME): This is similar to query-by-music-group with the exception that only one example is selected. In this way, the user can retrieve the music with style similar to the query example.

(3) Query-by-taxonomic-style (QBTS): An example is to retrieve the music with Baroque style.

(4) Query-by-taxonomic-style-combinations (QBTSC): For instance, to retrieve the music with both Baroque and Romantic styles. In this way, the combination of these styles can be viewed as a new style.

To process these four types of query, Figure 1 shows the flowchart of our approach. The kernel is the feature extraction and feature representation module. For each MIDI file in the music digital library, after the offline processing of the feature extraction and representation, the corresponding representations are stored in the library. The feature extraction and representation modules firstly process each of the four types of query issued by the user. For the query of type QBME, the representation of the extracted feature is then evaluated against each of the corresponding representation of MIDI files in the library and the ranking list is generated. For the query of type QBMG, QBTSC, or QBTS, the style patterns generated from the query are evaluated against each of the corresponding representation of MIDI files in the library and the ranking list is generated. The style patterns are generated by characterization and discrimination from the music set specified in the query. For QBMG, the music set is the selected group of music. For QBTS and QBTSC, the music set is the music corresponding to the specific taxonomy of music.

4.2 Feature Extraction

Music is usually polyphonic, in which two or more notes sound simultaneously. Since we focus on the melody style, it is necessary to extract melodies from MIDI files. We have proposed the melody extraction method for this task [12]. This method considers the information of instrument, volume and highest pitch of MIDI. Then, the proposed chord assignment algorithm extracts chords from the melody [12]. The chord assignment algorithm is a heuristic method based on harmony and music theory. Sixty common chords are chosen as the candidates. For each melody, the algorithm first decides length of the sampling unit used for music segmentation. The chord candidates are scored for each sampling unit, and the highest one is assigned to the sampling unit. The algorithm may assign a set of chords (chord-set) to a sampling unit while chord with the highest score is not unique. Output of the chord assignment algorithm is a sequence of chord-sets, and the chords are represented in Roman numerals such as Ⅰ, Ⅲmaj, Ⅵm7 for key invariant. For more detail explanation of the chord assignment algorithm, please refer to [12].

Figure 1. Flowchart of proposed approach.

4.3 Feature Representation

After feature extraction, there are three different representations for the chord feature as follows:

(6)

(1) Set of chord-sets: music is represented as a set of items, where each item is a chord-set.

(2) Set of bigrams: music is represented as a set of bi-grams of chord-sets. A bi-gram is an adjacent pair of chord-sets extracted from a sequence of chord-sets. Therefore, a melody with n units consists of (n-1) bi-grams.

(3) Sequence of chord-sets: music is represented as a sequence of chord-sets. In this way, a melody with n units is actually an n-gram.

4.4 Query Processing

4.4.1 Query-By-Music-Group

As stated in Section 3, the music style involves both the characterization and discrimination of music features. Therefore, to process this type of query, there are three major steps. (1) The first step is to discover the common characteristics of

the selected group and the unselected group of music examples respectively.

(2) The second step finds the discrimination between the characteristics of these two groups. The result of this step is a two-way classifier.

(3) At last, a ranking function is employed to measure of degree of relevance between a music object and the query style based on the two-way classifier. Given the ranking function, all the music objects in the library are evaluated and a ranking list is produced and output to the user.

Characterization

The first step takes the features of the selected group and the unselected group as input respectively. Frequent pattern mining technique is employed to derive the common properties and the interesting hidden relationships between chords and melody styles from music of the same group. Two frequent pattern mining methods are utilized with respect to the representations of the melody style feature.

If the melody feature is represented as the set of chord-set or the set of bi-grams, the concept of frequent itemset in the association rule mining is utilized [2]. In the terminology of association rule mining, support of an item-set is defined as the percentage of transactions which contain this item-set. Given the minimum support specified by the user, an item-set is frequent if its support is larger than the minimum support.

In our approach, the transaction database for the selected group (or the unselected group) consists of the features of music belonging to the selected group (or the unselected group). Each transaction corresponds to the set of chord-sets of a specific music. In other words, a chord-set is corresponding to an item in the terminology of association rule mining. The frequent item-set denotes the set of chord-sets which are accompanied together with the melodies of most music in the selected group. For example, assume that there is the frequent item-set {{I}, {V, Ⅵm7}, {V}} for the lyric-style music, this represents that the melodies of a great part of lyric-style music consist of chord-set {I}, {V, VIm7} and {V} together. The same concept is applied for representation of set of bigrams. That is, a bigram of chord-sets corresponds to an item.

If the feature of melody style is represented as the sequence of chord-sets, to find the common characteristics of music of the same group, we propose a new type of pattern – frequent consecutive sequential pattern. The concept of frequent consecutive sequential pattern is modified from that of sequential pattern [2] in sequence data mining techniques. The consecutive sequential pattern is continuous, which differs from the original sequential pattern. A consecutive sequential pattern is said to be contained in a transaction if the pattern is a consecutive subsequence of this transaction. For example, the consecutive sequential pattern ({V, Ⅵm7}, {V}, {I, III, Vim7}) is contained in the transaction ({I}, {V, Ⅵm7}, {V}, {I, III, Vim7}) while ({V, Ⅵm7}, {I, III, Vim7}) is not. The support of a consecutive sequential pattern is defined as the percentage of transactions which contain it. Given the minimum support specified by the user, a consecutive sequential pattern is frequent if its support is larger than the minimum support. We modified the join step of the Apriori-based sequential mining algorithm to find frequent consecutive sequential pattern.

Discrimination

The frequent patterns indicate the common properties of the music objects belong to the same style. However, it is not enough to discriminate one style from others only by the frequent patterns. In generally, people recognize a music style not only by the characteristics of itself, but also by the differences between this style and others. Discrimination tries to find the discrimination among characteristics of music group. The result of the discrimination for taxonomy of music groups is a melody style pattern set which consists of melody style rules.

Definition 4 The melody style rule r is of the form l⇒y, where

y is a music group corresponding to a melody style and l is the characteristics of y which may be a frequent set of chord-sets, a frequent set of bigrams or a frequent consecutive sequential pattern.

Definition 5 The melody style pattern set is an ordered set of

melody style rules. Format of the melody style pattern set is >

<r1,r2,...,rn,default_class , where each melody style rule ri is

ranked by the confidence. Given the set of music and the taxonomy, the confidence of a rule ri is the percentage of music

objects satisfying the characteristics of ri belong to the music

group yi.

In our work, if the type of characteristics is the consecutive sequential pattern, then the feature f of a music object satisfying the characteristics l if f is contained in l. If the characteristics is the set of chord-sets or the set of bigrams, then the feature f of a music object satisfying the characteristics l if f is a subset of l. The melody style pattern set may be regarded as a classifier which is learned from the given taxonomy of music objects and corresponding characteristics. It can be used to classify music of unknown group. To classify the music object, the first rule that satisfies the music is used to classify it. If there are no rules satisfying, the music is classified according to the default_class. Figure 2 shows an example of melody style pattern set.

We proposed a melody style classification algorithm in [12], which is based on frequent patterns to differentiate the melody

(7)

styles. In this work, we employ the classification algorithm to generate classification rules and regard the rules as the melody style pattern set. The characteristics of our proposed melody style rules consist of frequent set of chord-sets, frequent set of bigrams and frequent consecutive sequential pattern. Moreover, some music styles may contain more rules of lower support while some styles have fewer rules of higher support. In other words, the appropriate values of minimum support differ from each other. The rules built by our classification algorithm consist of multiple types of characteristics and the minimum support of each rule may differ. Our algorithm uses five-fold cross-validation to determine the appropriate minimum supports. For more detail of the music style classification algorithm, refer to [12].

Ranking Function

After the generation of the melody style pattern set respective to the style of query music group, the similarity between the music in digital library and the query style is evaluated as the way of classifying the music data. As stated in the previous subsection, the melody style rule in the melody style pattern set is ordered according to the confidence. The confidence implies the degree of membership where the characteristic of the rule belongs to the style. Hence ranking of the music data is decided by the confidence of the first rule that satisfies the music data.

If the first matched rule for music in library does not belong to the style of the selected group, the music in the library is not a qualified answer. Otherwise, the confidence this rule is regarded as the ranking measure for this music in library. Take the example of Figure 2, if the sequence of chord-sets of a music object in library is {Ⅱ7 ⅤⅢⅡⅤⅠⅦ}, it matches the third pattern style rule. The ranking score of this music object respect to the query group is 0.6.

Set: {Ⅰ, Ⅲ, Ⅳ7}→ style 1, conf = 0.9

Bigram: {(Ⅴ Ⅰ), (Ⅴ7 Ⅶ)} → style 2, conf = 0.75 Sequence: (Ⅴ Ⅲ Ⅱ Ⅴ Ⅰ) → style 1, conf = 0.6 Bigram: {(Ⅰ Ⅱ), (Ⅳ7 Ⅴ), (Ⅱ Ⅵ)} → style 1, conf = 0.57 Default_class: style 2, conf=0.55

Figure 2. An example of melody style pattern set.

4.4.2 Query-By-Music-Example (QBME)

Query-by-music-example allows users to query similar style music by an example of music rather than by a group of music. For QBME, we do the style matching for the music in the library and query music directly. The style matching is measured based on the similarity of melody feature between the library and the query music. The result of QBME is a list of music ranked by the similarity.

As stated in section 4.2, the extracted chords are used as the feature of melody. Consequently, the melody style matching process becomes the similarity measurement of chord features. We first give the definitions for the feature representation of chord-sets.

Definition 6 Given two chord-sets u and v, the similarity s(u, v),

between them is defined as

v u v u v u s × ∩ = ) , ( , where |u| is the cardinality of the set u, ∩ is the set intersection operation.

Definition 7 Given two sets of chord-set U={u1, u2,…, uM} and

V= ( v1, v2,…, vN), the similarity constraint δ, and the similarity

s(ui, vj), ∀ i, 1 ≤ i ≤ M, ∀ j, 1 ≤ j ≤ N, a mapping between them is

a one-to-one relation Rset from {1, 2, …, M} to {1,2, …, N}, such

that for each order pair (i, j) in Rset, s(ui, vj) ≥ δ.

Definition 8 Given two sets of chord-set U={u1, u2,…, uM} and

V= ( v1, v2,…, vN), the similarity constraint δ, the similarity

between U and Vfor a mapping Rset, S’_Rset(U, V, δ), is defined as

) , ( = ) , , ( ' (, ) N M j v i u s V U S i j Rset Rset _× ∑ ∈ ∀ δ

Definition 9 Given two sets of chord-set U and V and the

similarity constraint δ, the similarity between U and V Sset(U, V, δ)

is defined as )} , ( ' { max ) , (U,Vδ S U,V δ S _set set R R set = _∀

Example 1 Consider the following two sets of chord-set:

{

1, 2, 3, 4

} { }{

=

{

Ι,V, IVm

}{ }{

, I,IV,II,IV,VIm

}

= u u u u U and

{

₁, ₂, ₃

} { }{

=

{

I,IV, II,V,IV_m

}

,

{

V_maj,IV_m,II

}

= v v v V .

Given the similarity constraint δ = 0.4, the pairs of chord-set whose similarities are larger than or equal to δ consist of (u1, v1),

(u1, v2), (u2, v2), (u2, v3), (u3, v1) and (u4, v1), and their similarities

are 12, 1 6 , 1 3 , 1 3 , 1 and 1 6 respectively. )

, (U,Vδ

S_set = 0.986.

To find the similarity defined in Definition 9, we employed the Kuhn-Munkres algorithm (also known as Hungarian method). Given a weighted complete bipartite graph G=(U∪V, U×V), the Kuhn-Munkres algorithm finds a matching from U to V with maximum weight. Such a matching from U to V is called an optimal matching.

For the representation of bigram set, the definition of similarity is similar to those of the set representation. The only exception lies in the similarity measure between two bigrams.

Definition 10 Given two bigrams x and y, where x==u1•u2, y=

v1•v2, the similarity s(x, y) between them is defined as

2 2 2 2 1 1 1 1 ) , ( v u v u v u v u y x s × ∩ × ∩ × = ,

where |u| is the cardinality of the set u, ∩ is the set intersection operation.

Example 2 Consider the following two bigrams:

{ } { }

m 2 1• = I,V • IV =u u x and

{ }

{

m

}

2 1• = I,IV • II,V,IV =v v y . The similarity s

( )

x,y =12×1 3

Definition 11 Given two chord-set sequences A = (a1, a2,…, aM)

and B= ( b1, b2,…, bN), the similarity constraint δ, and the

similarity s(ai, bj), ∀ i, 1 ≤ i ≤ M, ∀ j, 1 ≤ j ≤ N, a mapping

between them is a one-to-one relation Rseq from {1, 2, …, M} to

(8)

(1) For each order pair (i, j) in Rseq, s(ai, bj) ≥ δ,

(2) For any two ordered pairs (i, j), (k, l) in Rseq, [(j - l)=1] if

and only if [(i - k)=1].

Definition 12 Given two chord set sequences A = (a1, a2,…, aM)

and B= (b1, b2,…, bN), the similarity constraint δ, the similarity

between A and Bfor a given mapping Rseq, S’_Rseq(A, B, δ), is

defined as N M b a s B A S seq seq R j i j i R _×

∑

∈ ∀(, ) ) , ( = ) , , ( ' δ .

Definition 13 Given two chord set sequence A and B, the

similarity constraint δ, the similarity between A and B Sseq(A, B, δ)

is defined as )} , , ( ' { max ) , , (ABδ S ABδ S seq seq R R seq =_∀ .

Example 3 Consider the following two sequences of chord-set:

(

1, 2, 3, 4

) { }{

=

(

Ι,V, IVm

}{ }{

, I,IV,II,IV,VIm

}

)

= a a a a A and

(

₁, ₂, ₃

) { }{

=

(

I,IV, II,V,IV_m

}

,

{

V_maj,IV_m,II

}

)

= b b b B .

Given the similarity constraint δ = 0.4, the similarity between A and B Sseq(A,B,δ)=

(

12+1 3

)

4×3.

To compute this similarity measure, the algorithm is based on the dynamic programming strategy.

4.4.3 By-Taxonomic-Style (QBTS) and

Query-By-Taxonomic-Style-Combination (QBTSC)

QBTS allows users query music by system predefined taxonomic style. To process this query, preprocessing for the generation of melody style pattern set corresponding to the predefined taxonomic style is required. The music objects in the library are grouped according to this predefined taxonomy. If the taxonomy consists of m styles of music, then there are m groups of music in the library. The generation of melody style pattern set for these m groups of music is similar to that for QBMG. The only exception lies in the number of music groups. In QBMG, there are only two music groups, one for the selected group and the other for the unselected group. After the generation of music style pattern set for QBTS, ranking is of the same as that in QBMG.

For the query of QBTSC, the generation of music style pattern set is of the same as that for QBTS. Ranking is done by multiplication of the ranking scores respective to the styles specified in QBTSC.

5. EXPERIMENTS

We have evaluated the effectiveness of the proposed melody style mining approach. For more detail, please refer to [12] and [17]. In this report, we focus on the evaluation of the performance of the proposed style query specification and ranking measures. We have implemented a music style retrieval system (http:// 140.113.215. 246) to perform the experiments. The music digital library contains four music styles of classical music – Baroque, Classic, Romantic and Modern style, each style contains fifty MIDI files. All MIDI files were gathered from the Internet. The Baroque style includes music of J. S. Bach, Vivaldi and Handel.

The Classic style contains music composed by Haydn and Beethoven. The Romantic style includes music of Chopin and Brahms. The Modern style consists of music of Debussy, Ravel, Prokofiev and Saint-Saens. The music of Bach was downloaded from http://www.bachcentral.com. Beethoven and Brahms’s music were downloaded from http://www.midi iofm.net. Chopin’s music was acquired from the web site http://egalvao.com/chopin. The others were accessed from http://www.music-scores.com. For each file in the library, the melody extraction and chords assignment were performed. Figure 3 shows the snapshot of the query by music group while Figure 4 shows the results returned by the system.

We invited ten users whose backgrounds cover various levels of music training to perform the experiments. One user had learned guitar for several years, three had learned piano for a few years, one is the co-leader of the chorus, one is highly interested in classic music and the others don’t have more music discipline besides the basic music courses in the school.

For each type of proposed query specifications, the users made three rounds of tests respectively. In each round of test, they made the query and gave scores to the music files in the result lists based on their perception of the style similarity between query and results. The users were requested to listen to all music files in the result list to ensure the reliability of the scores. There are seven levels of the score: -5, -3, -1, 0, 1, 3, 5, where the score 5 indicates the highly relevant and -5 indicates the highly non-relevant.

For the QBTS and QBTSC methods, users should know the characteristics of the Baroque, Classic, Romantic and Modern styles. To give users roughly knowledge about these styles, the system provided a brief introduction and some famous works for each style. Table 1 shows these representative works.

The system generated random music lists for users to select the query example(s) for QBMG and QBME. There are twenty and ten music files in the query list of QBMG and QBME respectively. For QBTS, QBTSC and QBMG, the number of music in the result lists is twenty, and system returned ten query results for the proposed three similarity measures of QBME. As we have stated in the first section, music retrieval by style try to find the music which is similar to the query style. People wish to find something new, not something known. Therefore, it is not adequate to measure the performance by recall. We measure the performance only by precision and average scores given by the users. Precision is defined as retrieved relevent retrieved N N precision

=

_{_} ,

where Nretrieved_relevant is the number of relevant music retrieved

and Nretrieved is the number of retrieved music. The music is

relevant if its score is larger than or equals zero. The average score is defined as retrieved N i i N Score score average retrieved

∑

= = 1 _ ,

where the Scorei is the score of music i feedback by the user.

We calculate the precision and average score for each round of query of the users, and average the precisions and average scores of each user. The overall performance of each type of proposed

(9)

query specifications and similarity measures is the average of all user’s average precisions and average scores. Figure 5 shows the average precision and average score curves for the three similarity measures of QBME respectively. Both the average precision and average score curves are downward gradually. The average precisions range between 0.63 and 1, and the average scores range between 0.62 and 4.73. There are no significant differences among the set, bigram and sequence similarity measures, but in most case the bigram similarity performs better. In the following experimental results, we use the results of bigram similarity measure for QBME.

Table 1. Representative works for each style.

Style Music title Composer

Cantata No.147: Jesu, Joy of Man’s Desiring J.S. Bach Invention in a minor, BWV 784 J.S. Bach Inven---tion in C major, BWV 772 J.S. Bach Messiah No. 7 Chorus: And he shall purify Handel Baroque

The Four Seasons: “Autumn” (Allegro) Vivaldi Trumpet Concerto in Eb, 3rd movement Haydn

Bagatelle No. 3, Op. 33 Beethoven

Ruins of Athens Overture, Op. 113 Beethoven Moonlight Sonata Op. 27 No. 2, 1st _Beethoven

Classic

Fur Elise Beethoven

Mazurka in Bm, Op. 33 No. 4 Chopin Mazurka in F#m, Op. 59 No. 3 Chopin

Mazurka in Bb, Op. 7 No. 1 Chopin

Etude in E, Op. 10 No. 3 Chopin

Romantic

Hungarian Dance No. 5 Brahms

Golliwogg's Cake-walk Debussy

Doctor Gradusad Parnassum Debussy

Serenade for the doll Debussy

Bolero Ravel Modern

Carnival of the Animals: Elephant Saint-Saens

The precision and average score curve of the four types of query specification are shown in Figure 6. The range of precision of QBTSC and QBTS is between 0.86 and 0.91, QBMG is between 0.71 and 0.83, QBME is between 0.66 and 1. The range of average score of QBTSC and QBTS is between 2.26and 3.27, QBMG is between 0.82 and 2.24, QBME is between 0.62 and 4.64. The precision curves of QBTSC and QBTSC are flat; QBMG and QBME are downward gradually. The average scores of all query specification types are tending downwards. The results show the QBTSC and QBTS perform better than QBMG and QBME, and the QBTS has higher average scores than QBTSC.

For the QBTSC and QBTS, the query is one or a combination of taxonomic styles, and the query of QBME and QBMG is one or a number of music files. This means that the scope of query style of QBTSC and QBTS is larger than that of QBME and QBMG. The slopes of the precision curves reflect this difference. There are more music files corresponding to the query style of QBME and QBMG, so the precision keeps high. On the contrary, the query style of QBMG is more specific and the slope of precision curve is larger; there is only one music file in the query of QBME, so its

slope is largest. Furthermore, the users may be stricter while the query is more specific.

Figure 3. Snapshot of query-by-music-group.

Figure 4. Snapshot of query result.

For the further analysis, we divide the users into two groups according to their music background. Group 1 includes six users with more music training, and group 2 includes the other six users with only basic music education in school.Figure 7 shows the average precisions of the group 1 and group 2 respectively. For the user group 1, the QBTSC performs better than the other types of query specification, and the QBTS performs better for the group 2. In our observation, the users in group 2 have less knowledge of the taxonomic styles. It is harder for them to identify the music style which is a combination of multiple taxonomic styles. However, they felt easier to identify one

(10)

taxonomic style. This made the difference in the precision curves of QBTSC and QBTS between two groups. For QBMG and QBME, there is no significant difference in the results.

6. Evaluation of Project

In this project, we have proposed an approach for melody style retrieval. We proposed four types of query specification for melody style query. Query processing of these four types of query was presented. Query processing involves the steps of the feature extraction, feature representation, melody style pattern generation and ranking. The melody style pattern generation is an integration of characterization and discrimination. The performance measured by the precision indicates that the test users are satisfied by the result returned by the system.

Our work provides a new and effective way for retrieval in terms of music style rather than the syntactic features of music. Future research could provide other query methods such as query by selecting multiple styles, query by style example music and define the corresponding similarity measures.

Our work has been published in the ACM/IEEE Joint Conference on Digital Libraries (JCDL) and the IEEE International Conference on Data Mining (ICDM) while the accept rates of both conference are about 30%. Moreover, the master student who joined the project was the winner of Acer Excellent Thesis Award.

REFERENCES

[1] Agrawal, R. and Srikant, R. Fast Algorithms for Mining Association Rules. In Proceedings of the International Conference on Very Large Data Bases, 1994.

[2] Agrawal, R. and Srikant, R. Mining Sequential Patterns. In

Proceedings of the International Conference on Data Engineering, 1995.

[3] Chai, W. and Vercoe, B. Folk Music Classification Using Hidden Markov Models. In Proceedings of the International Conference on Artificial Intelligence, 2001. [4] Chen, A. L. P., Chang, M., Chen, J., Hsu, J. L., Hsu, C. H.

and Hua, S. Y. S. Query by Music Segments: An Efficient Approach for Song Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo, 2000. [5] Dannenberg, R. B., Thom, B. and Watson, D. A Machine

Learning Approach to Musical Style Recognition. In

Proceedings of the International Computer Music Conference, 1997.

[6] Doraisamy, S. and Ruger, S. M. An Approach Towards a Polyphonic Music Retrieval System. In Proceedings of the International Symposium on Music Information Retrieval, 2001.

[7] Ghias, A., Logan, J., Chamberlin, D. and Smith, B. C. Query by Humming: Musical Information Retrieval in an Audio Database. In Proceedings of the ACM International Multimedia Conference, 1995.

[8] Hu, N. and Dannenberg, R. B. A Comparison of Melodic Database Retrieval Techniques Using Sung Queries. In

Proceedings of the ACM Joint Conference on Digital Libraries, 2002.

[9] Huron, D. and Aarden, B. Cognitive Issues and Approaches in Music Information Retrieval. In Music Information Retrieval, edited by S. Downie and D. Byrd, 2002.

[10] Jang, J. S. R., Lee, H. R. and Yeh, C. H. Query by Tapping: A New Paradigm For Content-Based Music Retrieval From Acoustic Input. In Proceedings of the IEEE Pacific-Rim Conference on Multimedia, 2001.

[11] Kline, R. L. and Glinert, E. P. Approximate Matching Algorithms for Music Information Retrieval Using Vocal Input. In Proceedings of the ACM International Multimedia Conference, 2003.

[12] Kuo, F. F. and Shan, M. K. A Personalized Music Filtering System Based on Melody Style Classification. In

Proceedings of the IEEE International Conference on Data Mining, 2002.

[13] Lu, L., You, H. and Zhang, H.J. A New Approach to Query by Humming In Music Retrieval. In Proceedings of the IEEE International Conference on Multimedia and Expo, 2001.

[14] McNab, R. J., Smith, L. A., Witten, I. H., Henderson, C. L. and Cunningham, S. J. Towards the Digital Music Library: Tune retrieval from acoustic input. In Proceedings of the ACM International Conference on Digital Libraries, 1996. [15] Pickens, J. and Crawford, T. Harmonic Models for

Polyphonic Music Retrieval. In Proceedings of the ACM International Conference on Information and Knowledge Management, 2002.

[16] Shalev-Shwartz, S., Dubnov, S., Friedman, N. and Singer, Y. Robust Temporal and Spectral Modeling for Query by Melody. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002.

[17] Shan, M. K. and Kuo, F. F. Music Style Mining and Classification by Melody. IEICE Transactions on Information and Systems, E86-D(4), 2003.

[18] Tseng, Y. H. Content-Based Retrieval for Music Collections. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999.

[19] Tzanetakis, G., Essl, G. and Cook, P. Automatic Musical Genre Classification of Audio Signals. In Proceedings of

the International Symposium on Music Information Retrieval, 2001.

(11)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 Number of retrieved music Precision QBME-set QBME-bigram QBME-seq 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5 6 7 8 9 10 Number of retrieved music Average

score _QBME-set QBME-bigram QBME-seq

Figure 5. Average precision and score curves of QBME.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 3 5 7 9 11 13 15 17 19 Number of retrieved music Precision QBMG QBME-bigram QBTSC QBTS 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 3 5 7 9 11 13 15 17 19 Number of retrieved music Average

score QBMG QBME-bigram QBTSC QBTS

Figure 6. Average precision and score curves of all users.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 3 5 7 9 11 13 15 17 19 Number of retrieved music Precision QBMG QBME-bigram QBTSC QBTS 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 3 5 7 9 11 13 15 17 19 Number of retrieved music Precision QBMG QBME-bigram QBTSC QBTS

(12)

(13)

可供推廣之研發成果資料表

□ 可申請專利

⌧

可技術移轉

日期：年月日

國科會補助計畫

計畫名稱：音樂曲風的探勘技術與應用

計畫主持人：沈錳坤

計畫編號：NSC92-2213-E-004-006

學門領域：資訊

技術/創作名稱

1. 個人化音樂推薦技術

2. 以曲風擷取音樂的方法

發明人/創作人

沈錳坤、郭芳菲

技術說明

We invented a new approach for content-based music retrieval.

Traditional content-based music retrieval provides users the capability

to look for know music. Our approach allows users to retrieve

unknown music buy melody style. We proposed four types of query

specification for melody style query. These are query-by-music-group

(QBMG), query-by-music-example (QBME),

query-by-taxonomic-style (QBTS), and Query-by- taxonomic-query-by-taxonomic-style-combinations (QBTSC).

The output of the melody style query is a music list ranked by the

degree of relevance, in terms of music style, to the query. To process

the query, the kernel is the feature extraction and feature representation

module. For each MIDI file in the music digital library, after the

offline processing of the feature extraction and representation, the

corresponding representations are stored in the library. The feature

extraction and representation modules firstly process each of the four

types of query issued by the user. For the query of type QBME, the

representation of the extracted feature is then evaluated against each of

the corresponding representation of MIDI files in the library and the

ranking list is generated. For the query of type QBMG, QBTSC, or

QBTS, the style patterns generated from the query are evaluated

against each of the corresponding representation of MIDI files in the

library and the ranking list is generated. The style patterns are

generated by characterization and discrimination from the music set

specified in the query. For QBMG, the music set is the selected group

of music. For QBTS and QBTSC, the music set is the music

corresponding to the specific taxonomy of music. The music style

mining algorithm which utilizes the modified associated classification

algorithm generates the style patterns. A set of ranking function is used

to evaluate the relevance of database music against the query music.

可利用之產業

我們所提出的曲風查詢的技術可以應用在數位內容展業，個人化

(14)

及

可開發之產品

無線網路內容服務。音樂網站從伺服器端，根據使用者的喜好，

主動推薦新發行的音樂。同時，網站根據使用者的閱聽行為，學

習使用者在曲風上的喜好。而曲風查詢可應用在手機鈴聲或音樂

下載系統的音樂查詢或電影音樂的配樂製作。此外，諸如情侶或

餐廳為了情人節，搜尋浪漫抒情的音樂；節目製作人為了背景音

樂的需要，搜尋緊張刺激的音樂；心理諮商師為了心理工作坊，

搜尋心情放鬆的音樂。

技術特點

曲風查詢系統，有別於現有音樂查詢系統。現有的系統提供的功

能是查詢曾聽過的音樂。而我們提出的曲風查詢系統，提供使用

者根據曲風查詢音樂，因此使用者查詢的未必是曾聽過的音樂。

推廣及運用

的價值

本系統將可廣泛地應用在數位內容產業的音樂製作、音樂配樂、

音樂網站與音樂搜尋引擎。

※ 1.每項研發成果請填寫一式二份，一份隨成果報告送繳本會，一份送貴單位研發成

果推廣單位（如技術移轉中心）。

※ 2.本項研發成果若尚未申請專利，請勿揭露可申請專利之主要內容。

※ 3.本表若不敷使用，請自行影印使用。

(15)

ACM/IEEE Joint Conference on Digital Libraries 2004

會議報告

郭芳菲

交通大學資訊工程所博士班

一、會議時間與地點

美國亞利桑那州土桑市 (Arizona, Tucson, USA), 93 年 6 月 7 日至 6 月 11 日。

二、會議介紹及與會經過

Joint Conference on Digital Libraries (JCDL)是自 2001 年起， ACM 與 IEEE 學會將原來 Digital

Library 的兩大會議—ACM Digital Libraries Conferences 與 IEEE-CS Advances in Digital Libraries

Conferences 合併而成的研討會，JCDL 在 Digital Library 研究領域中為極具代表性的國際學術會

議。

本屆會議所發表的論文包括了

34 篇 full paper，27 篇 short paper ，35 篇 poster 以及 16 篇

demonstration。其中 full paper 的 accept rate 為 29.8%，short paper 的 accept rate 為 30%。會議的

第一天為

tutorial，正式的議程從第二天開始，我們所發表的 full paper—“Looking for New, Not

Known Music Only: Music Retrieval by Melody Style” ，被安排在第三天下午的 Session 8A:

Indexing Music and Chinese Text 報告。來自台灣的學者包括清大的蘇豐文教授、政大的劉吉軒教

授、中研院的簡立峰博士都有論文發表。

正式議程由

MCI 公司技術策略部門的資深副總裁 Vint Cerf 的 Keynote speech 開始。Cerf 負責

草創

TCP/IP 通訊協定以及網際網路架構，被公認為網際網路之父。演講中提到了一些有趣的趨

勢，例如不管目前或是未來，亞洲都會是全球使用網際網路的使用者最多的地區。Cerf 也介紹了

RFID 技術，以及 RFID 會帶來的一些好處。

當天接下來的

session 主題有 Repository Architectures，Evaluation，Geographic Aspects of

Digital Libraries，Books and Reading，The Virtual and the Real: Current Research on Museum

Audiences and Library Users，Translating Unknown Cross-Lingual Queries in Digital Libraries Using

a Web-based Approach ， Surrogates for Physical Artifacts ， Crawling the Web ， Automated

Techniques for Managing Collections。其中 The Virtual and the Real: Current Research on Museum

(16)

Audiences and Library Users 這個 session 是 Panel discussion。這個 panel 中討論了一些有趣的議

題，例如:線上資源對於傳統博物館的影響究竟會使人們更常去傳統博物館或剛好相反，線上的

使用者是否真的想要數位圖書館或是"virtual experiences"…等等。

第二天晚上

6:30 到 9:00 則是同時舉行 Poster，Demonstration session 以及歡迎晚會，有非常多

有趣的

poster 與 demo 展示。在這個 session 中，有幾位清華大學的學長參與 poster 展示，我們交

換了許多研究上的意見，以及對於參加這次會議的一些心得感想，獲益良多。Session 最後是 best

paper award 的頒獎。

第三天的

session 包括了 Educational Aspects of Digital Libraries，Image and Video Digital

Libraries ， Collaboration and Group Work ， Indexing Music and Chinese Text ， Demonstrating

Education Impact: Challenges in the Years Ahead，Digital Preservation，Mining and Disambiguating

Names ， Library Leaders on Digital Libraries and the Future of the Research Library: A Panel

Discussion，Interacting with Collections。我們的論文是在 Indexing Music and Chinese Text session

中發表，Session chair 是 University of Maryland 的教授 Bob Allen。許多音樂相關的研究學者參加

這個

session，因此發問與討論的情況很熱烈。

論文發表的

session 安排到第四天早上為止，session 包括了 Search and Query Strategies，

Supporting Personalization 與 Interchange and Interoperability。會議在最後一個 penal discussion—

“10 Years Hence and 10 Before”之後結束，這個 penal 主要討論關於之前 10 年中的 digital libraries

以及未來

10 年間的發展。

三、與會心得

參加了這次的

JCDL 會議，不管是在會前的報告準備，或是參加會議的過程中，都讓我獲得

了很多寶貴的經驗。由於

JCDL 會議參與的研究學者除了 computer science 領域之外，還有大部

分是來自

digital libraries 領域，從會議中發問以及報告的過程中，我可以看到各領域的研究學者

不同的思考模式。在會議中發表論文也是非常好的訓練機會，不但可以訓練表達能力，同時可以

加強自己的英文能力。

在我報告的

session 中，有另外兩篇關於音樂的相關論文發表，題目分別是“Discovery of

(17)

retrieval”。第一篇論文探討的問題是找出音樂中，有逆行(retrograde)或反向(inverted)的音樂主題

(Theme)。之前的研究有不少是利用 data mining 的方法來找出音樂的主題，但是大部分的研究都

只有針對重複(repetition)的音樂主題。但是音樂的主題除了重複、逆行跟反向之外，還有許多不

同的變化，如何找出其他種類的音樂主題，是一個值得研究的議題。第二篇論文則是研究如何使

音樂檢索的速度變快。目前音樂檢索的相關研究，主要是針對音樂檢索的效果做改進，較少著墨

在檢索的速度上。但在實際應用上，檢索的速度的確是個重要的問題，使用者會希望馬上能得到

查詢的結果。我認為未來除了針對不同的音樂檢索方式改進效果之外，不同檢索方式的速度也是

重要的研究方向。

在這次的發表過程中，我很高興的發現自己的英語發表能力比起之前參加國際會議有明顯的

進步，也發現自己仍有許多要加強的地方。其他學者對我所提的問題，對於我未來的研究也有很

大的幫助。希望未來能有更多機會參加國際學術會議，可以提升自己的能力，並且讓自己的眼界

更加開闊。

四、攜回資料名稱與內容

(18)

Looking for New, Not Known Music Only:

Music Retrieval by Melody Style

Fang-Fei Kuo

Dept. of Computer Science and Information Engineering National Chiao Tung University

HsinChu, Taiwan, ROC

[email protected]

Man-Kwan Shan

Dept. of Computer Science National Cheng Chi University

Taipei, Taiwan, ROC

[email protected]

ABSTRACT

With the growth of digital music, content-based music retrieval (CBMR) has attracted increasingly attention. For most CBMR systems, the task is to return music objects similar to query in syntactic properties such as pitch and interval contour sequence. These approaches provide users the capability to look for music that has been heard. However, sometimes, listeners are looking, not for music they have been known, but for music that is new to them. Moreover, people sometimes want to retrieve music that “feels like” another music object or a music style. To the best of our knowledge, no published work investigates the content-based music style retrieval. This paper describes an approach for CBMR by melody style. We proposed four types of query specification for melody style query. The output of the melody style query is a music list ranked by the degree of relevance, in terms of music style, to the query. We developed the melody style mining algorithm to obtain the melody style classification rules. The style ranking is determined by the style classification rules. The experiment showed the proposed approach provides a satisfactory way for query by melody style.

Categories and Subject Descriptors

H.2.8 [Database Management]: Database Applications – Data mining; H.3.3 [Information Storage and Retrieval]: Information

Search and Retrieval – Query formulation, retrieval models, search process; H.3.7 [Information Storage and Retrieval]:

Digital Libraries – Systems issues; H.5.5 [Information Interfaces

and Presentation]: Sound and Music Computing – Methodologies and techniques; J.5 [Computer Applications]:

Arts and Humanities – Performing arts.

General Terms

Algorithms, Design, Experimentation, Human Factors.

Keywords

Content-Based Music Retrieval, Music Style Mining, Query by Melody Style, Music Classification.

7. INTRODUCTION

Music information retrieval (MIR) has become an increasingly important field of research in recent years. In traditional MIR systems, the query is based on text-based metadata. The content-based music retrieval (CBMR) allows user to query by music content instead of metadata.

Much work has been done on the development of CBMR. Query by humming or singing[7][8][11][13][14][18] are common approaches for retrieval from acoustic input. The queries were melodies hummed or sung by the user, and were transcribed into symbolic MIDI format. Query by tapping is another query method that takes the beat information for retrieval[10]. Recently, several researchers have explored polyphonic content-based music retrieval[6][15][16]. The polyphonic music retrieval techniques are more suitable than monophonic music retrieval for retrieving performance data and query by polyphonic input.

Main goals of the previous CBMR researches are to return the music objects that are similar to the query in pitch, interval contour or rhythm. Moreover, these CBMR approaches provide users the capability to look for music that they have been heard. However, sometimes, listeners are looking, not for music they have been known, but for music that is new to them. Moreover, people sometimes want to retrieve music that “feels like” another music object or a style.

To look for new music that we haven’t listened, the approaches of query by humming, singing, or tapping is helpless. It is necessary to develop the technique for query music by melody style. Music style implies the human perception of music, which is the feature that people often utilize to classify music. Though text-based metadata, which records the text description of music style, can be utilized for melody style query, it should be annotated manually. Furthermore, sometimes user may wish to query mixed style. For example, the users may want to retrieve music mainly sounds like Chopin and a little Bach. The returned music objects should be more similar to Chopin style but also have a little feeling of Bach style.

The purpose of our research was to investigate the technique for content-based music retrieval by melody style. There are several issues about our work:

(5) To develop the methods for the specification of query style. (6) To determine the appropriate feature for music style and its

representation.

(7) To discover the description of melody style.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

(19)

(8) To measure the degree of relevance between the music object and the query style.

For the first issue, we present four types of query specification for query style. For the second issue, the basic elements of music consist of melody, harmony, rhythm, and so on. Above all, melody is the most memorable aspect of music. Accordingly, we concentrated on the melody style and utilized chord as the melody feature for retrieval by music style. For the third issue, we develop an algorithm to discover the common characteristics from the music of the same style and find the discriminating patterns between the music of various styles. The melody styles are described by the discovered set of style rules. For the last issue, the discovered set of style rules is used to rank the music objects. Our work is useful in many aspects of applications. For example, to help physiotherapist for seeking music that will motivate a patient, to help film director for seeking music conveying a certain mood[9], to help restaurateur for seeking music that targets a certain clientele. Query by melody style provides users the capability to find music with style similar to what users like. This paper is organized as follows. Section 2 give a brief review of previous work related to content-based music retrieval and music style discovery. In section 3, we present the music style retrieval model. Section 4 describes our proposed methodology. The experiment and result of performance analysis is described in Section 5. Section 6 concludes the paper.

8. RELATED WORK

Much research has been done on the development of the content-based music retrieval technology. Query by humming or singing is a common approach for query by acoustic input[7][8][11][13][14][18]. Ghias et al.[7] introduced a query by humming system. The query input was converted into a melodic contour and the contour was matched against the music in the library by approximate string matching. McNab et al.[14] presented a CBMR system that accepted singing or humming queries. They investigated people’s singing accuracy and suggested that the music transcription should adapt user’s tuning. In Tseng’s research[18], key melody extraction is used for query suggestion and effective retrieval, where the key melodies are representative fragments of music. To allow queries in any key levels and match approximately, the pitch profile encoding and n-note indexing techniques were used respectively. Kline et al.[11] developed approximate matching algorithms make better use of both pitch and duration information, which improved results when the users have relatively little music experience or ability. Lu et al. [13] proposed a new melody representation and hierarchical matching method for query by humming system. The melody representation is a combination of pitch contour, pitch interval and the duration. Jang et al.[10] presented a new query paradigm, which allows user query by tapping. Melodies are transformed into the time vectors that contain the beat information. Hu et al. [8] compared the performance of several retrieval algorithms. The types of query include humming, singing and whistling. In [23], Chen et al. investigated the music content representation and retrieval techniques. They proposed music segment as a music content representation, which consists of both melody and rhythm information.

Several researchers have explored polyphonic content-based music retrieval[25][15][16]. Doraisamy et al. [25] proposed the polyphonic music indexing using pitch and rhythm information. In [16], a probabilistic model is proposed for retrieving performances that include large number of variations in performing a melody and accompaniment. Pickens et al. [15] proposed harmonic description which contains the information from all chords, and combined with Markov method to model music document and query.

Though the aim of this work is melody style retrieval rather than melody style classification, several works on music genre classification that are related to our work are described as follows. The work developed in MIT Media Lab.[1] employed hidden Markov model to model and classify the melodies, which were represented as a sequence of absolute pitches, absolute pitches with duration, intervals and contours. Another research in CMU used the naïve classifier, linear and neural network respectively to recognize music style for interactive performance systems [5]. Thirteen statistical features derived from MIDI are identified for learning of music style. In [19], the music genre classification algorithms aimed at audio signals were explored. They proposed features for representing the musical surface and rhythmic structure and classified by statistical pattern recognition classifier.

9. MUSIC STYLE RETRIEVAL MODEL

Before the description of the proposed approaches of music style retrieval, we first formalize how the music style is modeled.

Definition 1 A music object O is represented as O = O(M, F, R)

where

M is the raw music data, for example, an MIDI file. F = {fi} is a set of low level music features associated with

the music object.

R = {rij} is a set of representations for a given feature fi.

Style usually refers to collections of data. Style is a concept description that generates descriptions for characterization and discrimination. Characterization refers to the common patterns of a given collection while discrimination denotes the comparison among collections. Therefore, the music style involves both the characterization of music patterns for each collection of music object and the discrimination of music features among collections of music objects.

Definition 2 The music style T is modeled as T = D(C(G(O)))

where G is the taxonomy of the music objects, C is the characterization function, D is the discrimination function. For example, the taxonomy of music objects may be classified according to the composer. For the folk song, the taxonomy may be classified according to the peoples. For the Western music, the taxonomy of music objects may be classified according to the eras of history of Western music, namely, the Baroque, the Classical, the Romantic and the Modern era.

For the taxonomy of Western music, the music shares aspects of style with other pieces written at roughly the same time. In the Baroque era, melodies are ornate and often make use of dramatic leaps. Repetition and simple binary and ternary forms provide the basis for musical structure. Rhythms are often derived from dance

音樂曲風的探勘技術與應用

行政院國家科學委員會專題研究計畫 成果報告