Music Object Modeling - 數位音樂典藏之資料探勘與智慧型檢索技術 (II)

A good music representation should be able to assist the system in capturing user’s semantic concept in terms of low-level music features. A music object can be characterized by multiple features such as tempo, rhythm, melody etc. Each feature can be represented as a set of representations. For example, average pitch difference and pitch standard deviations can be used for representations of the melody feature. In the representation space, the semantic concept can be characterized as a subset of representations which discriminates the concept from others. For instance, an inspiring music which rise and fall seriously in melody is describable by average pitch difference.

To understand user’s concept, global features corresponding to an entire objects and local features with respect to each music segment should be considered. A music object is composed of a set of music segments. A music object can be globally described by a set of representations in feature space or locally described as multiple sets of representations in feature space where each representation set corresponds to a music segment.

We proposed a segment-based music modeling technique to represent music object in segment level. In our work, the modeling approach consists of three steps. In the first step we represent each music object as a set of segments found by the motivic repeating pattern finding algorithm. Then, multiple feature representations are extracted from each music segment. Moreover, global feature representations are also extracted from an entire music

object to represent the music object as a whole. The last step is to filter significant motivic patterns based on frequency of patterns.

The music modeling approach is organized as follows. Section 3.1 describes the technique for finding motivic repeating patterns. Section 3.2 introduces the step of feature extraction. After that, section 3.3 introduces how to filter significant motivic repeating patterns.

3.1 Motivic Repeating Pattern Finding

In music, a motive is a salient recurring fragment of notes that may be used to construct the entirely or parts of complete melodies and themes. Therefore, each music object can be described by a set of motives. The recurrence of a motive may not be an exact repetition in the music object but with some variations. This is called as motivic treatment in musicology [26].

Six common motivic treatments (a)repetition (exact repeat), (b)transpose (interval repeat), (c)sequence, (d)contrary motion, (e)retrograde, and (f)augmentation/ diminution repetition are considered in our work (Figure 3.1).

We first apply the all-mono method to extract main melody. The extracted main melody will be represented as a note sequence where each note is expressed by pitch and its duration.

Then, we modified the correlative matrix method [12], originally designed for exact repeating pattern finding, to discover six variations of motivic repetitions [9]. Finally, a minimum constraint on the length of a fragment is used to retain motivic patterns of more than four notes.

The correlative matrix method is utilized for repeating pattern discovery with a given note sequence. It includes the following three steps:

1) Construct Correlative Matrix:

The correlative matrix is the data structure which is initially formed by the given note sequence. Namely, if the length of note sequence is n, the size of the matrix is nxn.

The purpose of the first step is to fill the matrix row by row. For the i^th note and the j^th note in the note sequence, the cell of i^th row and the j^th column in the matrix will be set as one if they are the same, otherwise it will be empty. In addition to the current matching results, the value of cell in the i^th row and the j^th column is also decided based on the result of the cell in the (i-1)^th row and (j-1)^th column. Assume the value of the cell in (i-1)^th row and (j-1)^th column is v. The value of the cell in i^th row and j^th column will be set to v+1 if the i^th note is the same as j^th note in the sequence. The value in the cell indicates the length of a potential repetition.

After the construction step, the matrix will keep all of the intermediate results of substring matching.

2) Find Candidate Set:

For each non-empty cell, the corresponding pattern is regarded as a candidate, a potential repeating pattern. The associated information is computed as we find each candidate. The information includes, pattern, rep_count, and sub_count. Pattern indicates the repeating pattern, rep_count represents the count of matching for the repeating pattern, and sub_count means the number of other repeating patterns which contains this pattern. To calculate the rep_count and sub_count for the ith row and jth column (Mij), conditions of Mi-1,j-1 and Mi+1, j+1 has be taken into account. After computation for each non-empty cell, patterns with their corresponding repetition count and substring count will be used to calculate pattern frequency in the next step.

3) Discover Non-trivial Repeating Patterns:

The purpose of this step is to discover all non-trivial repeating patterns and calculate the actual frequency of each legal pattern. A pattern is trivial if its rep_count equals sub_count. The trivial case indicates that there exists a superstring S’ containing the pattern S and S appears along with S’. In such case, the superstring S’ is considered more representative and hence the trivial pattern S will be removed. After removal, the frequency f of each pattern p in a music object m is calculated by the formula:

Table 3.1 shows an example. Given a note sequence of “CAACCAACD”, the correlative matrix is constructed by substring matching row by row. For the 1^st note “C”, it repeats in 4^th, 5^th, and 8^th position of the sequence. For the cell M26, because “A” in the 2^nd row matches the

“A" in the 6^th column and M15 is 1, the value of M26 is set to 2. The value 2 indicates the pattern “CA” with length 2. To find all candidates, all non-empty cells is scanned and associated information of each candidate is computed. Take M37 as an example. The corresponding pattern of M37 is “CAA”, whose count of match so far is one and is a substring of the pattern “CAAC” since M48 isn’t empty. Hence, the associated information of “CAA” is (“CAA”, 1, 1). Since the rep_count of “CAA” equals sub_count, “CAA” is a trivial pattern and will be removed. The pattern “C” is an example of non-trivial patterns.

Table 3.1 An example of correlative matrix.

C A A C C A A C D

The method described is the standard version for discovering exact repeating patterns and can’t be applied for other repeating variants shows in Figure 3.1 without modification.

For exact repetition (Figure 3.1 (a)), we can utilize the method directly.

For transpose (interval repeat) (Figure 3.2 (b)), we have to transform the pitch sequence into pitch interval sequence(Figure 3.1). After that, the correlative matrix method is applied on the pitch interval sequence.

Figure 3.1 Transformation from pitch sequence into pitch interval sequence.

Sequence is a type of motive treatment which contains more than three consecutive motive transpositions (Figure 3.2 (c)). Beside, the direction of the transposition has to be the same, namely ascending or descending. In Figure 3.2 (c), the first rectangle indicates the original motive. The second and third are the transposition of the original motive. To discover sequence, the method is the same as the case of transpose except that we have to check whether the discovered pattern is repeated consecutively.

Contrary motion (Figure 3.2 (d)) is a motive treatment where pitch interval sequence is inversely repeated while the rhythm keeps the same. Namely, the contrary motion of the original motive can be obtained by assigning opposite sign for each pitch interval. To discover contrary motion, the correlative matrix is constructed by two different sequences. One is the original pitch interval sequence and the other is the one with opposite sign. Others remain the same.

Retrograde is a repetition where pitch contour is inversely repeated while rhythm keeps the same. Figure 3.2 (e) gives an example. The second motive <72, 72, 71, 67, 65, 65> is the retrograde of the first one <65, 65, 67, 71, 72, 72>. To discover retrograde in the sequence, The conditions to decide the value for each cell is changed. To assign the value of Mij, the original method will take Mi-1,j-1 into account while Mi-1,j+1 will be considered in the retrograde case. Others remain the same.

Augmentation (diminution) repetition is repetition where pitch sequence remains the same while rhythm becomes faster (slower) with a ratio. In Figure 3.2 (f), the second motive is the augmentation repetition of the first one and the third motive is the diminution of the first one. To discover augmentation (diminution) repetition, the process for discovering repeating patterns remains the same while an additional check on the results is needed to ensure the rhythm of repetitions with regarded to one pattern is changed in a ratio.

After six discovery processes perform on each music object, we only keep original motive to represent the structure feature of that music. Next step describes the process of feature extraction.

3.2 Feature Extraction

We extract six kinds of global feature representation and five kinds of local feature representation shown in Table 3.2. In other words, each music object is modeled as a six-attribute global feature and a set of five-attribute local features. Music features considered in this report are melody, rhythm and tempo. Representations for melody features include average pitch, pitch standard deviation, highest/lowest pitch, chord sequence and average pitch difference. Rhythm feature is represented as density while tempo is represented as the tempo value only.

Average pitch is the average pitch values of notes within a music piece (an entire one or a segment). Pitch standard deviation is the standard deviation of pitch values of notes within a music piece. Highest and lowest pitch value is extracted from a music object and average pitch difference indicates the average of difference in pitch between two consecutive notes within a music segment. Chord sequence is a sequence of chord within a segment calculated by chord assignment algorithm which is a heuristic method based on harmony and music theory. Details on chord sequence can be seen in [14]. Density of a music piece is defined as number of notes dividing by the total duration of a music piece. Tempo denotes the speed of a music object and is defined as number of beats per minute.

Table 3.2 Global and local features considered in our work.

Global feature Local feature Density (gd)

Average Pitch (gap)

Pitch standard deviation (gsd) Tempo (gt)

Highest Pitch (ghp) Lowest Pitch (glp)

Density (ld)

Average Pitch (lap)

Pitch standard deviation (lsd) Chord sequence (lcs)

Average Pitch difference (lapd)

In some cases, two different music segments may be approximately sounds the same. In order to consider the fault-tolerant cases, we intent to quantize the feature values in each segment. In the aspect of global feature, density and pitch standard deviation are quantized by the range of 0.5. More precisely, the quantized value will equal the quotient obtained by dividing the raw value by 0.5. For instance, two densities of 1.7 and 1.9 are quantized as 3. In the same way, the average pitch, highest pitch and lowest pitch are divided by 5. In the part of local feature, density, pitch standard deviation and average pitch difference are divided by 0.5.

The average pitch value is quantized as it does in the global feature, while the chord sequence remains the original value.

In order to observe the impact of quantization on performance, we keep two copies of features, the raw one and the quantized one. These two copies will be processed in the next step and the sequential learning process respectively. We will compare the performance of the two different representations in the chapter 5.

3.1 Significant Motive Selection

We aim to filter significant motivic patterns (SMPs) in this step. We measure the significance of each motivic repeating pattern and retain those significant one with regard to a music segment. A motivic repeating pattern with high frequency in the music object isn’t necessarily more important than the one with low frequency in the other music object.

Therefore, the frequency of a motive, f(p,m), is normalized by dividing the maximal frequency of the motivic pattern p’ in music m. A motive is more important with respect to one music object if the motive is more specific in the music database (DB) and thus the importance of a motive with respect to one music object is defined as follows:

)

where sup(p,DB) stands for the support of p in the DB.

Table 3.3 shows the representations of the song “don’t let the sun go down on me” which contains eight SMPs with importance higher than 0.5. Only the representations of global feature and local features of three SMPs are shown in Table 3.2. Figure 3.3 illustrate the corresponding score for these SMPs.

Figure 3.2 Examples of six motivic treatments.

Table 3.3 Representation of the global feature and three SMPs of the music object M.

gd gap gsd gt ghp glp

M 1.7 73 3.5 7 81 72

ld lap lsd lcs lapd

{ 2 75 2 0 3 ,

ld lap lsd lcs lapd

1.1 74 2 7 1 ,

ld lap lsd lcs lapd

0.4 76 2 3 0.3 }

Figure 3.3 Corresponding score of SMPs in Table 3.3.

CHAPTER 4

在文檔中數位音樂典藏之資料探勘與智慧型檢索技術 (II) (頁 26-35)