Music Structure Analysis and Structure Rule Learning

4 Analysis and Learning

4.1 Music Structure Analysis and Structure Rule Learning

Musical structure can be regarded as a hierarchical structure similar to the structure of an article. In our approach, a music object is composed of sections and a section is composed of one or more phrases [19]. The structure analysis component discovers the section-phrase hierarchical structure of a music object while the structure learning component analyzes common characteristics from structures of given music examples.

Some researchers have devoted to the investigation of music segmentation techniques for symbolic music [8][11][32][44][49]. Most work focuses on the extraction of phrases. Takasu et al. proposed the phrase extraction method based on some heuristic rules and extended the work to classify phrases into two classes:

theme phrases and non-theme phrases based on the decision tree algorithm [44]. Chen et al. adopted a similar approach to extract phrases and to group similar phrases for sentence extraction [11]. Nevertheless, to the best of our knowledge, nobody has focused on the detection of musical sections which constitute the musical form.

Therefore, we present a heuristic method for section detection.

There are three main steps in the music structure analysis component. In the first step, section detection

is performed. To find the section structure in music, first we discover repeating patterns over main melody. A repeating pattern is a sequence of notes repeats several times in a music object. The main melody is a note sequence where a note can be parameterized with several property values such as pitch, duration and velocity.

Velocity is only relevant to music performance. Therefore only pitch and duration are considered for the structure analysis. The repeating patterns over pitch-duration sequence are discovered by repeating pattern mining techniques. Over the past few years, several algorithms have been proposed to mine repeating patterns in a sequence [25][27][30]. Suffix tree is a well-know data structure originally developed for string matching.

It is also utilized to find the repeating patterns by storing the suffix information [30]. There may exist a large number of repeating patterns in a sequence. Therefore, the concept of non-trivial repeating patterns was introduced by Hsu et al. [25]. A repeating pattern is non-trivial if and only if there does not exist another superstring with the same frequency. Hsu et al. proposed two different approaches, correlative matrix and string-join, to mine non-trivial repeating patterns [25]. In the former approach, a data structure called correlative matrix is constructed by lining up the note sequence the horizontal and vertical dimensions respectively to keep the intermediate information of substring matching. Each cell of this matrix denotes the length of a founded repeating pattern. After the construction of this matrix, the repeating frequencies of all repeating patterns can be derived by computing the non-zero-cells in the matrix. The latter approach, string-join, utilizes the anti-monotony property to avoid generating large amount of candidates. In this approach, shorter patterns are joined into longer ones and the non-qualified candidates are pruned out. Here, we employ the correlative matrix technique to find the repeating patterns.

NTRP 0 NTRP 1 NTRP 2 NTRP 3 NTRP 4 NTRP 5

Figure 5: The instances of repeating patterns in the main melody of “Little Bee.”

After the repeating pattern mining process, a music object may contain more than one repeating pattern.

Each repeating pattern appears several times. Figure 5 illustrates the instances of non-trivial repeating patterns after correlative matrix technique has been applied to the musical piece “Little Bee.” Repeating patterns with durations shorter than two bars are not shown here. In Figure 5, each strip denotes an instance of a non-trivial repeating pattern. There are six non-trivial repeating patterns. For example, the first pattern NTRP0 has three instances. One starts at the first bar, another starts at the fifth bar, and the other starts at the thirteenth bar.

Algorithm Pattern-Selection

Input: A set of n phrase, P = {pi =(li, ri) 1 i  n}

Output: The subset Q of P such that the total duration is maximized

1) Sort the left positions and the right positions of the phrases into a 1-dimension array T of 2n elements 2) Create a 1-dimension array S of 2n elements

3) for i  1 to 2n //initializing value for S 4)

S

i  0

5) for i  2 to 2n //fill up array in order 6) if T(i) is the right end of a phrase pk

= (l

_k, rk) 7) f  lk

8) d  rk - lk

S

i  max{Sf -1+ d, Si-1} 10)

else

11)

S

i  Si-1

12) i  2n; Q  {};

13) While (i  2) //backtracking s2n to extract phrases 14)

if ( S

_i≠ Si-1 )

15) let pk be the phrase ends at Ti

16) Add pk to Q 17)

i  l

18)

else

19)

i  i-1

20) return Q

Figure 6: Pattern-selection algorithm.

As not all instances of repeating pattern are required for section structure detection, appropriate instances need to be selected. First, in our proposed approach, all the instances of the repeating patterns with durations shorter than two bars are filtered out. Then we attempt to extract the musical sections by finding the set of non-overlapping repeating pattern instances such that the total length of the selected instances is maximized.

We modify the algorithm for exon-chaining problem developed in the field of bioinformatics [26]. Given a set of weighted intervals in a chain, the exon-chaining problem attempts to find a set of non-overlapping intervals such that the total weight is maximal. This algorithm can be modified to accommodate the pattern selection problem by replacing the weight of an interval with its duration. As the detailed pattern-selection algorithm in Figure 6 shows, given n pattern instances, this problem can be solved using dynamic programming in a one dimension array S of 2n elements, n of which corresponds to the starting (left) positions

of the instances and n of which corresponds to the ending (right) positions of the phrases. For the sake of simplicity, it is assumed that all instances are distinct. This algorithm starts by sorting the starting and ending positions of all instances into a one dimension array T of 2n elements. The i-th element of S, Si, represents the maximum duration for the set of instances which ends before the position Ti. The set of instances, which ends before or at Ti, to maximize the total duration either excludes the instance ends at Ti, or includes the instance ends at Ti to the maximum set of phrases which ends before the left position of this instance (Figure 6, lines 5 to 11). This leads to the following formula



The set of selected phrases can be derived by backtracking the array S (Figure 6, lines 13 to 20). To describe the computation of this algorithm, Figure 7 gives the example corresponding to the pattern instances in “Little Bee.” In this example, the duration of each instance is measured in beats. The circled instances constitute the set which maximizes the total duration.

The next step of section structure detection is to find the section boundary based on the selected pattern instances. Each selected instance corresponds to a section. Initially, the starting and ending positions of a section are set to those of corresponding instance. Then each section is shrunk or expanded by adjusting the starting position to the nearest starting position of an odd numbered bar. For example, the second section in Figure 8 is shrunk by adjusting its starting position to the first beat of the fifth bar. At last, each section is expanded by aligning its ending position to meet the starting position of the next section. For example, in Figure 8, the ending position of the first section is set to the end of the fourth bar.

Figure 7: An illustration of example of pattern selection corresponding to Figure 5.

NTRP 0

Figure 8: The selected pattern instances of the example in Figure 5.

The second step in music structure is phrase detection. To obtain the number of phrases in a musical section, the LBDM (Local Boundary Detection Model) approach developed by Cambouropoulos et al. [8] is used to segment a section into phrases. Previous experiments have adopted LBDM as representative of the melodic feature-based algorithms for melody segmentation [32]. LBDM extracts the pitch interval sequence, the inter onset interval sequence and the reset sequence from the main melody. Then these three sequences are integrated into the sequence of boundary strength values measured by the change rule and the proximity rule.

The resulting peaks of the boundary strength value sequence are regarded as the phrase boundaries.

The final step in music structure analysis is to extract features for each section, including section labeling.

Each section is labeled such that all the instances of the same repeating pattern are labeled with the same symbol. For example, in Figure 8, the labeled sequence becomes ABCDB. The second and the fifth section correspond to the same repeating pattern, so do the third and the fourth section. At last, the structure analysis component outputs a section sequence where the section is parameterized by label, numberOfOccurrences,

numOfPhrases and length. While the attribute label denotes which label it is, the attribute numberOfOccurrences denotes the number of appearances of the same label. The attribute numOfPhrases

denotes the number of phrases in this section and the attribute length denotes the length of the section measured in beats.

In the learning step, the probability distribution of section sequences along with associated meters is derived to model the style of musical form. Moreover, the conditional probability Prob(numOfPhrases|label,

numberOfOccurrences) of the number of phrases given a label and an occurrence value is derived. So does

the conditional probability that Prob(length|label, numberOfOccurrences) of the section length given a label and an occurrence value.

在文檔中以音樂動機探勘為導向的電腦音樂作曲 (I) (頁 11-15)