Phrase Identification and Utility Assignment Phase

Chapter 3 Automatic Music Arrangement Framework

3.3 Phrase Identification and Utility Assignment Phase

3.3.1 Phrase Identification

In this subsection, we attempt to identify the phrases from a segmented track. As mentioned in [49], the definition of “phrase” is ambiguous. The phrase we try to find is a monophonic melodic group of notes with similar properties, usually separated by a breathe point or a large pitch interval. Many approaches have been proposed, which have performed well in finding this type of phrases. Because the phrases are found from a monophonic piece of music, we first have to identify the monophonic piece lines from a segmented track. Thus, the process of phrase identification consists of two steps: (1) finding monophonic lines; and (2) identifying phrases from monophonic lines.

In the first step, we adopt the approach proposed by Lui [37] because, to the best of our knowledge, no other studies on this topic have investigated so far. One of the most important issues of finding the monophonic line in polyphonic music is to preserve the best voice leading, which keeps the most natural melodic continuity between notes. The notes are grouped as follows: First, the chord progress of each measure is determined. For each consecutive pair of chords, let C_fewer be the chord with fewer notes and C_more be the chord with more notes. Resolve each tendency tone, and then each note of Cfewer is grouped with its neighbor of the nearest pitch in C_more. For different chords, the notes are grouped based on the following:

 For common chords, such as I and V, use voice-leading matrixes to resolve tendency notes.

 For the other chords, group each note of the preceding chord with its nearest neighbor in the succeeding chord.

The voice-leading matrix is two-dimensional (12×12). The indices are relative to the tonic and the entry indicates the voice leading priority from pitch row to pitch column. Interested readers can refer to [37] for the detailed descriptions.

In the first step, the monophonic lines are extracted. In the second step, the phrases are identified in each monophonic line. We investigated many works on this issue, and chose, the local boundary detection model (LBDM) [8] due to its easy implementation and good performance. The approach identifies phrases by segmenting a monophonic line according to larger pitch intervals or breaths of long notes. This model consists of a change rule, which assigns boundary strengths in proportion to the degree of change between consecutive intervals, and a proximity rule, which scales the boundary strength according to the size of the intervals involved. The LBDM performs over three independent parametric melodic profiles

Profilek = [x1, x2, …, xn] where k ϵ{ pitch, ioi, rest }, i ϵ{1, 2, …, n} and ioi stands for inter-onset interval. The boundary strength at interval x_i is defined by

strengthi = xi × (ri-1,i + ri,i+1) (2)

where ri-1,i is the degree of change between two successive intervals and can be calculated by

0 For each parameter k, the boundary strength profile strength_i is calculated and normalized into the range [0, 1]. A weighted sum of strengths is computed, using weights derived by trial-and-error in the previous study [8] (0.25 for pitch and rest, and 0.5 for ioi). Finally, the boundaries are detected where the combined strength profile exceeds a predefined threshold.

Figure 3-3. An example of phrase identification

Figure 3-3 illustrates an example of performing phrase identification. The given segmented track is polyphonic in left-hand side. In the step 1, the monophonic lines will be identified. In the beginning, A5 overlaps with B4, and two temporary monophonic lines, tml1 (A5) and

be chosen. According to chord progress and pitch difference, B5 is grouped into tm1 and B4 is grouped into tml2. By the same process, monophonic line 1 and 2, ml1 and ml2, are formed.

Then, ml1 and ml2 are fed into LBDM. When processing ml1, the cut point between the 5th and 6th note of ml1 is found because the combined strength profile exceed the threshold.

Finally, three phrases are identified in the example.

3.3.2 Utility Assignment

Each of phrases identified is of different importance for the arrangement. We define the importance of a phrase, called utility, based on two factors. In the first factor, we consider the types of arrangement elements of the phrase for the target instrument that users considered. As mentioned in Section 3.2, the five types of arrangement elements in a segmented track have been determined and the classifier outputs the probabilities. Considering the input of our framework, the types of arrangement elements that users want to arrange for the target instrument have been specified in advance. The probabilities of the user-defined types of arrangement elements are taken as the first part of utility. Hence, the probabilities that the phrase inherited from the segmented track to which it belongs are summed up. To normalize the value, it is divided by the number of the considered types. The first factor, denoted as Fill}; P(ae|st) is the probability that the segment track st belongs to arrangement element ae;

φae is the user preference on arrangement element ae and φae ϵ (0, 1]. For example, if we consider the arrangement elements, lead and fill, are important, we can set φlead and φfill to 1 and set the others close to 0. Note that for all phrases in the same segmented track, their F1

values are equal.

In the second factor, the richness of a phrase is considered because we think it will make newly arranged music richer. The entropy is used to measure the richness of a phrase; that is, the phrase is richer when the pitches of the phrase are represented by more bits. The second factor, F₂(phr_st,i), is defined with the formula where m is the number of distinct pitch values in the phrase phrst,i and pvi is the proportion of a pitch value in a phrase.

Note that an upper bound for entropy is defined and the entropy can be normalized into 0 ~ 1.

Here, the upper bound of the entropy is set to 64 heuristically, since a phrase usually falls within two measures and there are 16 distinct pitches at most for the notes with the 1/8 minimal length of a note in 4/4 music.

We combine the values of these two factors as the utility of a phrase with predefined weights.

Since the phrases needed to be selected on score and some constraints exist among phrases over the time domain, the range of value leads into a situation wherein most of selected phrases are shorter. To assign the utility fairly over the time domain, the length of the phrase is also considered. Therefore, the utility of a phrase U(phrst,i) is defined as

)

在文檔中縮編式自動編曲之研究 (頁 28-33)