Divide-and-Conquer-based segmentation - 機率式模型分群法之研究與其應用

In this section, we present two implementations of the diviand-conquer paradigm for de-tecting multiple change points in an analysis window. Note that the proposed approaches are based on the same assumption as that of WinGrow, i.e., the feature vectors of audio segments from different acoustic source are derived from different Gaussian distributions.

5.2.1 The DACDec1 approach

We use the example in Figure 5.3 to explain the concept of divide-and-conquer-based segmentation. It is assumed that the audio stream in Figure 5.3 (a) consists of three homogeneous segments derived from different speakers. Initially, OCD-Chen is applied in an analysis window that covers the entire audio stream. After the change point C2 has been detected with the ∆BIC curve in Figure 5.3 (b), the audio stream is divided into two analysis windows. Then, OCD-Chen is recursively applied in these two windows to search for the remaining change points so that C₁ can be detected. This approach, called DACDec1, allows us to detect the change points by a divide-and-conquer (DAC) strategy.

As described in Algorithm 6, DACDec1 terminates (returns) if no change point is detected by OCD-Chen in the analysis window or the size of the analysis window is smaller than a pre-defined value, denoted as N_min samples. In the Divide stage, the analysis window is partitioned into two sub-windows at the change point detected by OCD-Chen. Then,

(a) (b)

Figure 5.3: (a) An audio stream comprised of three speech segments, each derived from a distinct speaker. C₁ and C₂ are the change points. (b) The ∆BIC curve obtained by applying OCD-Chen to the audio stream in (a).

the sub-windows are input to DACDec1 in the Solve sub-instances stage. Finally, the Combine stage outputs all the change points detected in step 1) and step 4) (i.e., the Solve sub-instances stage).

Discussions: In general, when the data samples are derived from more than one Gaussian distribution, two Gaussians (the H₁ hypothesis) fit the distribution of the data better than one Gaussian (the H₀ hypothesis) if the samples belonging to the same Guassian are used together to estimate the parameters. For example, Figure 5.4 schemat-ically illustrates a case where the three audio segments are derived from three different speakers and their feature vectors distribute as three Gaussian clusters. This case ex-plains why the ∆BIC values at C₁ and C₂ in Figure 5.3 (b) are positive. From the above perspective, if the homogeneous segments in the analysis window of DACDec1 are always derived from different speakers during the recursive process, we can be confident that, at each change point, the H1 hypothesis will fit the data better than the H0 hypothesis;

thus, the ∆BIC value will be positive.

However, if two or more segments in the analysis window are derived from the same speaker, the performance of DACDec1 may decline dramatically. For example, in Figure 5.5 (a), the first and third segments are derived from the same speaker (Speaker1), while the second segment is derived from another speaker (Speaker2). When applying OCD-Chen to the audio stream in Figure 5.5 (a) with the same λ value of BIC used in the example in Figure 5.3, we obtain the ∆BIC curve in Figure 5.5 (b). The curve still has two peaks at the change points C1 and C2 because the H1 hypothesis models the distribution of the data samples better at change points than it does at non-change points.

We use Figures 5.5 (c) and (d) to explain this perspective. Figure 5.5 (c) diagrammatically

Algorithm 6 CP ←DACDec1(W ) Require: W : the analysis window

Ensure: CP : the set of change points detected in W Begin

1. detect whether there is a change point in W by OCD-Chen;

2. //Check termination

if (there is no change point in W or the size of W is smaller than N_min) CP ← φ; //empty set

goto End; //return 3. //Divide

let ˆt be the change point detected in 1);

divide W into two sub-windows, W1 and W2, at ˆt;

4. //Solve sub-instances

CP_W₁ ← DACDec1(W₁); CP_W₂ ← DACDec1(W₂);

5. //Combine

CP ← ˆt∪ CP_W₁ ∪ CP_W₂; End

illustrates the two hypotheses at C2, where all the data samples of Speaker2 (the circles) are used with those of Speaker1 (the stars) to estimate one Gaussian in H1. In contrast, at the non-change point R in Figure 5.5 (b), as shown in Figure 5.5 (d), the data samples of Speaker2 are divided into two parts, each of which is combined with the data samples of Speaker1 (one with the stars and the other with the diamonds) to estimate a distinct Gaussian in H₁. Clearly, the H₁ hypothesis in Figure 5.5 (c) fits the data better than that in Figure 5.5 (d).

In this example, we have peaks at C₁ and C₂. However, their ∆BIC values are negative, and no change point will be output by OCD-Chen because, as illustrated in Figure 5.5 (c), H1 over-fits the data samples of Speaker1 and obtains a smaller BIC value than that of H0. We may adjust the value of λ so that, at C2, the ∆BIC value will be positive (i.e., the hypothesis test favors H₁). However, this may result in false alarms when the recursive process continues to detect change points in a homogeneous segment.

In other words, it is difficult to determine a reliable λ value for an audio stream like the example in Figure 5.5 (a). Moreover, it is infeasible to adjust the value of λ for each specific audio stream in practical applications.

5.2.2 The DACDec2 approach

To overcome the performance limitation caused by unreliable ∆BIC measurements of the over-fitting cases in DACDec1, we developed an alternative implementation of the

divide-H0

Figure 5.4: An illustration that data samples distribute as three Gaussian clusters. For this case, generally, two Gaussians (H₁) fit the distribution of the data better than one Gaussian (H₀) if the samples belonging to the same Gaussian cluster are used together to estimate the parameters.

and-conquer paradigm, called DACDec2. In this approach (Algorithm 7), the ∆BIC value is not used to check the termination in the Check termination stage because it may be unreliable, as illustrated in Figures 5.3 and 5.5. The recursive process terminates (returns) when the size of the analysis window is smaller than N_min samples. In the Divide stage, the analysis window is partitioned into two sub-windows at the time index ˆt that has the largest ∆BIC value located by OCD-Chen. Then, the sub-windows are input to DACDec2 in the Solve sub-instances stage. In the Combine stage, ˆt is labeled as a change point if the ∆BIC value at ˆt calculated in the Divide stage is positive; otherwise, it needs to be verified using its two neighboring segments X and Y. In the verification process, ˆt is only labeled as a change point if ∆BIC{X ,Y}(ˆt) > 0.

Figure 5.6 illustrates a recursive tree that simulates the recursive process of DACDec2 on the audio stream in Figure 5.5 (a). We assume that there are no miss and false alarm errors in the detection process. In the figure, each tree node corresponds to a divide-point (i.e., ˆt) in the analysis window; the number inside the node indicates the order of the division, while the number below the node indicates the order in which the divide-point is verified in the Combine stage. In Figure 5.5 (b), Node 1 (C₂) has a negative ∆BIC value in the Divide stage; however, it will be labeled as a change point by the verification process with segments {c, d, e, f} and {g, h, i} in the Combine stage. Node 2 (C1) has a positive ∆BIC value in the Divide stage; thus, it is labeled as a change point and verification is not necessary. Segments {a} and {b} will be used for verifying Node 3;

segments {c, d} and {e, f} will be used for verifying Node 4, and so on.

(a) (b)

H1 Speaker2

Speaker1

(c)

H1 Speaker2

Speaker1

(d)

Figure 5.5: (a) An audio stream comprised of three speech segments; the first and third segments are derived from the same speaker (Speaker1), while the second is derived from another speaker (Speaker2). (b) The ∆BIC curve obtained by applying OCD-Chen to the audio stream in (a). (c) The diagram of the hypothesis test at the change point C2

in (b). (d) The diagram of the hypothesis test at the non-change point R in (b).

5.2.3 Sequential segmentation by DACDec1 and DACDec2

For a long audio stream, such as a one-hour broadcast news program, the segmentation task becomes computationally intractable when DACDec1 or DACDec2 are used to detect change points. Moreover, if the initial analysis window contains too many segments, it may be difficult for OCD-Chen to find an appropriate λ value to obtain robust ∆BIC measurements for the various hypothesis tests in the recursive process. Therefore, in practical applications, we apply DACDec1 and DACDec2 in a large analysis window of fixed-size (e.g., 20 seconds) that moves from the beginning to the end of the audio stream to detect the speaker changes sequentially. The proposed sequential segmentation algorithms, SeqDACDec1 and SeqDACDec2, are shown in Figure 5.7. In SeqDACDec1 (or SeqDACDec2), if a change point is detected in the fixed-size analysis window by DACDec1 (or DACDec2), the window is moved to the change point with the largest time

Algorithm 7 CP ←DACDec2(W ) Require: W : the analysis window

Ensure: CP : the set of change points detected in W Begin

1. //Check termination

if (the size of W is smaller than Nmin) CP ← φ; //empty set

goto End; //return 2. //Divide

apply OCD-Chen to W and let ˆt be the time index with the largest ∆BIC value;

divide W into two sub-windows, W₁ and W₂, at ˆt;

3. //Solve sub-instances

CP_W₁ ← DACDec2(W₁); CP_W₂ ← DACDec2(W₂);

4. //Combine

if (∆BIC_{W₁_,W₂_}(ˆt) calculated in 2) is positive) CP ← ˆt∪ CP_W₁ ∪ CP_W₂;

else

let X be the segment on the left of ˆt in W₁ and Y be the segment on the right of ˆt in W₂;

if (∆BIC_{{X ,Y}}(ˆt) > 0) //ˆt is a change point CP ← ˆt∪ CP_W₁ ∪ CP_W₂;

else //ˆt is not a change point merge X and Y;

CP ← CP_W₁ ∪ CP_W₂; End

index. Otherwise, it is moved forward by ηL samples, where L denotes the window size, and η > 0. Note that a small η will allow a missed change point to be checked again by DACDec1 (or DACDec2) in the subsequent fixed-size analysis window. Like WinGrow, SeqDACDec1 and SeqDACDec2 are suitable for on-line applications.

在文檔中機率式模型分群法之研究與其應用 (頁 89-94)