In this section, we present two implementations of the diviand-conquer paradigm for de-tecting multiple change points in an analysis window. Note that the proposed approaches are based on the same assumption as that of WinGrow, i.e., the feature vectors of audio segments from different acoustic source are derived from different Gaussian distributions.
5.2.1 The DACDec1 approach
We use the example in Figure 5.3 to explain the concept of divide-and-conquer-based segmentation. It is assumed that the audio stream in Figure 5.3 (a) consists of three homogeneous segments derived from different speakers. Initially, OCD-Chen is applied in an analysis window that covers the entire audio stream. After the change point C2 has been detected with the ∆BIC curve in Figure 5.3 (b), the audio stream is divided into two analysis windows. Then, OCD-Chen is recursively applied in these two windows to search for the remaining change points so that C1 can be detected. This approach, called DACDec1, allows us to detect the change points by a divide-and-conquer (DAC) strategy.
As described in Algorithm 6, DACDec1 terminates (returns) if no change point is detected by OCD-Chen in the analysis window or the size of the analysis window is smaller than a pre-defined value, denoted as Nmin samples. In the Divide stage, the analysis window is partitioned into two sub-windows at the change point detected by OCD-Chen. Then,
(a) (b)
Figure 5.3: (a) An audio stream comprised of three speech segments, each derived from a distinct speaker. C1 and C2 are the change points. (b) The ∆BIC curve obtained by applying OCD-Chen to the audio stream in (a).
the sub-windows are input to DACDec1 in the Solve sub-instances stage. Finally, the Combine stage outputs all the change points detected in step 1) and step 4) (i.e., the Solve sub-instances stage).
Discussions: In general, when the data samples are derived from more than one Gaussian distribution, two Gaussians (the H1 hypothesis) fit the distribution of the data better than one Gaussian (the H0 hypothesis) if the samples belonging to the same Guassian are used together to estimate the parameters. For example, Figure 5.4 schemat-ically illustrates a case where the three audio segments are derived from three different speakers and their feature vectors distribute as three Gaussian clusters. This case ex-plains why the ∆BIC values at C1 and C2 in Figure 5.3 (b) are positive. From the above perspective, if the homogeneous segments in the analysis window of DACDec1 are always derived from different speakers during the recursive process, we can be confident that, at each change point, the H1 hypothesis will fit the data better than the H0 hypothesis;
thus, the ∆BIC value will be positive.
However, if two or more segments in the analysis window are derived from the same speaker, the performance of DACDec1 may decline dramatically. For example, in Figure 5.5 (a), the first and third segments are derived from the same speaker (Speaker1), while the second segment is derived from another speaker (Speaker2). When applying OCD-Chen to the audio stream in Figure 5.5 (a) with the same λ value of BIC used in the example in Figure 5.3, we obtain the ∆BIC curve in Figure 5.5 (b). The curve still has two peaks at the change points C1 and C2 because the H1 hypothesis models the distribution of the data samples better at change points than it does at non-change points.
We use Figures 5.5 (c) and (d) to explain this perspective. Figure 5.5 (c) diagrammatically
Algorithm 6 CP ←DACDec1(W ) Require: W : the analysis window
Ensure: CP : the set of change points detected in W Begin
1. detect whether there is a change point in W by OCD-Chen;
2. //Check termination
if (there is no change point in W or the size of W is smaller than Nmin) CP ← φ; //empty set
goto End; //return 3. //Divide
let ˆt be the change point detected in 1);
divide W into two sub-windows, W1 and W2, at ˆt;
4. //Solve sub-instances
CPW1 ← DACDec1(W1); CPW2 ← DACDec1(W2);
5. //Combine
CP ← ˆt∪ CPW1 ∪ CPW2; End
illustrates the two hypotheses at C2, where all the data samples of Speaker2 (the circles) are used with those of Speaker1 (the stars) to estimate one Gaussian in H1. In contrast, at the non-change point R in Figure 5.5 (b), as shown in Figure 5.5 (d), the data samples of Speaker2 are divided into two parts, each of which is combined with the data samples of Speaker1 (one with the stars and the other with the diamonds) to estimate a distinct Gaussian in H1. Clearly, the H1 hypothesis in Figure 5.5 (c) fits the data better than that in Figure 5.5 (d).
In this example, we have peaks at C1 and C2. However, their ∆BIC values are negative, and no change point will be output by OCD-Chen because, as illustrated in Figure 5.5 (c), H1 over-fits the data samples of Speaker1 and obtains a smaller BIC value than that of H0. We may adjust the value of λ so that, at C2, the ∆BIC value will be positive (i.e., the hypothesis test favors H1). However, this may result in false alarms when the recursive process continues to detect change points in a homogeneous segment.
In other words, it is difficult to determine a reliable λ value for an audio stream like the example in Figure 5.5 (a). Moreover, it is infeasible to adjust the value of λ for each specific audio stream in practical applications.
5.2.2 The DACDec2 approach
To overcome the performance limitation caused by unreliable ∆BIC measurements of the over-fitting cases in DACDec1, we developed an alternative implementation of the
divide-H0
H1
Figure 5.4: An illustration that data samples distribute as three Gaussian clusters. For this case, generally, two Gaussians (H1) fit the distribution of the data better than one Gaussian (H0) if the samples belonging to the same Gaussian cluster are used together to estimate the parameters.
and-conquer paradigm, called DACDec2. In this approach (Algorithm 7), the ∆BIC value is not used to check the termination in the Check termination stage because it may be unreliable, as illustrated in Figures 5.3 and 5.5. The recursive process terminates (returns) when the size of the analysis window is smaller than Nmin samples. In the Divide stage, the analysis window is partitioned into two sub-windows at the time index ˆt that has the largest ∆BIC value located by OCD-Chen. Then, the sub-windows are input to DACDec2 in the Solve sub-instances stage. In the Combine stage, ˆt is labeled as a change point if the ∆BIC value at ˆt calculated in the Divide stage is positive; otherwise, it needs to be verified using its two neighboring segments X and Y. In the verification process, ˆt is only labeled as a change point if ∆BIC{X ,Y}(ˆt) > 0.
Figure 5.6 illustrates a recursive tree that simulates the recursive process of DACDec2 on the audio stream in Figure 5.5 (a). We assume that there are no miss and false alarm errors in the detection process. In the figure, each tree node corresponds to a divide-point (i.e., ˆt) in the analysis window; the number inside the node indicates the order of the division, while the number below the node indicates the order in which the divide-point is verified in the Combine stage. In Figure 5.5 (b), Node 1 (C2) has a negative ∆BIC value in the Divide stage; however, it will be labeled as a change point by the verification process with segments {c, d, e, f} and {g, h, i} in the Combine stage. Node 2 (C1) has a positive ∆BIC value in the Divide stage; thus, it is labeled as a change point and verification is not necessary. Segments {a} and {b} will be used for verifying Node 3;
segments {c, d} and {e, f} will be used for verifying Node 4, and so on.
(a) (b)
H0
H1 Speaker2
Speaker1
Speaker1
(c)
H0
H1 Speaker2
Speaker1
Speaker1
(d)
Figure 5.5: (a) An audio stream comprised of three speech segments; the first and third segments are derived from the same speaker (Speaker1), while the second is derived from another speaker (Speaker2). (b) The ∆BIC curve obtained by applying OCD-Chen to the audio stream in (a). (c) The diagram of the hypothesis test at the change point C2
in (b). (d) The diagram of the hypothesis test at the non-change point R in (b).
5.2.3 Sequential segmentation by DACDec1 and DACDec2
For a long audio stream, such as a one-hour broadcast news program, the segmentation task becomes computationally intractable when DACDec1 or DACDec2 are used to detect change points. Moreover, if the initial analysis window contains too many segments, it may be difficult for OCD-Chen to find an appropriate λ value to obtain robust ∆BIC measurements for the various hypothesis tests in the recursive process. Therefore, in practical applications, we apply DACDec1 and DACDec2 in a large analysis window of fixed-size (e.g., 20 seconds) that moves from the beginning to the end of the audio stream to detect the speaker changes sequentially. The proposed sequential segmentation algorithms, SeqDACDec1 and SeqDACDec2, are shown in Figure 5.7. In SeqDACDec1 (or SeqDACDec2), if a change point is detected in the fixed-size analysis window by DACDec1 (or DACDec2), the window is moved to the change point with the largest time
Algorithm 7 CP ←DACDec2(W ) Require: W : the analysis window
Ensure: CP : the set of change points detected in W Begin
1. //Check termination
if (the size of W is smaller than Nmin) CP ← φ; //empty set
goto End; //return 2. //Divide
apply OCD-Chen to W and let ˆt be the time index with the largest ∆BIC value;
divide W into two sub-windows, W1 and W2, at ˆt;
3. //Solve sub-instances
CPW1 ← DACDec2(W1); CPW2 ← DACDec2(W2);
4. //Combine
if (∆BIC{W1,W2}(ˆt) calculated in 2) is positive) CP ← ˆt∪ CPW1 ∪ CPW2;
else
let X be the segment on the left of ˆt in W1 and Y be the segment on the right of ˆt in W2;
if (∆BIC{X ,Y}(ˆt) > 0) //ˆt is a change point CP ← ˆt∪ CPW1 ∪ CPW2;
else //ˆt is not a change point merge X and Y;
CP ← CPW1 ∪ CPW2; End
index. Otherwise, it is moved forward by ηL samples, where L denotes the window size, and η > 0. Note that a small η will allow a missed change point to be checked again by DACDec1 (or DACDec2) in the subsequent fixed-size analysis window. Like WinGrow, SeqDACDec1 and SeqDACDec2 are suitable for on-line applications.