Chapter 3 Downsample Codebook Search with Modified Codebook
3.4 Classification before Stochastic Codebook Search
So far, we focus on discussing downsample method to reduce the computational load. The radical problem of downsample method is that we will choose the wrong codebook index after applying downsample method. Thus, we apply the re-search method as a remedy. However, the re-search is not always effective and may be a burden in computation. Like in the “miss” case that the index is not in the candidate index ; thus, re-search method is not only useless, but induces extra computation as well. Therefore, we want to know when re-search is effective or useless and in turn just apply the re-search method to the situation that this method is effective to get the right codebook index and eliminate the useless computation.
k k
Here, we continue the experiment in Figure 3-13. In that experiment, we get
We analyze the spectrum of target signal s when the situation is “hit” and “miss”. 3 k,
The results are shown in Figure 3-14.
Figure 3-14 The spectrum of s3 k,
when “hit” or “miss”.
From Figure 3-14, we can easily see that the magnitude spectrum of the target signal Sav
( )
ω in “hit” or “miss” differ obviously. The magnitude spectrum of thetarget signal in “hit” Sav
( )
ω hit contains more energy in low frequency, whileav
( )
missS ω is white-noise-like.
That is an important result, because we can know when the situation tends to be
close it when the situation tends to be “miss”. Thus, by applying the classification we can avoid inducing extra useless computation.
However, why will lowpass-like signal tend to be “hit” after projection, while while-noise like signals tend to be “miss”? In the later chapter, we will do the same experiments on the dimension 30 and 20 (downsample two and downsample three), and the result is like Figure 3-13; that is, we get a lowpass signal in the “hit”
situation. Besides, the results show the tendency that the more we downsample, the lower spectrum the energy is allocated in.
From these experiments, we guess that the reasons why we can use downsample method in replace of the standard search are that we just compare the lower part of the spectrum which contains most of the characteristics of the speech. From basic DSP theory, the signals will expand the spectrum to M times after downsample
M . Thus, if the aliasing in higher frequency after downsample is negligible, we can compare the signals after the spectrum expansion which means comparing the lowpass part before downsample and the reduction in complexity can achieve.
However, this conjecture may be hazardous, because there is aliasing in spectrum after downsample, so we compare the signals with that; moreover, the lowpass-like signals does not always lead to “hit”. Strictly speaking, they just result in higher
probability to be “hit”. So, the reasons behind this result are need to further research.
So far in this Section, we know that we can execute the classification to decide whether we apply the re-search method or not. Now, we have to investigate the issue that how to implement the classification.
In some applications, some systems will apply voiced/unvoiced classification. In most situations, the voiced signals will lead to the low-pass signal. Thus, we can utilize these kinds of functions to do the classification.
However, if there is no voiced/unvoiced classification function in the CELP system, we can do the classification by comparing how much the target signal
differs between the energy in lower frequency and higher frequency. If the energy in lower frequency is much larger than the higher frequency beyond a specific
threshold, we classify it as lowpass-like signal and we apply the downsample along with re-search method to close-loop search in this subframe. While, if the energy does not differ much (below the specific threshold), we view it as noise-like signal, and we do the close-loop search as it does in standard. This can be shown in Figure 3-15.
3 k,
s
Figure 3-15 Flow chart of classification.
To get the lower frequency part and higher frequency part of , we add one lowpass filter
By these two signals, we can classify the situation as “hit” if the energy difference is much larger than a threshold . We can implement is as c
Here, we adopt absolute value instead of square for low complexity.
Assume that the lowpass filter glow
[ ]
n and ghigh[ ]
n are of order . So, we add multipliers and additions per loop to achieve classification. TheG
2GL 2L
filter order and the computation load is much smaller than close-loop search when we choose the reasonable the filter order .
G
G
The computation load of the method we mentioned is summarized in Table 3-2.
Table 3-2 Computation load in classification, standard search and downsample along with re-search.
Chapter 4
Simulation Results
In this chapter, we will present the simulation results and listening test in the methods that we proposed in this thesis, and all the simulations are done based on the FS1016 standard with some required modifications.
In the Section 4.1, downsample two, three and four are applied. In the Section 4.2, the comparison of different method in signal-noise-ratio (SNR), listening test and computational load is presented.
4.1 Comparison in Downsample two, three and four
In the Chapter 3, the hit rate of different points is shown in Figure 3-2. Here, we just present downsampling in two, three and four. As mentioned above, the more we downsample in target vectors and candidate vectors, the more computation we can save; however, the more degradation in speech quality we sacrifice too.
Here, we focus on the distribution in rank, and the spectrum of target signal s 3 k,
when the situation is hit.
4.1.1 Downsample Two
the spectrum of target signal when hit the spectrum of target signal when miss
Figure 4-2 The spectrum of s3 k,
when “hit” or “miss” in downsampling 2.
4.1.2 Downsample Three
spectrum of target signals when hit spectrum of target signals when miss
s
4.1.3 Downsample Four
specturm of target signal when hit spectrum of target signal when miss
Figure 4-6 The spectrum of s3 k,
when “hit” or “miss” in downsampling 4.
4.1.4 Discussion
From above results, we can see that the hit rate decreases with the increase of the downsample level. Besides, in the cases of downsample three and downsample four, the differences between the characteristics of the spectrums of target signals when hit and miss are not apparent as that in the downsample two. This means that we have to use sharper filters during classification.
4.2 Comparison between different proposed method
In this section, different methods are compared. We design four situations (situation A, B, C, D) to do the comparisons.
In the situation A, all the computations are the same with standard FS1016. In the situation B, the close-loop search in stochastic codebook part is downsampled two. In the situation C, the modified codebook with downsample two is used, and in the situation D, the classification with re-search method is added in situation B. This summarized in Table 4-1.
Table 4-1 Applied method in different situation.
Method Situation
Downsample by 2
Modified
codebook Classification Re-search
A
B ○
C ○ ○
D ○ ○ ○
And speeches samples in three males (m1, m2, m3) and females (f1, f2, f3) are used. The results are shown in Figure 4-7 and Table 4-2.
Figure 4-7 The SNR in different methods.
Table 4-2 The SNR(dB) in different methods.
As shown in Figure 4-7, the situation B suffers a little degradation, while situation D can improve the speech quality that is degraded by downsample.
Here, we use comparison category rating (CCR) to evaluate the subjective speech quality of these methods [3]. Twenty listeners write down their scores in comparison with the situation A (FS1016 only) using the scale shown in Table 4-1.
And the average score is shown in Table 4-3.
Table 4-3 CCR scale.
Comparison Score
Much better +3
Better +2
Slightly better +1
About the same 0
Slightly worse -1
Worse -2
Much worse -3
Table 4-4 The averaged scores of CCR in comparison of situation A (standard).
Situation
Speech samples B C D
m1 0 -0.1 0
m2 0 -0.1 0
m3 -0.1 0 0
f1 0 0 0
f2 0 -0.1 0
f3 0 0 0
We can see that the subjective speech quality decreases a little due to downsample by 2. Moreover, the quality will be worse by applying modified stochastic codebook. However, listeners often can not tell the difference between situation A and situation D.
Besides, the computational load is also shown in Figure 4-8.
Figure 4-8 The computational load in close-loop search.
In Figure 4-8, we only consider the multiplier in the close-loop search. From this figure, we can easily see that the computational load between different methods we propose.
Here, we show more detailed information between methods. In the Figure 4-9, the hit rate about situation B and D is shown.
Figure 4-9 The probability of hit in situation B and D.
Chapter 5
Conclusion and Future work
In this thesis, the downsample method with modified codebook we proposed can save a lot of computation compared to the standard. Besides, the quality issue in these methods has been considered and solved. In re-search and classifying scheme that we propose as a remedy to downsample method can increase the quality that nearly the same with the quality of the standard FS1016, but the computational load that needs to add is low. To sum up, we can fairly say that by applying these
methods the total computational load can be reduced at the expense of imperceptible degradations.
However, the philosophy behind the downsample method is not very clear yet.
Thus, the researches are limited to the statistic analysis. Besides, the verifications in
applying these techniques to other codebook structures are not done yet. Thus, there is still much space that is worth being investigated
Bibliography
[1] WAI C. CHU “Speech coding algorithms,” in 2003 by John Wiley & Sons.
[2] Masahiro Serizawa, Hironori Ito and Toshiyuki Nomura, “A silence compression algorithm for multi-rate/dual-bandwidth MPEG-4 CELP standard,” in IEEE 2000.
[3] Lucio Martins da Silva and Abraham Alcaim “A modified CELP model with computationally efficient adaptive codebook search,” IEEE 1995.