An optimization method to efficiently search the optimal parameters in the MMSE-VAD-TRA-NR algorithms has been proposed. In order to obtain optimal NR performances, the optimization method employs a SA method and constructs an appropriate objective function to achieve the goal. We observe that the parameters β and δ need to be chosen carefully because they can affect the estimate of the noise spectrum obviously; that is, β and δ are the most important parameter to affect the NR performance of the MMSE-VAD-TRA-NR significantly.
The comparisons and research of some NR algorithms have been represented in computation complexity, objective tests, and subjective listening tests. The results of the processing time reflect the calculation and processing data complexity in those NR algorithms. The results of objective and subjective tests do not only imply that the
Wiener filtering algorithm yield the more residual noise in order to avoid serious signal distortion, but also shows that the overestimate of noise results in the lowest scores of signal distortion SIG in the KTL-NR algorithm. The results of the subjective listening tests nearly indicate that for all subjective indices, the MMSE-NR, MMSE-TRA-NR and MMSE-VAD-TRA-NR algorithms perform equally well in the white and car noise scenarios. Therefore, it can be concluded that the MMSE-NR, MMSE-TRA-NR and MMSE-VAD-TRA-NR algorithms are better NR algorithms than others according to the aforementioned comparisons and research in the paper.
To enhance recognition rate is not the main propose of the general NR algorithm.
After the general NR algorithm processing, the signal will enhance speech and reduce the noise. But, sometimes the speech will be distortion because the noise reduction of the NR algorithm is too aggressive. However, MMSE-TRA-NR can change the parameters to enhance the recognition rate and avoid the trade-off between the distortion and noise reduction.
Future research is planned on integrating the noise reduction algorithms with the microphone arrays to exploit its full potential of noise suppression in telecommunication applications such as peer-to-peer internet telephony networks, hands free car-kits, wireless earphones, and so forth.
REFERENCES
[1]Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short time spectral amplitude estimator,” IEEE Trans on Acoustic, Speech, Signal Process. 32(6), 1109-1121 (1984).
[2] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process. 28(2), 137-145 (1980).
[3] E. Hänsler and G. Schmidt, Acoustic Echo and Noise Control a Practical Approach, (John Wiley, New York, 2004)
[4] R. E. Crochiere, “A weighted overlap-add method of short-time Fourier analysis/synthesis,” IEEE Trans on Acoustics, Speech, Signal Process. 281(1), 99-102 (1980).
[5] M. R. Portnoff, “Implementation of the digital phase vocoder using the fast Fourier transform,” IEEE Trans on Acoustics, Speech, Signal Process, 24(3), 243-248 (1976).
[6] U. Zölzer, DAFX – Digital Audio Effects, (John Wiley, New York, 2002)
[7] S. L. Gay, J. Benesty, Acoustic Signal Processing for Telecommunication (Kluwer Academic Publishers, Norwell, MA, 2000)
[8]N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications (John Wiley, New York, 1949)
[9] B. Farhang-Boroujeny, Adaptive Filters Theory and Application (John Wiley, New York, 2000)
[10] S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction (John Wiley, New York, 1996)
[11] P. C. Loizou, Speech Enhancement Theory and Practice (CRC, New York, 2007) [12] Y. Hu and P. C. Loizou, “A generalized subspace approach for enhancing speech
corrupted by colored noise,” IEEE Trans on Acoustics, Speech, Signal Process, 11(4), 334-341 (2003).
[13] L. Lin ,W. Holmes and E. Ambikairajah, “Adaptive Noise Estimation Algorithm for Speech Enhancement,” Electronics Lett., 39(9), 754-755 (2003).
[14] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller,
“Equations of State Calculations by Fast Computing Machines,” JCP, 1087-1092 (1953).
[15] A. Das and B. K. Chakrabarti (Eds.), Quantum Annealing and Related Optimization Methods (Springer, Heidelberg, 2005)
[16] J. De Vicente, J. Lanchares, R. Hermida, "Placement by Thermodynamic Simulated Annealing,” Physics Letters A, 317(5-6), 415-423 (2003).
[17] S. J. Orfanidis, Optimum Signal Processing, An Introduction, (McGraw Hill, New York, 1996).
[18] ITU-R Rec. P.862, “Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs,” (International Telecommunications Union, Geneva, Switzerland, 2000).
[19] ITU-R Rec. P.835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm,” (International Telecommunications Union, Geneva, Switzerland, 2003).
[20] G. Keppel and S. Zedeck, Data analysis for research designs. (Freeman, New York, 1989).
TABLE I. The NR performance of the MMSE-TRA-NR algorithm and MMSE-VAD-TRA algorithm in terms of the SNRseg and PESQ for different values of parameters β and δ
Noise type Algorithms
β δ
SNRseg PESQTRA 1.6 1 -1.0942 1.9639
optimal-TRA 0.6117 0.5214 1.5155 2.1619
VAD-TRA 1.6 1 -1.1833 1.7369
white noise
optimal-VAD-TRA 0.5671 0.2606 1.5899 2.1582
TRA 1.6 1 -1.5609 2.2168
optimal-TRA 0.7128 0.5265 0.7061 2.3145
VAD-TRA 1.6 1 -1.4524 2.1396
car
optimal-VAD-TRA 0.6896 0.1724 0.7666 2.3219
TABLE II. The MANOVA results of the subjective listening test in white noise and car noise conditions for compare with and without optimization in MMSE-TRA-NR and MMSE-VAD-TRA-NR.
Significance value Algorithm Noise type
SIG BAK OVL
White noise 0.040 0.000 0.117
MMSE-TRA
Car noise 0.017 0.000 0.784
White noise 0.042 0.000 0.126
MMSE-VAD-TRA
Car noise 0.015 0.000 0.631
TABLE III. Comparison of computational requirement and objective noise reduction performance of the six noise reduction algorithms.
SNRseg PESQ
Noise condition
NR algorithms White Car White Car
Spectral subtraction 2.115 1.450 2.224 2.118
Wiener filtering 0.878 0.073 2.162 2.322
MMSE-NR 2.215 1.224 2.250 2.394
MMSE-TRA-NR 1.515 0.7061 2.161 2.314
MMSE-VAD-TRA-NR 1.5899 0.7666 2.1582 2.3219
KLT-NR 3.177 1.856 2.400 2.367
TABLE IV. The MANOVA output of the listening test of the NR algorithms.
Cases with significance value p below 0.05 indicate that statistically significant difference exists among all methods.
Significance value p Noise type
SIG BAK OVL
White noise 0.007 0.000 0.006
Car noise 0.012 0.000 0.083
TABLE V. The result from Tukey’s HSD test for SIG, BAK and OVL between NR algorithms. (The denoted NR algorithms by asterisks have equally good performance.
The algorithms with no asterisks have poor performance.)
SIG BAK OVL
Noise condition
NR algorithms White Car White Car White Car
Spectral subtraction
* * *
Wiener filtering
* * * *
MMSE-NR
* * * * * *
MMSE-TRA-NR
* * * * *
MMSE-VAD-TRA-NR
* * * * * *
KLT-NR
* * *
FIG. 1. General structure of NR algorithms.
Forward transform
Inverse transform Main NR
process Noisy
signal
Enhanced speech
( )
Noisy signal y n
( )
Desierd d n
( )
Error e n
[ w w
0,
1, , w
M−1]
T=
w K
FIG. 2. Block diagram of the filtering problem.
Output
Linear time-invariant filter
+
-FIG. 3. The smoothing factor α λ
( )
, k calculated according to Eq. (40) for different values of the parameter β when δ =1. (Solid line:β =5; Dashline:β =10; Dotted line: β =20)
0 2 4 6 8 10 12 14 x 104 -1
-0.5 0 0.5 1
Comparison of VAD and TRA algorithms.
Amplitude(V)
0 2 4 6 8 10 12 14
x 104 -1
-0.5 0 0.5 1
time(sample)
Amplitude(V)
FIG. 4. Plots of the non-stationary noise (solid line) estimated using the VAD (top panel) and TRA (bottom panel) algorithms from noisy speech signal (dotted line).
No No No
Yes Yes
Yes
c T
α × Set initial temperatureT , final temperature0 T , f
cooling factor αc and objective function Q . f
Compute new objective function Q
0,
c opt f
T =α ×T Q =Q
T ≤Tf
Q Q >
opt eQ Qopt−T >ϕStop Qopt =Q
FIG. 5. The flow diagram of the SA method.
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.4
-0.2 0 0.2 0.4
white noise
Amplitude(V)
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.4
-0.2 0 0.2 0.4
car noise
Amplitude(V)
time(sample)
FIG. 6. The waveforms of a test sentence corrupted by white noise (top panel) and car noise (bottom panel).
Time
Frequency
white noise condition
0 0.5 1 1.5 2
0 500 1000 1500 2000 2500 3000 3500 4000
-70 -60 -50 -40 -30 -20 -10 0 10 20
(a)
Time
Frequency
car noise condition
0 0.5 1 1.5 2
0 500 1000 1500 2000 2500 3000 3500 4000
-100 -80 -60 -40 -20 0 20
(b)
FIG. 7 The spectrograms of the test sentence corrupted by two noise conditions. (a) White noise. (b) Car noise.
(a)
(b)
SIG BAK OVL MMSE-VAD-TRA-NR (no SA) MMSE-VAD-TRA-NR (SA)
White noise condition 0.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Grades
(c)
SIG BAK OVL MMSE-VAD-TRA-NR (no SA) MMSE-VAD-TRA-NR (SA)
Car noise condition 0.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Grades
(d)
FIG. 8 The results of the listening test analyzed by using the MANOVA. (a) MMSE-TRA-NR in white noise condition. (b) MMSE-TRA-NR in car noise
condition. (c) MMSE-VAD-TRA-NR in white noise condition. (d) MMSE-VAD-TRA-NR in car noise condition.
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.5
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
noise-free speech signal
amplitude(V)
time(sample)
(a)
0 5000 10000 15000
0 5000 10000 15000
FIG. 9. Simulation results for six NR algorithms. (a) The noise-free speech signal used for a computer simulation. (b) Waveforms of the noisy and processed speech signals via six NR algorithms in white noise condition. (c) Waveforms of the noisy and processed speech signals via six NR algorithms in car noise condition. (Dotted
line: noisy speech signals; Solid line: processed speech signals)
SIG BAK OVL KLT-NR
MMSE-NR
MMSE-TRA-NR
Spec-sub
MMSE-VAD-TRA-NR
Wiener filtering White noise condition
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Grades
(a)
SIG BAK OVL KLT-NR
MMSE-NR
MMSE-TRA-NR
Spec-sub
MMSE-VAD-TRA-NR
Wiener filtering Car noise condition
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
Grades
(b)
FIG. 10. The results of the listening test analyzed by using the MANOVA. (a) White noise case. (b) Car case.
0 2 4 6 8 10 12
FIG. 11. The recognition rate in different noise condition and SNR level.
6 7 8 9 10 11 12 0
10 20 30 40 50 60 70 80 90 100
Movie noise condition
SNR (dB)
Recognition Rate (%)
20ms 50ms 80ms Original
FIG. 12. The recognition rate in movie noise condition with different window length and SNR level.
0 2 4 6 8 10 12
FIG. 13. The comparison of recognition rate between the with SA and without SA in the babble noise condition (the left figure) and in the movie noise condition (the
right figure).
0 20 40 60 80 100 62
64 66 68 70 72 74 76 78
Movie noise condition
Processed signal ratio (%)
Recognition Rate (%)
FIG. 14. The recognition rate in movie noise condition with different processed signal ratio.
0 2 4 6 8 10 12
FIG. 15. The comparison of recognition rate between the with optimum and without optimum in the babble noise condition (the left figure) and in the movie noise
condition (the right figure).