CONCLUSIONS AND FUTURE WORK

An optimization method to efficiently search the optimal parameters in the MMSE-VAD-TRA-NR algorithms has been proposed. In order to obtain optimal NR performances, the optimization method employs a SA method and constructs an appropriate objective function to achieve the goal. We observe that the parameters β^and δ need to be chosen carefully because they can affect the estimate of the noise spectrum obviously; that is, β^and δ are the most important parameter to affect the NR performance of the MMSE-VAD-TRA-NR significantly.

The comparisons and research of some NR algorithms have been represented in computation complexity, objective tests, and subjective listening tests. The results of the processing time reflect the calculation and processing data complexity in those NR algorithms. The results of objective and subjective tests do not only imply that the

Wiener filtering algorithm yield the more residual noise in order to avoid serious signal distortion, but also shows that the overestimate of noise results in the lowest scores of signal distortion SIG in the KTL-NR algorithm. The results of the subjective listening tests nearly indicate that for all subjective indices, the MMSE-NR, MMSE-TRA-NR and MMSE-VAD-TRA-NR algorithms perform equally well in the white and car noise scenarios. Therefore, it can be concluded that the MMSE-NR, MMSE-TRA-NR and MMSE-VAD-TRA-NR algorithms are better NR algorithms than others according to the aforementioned comparisons and research in the paper.

To enhance recognition rate is not the main propose of the general NR algorithm.

After the general NR algorithm processing, the signal will enhance speech and reduce the noise. But, sometimes the speech will be distortion because the noise reduction of the NR algorithm is too aggressive. However, MMSE-TRA-NR can change the parameters to enhance the recognition rate and avoid the trade-off between the distortion and noise reduction.

Future research is planned on integrating the noise reduction algorithms with the microphone arrays to exploit its full potential of noise suppression in telecommunication applications such as peer-to-peer internet telephony networks, hands free car-kits, wireless earphones, and so forth.

REFERENCES

[1]Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short time spectral amplitude estimator,” IEEE Trans on Acoustic, Speech, Signal Process. 32(6), 1109-1121 (1984).

[2] R. J. McAulay and M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoust., Speech, Signal Process. 28(2), 137-145 (1980).

[3] E. Hänsler and G. Schmidt, Acoustic Echo and Noise Control a Practical Approach, (John Wiley, New York, 2004)

[4] R. E. Crochiere, “A weighted overlap-add method of short-time Fourier analysis/synthesis,” IEEE Trans on Acoustics, Speech, Signal Process. 281(1), 99-102 (1980).

[5] M. R. Portnoff, “Implementation of the digital phase vocoder using the fast Fourier transform,” IEEE Trans on Acoustics, Speech, Signal Process, 24(3), 243-248 (1976).

[6] U. Zölzer, DAFX – Digital Audio Effects, (John Wiley, New York, 2002)

[7] S. L. Gay, J. Benesty, Acoustic Signal Processing for Telecommunication (Kluwer Academic Publishers, Norwell, MA, 2000)

[8]N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications (John Wiley, New York, 1949)

[9] B. Farhang-Boroujeny, Adaptive Filters Theory and Application (John Wiley, New York, 2000)

[10] S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction (John Wiley, New York, 1996)

[11] P. C. Loizou, Speech Enhancement Theory and Practice (CRC, New York, 2007) [12] Y. Hu and P. C. Loizou, “A generalized subspace approach for enhancing speech

corrupted by colored noise,” IEEE Trans on Acoustics, Speech, Signal Process, 11(4), 334-341 (2003).

[13] L. Lin ,W. Holmes and E. Ambikairajah, “Adaptive Noise Estimation Algorithm for Speech Enhancement,” Electronics Lett., 39(9), 754-755 (2003).

[14] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller,

“Equations of State Calculations by Fast Computing Machines,” JCP, 1087-1092 (1953).

[15] A. Das and B. K. Chakrabarti (Eds.), Quantum Annealing and Related Optimization Methods (Springer, Heidelberg, 2005)

[16] J. De Vicente, J. Lanchares, R. Hermida, "Placement by Thermodynamic Simulated Annealing,” Physics Letters A, 317(5-6), 415-423 (2003).

[17] S. J. Orfanidis, Optimum Signal Processing, An Introduction, (McGraw Hill, New York, 1996).

[18] ITU-R Rec. P.862, “Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs,” (International Telecommunications Union, Geneva, Switzerland, 2000).

[19] ITU-R Rec. P.835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm,” (International Telecommunications Union, Geneva, Switzerland, 2003).

[20] G. Keppel and S. Zedeck, Data analysis for research designs. (Freeman, New York, 1989).

TABLE I. The NR performance of the MMSE-TRA-NR algorithm and MMSE-VAD-TRA algorithm in terms of the SNRseg and PESQ for different values of parameters β^and δ

Noise type Algorithms

β δ

^SNRseg ^PESQ

TRA 1.6 1 -1.0942 1.9639

optimal-TRA 0.6117 0.5214 1.5155 2.1619

VAD-TRA 1.6 1 -1.1833 1.7369

white noise

optimal-VAD-TRA 0.5671 0.2606 1.5899 2.1582

TRA 1.6 1 -1.5609 2.2168

optimal-TRA 0.7128 0.5265 0.7061 2.3145

VAD-TRA 1.6 1 -1.4524 2.1396

car

optimal-VAD-TRA 0.6896 0.1724 0.7666 2.3219

TABLE II. The MANOVA results of the subjective listening test in white noise and car noise conditions for compare with and without optimization in MMSE-TRA-NR and MMSE-VAD-TRA-NR.

Significance value Algorithm Noise type

SIG BAK OVL

White noise 0.040 0.000 0.117

MMSE-TRA

Car noise 0.017 0.000 0.784

White noise 0.042 0.000 0.126

MMSE-VAD-TRA

Car noise 0.015 0.000 0.631

TABLE III. Comparison of computational requirement and objective noise reduction performance of the six noise reduction algorithms.

SNRseg PESQ

Noise condition

NR algorithms White Car White Car

Spectral subtraction 2.115 1.450 2.224 2.118

Wiener filtering 0.878 0.073 2.162 2.322

MMSE-NR 2.215 1.224 2.250 2.394

MMSE-TRA-NR 1.515 0.7061 2.161 2.314

MMSE-VAD-TRA-NR 1.5899 0.7666 2.1582 2.3219

KLT-NR 3.177 1.856 2.400 2.367

TABLE IV. The MANOVA output of the listening test of the NR algorithms.

Cases with significance value p below 0.05 indicate that statistically significant difference exists among all methods.

Significance value p Noise type

SIG BAK OVL

White noise 0.007 0.000 0.006

Car noise 0.012 0.000 0.083

TABLE V. The result from Tukey’s HSD test for SIG, BAK and OVL between NR algorithms. (The denoted NR algorithms by asterisks have equally good performance.

The algorithms with no asterisks have poor performance.)

SIG BAK OVL

Noise condition

NR algorithms White Car White Car White Car

Spectral subtraction

* * *

Wiener filtering

* * * *

MMSE-NR

* * * * * *

MMSE-TRA-NR

* * * * *

MMSE-VAD-TRA-NR

* * * * * *

KLT-NR

* * *

FIG. 1. General structure of NR algorithms.

Forward transform

Inverse transform Main NR

process Noisy

signal

Enhanced speech

( )

Noisy signal y n

( )

Desierd d n

( )

Error e n

[ ^{w w}

⁰

^,

^, ^, ^w

^M⁻¹

]

=

w K

FIG. 2. Block diagram of the filtering problem.

Output

Linear time-invariant filter

+

-FIG. 3. The smoothing factor ^{α λ}

( )

^{, k} calculated according to Eq. (40) for different values of the parameter β^when δ =¹. (Solid line:β =5; Dash

line:β =10; Dotted line: β =20)

0 2 4 6 8 10 12 14 x 10⁴ -1

-0.5 0 0.5 1

Comparison of VAD and TRA algorithms.

Amplitude(V)

0 2 4 6 8 10 12 14

x 10⁴ -1

-0.5 0 0.5 1

time(sample)

Amplitude(V)

FIG. 4. Plots of the non-stationary noise (solid line) estimated using the VAD (top panel) and TRA (bottom panel) algorithms from noisy speech signal (dotted line).

No No No

Yes Yes

Yes

c T

α × Set initial temperatureT , final temperature₀ T , _f

cooling factor α_c and objective function Q . _f

Compute new objective function Q

c opt f

T =α ×T Q =Q

T ≤Tf

Q Q >

_opt _e^{Q Qopt}⁻^T _>_ϕ

Stop Q^opt =Q

FIG. 5. The flow diagram of the SA method.

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.4

-0.2 0 0.2 0.4

white noise

Amplitude(V)

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.4

-0.2 0 0.2 0.4

car noise

Amplitude(V)

time(sample)

FIG. 6. The waveforms of a test sentence corrupted by white noise (top panel) and car noise (bottom panel).

Time

Frequency

white noise condition

0 0.5 1 1.5 2

0 500 1000 1500 2000 2500 3000 3500 4000

-70 -60 -50 -40 -30 -20 -10 0 10 20

(a)

Time

Frequency

car noise condition

0 0.5 1 1.5 2

0 500 1000 1500 2000 2500 3000 3500 4000

-100 -80 -60 -40 -20 0 20

(b)

FIG. 7 The spectrograms of the test sentence corrupted by two noise conditions. (a) White noise. (b) Car noise.

(a)

(b)

SIG BAK OVL MMSE-VAD-TRA-NR (no SA) MMSE-VAD-TRA-NR (SA)

White noise condition 0.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Grades

(c)

SIG BAK OVL MMSE-VAD-TRA-NR (no SA) MMSE-VAD-TRA-NR (SA)

Car noise condition 0.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Grades

(d)

FIG. 8 The results of the listening test analyzed by using the MANOVA. (a) MMSE-TRA-NR in white noise condition. (b) MMSE-TRA-NR in car noise

condition. (c) MMSE-VAD-TRA-NR in white noise condition. (d) MMSE-VAD-TRA-NR in car noise condition.

0 2000 4000 6000 8000 10000 12000 14000 16000 18000 -0.5

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5

noise-free speech signal

amplitude(V)

time(sample)

(a)

0 5000 10000 15000

FIG. 9. Simulation results for six NR algorithms. (a) The noise-free speech signal used for a computer simulation. (b) Waveforms of the noisy and processed speech signals via six NR algorithms in white noise condition. (c) Waveforms of the noisy and processed speech signals via six NR algorithms in car noise condition. (Dotted

line: noisy speech signals; Solid line: processed speech signals)

SIG BAK OVL KLT-NR

MMSE-NR

MMSE-TRA-NR

Spec-sub

MMSE-VAD-TRA-NR

Wiener filtering White noise condition

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

Grades

(a)

SIG BAK OVL KLT-NR

MMSE-NR

MMSE-TRA-NR

Spec-sub

MMSE-VAD-TRA-NR

Wiener filtering Car noise condition

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

Grades

(b)

FIG. 10. The results of the listening test analyzed by using the MANOVA. (a) White noise case. (b) Car case.

0 2 4 6 8 10 12

FIG. 11. The recognition rate in different noise condition and SNR level.

6 7 8 9 10 11 12 0

10 20 30 40 50 60 70 80 90 100

Movie noise condition

SNR (dB)

Recognition Rate (%)

20ms 50ms 80ms Original

FIG. 12. The recognition rate in movie noise condition with different window length and SNR level.

0 2 4 6 8 10 12

FIG. 13. The comparison of recognition rate between the with SA and without SA in the babble noise condition (the left figure) and in the movie noise condition (the

right figure).

0 20 40 60 80 100 62

64 66 68 70 72 74 76 78

Movie noise condition

Processed signal ratio (%)

Recognition Rate (%)

FIG. 14. The recognition rate in movie noise condition with different processed signal ratio.

0 2 4 6 8 10 12

FIG. 15. The comparison of recognition rate between the with optimum and without optimum in the babble noise condition (the left figure) and in the movie noise

condition (the right figure).

在文檔中對語音增強的單聲道噪音消除演算法 (頁 31-61)

β δ

* * *

* * * *

* * * * * *

* * * * *

* * * * * *

* * *

Forward transform

Inverse transform Main NR

process Noisy

signal

( )

Noisy signal y n

( )

Desierd d n

( )

Error e n

[ w w

,

, , w

]

=

w K

Output

Linear time-invariant filter

+

( )

Q Q >

[ ^{w w}

^,

^, ^, ^w