C. Beam steering
IV. CONCLUSIONS
The enhancement of speech recognition using microphone arrays is presented in this paper. The super-directive microphone arrays steering to endfire performs well when the noise source is from the rear. An equalizer is applied in the super-directive microphone arrays to prevent the distortion in speech signal. When the noise signal is close to the speech, PD is proposed to solve this problem. Using GSS to find the optimal ITD threshold differing with the included angle and the optimal volume can further improve the speech recognition. Finally, simulated and experimental results are discussed to prove effective in enhancement of speech recognition.
23
REFERENCES
1. Y. Gong, ―Speech recognition in noisy environments: a survey‖, Speech Commun.
16(1995), 261-291.
2. J. Bitzer, K. U. Simmer and K. D. Kammeyer, ―Multi-microphone noise reduction techniques for hands-free speech recognition –a comparative study-,‖ in Robust Methods for Speech Recognition in Adverse Conditions (ROBUST99), 171–174,
Tampere, Finland, May 1999.
3. M. Cooke, P. Green, L. Josifovski, A. Vizinho, ―Robust automatic speech recognition with missing and unreliable acoustic data,‖ Speech Commun.
34(2001), 267-285.
4. S. Srinivasan, N. Roman, D.L. Wang, ―Binary and ratio time-frequency masks for robust speech recognition,‖ Speech Commun. 48(2006), 1486-1501.
5. R. M. Stern, E. Gouvea, C. Kim, K. Kumar, and H. Park, ―Binaural and multiple-microphone signal processing motivated by auditory perception‖, in Hands-Free Speech Communication and Microphone Arrys, pages 98–103, May.
2008.
6. R. M. Stern and C. Trahiotis, ―Models of binaural interaction,‖ in Hearing, B. C. J.
Moore, Ed. Academic Press,2002, pp. 347–386.
7. H. Park, and R. M. Stern, ―Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings,‖ Speech Communication, vol. 51, no. 1, pp. 15–25, Jan. 2009.
8. K.J. Palomaki, G.J. Brown, D.L. Wang, ―A binaural processor for missing data speech recognition in the presence of noise and small-room
24
reverberation,‖ ,Speech Commun. 43(2004), 361-378.
9. N. Roman, D.L. Wang, G.J. Brown, ―Speech segregation based on sound localization,‖ J. Acoust. Soc. Am. 114, 2236-2252, 2003.
10. M. Brandstein and D. Ward, Microphone arrays (Springer, New York, 2001).
11. S.L. Gay, J. Benesty, Acoustic signal processing for telecommunication, (Kluwer Academic Publishers, 2000).
12. C. Kim, K. Kumar, B. Raj, and R. M. Stern, ―Signal Separation for Robust Speech Recognition Based on Phase Difference Information Obtained in the Frequency Domain,‖ in INTERSPEECH-2009, pages 2495–2498, Sept. 2009.
13. H. Teutsch, G.W. Elko, ―First- and Second-order adaptive differential microphone arrays,‖ 2001.
14. H. Song, J. Liu, ―First-Order Differential Microphone Array for Robust Speech Enhancement,‖ Language and Image Processing, 2008.
15. P. H. Rogers, A. L. V. Buren, ―New approach to a constant beamwidth transducer,‖ J. Acoust See. Am. 64(1), July 1978.
16. W. Marshall Leach, Jr., Introduction to electroacoustics and audio amplifier design (Kendall/Hunt publishing company,2003).
17. J.G.Wilpon, L.R.Rabiner, C.H.Lee, E.R.Goldmn, ―Automatic recognition of keyword in unconstrained speech using hidden Markov models,‖ IEEE Trans.
ASSP. Nov 1990.
18. H. Ney, ―The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition,‖ IEEE Trans. Acoustics, Speech, Signal Proc., vol.32, no2, pp.263-271, April 1984.
19. Chin-Hui Lee, Frank K. Soong and Kuldip K. Paliwal. ―Automatic Speech and Speaker Recognition,‖ Kluwer Academic Publishers. 1995.
20. numerical recipes in C: the art of scientific computing, 2nd Edition, 1993.
25
21. J. Bergqvist and F. Rudolf, ―A silicon condenser microphone using bond and etch-back technology,‖ Sensors and Actuators A, 45, 115-124 (1994).
22. C. Kim, R.M. Stern, K. Eom, J. Lee, ―Automatic selection of thresholds for signal separation algorithm based on interaural delay,‖ 2010.
26
TABLE I Table of first-order differential designs.
Microphone type DI (dB) FBR (dB)
3dB Beamwidh Nulls (degrees)
Dipole 4.77 0.00 90.00° 90.00
Cardioid 4.77 8.45 131.06° 180.00
Hypercardioid 6.02 8.45 104.90° 109.47
Supercardioid 5.72 11.44 114.90° 125.26
27
TABLE II Comparing the effective beamwidth corresponding to the optimal ITD threshold and the subtending angle.
Average τ 0.9909 0.9597 0.9025 0.7055 0.4676 0.2653
Corresponding effective beamwidth
58.72 55.9 51.1 37.5 23.8 13.2
The subtending angle
90 75 60 45 30 15
28
FIG. 1 Diagram of first-order microphone composed of two microphones.
d
k k e
A
P
dx
1x
0τ
29
FIG. 2 Front-to-back ratio of first-order microphone versus the first-order differential parameter α1..
30
FIG. 3 Directivity index of first-order microphone versus the first-order differential parameter α1.
31
FIG. 4 Various first-order directional responses (a) dipole, (b) cardioids, (c) hypercardioid, (d) supercardioid.
32
FIG. 5(a) α1=0.25
FIG. 5(b) α1=0.5
FIG. 5 The directivity pattern of 1st order DMAs.
33
FIG. 6 The block diagram of First-order ADMA
34
FIG. 7 Directivity pattern of the first-order back-to-back cardioids system.
35
FIG. 8 Various directivity patterns for a first-order ADMA
36
FIG. 9 The model of the optimal beamformer, which is a filter and sum system.
...
*
w
1*
w
M
y
x
y z
microphone
37
FIG. 10 The contour plot of super-directive microphone arrays, the four plots represent maximum for DI, maximum for FBR, maximum for constant beamwidth,
and 1st DMA respectively.
38
FIG. 11 The power spectral density of super-directive microphone arrays.
39
FIG. 12 The processing of applying an equalizer in super-directive microphone arrays.
40
FIG. 13 The recognition rate (%) of the noisy speech (white noise) using different algorithms, where the noise signal is located at (a) 180 degrees, (b) 90 degrees, (c) 45
degrees, (d) 0 degree.
41
FIG. 14 The recognition rate (%) of the noisy speech (car noise) using different algorithms, where the noise signal is located at (a) 180 degrees, (b) 90 degrees, (c) 45
degrees, (d) 0 degree.
42
FIG. 15 The block diagram of phase-difference estimation.
Speech
43
FIG. 16 The block diagram of the proposed PDE-based enhancement algorithm, where θ is the subtending angle estimated by DOA.
FFT
44
(a) The searching process of τ
(b) Relative recognition rate
FIG. 17 The searching process of the ITD threshold by GSS.
45
FIG.18 (a)
FIG. 18 (a) Recognition rate in babble noise at SNR 0dB. (b) The optimal ITD threshold tau and the polynomial fitting.
46
FIG.18 (b)
47
FIG. 19 Comparing recognition rate in different volume.
48
FIG. 20 Comparing the recognition rate when the source is not at the direction of the designed mainlobe and the effect of beam steering, where ―15degs.‖ means the
source is aside the desired main axis 15 degrees.
49
FIG. 21 The simulated and experimental environments.
12 m
12 m 9 m
5 cm
0.3 m 1 m
center
Subtending angle
target sources
noise microphone array
50
FIG. 22 (a)
FIG. 22 Comparing the performance of the original noisy signal, PDE algorithm with fixed ITD threshold, automatic ITD threshold selection algorithm, and the
proposed PDE-based enhancement algorithm (a) Subtending angle = 75°. (b) Subtending angle = 45°. (c) Subtending angle = 15°.
51
FIG. 22 (b)
52
FIG. 22 (c)
53
FIG. 23 (a)
FIG. 23 The effect of reverberation, where the subtending angle is from 0 to 90 degrees. (a)T60=0.138 secs. (b)T60=0.966 secs. (c)T60=2.898 secs.
54
FIG. 23 (b)
55
FIG. 23 (c)
56
FIG. 24 The recognition rate with the optimal threshold of record wave file.