NOVEL BINAURAL SPECTRO-TEMPORAL ALGORITHM FOR NOVEL BINAURAL SPECTRO-TEMPORAL ALGORITHM FOR
SPEECH ENHANCEMENT IN LOW SNR ENVIRONMENTS
Po-Hsun Sung, Bo-Wei Chen, Ling-Sheng Jang and Jhing-Fa Wang
Department of Electrical Engineering, National Cheng Kung University Tainan City, Taiwan, R.O.C.
B INAURAL A UDITORY P ROCESSING A BSTRACT
A novel BInaural Spectro‐Temporal (BIST) algorithm is proposed
i thi t i th h i t lli ibilit i l
B INAURAL A UDITORY P ROCESSING A BSTRACT
In our experiment, the clean speech and noise are played via two loudspeakers in front of the microphone array(using condition 1: a single microphone and condition 2: an artificial in this paper to increase the speech intelligibility in low or
negative SNR noisy environments. The BIST algorithm consists of two modules. One is the binaural auditory processing for
receiving sound from the specific direction and the other is the E R
g p
head). One is at 0 ∘ , and the other is at θ = 90 ∘ . The distance between the loudspeakers and the microphone array are 1 m.
receiving sound from the specific direction, and the other is the spectro‐temporal modulation filter for noise reduction. Most speech enhancement algorithms are not applicable in harsh environments because the energy of speech is overwhelmed by
E XPERIMENTAL R ESULTS
gy p y
the noise. To increase the speech intelligibility in low or negative SNR noisy environments, a distinctive approach is
proposed to solve this problem. First, the BIST algorithm takes Figure 2. Function block diagram of binaural auditory processing
equency (Hz)
750 1500 3000
equency (Hz)
750 1500 3000
binaural auditory processing as a spatial mask to separate the speech and noise according to their locations. Next, the modulation filter is applied to reduce the noise source in the scale rate (spectro temporal modulation) domain according to
S PECTRO -T EMPORAL M ODULATION F ILTERING
Fre
Time (ms)
500 1000 1500 2000 2500 188
375 Fre
Time (ms)
500 1000 1500 2000 2500 188
375
2500
scale‐rate (spectro‐temporal modulation) domain according to their different acoustic feature. It works like the spectro‐
temporal receptive field (STRF) which is the perception
response of human auditory cortex The experimental results
Hz) 15003000
Hz)
1500 3000
(a) Noisy speech at SNR ‐10dB (b) Enhanced by spectrum subtraction
response of human auditory cortex. The experimental results demonstrate that the proposed BIST algorithm can improve
speech intelligibility by 20%.
Frequency (H
188 375 750 1500
Frequency (H
188 375 750 1500
I NTRODUCTION
• Cocktail Party Effect Figure 8. Cochleagram of different signals
Figure 3. Function block diagram of spectro‐temporal modulation filtering
Time (ms)
500 1000 1500 2000 188
Time (ms)
500 1000 1500 2000 188
(C) Enhanced after spatial mask (d) Enhanced by BIST filtering algorithm
cyc/oct)
2 00 4.00 8.00
cyc/oct)
2 00 4.00 8.00
ncy (Hz)
1500 3000
y
• Speech with heavy noise remains a difficult issue to solve because it is hard to separate the mixture of speech
Scale (c
1 2 4 8 16 32 0.50 1.00 2.00
Scale (c
1 2 4 8 16 32 0.50
1.00 2.00
Frequen
500 1000 1500 2000 2500 188
375
and noise efficiently.
750• Human can pay attention to a particular target while filtering out
th d i d
- 32.- 16.Rate (Hz)- 8.- 4.- 2.- 1. 1. 2. 4. 8.16.32.Rate (Hz)Time (ms)
500 1000 1500 2000 2500
other sound sources even in adverse listening environments.
• The BInaural Spectro‐Temporal (BIST) filtering algorithm is proposed to
Figure 4. Representation of clean speech
In this paper, the clean speech is extracted from noisy sound Summary
filtering algorithm is proposed to solve this issue.
0 5 10
Processing of Temporal Modulation Filter
Amplitude
Original
p p , p y
according to its location and acoustic characteristics step by step. The binaural auditory processing functions as a spatial mask which is used to separate the target source and noises
Summary
We have demonstrated a novel approach to improve the speech intelligibility in negative SNR environments. In this
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000
-10 -5
Sample points
A
0.2 Filtered
based on their directions. Regarding the pattern of sound source, the transformation in the representation of sound is founded in mammalian auditory systems, called spectro‐
t l ti fi ld (STRF ) STRF t th
method, the integration of binaural auditory processing and auditory cortical processing functions as mechanism of human auditory perception. It is proved that the adaption of binaural
h l k h f h l f l d h
-0.1 0 0.1
Amplitude (Normalized) te ed
temporal receptive fields (STRFs). STRFs represent the transformation of neuron responses in the auditory cortex when the sound wave arrives. It decomposes the content of auditory spectrogram into the scale‐rate domain The scale
hearing as spatial mask at the first stage is helpful to reduce the complexity of multi sound sources. Furthermore, the speech can be extracted from noisy signal based on their distinct spectro temporal modulation patterns The objective tests
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000
-0.2
Sample points