Chapter 2 Measurement of feedback paths in hearing aids
2.4 Modeling the EFP in the time-domain
In the previous section, we have modeled the frequency response of the EFP with sweep stimulus. As most of the recent feedback cancellation methods are based on adaptive filtering ( the filter response changes with time according to the variance of the EFP ), it is imperative to model the EFP in the time domain. The time-domain model proposed by Kates [9], as mentioned earlier, considered only the magnitude response and ignored the phase response. However, as we have shown in the previous section, the phase of the EFP is important for determining the frequency where acoustic oscillation occurs.
To determine the impulse response of the external feedback path, we employed a Matlab program which uses the IFFT function to transfer the frequency response to the time-domain that is impulse response. In order to get a more natural impulse response, we took the broadband frequency response as Fig. 2.4 to deal with. The impulse response is depicted in Fig. 2.8. And we can obtain the feedback time which is about 3 milliseconds (ms). As a result , the processing time of feedback cancellation should be in 3 ms. Therefore the time-domain model (or impulse response) can therefore be used for the development of acoustic feedback cancellation algorithms in the future.
Fig. 2.8 Impulse response of the EFP.
Chapter 3
Adaptive noise cancellation algorithms in hearing aids
3.1 Introduction
Noise problem naturally arise in many areas, such as voice communication, speech recognition, and hearing aids. As a result, noise cancellation becomes an important research topic and many studies have been using techniques such as short-time spectral amplitude estimation [10]-[13], iterative Wiener filtering [14]-[16], audio-based filtering [17],[18], signal-subspace processing [19],[20], and hidden Markov modeling (HMM) [21],[22].
Although significant results have been achieved, most of them are not suitable for real-time implementation because their computational complexities are generally too high.
The Kalman filter is well known in signal processing for its efficient structure, and it can be used for time-varying system. It is used in a wide range of engineering applications from radar to computer vision, and is an important topic in control theory and noise cancellation. The Kalman filter is a recursive estimator, that is to say, it only needs the estimated state from the previous time step and the current measurement to estimate the current state. Since it has the potential of high performance, is naturally adaptive, and may qualify for low computational cost, we decided to study the possibility of applying it to noise cancellation in hearing aids.
In [23], Paliwal and Basu used a Kalman filter to enhance speech corrupted by white noise. On a short-time base, speech signals were modeled as stationary AR process and AR parameters were assumed to be known. Gibson, Koo, and Gray considered speech enhancement with colored noise in [24]. They modeled both speech and colored noise as AR processes and developed scalar and vector Kalman filtering algorithms. To estimate the AR coefficients, an EM-based algorithm was employed. In [25], Lees and Ann proposed a non-Gaussian AR model for speech signals. They modeled the distribution of the driving-noise as a Gaussian mixture and applied a decision-directed nonlinear Kalman filter.
Again, an EM-based algorithm was used to identify unknown parameters. Niedzwiecki and Cisowki [26] assumed that speech signals are nonstationary AR processes and used a random-walk model for the AR coefficients. An extended Kalman filter was then used to simultaneously estimate speech and AR coefficients, Note that the stability of the extended Kalman filter is not guaranteed and dimensions of the Kalman filter are greatly increased.
The aforementioned Kalman filtering algorithms still require extensive computations for two reasons: first, using EM algorithms to identify AR coefficients costs a lot, and second, using Kalman filters usually involves matrix inversion. However, the above costs depend on the order of the Kalman filter used. In order to carefully assess the pros and cons of using Kalman filters in hearing aids, we intend to use low-order Kalman filters and low-cost Yule-Walker method for estimating AR coefficients. As a future plan, Kalman filters incorporating with sub-band filtering structure will also be studied; in this case, zero or first-order Kalman filters may be sufficient and the calculation only involves scalar operations, thus saving a considerable amount of computation [27].
3.2 Kalman filtering for white noise cancellation
We derived the following from [27],[24]. On a short-time basis, a speech sequence ( )n
x can be represented as an AR process, which is essentially the output of an all-pole linear system driven by a white noise sequence:
speech signal z n is assumed to be contaminated by a zero-mean additive Gaussian ( ) noise v n( ) , i.e., z n
( )
=x n( ) ( )
+v n . Let v n( ) be white and( ) [ ( ) (n x n x n 1) x n( p 1)]T
=Δ − ⋅⋅⋅ − +
x . Equation (3.1) and the corrupted speech ( )z n can be formulated in the state-space domain as
( )x n =Fx(n− +1) HTw n( ) (3.2) From standard Kalman filtering theory for white message and measurement noise, the state vector estimate is
xˆ =Fxˆ(n−1)+G(n)[z(n)−HFxˆ(n−1)] (3.6)
With the initial condition xˆ(0)=0. The gain and error covariance equations are
G(n)=M(n)HT[HM(n)HT +R]−1 (3.7) M(n)=FS(n−1)FT +HTQH (3.8) S(n)=[I−G(n)H]M(n) (3.9) where G(n)is the Kalman gain vector, M(n)is the prediction-error covariance matrix,
) (n
S is the filtering-error covariance matrix, R=σv2 is the variance of the noise sequence{ nv( )}, and Q=σw2 is the variance of the driving term{ nw( )}. With the initial autocorrelation method from the noisy observations when the true values are not known.
The frame length of 360 samples corresponds to 15ms and was selected arbitrarily from within the common parameter update range of 10 to 30 ms often used in linear predictive systems. The frame length is not critical to the filter performance within these bounds. The noise variance σv2 is estimated in some time interval before speech is present. Fig. 3.1 is the flowchart of the Frame-based AR Kalman filtering for white noise cancellation. At each iteration, we alternately estimate the parameters and filter the speech.
For the complexity of estimating AR coefficients, we propose a simpler method to reduce the computation of AR coefficients. First we estimate the AR coefficients from a clean speech, and then we use the Kalman filter with the fixed AR coefficients to filter the noisy speech. Fig. 3.2 is the flowchart of the fixed AR Kalman filtering for white noise cancellation. We reduce the computation and get a not bad performance in Kalman filtering for white noise cancellation. The simulation results are listed in section 3.3.
Fig. 3.1 The flowchart of the frame-based AR Kalman filtering for white noise cancellation.
Fig. 3.2 The flowchart of the fixed AR Kalman filtering for white noise cancellation.
3.3 Simulation
In this section, we take a girl’s sound “girl , meat ball” in Mandarin from Hearing and Speech Engineering Lab in National Yang-Ming University to be the clean input speech.
This sentence of sound is sampling in 24KHz and about 3.2sec. The white noise we used is a function “rand” in Matlab. We add the white noise in some level of SNR and used the Kalman filter to enhance the noisy signal. Here are some simulation results and we will discuss them in the next section.
In the frame-based AR method we estimate AR coefficients and filter speech in every frame. Fig. 3.3 and Fig. 3.4 are the plots of frame-based method. Then we evaluate the SNRseg and PESQ of speech frames in Table 3.1 and Table 3.2 in different AR orders and in different SNR. Differently we estimate the AR coefficients once from the clean sentence and filter the noisy sentence with the fixed AR coefficients. Then we have the simulation plots of fixed AR method in Fig. 3.5 and Fig. 3.6. After that we also construct Table 3.3 and Table 3.4 with evaluation of speech frames in different AR orders and in different SNR.
Finally we also take five girl’s speech samples from Hearing and Speech Engineering Lab in National Yang-Ming University to be the clean input speech. We contaminate the five speech samples in SNR0 and evaluate those PESQ as the benchmark. Then we use the fixed AR method and framed-base AR method in AR order 4 to enhance these noisy speeches. The simulation results are as Fig. 3.11 and Fig. 3.12.
Fig. 3.3 Time plot of enhanced SNR-5 white noisy speech by frame-based AR4 Kalman filtering.
Fig. 3.4 Spectrogram of enhanced SNR-5 white noisy speech by frame-based AR4 Kalman filtering.
Fig. 3.5 Time plot of enhanced SNR-5 white noisy speech by fixed AR4 Kalman filtering.
Table 3.1 SNRseg of speech frames for frame-based AR Kalman filtering.
Fig. 3.7 SNRseg of speech frames for frame-based AR Kalman filtering.
Table 3.2 PESQ of speech frames for frame-based AR Kalman filtering.
Fig. 3.8 PESQ of speech frames for frame-based AR Kalman filtering.
Table 3.3 SNRseg of speech frames for fixed AR Kalman filtering.
Fig. 3.9 SNRseg of speech frames for fixed AR Kalman filtering.
Table 3.4 PESQ of speech frames for fixed AR Kalman filtering.
Fig. 3.10 PESQ of speech frames for fixed AR Kalman filtering.
1.5
Fig. 3.11 PESQ of sentences in different SNR0 speech samples.
1.5
Fig. 3.12 PESQ of speech frames in different SNR0 speech samples.
3.4 Discussion
From Table 3.1 to Table 3.4 we can find that SNRseg and PESQ are directly proportional to AR order. Intuitively, cleaner speech signals generate more accurate estimates of AR coefficients. Therefore we can model the speech signal more accurately. In these examples the SNRseg and PESQ grades of the two methods saturate on AR3. In my experience different noisy speech signals saturates on different AR orders. Generally AR4 is good enough to model the speech signals.
In Fig. 3.11 and Fig. 3.12 we contaminate five different speech samples and use the two methods to enhance them. Obviously in speech sample 1 and 3 the fixed AR method is much better than frame-based method. On the other hand the frame-based method is better in the other speech samples. Generally the two methods can enhance noisy speech except the few samples. For example in speech sample 4 the fixed AR method’s PESQ is worse than unfiltered noisy speech. But in the subjective evaluation of speech quality, the filtered speech still sounds better than the unfiltered noisy speech does. Since PESQ is an objective measure to simulate the evaluation of the subjective speech quality, it is a valuable reference but not an absolute standard. In order to further compare the significant difference in speech quality resulting from these algorithms, we should do the subjective listening tests [31].
In this part, we discuss the computational complexities of Kalman filtering. The following result is derived from [27]. First, we define three terms for measuring complexity:
MPU, multiplications per unit of time; DVU, divisions per unit of time; and APU, additions per unit of time. According to (3.1), speech is modeled as AR(p). If p≥ the AR 1
MPU、p APU. The Kalman filter described in (3.6)-(3.9) requires 3p2+2p MPU, p DVU, and 3p2+ APU. Totally the frame-based AR Kalman filter needs 2 6p2+3p MPU、3p2+ + ADU and p DVU. On the other hand the Kalman filter with fixed AR p 2 coefficients needs 3p2+2p MPU、3p2+ APU and p DVU. We list the complexities 2 of Kalman filtering and spectrum subtraction for white noise cancellation in Table 3.5.
Obviously in the single band the AR3 and AR4 Kalman filters are more complicated than spectrum subtraction.
Table 3.5 Overall complexities of a Kalman filter.
MPU DVU ADU
AR 3p 2 0 0
Q p 0 p
Kalman 3p2+2p p 3p2+ 2
Table 3.6 Complexities of Kalman filtering and spectrum subtracton for white noise cancellation.
Fixed AR Frame-based AR
AR1 AR2 AR3 AR4 AR1 AR2 AR3 AR4 Spectrum Subtraction
MPU 5 16 33 56 9 30 63 108 0
DVU 1 2 3 4 1 2 3 4 0
ADU 5 14 29 50 6 16 32 54 119
From Table 3.1 to Table 3.4 the performances of Kalman filters are good in every AR order and SNR. But the complexities of Kalman filters are too high. So in the single band Kalman filters may not be able to process noisy speech in real time. As a result, it may not be suitable for the white noise cancellation in the hearing aids system. We will evaluate the possibility of multi-band Kalman filter for white and color noises cancellation to achieve a balance between the performance and low power/cost in the future.
Chapter 4
Conclusions and future work
4.1 Conclusions
We have described a platform to facilitate research for the development of acoustic feedback algorithm on the basis of an ITC hearing instrument placed in-situ. We used sweep stimulus method to measure the broad band frequency response of the EFP and obtained its equivalent impulse response. Then we take a 40dB gain to get the closed-loop response and make use of the Nyquist criterion to obtain the most possible frequency location of inducing oscillation. Our measurement results are helpful to the design and realization of the feedback cancellation algorithm.
We introduce the Kalman filter and apply it to the white noise cancellation. For the complexity of estimating AR coefficients, we propose a simpler method to reduce the computation of AR coefficients. First we estimate the AR coefficients from a clean speech, and then we use the Kalman filter with the fixed AR coefficients to filter the noisy speech.
We canceled some noise and get good grades of PESQ. Then we compare the complexities of them and spectrum subtraction. I think that the single band Kalman filtering is not suitable for the white noise cancellation in hearing aid system because of its high complexity. As a result we will evaluate the combining of Kalman filter and the existing analysis/synthesis filter bank as the future development in order to achieve a balance between the performance and low power/cost.
4.2 Future work
There are several possible extensions for our researches:
(1) Use the platform to measure the EFP in other situations, such as jaw movements or handset proximity and our future hearing aids.
(2) Evaluate the possibility of using state augmentation or other method for Kalman filter to cancel the color noise.
(3) Combine the Kalman filter and the existing analysis/synthesis filter bank to achieve the balance between the performance and low power/cost.
(4) Do the subjective listening tests conducted according to the ITU-T P.835 [31],[32].
Bibliography
[1] Grzegorz Szwoch, Bozena Kostek, “Waveguide model of the hearing aid earmold system,” Diagnostic Pathology, May 2006.
[2] Jingbo Yang, Meng Tong Tan and Joseph S. Chang, “Modeling External Feedback Path of an ITE Digital Hearing Instrument for Acoustic Feedback Cancellation,”
Circuits and Systems, 2005. ISCAS 2005. IEEE International Symposium on 23-26 May 2005 Page(s):1326-1329 Vol.2
[3] Hsiang-Feng Chi, Shawn X. Gao, Sigfrid D. Soli and Abeer Alwan, “Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids,” Speech Communication 39 (2003) 147-161.
[4] Ann Spriet, Geert Rombouts, Marc Moonen, Member, IEEE, and Jan Wouters,
“Combined Feedback and Noise Suppression in Hearing Aids,” IEEE Transactions on audio, speech, and language processing, Vol. 15, No. 6, August 2007.
[5] D. K. Bustamante et. al. , “ Measurement and adaptive suppression of acoustic feedback in hearing aids,” Proc. Int. Conf. Acoustics, Speech, Signal Processing, pp.2017-2020,1989
[6] S.F. Lybarger, “Acoustic Feedback Control, ” The Vanderbilt Hearing-Aid Report edited by G.A. Studebaker, 1989
[7] M.R. Stison et. al., “Effects of handset proximity on hearing aid feedback,” J.
Acoust. Soc. Am. 115, 1147,2004
[8] D.P. Egolf, “Simulating the open-loop transfer function as a means for understanding acoustic feedback in hearing aid,” JASA. 85(1),1989
[9] J. Kates, “A Time-Domain Digital Simulation of Hearing Aid Response,” J. Rehb.
Res. Dev., vol. 27, issue 3, 1990
[10] J.S. Lim, “Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 471-472, Oct. 1978
[11] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-27, pp. 113-120, Oct. 1979 [12] R. J. Mcaulay and M. L. Malpass, “Speech enhancement using sorf-decision noise
suppression filter,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-28, pp. 137-145, Apr. 1980
[13] Y. Ephraim and D. Malah, “Speech enhancement using minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 1109-1121, Dec. 1984
[14] J.S. Lim and A. V. Oppenheim, “All-pole modeling of degraded speech,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-26, pp. 197-210, Oct. 1978 [15] J. H. L. Hansen and M. A. Clement, “Constrained iterative speech enhancement
with application to speech recognition,” IEEE Trans. Signal Processing, vol. 39, pp.
795-805, Apr. 1991
[16] T. V. Sreenivas and P. Kirnapure, “Codebook constrained Wiener filtering for speech enhancement,” IEEE Trans. Speech, Audio Processing, vol. 4, pp. 383-389, Sept. 1996
[17] Y. Cheng and D. O’Shaughnessy, “Speech enhancement based conceptually on auditory evidence,” IEEE Trans. Signal Processing, vol.39, pp. 1943-1954, Sept.
1991
[19] Y. Ephraim and H. L. van Tree, “A signal subspace approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol.3, pp. 251-266, July 1995 [20] S. H. Jensen, P. H. Hansen, S. D. Hansen, and J.A. Sorensen, “Reduction of
broad-band noise in speech by truncated QSVD,” IEEE Trans. Speech Audio Processing, vol.3, pp.439-448, Nov. 1995
[21] Y. Ephraim, “A Bayesian estimation approach for speech enhancement using hidden Markov models,” IEEE Trans. Signal Processing, vol.40, pp. 725-735, Apr.
1992
[22] K. Y. Lee and K. Shirai, “Efficient recursive estimation for speech enhancement in color noise,” IEEE Signal Processing Lett., vol. 3, pp. 196-199, July 1996
[23] K. K. Paliwal and A. Basu, “A speech enhancement method based on Kalman filtering,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 177-180, Apr. 1987
[24] J. D. Gibson, B. Koo, and S. D. Grey, “Filtering of colored noise for speech enhancement and coding,” IEEE Trans. Signal Processing, vol.39, pp. 1732-1741, Aug. 1991
[25] B. Lee, K. Y. Lee, and S. Ann, “An EM-base approach for parameter enhancement with an application to speech signals,” Signal Process., vol. 46, no. 1, pp. 1-14, Sept. 1995
[26] M. Nied´zwiecki and K. Cisowski, “Adaptive scheme for elimination of broadband noise and impulsive disturbance from AR and ARMA signals,” IEEE Trans. Signal Processing., vol. 44, pp. 528-537, Mar. 1996
[27] Wen-Rong Wu, and Po-Cheng Chen, “Subband Kalman filtering for speech enhancement,” IEEE Trans. On circuits and systems-II: Analog and Digital Signal Processing, vol. 45, no.8, Aug. 1998
[28] Y. T. Kuo, T. J. Lin, Y. T. Li, W. H. Chang, C. W. Liu ,and S. T Young, “Design of ANSI S1.11 Filter Bank for Digital Hearing Aids,” Electronics, Circuits and Systems,2007. ICECS 2007. 14TH IEEE International Conference, pp. 242-245, Dec.
2007
[29] http://www.hearingconsultants.com.au/body_products.html, “How hearing aids work today,”
[30] Trench W. F., “An algorithm for the inversion of finite Toeplitz matrices,” J. Soc.
Indust. Appl. Math., vol.12, pp. 515-522, 1964
[31] Mingsian R. Bai, Ping-Ju Hsieh, and Kur-Nan Hur, “Optimal design of minimum mean-square error noise reduction algorithms using the simulated annealing technique, ” J. Acoust. Soc. Am. 125 934 (2009)
[32] ITU-T Rec. P.835, “Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm,” International Telecommunications Union, Geneva, Switzerland, 2003
About the Author
姓 名:陳建男 Chien-Nan Chen 出 生 地:高雄市
出生日期:1983. 1. 27
學 歷:
1989. 9 ~ 1990. 6 高雄市立愛群國小 1990. 9 ~ 1995. 6 高雄市立樂群國小 1995. 9 ~ 1998. 6 高雄市立光華國中 1998. 9 ~ 2001. 6 高雄市立高雄中學
2001. 9 ~ 2005. 2 中正大學 電機工程學系 學士