• 沒有找到結果。

Speech enhancement using an equivalent source inverse filtering-based microphone array

N/A
N/A
Protected

Academic year: 2021

Share "Speech enhancement using an equivalent source inverse filtering-based microphone array"

Copied!
8
0
0

加載中.... (立即查看全文)

全文

(1)

Speech enhancement using an equivalent source inverse

filtering-based microphone array

Mingsian R. Bai,a兲 Kur-Nan Hur, and Ying-Ting Liu

Department of Mechanical Engineering, National Chiao-Tung University, 1001 Ta-Hsueh Road, Hsin-Chu 300, Taiwan

共Received 6 August 2009; revised 16 December 2009; accepted 20 December 2009兲

This paper presents a microphone array technique aimed at enhancing speech quality in a reverberant environment. This technique is based on the central idea of single-input-multiple-output equivalent source inverse filtering共SIMO-ESIF兲. The inverse filters required by the time-domain processing in the technique serve two purposes: de-reverberation and noise reduction. The proposed approach could be useful in telecommunication applications such as automotive hands-free systems, where noise-corrupted speech signal generally needs to be enhanced. SIMO-ESIF can be further enhanced against uncertainties and perturbations by including an adaptive generalized side-lobe canceller. The system is implemented and validated experimentally in a car. As indicated by numerous performance measures, the proposed system proved effective in reducing noise in human speech without significantly compromising the speech quality. In addition, listening tests were conducted to assess the subjective performance of the proposed system, with results processed by using the analysis of variance and a post hoc Fisher’s least significant difference共LSD兲 test to assess the pairwise difference between the noise reduction共NR兲 algorithms. © 2010 Acoustical Society of America. 关DOI: 10.1121/1.3291684兴

PACS number共s兲: 43.60.Ac, 43.60.Dh, 43.60.Fg, 43.60.Np 关EJS兴 Pages: 1373–1380

I. INTRODUCTION

Signal processing using microphone arrays has found applications in teleconferencing, telecommunication, speech recognition, speech enhancement, hearing aids, and so forth.1 In these applications, how to effectively communicate in noisy or reverberant environments has been one of the press-ing issues. Array techniques such as the well known delay-and-sum 共DAS兲 beamformer do not function well in such environments due to multiple reflections.2To enhance inter-ference rejection, a superdirective beamformer was sug-gested for its excellent spatial filtering capability.3Adaptive arrays provided useful alternatives for interference rejection.1–13The generalized side-lobe canceller共GSC兲 is an elegant approach in beamformer design in which a con-strained optimization problem is converted into an uncon-strained one from a linear algebraic perspective.4 Its idea originated from linearly constrained minimum variance 共LCMV兲 beamformer5

and was first utilized by Owsley.6 Griffiths and Jim7 analyzed them and coined the term GSC. Their GSC corresponds to what we called in the present pa-per the Griffiths–Jim beamformer共GJBF兲. The key notion of GSC hinges at the use of the blocking matrix that further enhances the performance achievable by the fixed beam-former regardless of how it is implemented. However, often-times, adaptive implementation of GSC is preferred for two practical reasons. First, a fixed GSC requires the knowledge of covariance matrix, which is computationally prohibitive in real-time applications. Second, adaptive GSC is more robust in the presence of background noises, pointing errors, and

other system uncertainties. There are various forms of adap-tive implementation. Among them, GJBF is perhaps the most well known, which eliminates the need for on-line calcula-tion of the signal correlacalcula-tion matrix and is robust against uncertainties and perturbations of the system. However, the GJBF still could fail in a reverberant environment in which multiple reflections cause problems of beamforming.8 One example of such environment is the interior of a car cabin that is notoriously known to be a noisy and reverberant en-vironment for speech communication. The speech signals tend to be corrupted by noises from the engine, tire, wind, etc., and reflections from the window, dashboard, seats, ceil-ing, etc.

In this paper, a microphone array technique is proposed for processing speech signals in noisy and reverberant envi-ronments. The idea of the technique originated from an equivalent source inverse filtering 共ESIF兲 technique9 devel-oped for noise source identification purposes. However, the model in this paper is based on a single-input multiple-output 共SIMO兲 structure, while the previous paper is based on a multiple-input multiple-output共MIMO兲 structure. This seem-ingly minor difference leads to many distinctive features in the implementation. First, unlike MIMO approach, the SIMO-ESIF formulation results in only one single focus, which simplifies tremendously the filter design that requires only simple phase conjugation and scaling, without having to design explicitly complicated inverse filters. The propagation matrices are basically represented by a vector h, which ren-ders the term hHh in the inverse filter a nonzero scalar and

regularization is literarily unnecessary. Second, by posing the problem within a SIMO framework, inverse filters are de-signed based on measured acoustical plants, or systems to be controlled,14that include the effects of not only direct propa-a兲Author to whom correspondence should be addressed. Electronic mail:

(2)

gation from the source but also the reflections from the boundaries, which is different from the previous MIMO nearfield equivalence source imaging 共NESI兲 that employs only free-field point source model. It follows that, with in-verse filtering, both noise reduction and de-reverberation are fulfilled at the same time. Finally, the present work and the previous approach have totally different purposes in applica-tion. The ESIF algorithm aims at speech enhancement for telecommunication, whereas the previous NESI algorithm is intended for noise source identification共NSI兲 and sound field visualization. Another unique feature of the proposed tech-nique is the use of the GSC to further enhance the perfor-mance of the proposed technique. An exact blocking matrix 共BM兲 differing from those used in traditional beamformers is derived in this paper by solving a LCMV problem with two mutually orthogonal subspaces.5 The leaky least-mean-squares共LMS兲 algorithms is exploited for adaptive filtering in the multiple-input canceller共MC兲.15,16

The proposed algorithms were implemented for enhanc-ing speech communication quality in a car by usenhanc-ing a multi-channel data acquisition system. Objective tests were carried out to evaluate the algorithms.1 In addition, listening tests were conducted to assess the subjective performance of the algorithms with data processed by the multivariate analysis of variance 共MANOVA兲 共Ref. 17兲 and the least significant difference共Fisher’s LSD兲 post hoc test.

II. EQUIVALENT SOURCE INVERSE FILTERING The central idea of the proposed SIMO-ESIF algorithm is introduced in this section. In Fig.1, M microphones are employed to pick up the sound emitting from a source posi-tioned in the farfield. In the frequency domain, the sound pressure received at the microphones and the source signal can be related by a M⫻1 transfer matrix H as follows:

P = Hq共␻兲, 共1兲

where q共␻兲 is the Fourier transform of a scalar source strength, P =关p1共␻兲¯pM共␻兲兴Tis the pressure vector with T

denoting matrix transpose, and H =关h1共␻兲¯hM共␻兲兴T is the

M⫻1 propagation matrix. The aim here is to estimate the source signal q共兲 based on the pressure measurement P by using a set of inverse filters

C =关c1共␻兲 ¯ cM共␻兲兴T, 共2兲

such that CTH⬇I and therefore

qˆ = CTP = CTHq⬇ q. 共3兲

On the other hand, this problem can also be written in the context of the following least-squares optimization problem:

min

q 储P − Hq储2

2

, 共4兲

where储 储2denotes vector 2-norm. This is an overdetermined problem whose least-squares solution is given by

qˆ =共HHH−1HHP =H HP

储H储2

2, 共5兲

where the superscript H denotes Hermitian transpose. Com-parison of Eqs.共3兲 and共5兲 yields the following optimal in-verse filters:

CT= H H

储H储2

2. 共6兲

If the scalar储H储22is omitted, the inverse filters above reduce to the “phase-conjugated” filters, or the “time-reversed” fil-ters in the free-field context. Specifically, for a point source in the free field, it is straightforward to show that

储H储22=

m=1 M 1 rm 2, 共7兲

where rmis the distance between source and the mth

micro-phone. Since 储H储22 is a frequency-independent constant, the inverse filters and the time-reversed filters differ only by a constant. In a reverberant environment, these filters are dif-ferent in general. Being able to incorporate the reverberant characteristics in the measured acoustical plant model repre-sents an advantage of the proposed approach over conven-tional methods such as the DAS beamformer.

In real-time implementation, the inverse filters are con-verted to the time-domain finite-impulse-response 共FIR兲 fil-ters with the aid of inverse fast Fourier transform共IFFT兲 and circular shift. Thus, the source signal can be recovered by filtering the pressure signals with the inverse filters c共k兲 as follows:

共k兲 = cT共k兲 ⴱ p共k兲, 共8兲

where k is discrete-time index, c共k兲 is the impulse response of the inverse filter, and “ⴱ” denotes convolution.

III. ADAPTIVE GSC-ENHANCED SIMO-ESIF ALGORITHM

The SIMO-ESIF algorithm can be further enhanced by introducing an adaptive GSC to the system. The benefit is twofold. The directivity of the array is increased by suppress-ing the interferences due to side-lobe leakage. Robustness of the array is improved in the face of uncertainties and pertur-bations. The block diagram of the GSC with M microphones is shown in Fig.2. The system comprises a fixed beamformer 共FBF兲, a MC, and a BM.11

The FBF aims at forming a beam in the look direction so that the target signal is passed and signals at other directions are rejected. The pm共k兲 is the

sig-nal received at the mth microphones and qˆ共k兲 is the output signal of the FBF at the time instant k. The MC consists of

( ) qh2( ) qˆ( ) ( ) M h  1( ) c  2( ) c  ( ) M c  2( ) p  ( ) M p  1( ) p  1( ) h

FIG. 1. The block diagram of the SIMO-ESIF algorithm. The parameter

qm共␻兲 is the input source, hm共␻兲 is the propagation matrix, and cm共␻兲 is the

(3)

multiple adaptive filters that generate replicas of components correlated with the interferences. The components correlated with the output signals ym共k兲 of the BM are subtracted from

the delayed output signal qˆ共k−Q兲 of the FBF, where Q is the number of modeling delay. Contrary to the FBF that pro-duces a main-lobe, the BM forms a null in the look direction so that the target signal is suppressed and all other signals are passed though, hence the name “blocking matrix.” The GSC subtracts the interferences that “leak” to the side-lobes in the subtractor output z共k兲 and effectively improves spatial filter-ing.

A. Formulation of the blocking matrix

The purpose of the GSC depicted in Fig.3lies in mini-mizing the array output power, while maintaining the unity gain at the look direction共0-deg broadside is assumed here兲, which can be posed in the following constrained optimiza-tion formalism:5 min w E兵兩z兩 2其 = min w w H Rppw 共9兲 subject to HHw = 1, 共10兲

where z is the array output signal, Rpp= E兵ppH其 is the data

correlation matrix, E兵 其 symbolizes the expected value, H is the frequency response vector corresponding to the propaga-tion paths from the source to each microphone, and w is coefficient vector of the array filters. This constrained opti-mization problem can be converted into an unconstrained one by decomposing the optimal filter w into two linearly independent components belonging to two mutually

orthogo-nal subspaces: the constraint range space R共H兲 and the or-thogonal null space N共HH兲.

w = w0− v, 共11兲

where w0苸R共H兲 is a fixed filter and v=Bwa苸N共HH兲 with

wa being an adaptive filter. It follows that

HHw = HH共w − Bwa兲 = HHw0− HHBwa⬇ 1. 共12兲

The fixed filter w0 represents the quiescent component that guarantees the essential performance of beamforming. The filter design is off-line since it is independent of the data correlation matrix. It turns out that the minimization can then be carried out in the orthogonal subspace共v兲 without impact-ing the constraint.

Traditionally, various ad hoc blocking matrices have been suggested. These matrices are based on the simple idea that, for free-field plane waves incident from the farfield broadside direction, H =关1 1¯1兴H. Since HHB = 0, blocking

is ensured if the columns of B sum up to zero; e.g., subtrac-tion of signals of adjacent channels is a widely used ap-proach. However, for a complex propagation matrix in a re-verberant field, these ad hoc blocking matrices are inadequate. As a major distinction between the present ap-proach and the conventional apap-proaches, we shall derive an exact blocking matrix for a more general acoustical environ-ment.

To fulfill the condition that Bwa苸N共HH兲⇔HHBwa= 0,

the columns of B must be constructed from the basis vectors of N共HH兲 such that HHB = 0. Let H =关a1,a2, . . . ,anH, x =关x1,x2, . . . ,xn兴 苸 N共HHHHx = 0⇒ a1x1+ a2x2+ . . . + anxn= 0 If a1⫽ 0, x1= −a2 a1x2− a3 a1x3− ¯ − an a1xn Let x2=␣2, x3=␣3, . . . ,xn=␣n ⇒x1= − a2 a1␣2− a3 a1␣3− ¯ − an a1n

x1 x2 ] ] xn

=␣2

a2 a1 1 0 ] 0

v2 +␣3

a3 a1 0 1 ] 0

v3 + ¯ +␣n

an a1 0 ] 0 1

vn .

It is not difficult to see that v2, v3, . . . , vn are linearly

inde-pendent and form the basis of the null space N共HH兲. Thus,

the matrix B =共v2, v3, . . . , vn兲 comprised of v2, v3, . . . , vn as

its columns can be used as the blocking matrix; i.e.,   M R y k Microphones   1 p k   2 p k   M p k θ ˆq k  ˆq k Q     1 y k   2 y k   z k Output FBF BM MC Q z

FIG. 2. The block diagram of the GSC, comprised of the FBF, the BM, and the MC. - ( ) z k ( )k p 0 H w H

B

H a w +

FIG. 3. The block diagram of the SIMO-ESIF-GSC algorithm. The param-eter p共k兲 is the microphone signal, BHis the BM, and w

0

(4)

B =

a2 a1 −a3 a1 ¯ −an a1 1 0 0 0 1 ¯ ] ] ] 0 0 0 ¯ 1

. 共13兲

Physical insights can be gained by observing the beam patterns of the FBF and the BM shown in Fig.4. Three sine wave signals at 500 Hz, 1 kHz, and 2 kHz are used to com-pare the performance of the BM between FBF, respectively. In the look direction, the FBF forms a main-lobe, whereas the BM forms a null so that the signal in the look direction is “blocked.” The blocked path will attempt to further reduce the noise or interference outside the principal look direction 共side-lobes兲. Note that the formulation above is in the fre-quency domain. For real-time implementation, the blocking matrix B needs to be converted to impulse responses using IFFT and circular shift.

B. Multiple-input canceller

In practice, the GSC is implemented using adaptive fil-ters that are generally more robust than fixed filfil-ters. The need to compute the data correlation matrix Rppis eliminated

using such approach. For example, the leaky adaptive filters 共LAFs兲 共Ref. 16兲 can be used in the MC block. LAFs sub-tract the components correlated with yn共k兲, 共n=0, ... ,N兲

from qˆ共k−Q兲, where Q is the modeling delay for causality. Let M2 be the number of taps in each LAF and wn共k兲 and

yn共k兲 be the coefficient vector and the signal vector of the nth

LAF, respectively. The output of the MC module can be written as z共k兲 = qˆ共k − Q兲 −

n=0 N−1 wn T共k兲y n共k兲, 共14兲 wn共k兲  关wn,0共k兲,wn,l共k兲, ... ,wn,M2−1共k兲兴T, 共15兲 yn共k兲  关yn共k兲,yn共k − 1兲, ... ,yn共k − M2+ 1兲兴T. 共16兲 The filter coefficients can be updated using the LMS algo-rithm

wn共k + 1兲 = wn共k兲 +z共k兲yn共k兲, 共17兲

where␮is the step size.

In Fig. 5, the beam pattern at 500 Hz of the proposed adaptive GSC algorithm is compared with two other conven-tional algorithms: GJB 共Ref. 7兲 and LAF-LAF.16 The GJBF algorithm adopts subtracted signals of adjacent channels as its BM block, whereas LAF-LAF algorithm uses adaptive filters to block the target signals. Both algorithms use adap-tive algorithm identical to the MC block in Eq.共17兲. Clearly seen in Fig.5, the proposed adaptive GSC algorithm attains the sharpest beam in the look direction with minimum side-lobes. -60 -40 -20 0 20 40 60 -40 -35 -30 -25 -20 -15 -10 -5 0

Direction of Arrival (Degrees)

Ga in (d B ) 500Hz 1000Hz 2000Hz Target Signal FBF Directivity (b) (a)

FIG. 4. The directivity pattern of the SIMO-ESIF-GSC algorithm at differ-ence frequencies.共a兲 FBF with a main-lobe at the look direction. 共b兲 BM with a null at the look direction.

-60 -40 -20 0 20 40 60 -40 -38 -36 -34 -32 -30 -28 -26 -24 -22 -20

Direction of Arrival (Degrees)

G ai n (dB ) LAF-LAF GJBF GSC

FIG. 5. The comparison of the beam patterns at 500 Hz obtained using the GJBF, LAF-LAF, and SIMO-ESIF-GSC algorithms.

(5)

IV. ARRAY PERFORMANCE MEASURES

In the section, objective performance measures are de-fined for evaluating the array performance.1 With the first microphone as the reference, the input signal to noise ratio 共SNR兲 is defined as

SNR1共dB兲 = 10 logE兵x1共k兲 2 E兵v1共k兲2其

, 共18兲

where k is the discrete-time index, and x1共k兲 and v1共k兲 are the speech signal and the noise, respectively, received at mi-crophone 1. The output SNR can also be defined for the array output

SNRA共dB兲 = 10 log

E兵兩c共k兲Tⴱ x共k兲2兩其

E兵兩c共k兲Tⴱ v共k兲2兩其, 共19兲 where c共k兲 is the impulse response of the inverse filter and “ⴱ” denotes convolution. Hence, the SNR gain is obtained by subtracting the output SNR from the input SNR.

SNR共dB兲 = SNRA− SNR1. 共20兲

The SNRG quantifies the noise reduction performance due to array processing. However, noise reduction comes at the price of speech distortion in general. To assess speech distor-tion, a speech-distortion index共SDI兲 is defined as

SDI共dB兲 = 10 log E兵x1共k兲2其 E兵兩x1共k兲 − c共k兲Tⴱ x共k兲兩2其

. 共21兲

It is impractical to maximize both indices at the same time. The aim of array processing is then to reach the best com-promise between the two indices.

V. OBJECTIVE AND SUBJECTIVE PERFORMANCE EVALUATIONS

The proposed algorithms have been examined experi-mentally in the vehicle compartment of a 2-l sedan. Figure6 shows the experimental arrangement inside the car. Array signal processing algorithms are all implemented on National Instruments LABVIEW 8.6and NI-PXI 8105 data acquisition system.18 The sampling rate is 8 kHz. The sound pressure data were picked up by using a four-microphone 共PCB 130D20兲 linear uniform array with inter-element spacing of 0.08 m. A loudspeaker positioned at 共0.4 m, 0-deg兲 with respect to the array center was used to broadcast a clip of male speech in English, while another loudspeaker posi-tioned at共0.3 m, 53-deg兲 was used to generate white noise as the interference.

Objective and subjective experiments were undertaken to evaluate the proposed methods. The SIMO-ESIF algo-rithm is used as the FBF and 512-tapped adaptive filters with step size ␮= 0.001 are used in the MC and LAF. There are variations of the SIMO-ESIF algorithm, depending on the plant model used and the filtering method in the FBF, as summarized in TableI. Two kinds of plant models, the free-field point source model and the measured plant in the car, are employed for designing the inverse filters. Two filtering methods, the inverse filtering and the time-reversed filtering,

are employed in the FBF design. In addition, three variations of the processing methods with GSC are also included in TableI.

A. Objective evaluation

The objective measures SNR1, SNRA, SNRG, and SDI are employed to assess the performance of six proposed al-gorithms. The experimental results are summarized in Table II. By comparing the SIMO-ESIF and the SIMO-ESIF-GSC algorithms, the algorithms with GSC have attained

signifi-Target source

Noise source

Microphone array

(b) (a)

FIG. 6. The experimental arrangement for validating the SIMO-ESIF algo-rithms.共a兲 The test car. 共b兲 The experimental arrangement inside the car.

TABLE I. The acronyms and descriptions of six SIMO-ESIF algorithms.

Algorithm Acronym Description

SIMO-ESIF PIF Point source model-based inverse filtering MIF Measured plant-based inverse filtering MTR

Measured plant-based time-reversed filtering

GSC-PIF

Point source model-based inverse filtering

SIMO-ESIF-GSC GSC-MIF Measured plant-based inverse filtering GSC-MTR

Measured plant-based time-reversed filtering

(6)

cantly higher noise reduction共SNRG兲 and lower speech dis-tortion 共SDI兲 than the algorithms without GSC. The time-reversed filters in general yield inferior performance than the inverse filters. The inverse filtering with the measured plant model considerably outperforms the point source model, e.g., SNRG of GSC-MIF= 15.41 dB vs SNRG of GSC-PIF = 11.49 dB. The implication of this result is that the inverse filters based on measured plant models have provided “de-reverberation” effect in addition to noise reduction. Although the point source model-based inverse filtering 共PIF兲 method tends to yield the least distortion, its noise reduction perfor-mance is also the worst. Figure7compares the time-domain wave forms obtained using SIMO-ESIF algorithm with and without GSC. Evidently, introduction of GSC has positive impact on noise reduction performance of the array.

Table III compares the proposed adaptive GSC algo-rithm and two other conventional algoalgo-rithms, GJBF共Ref.7兲 and CCAF.11The GJBF algorithm subtracts signals of adja-cent channels as its BM block, whereas coefficient-constrained adaptive filtering 共CCAF兲 algorithm uses con-strained adaptive filters to block the target signals. Both algorithms use the adaptive algorithm identical to the MC block. The result revealed that the SIMO-ESIF algorithm augmented with the GSC outperformed the SIMO-ESIF al-gorithm without GSC. Among the GSC-based alal-gorithms, the proposed GSC had attained the highest SNRG perfor-mance. The proposed GSC algorithm performed the best in noise reduction.

B. Subjective evaluation

Apart from the preceding objective tests, listening tests were conducted according to the ITU-R BS1116共Ref.19兲 to validate the algorithms. Subjective perception of the pro-posed algorithms was evaluated in terms of noise reduction and speech distortion. Specifically, three subjective attributes including signal distortion 共SIG兲, background intrusiveness 共BAK兲, and overall quality 共OVL兲 were employed in the test. Fourteen participants in the listening tests were instructed with definitions of the subjective attributes and the proce-dures prior to the test. The subjective attributes are measured on an integer scale from 1 to 5. The participants were asked to respond in a questionnaire after listening. The six pro-posed algorithms previously used in the objective test are compared in the listening test. The test signals and conditions remain the same as in the preceding objective tests. A refer-ence signal and an anchor signal are required in the ITU-R BS1116. In the test, the unprocessed signal received at the first microphone was used as the reference, while the

lowpass-filtered reference was used as the hidden anchor. The mean and spread of the results of listening test are illus-trated in Fig. 8. In order to access statistical significance of the results, the test data were processed using MANOVA 共Ref. 17兲 with significance levels summarized in Table IV. Cases with significance levels below 0.05 indicate that sta-tistically significant difference exists among the algorithms. In particular, the difference of the indices SIG and BAK TABLE II. The objective performance summary of the six algorithms.

PIF MIF MTR

SIMO-ESIF

GSC Without With Without With Without With

SNR1共dB兲 3.79 3.79 3.79 3.79 3.79 3.79 SNRA共dB兲 12.96 15.28 15.56 19.19 13.58 13.66 SNRG共dB兲 9.16 11.49 11.77 15.41 9.78 9.87 SDI共dB兲 2.87 2.60 1.72 1.59 0.86 1.56 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 104 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Time(Sample) A m pl it ude( V ) unprocessed MIF GSC-MIF 0 0.5 1 1.5 2 2.5 3 3.5 4 -110 -100 -90 -80 -70 -60 -50 -40 -30 Frequency (kHz) P o w e r/ fr equenc y (dB /H z )

Power Spectral Density Estimate via Welch

unprocesed MIF GSC-MIF

(b) (a)

FIG. 7. The comparison of MIF algorithm and SIMO-ESIF-GSC-MIF algorithm by experimental measurement.共a兲 The time-domain wave forms.共b兲 The power spectral density functions.

(7)

among the six proposed methods was found to be statistically significant. Multiple regression analysis was applied to ana-lyze the linear dependence of the OVL on the SIG and BAK. The regression model was found to be OVL= 1.71+ 0.2 ⫻SIG+0.28⫻BAK. It revealed that the SIG has comparable but only slightly higher influence on the OVL than the BAK, whereas the indices SIG and the BAK are normally trade-offs. This explains why no significant difference can be found among methods in the OVL.

After the MANOVA, a post hoc Fisher’s LSD test was employed to perform multiple paired comparisons. In Fig.8, as opposed to the results of objective evaluation, the GSC-MIF algorithm performed not quite as expected in SIG. The price paid for the high noise reduction seems to be the signal distortion, which was noticed by many subjects. For the SIG index, the results of the post hoc test reveal that the GSC-PIF method outperforms the other methods. For the BAK index, the GSC-MIF method received the highest grade among all methods, which means that the inverse filtering approach has achieved both de-reverberation and noise reduction success-fully. Despite the excellent performance in SIG, the PIF al-gorithm received low scores in BAK, which is consistent with the observation in the objective test. On the other hand, the GSC-PIF algorithm received higher SIG grade than plain PIF algorithm, indicating the GSC algorithm enhanced the SIMO-ESIF algorithm. However, the grades in the SIG and BAK indices showed no significant difference between the measured plant-based time-reversed filtering 共MTR兲 and

GSC-MTR algorithms. By comparing the BAK grade, all the proposed methods performed better than the reference signal. Figure9compares the proposed GSC algorithm, GJBF,6 and CCAF 共Ref. 11兲 algorithms. The proposed GSC algo-rithm attained the highest BAK grades, while it also yielded lower SIG grades than the other algorithms. Apparently, the proposed GSC had attained the best performance in noise reduction at the expense of signal distortion. This is a typical trade-off for speech enhancement algorithms in general one has to face between signal distortion and noise reduction performance.

VI. CONCLUSIONS

A SIMO-ESIF microphone array technique has been de-veloped for noisy automotive environments. Speech commu-nication quality has been improved owing to the noise reduc-tion and de-reverberareduc-tion funcreduc-tions provided by the proposed system. With the use of specially derived BM of the GSC, the performance of SIMO-ESIF has been further enhanced.

The proposed algorithms have been validated via exten-sive objective and subjective tests. Overall, the results reveal that both de-reverberation and noise reduction can be achieved by using the SIMO-ESIF techniques. The methods exhibit different degrees in trading off noise reduction per-formance and speech-distortion quality. The MIF and GSC-MIF algorithms seem to have achieved a satisfactory com-promise between these two attributes. All this leads to the conclusion that SIMO-ESIF-GSC-MIF proves effective in reducing noise and interference without markedly compro-mising speech quality.

TABLE IV. The MANOVA output of the listening test of the six algorithms. Cases with significance value p below 0.05 indicate that statistically signifi-cant difference exists among all methods.

Noise type

Significance value p

SIG BAK OVL

White noise 0.000 0.000 0.847

FIG. 8. The MANOVA output of the subjective listening test for the six SIMO-ESIF algorithms. Three subjective attributes including signal distor-tion 共SIG兲, background intrusiveness 共BAK兲, and overall quality 共OVL兲 were evaluated in the test.

FIG. 9. The MANOVA output of the subjective listening test for the differ-ent GSC algorithms. Three subjective attributes including signal distortion 共SIG兲, background intrusiveness 共BAK兲, and overall quality 共OVL兲 were evaluated in the test.

TABLE III. The objective performance summary of the four beamforming algorithms including the ESIF, ESIF-GSC, GJBF, and CCAF algorithms.

Objective index

MIF

ESIF ESIF-GSC GJBF CCAF

SNR1共dB兲 ⫺1.04 ⫺1.04 ⫺1.04 ⫺1.04

SNRA共dB兲 6.20 12.72 10.27 9.92

SNRG共dB兲 7.24 13.76 11.31 10.96

(8)

ACKNOWLEDGMENT

The work was supported by the National Science Coun-cil of Taiwan, Republic of China, under Project No. NSC 97-2221-E-009-010-MY3.

1J. Benesty, J. Chen, and Y. Huang, Microphone Arrays Signal Processing

共Springer, New York, 2008兲.

2J. Bitzer, K. U. Simmer, and K. D. Kammeyer, “Multi-microphone noise

reduction techniques for hands-free speech recognition—A comparative study,” in Robust Methods for Speech Recognition in Adverse Conditions 共ROBUST99兲, Tampere, Finland 共1999兲, pp. 171–174.

3J. Bitzer, K. D. Kammeyer, and K. U. Simmer, “An alternative

implemen-tation of the superdirective beamformer,” Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October共1999兲.

4H. L. Van Trees, Optimum Array Processing共Wiley, New York, 2002兲. 5H. Cox, R. M. Zeskind, and M. M. Owen, “Robust adaptive

beamform-ing,” IEEE Trans. Acoust., Speech, Signal Process. ASSP-35, 1365–1376 共1987兲.

6N. L. Owsley, “Source location with an adaptive antenna array,” Technical

Report No. AD0719896, Naval Underwater Systems Center, National Technical Information Service, Springfield, VA, 1971.

7L. J. Griffiths and C. W. Jim, “An alternative approach to linear

con-strained adaptive beamforming,” IEEE Trans. Antennas Propag. 30, 27–34 共1982兲.

8J. Bitzer, K. U. Simmer, and K. D. Kammeyer, “Multichannel noise

reduction—Algorithms and theoretical limits,” in Proceedings of the EURASIP European Signal Proceeding Conference共EUSIPCO兲, Rhodes, Greece, September共1998兲, Vol. 1, pp. 105–108.

9M. R. Bai and J. H. Lin, “Source identification system based on the

time-domain nearfield equivalence source imaging: Fundamental theory and implementation,” J. Sound Vib. 307, 202–225共2007兲.

10M. Brandstein and D. Ward, Microphone Arrays 共Springer, New York,

2001兲.

11O. Hoshuyama, A. Sugiyama, and A. Hirano, “A robust adaptive

beam-former for microphone array with a blocking matrix using constrained adaptive filters,” IEEE Trans. Signal Process. 47, 2677–2684共1999兲.

12Y. Grenier, “A microphone array for car environment,” Speech Commun.

12, 25–39共1993兲.

13O. L. Frost III, “An algorithm for linearly-constrained adaptive array

pro-cessing,” Proc. IEEE 60, 926–935共1972兲.

14G. F. Franklin, M. L. Workman, and D. Powell, Feedback Control of

Dynamic Systems, 2nd ed.共Addison-Wesley, Boston, MA, 1993兲.

15I. Claesson and S. Nordholm, “A spatial filtering approach to robust

adap-tive beamforming,” IEEE Trans. Antennas Propag. 40, 1093–1096共1992兲.

16O. Hoshuyama and A. Sugiyama, “A robust generalized sidelobe canceller

with a blocking matrix using leaky adaptive filters,” Electron Commun. Jpn. 80, 56–65共1997兲.

17S. Sharma, Applied Multivariate Techniques共Wiley, New York, 1996兲. 18National Instruments, http://sine.ni.com/nips/cds/view/p/lang/zht/nid/

202630共Last viewed 7/17/2009兲.

19ITU-R Rec. BS.1116-1, “Methods for the subjective assessment of small

impairments in audio systems including multichannel sound systems,” In-ternational Telecommunications Union, Geneva, Switzerland, 1994.

數據

FIG. 1. The block diagram of the SIMO-ESIF algorithm. The parameter
FIG. 2. The block diagram of the GSC, comprised of the FBF, the BM, and the MC. - ( )z k( )kp0Hw H B Haw +
FIG. 5. The comparison of the beam patterns at 500 Hz obtained using the GJBF, LAF-LAF, and SIMO-ESIF-GSC algorithms.
TABLE I. The acronyms and descriptions of six SIMO-ESIF algorithms.
+3

參考文獻

相關文件

By using the case study and cross analysis of the results, The Purpose of this research is find out the Business implementing Supply Chain Management system project, Our study

In order to solve the problem of the tough recruitment of students in the future, universities and colleges, in addition to passing the relevant assessment conducted by the

According to analysis results, the system satisfaction have nearly 43% variance explained by system quality, information quality, training experience and

The purpose of this research is to explore the important and satisfaction analysis of experiential marketing in traditional bakery industry by using Importance-Performance and

In the proposed method we assign weightings to each piece of context information to calculate the patrolling route using an evaluation function we devise.. In the

Our preliminary analysis and experimental results of the proposed method on mapping data to logical grid nodes show improvement of communication costs and conduce to better

Based on the analysis conducted by the independent researcher, how could the newspaper report be modified to give a better description of the relationship between the number

Based on the insertion of redundant wires and the analysis of the clock skew in a clock tree, an efficient OPE-aware algorithm is proposed to repair the zero-skew